Video Coming Soon...

Created by Zed A. Shaw Updated 2024-10-08 04:45:56

05: Character Sets

In this exercise you'll learn to match sets of characters rather than just any charactor or a single character.

What is a Set?

A "set" is a list of items that must match exactly, but not in any particular order. The advantage of a set is you can give a range of values to match, which make it easier to write complex regex. For example, if you wanted to only match every character of the alphabet you could write [a-z] which I read as "match 'a' through 'z'."

Sets in Regex

We now get into "operators" that are more complex than one character. Other operators like * do a lot with just the one character, but for a set to work you need the [ (left-bracket), the ] (right-bracket) and then some contents. Let's break down the set I mentioned to match the alphabet [a-z]:

When you write this regex it means that grep will match one character that is specified in the set. It does not mean to match multiple character that can be found in the set. You need more regex to do that, which I'll cover in a bit.

Try a few of these commands on the last poem:

$ grep "^[A-Z]" ex04.txt
I led you astray with promises
$ ugrep "^[a-z]" ex04.txt
once burnished and unspoken
yet without you here I apprehend
for everything I may have broken.

NOTE: Remember that the ^ in this case means "anchor to the start" and not how we use it next.

Inverted Sets

You can also add a ^ (caret) to the start of a regex set to say do not match the set. This is an inverted set where you're saying you do not want these characters. We can add a ^ (caret) to the previous example like this [^a-z].

When we do this the regex will any character that's not in the alphabet a-z. So it will match an @ in an email address but not any of the letters.

Try some of these commands on the last poem:

$ grep "[j-z]$" ex04.txt
I led you astray with promises
once burnished and unspoken
$ grep "[^j-z]$" ex04.txt
yet without you here I apprehend
  unyielding vigilance
for everything I may have broken.

In this example I use a normal set, then have you compare it to the inverted version.

Escaping Inside Sets

There may be situations where you have to match the [, -, and ] characters inside the set. To do that just use the \ (backslash) character to make the regex explicitly escape them. You would do it like this [\[\-\]] which would match any character that a regex uses for sets. Let's break this down to make sure you're reading each part as individual components instead of a big blob of confusing randomness:

One way you can think of the \ (backslash) escape is it change an operator into a normal character. If you use it on \* then you change the * operator to just the * character. Another way to think of \ is it "kills" the next character, turning it into just a boring dead character instead of an alive active operator.

Combinations

A fundamental aspect of computation is combination. Everything you learn is usually designed to be combined with everything else you've learned. In fact, it's so common that if you run into something that can't be combined you'll think it's bizarre. This doesn't mean that every combination you can think of will work, but if you combine two things correctly according to the rules then they should work.

To practice this I want you to take what you know about sets, inverted sets, and everything you've learned so far to search through the two poems for different lines. See how complex you can make the combinations and still have them work. For example, what does this do:

$ grep "[aeiou][aeiou]+" ex04.txt

Here I combined sets with the + (plus) operator to find sequences of 2 or more vowels. This is what you should be trying to create.

Operators So Far

As usual, update your flash cards and keep drilling. I promise if you suffer through the pain of memorization you'll learn Regex faster than if you just flap around randomly.

You should also be combining all of these in as many ways as possible, and use grep on real data you have or that you can find.

Further Study

Previous Lesson Next Lesson

Register for Learn Regex the Hard Way

Register to gain access to additional videos which demonstrate each exercise. Videos are priced to cover the cost of hosting.