Video Coming Soon...

Created by Zed A. Shaw Updated 2024-10-08 04:45:56

04: Basic Repetition

You are now about to learn the operators that make regex shine. These operators tell grep to repeatedly match characters in a pattern. It's how you make grep find lines with long repetition or big gaps in the line. When you combine these with the operators you've learned so far you can do about 80-90% of what you need from a regex.

We Need a Better Poem

Let's change the poem to a new one just for practice. Create a text file named ex04.txt and put it in your Documents folder just like you did with ex02.txt. Refer back to Exercise 02 if you forgot how to do that.

Here's the new poem:

I led you astray with promises
once burnished and unspoken
yet without you here I apprehend
  unyielding vigilance
for everything I may have broken.

Be sure to get those two space characters at the start of line 4. They totally make the poem.

One or Zero

The easiest form of repetition is to match "zero or one" characters with the ? (question-mark) operator. You can also call this the "optional" operator, which is more how I think about it. When regex sees this it will look at the character or operator before it, and if there is 1 or 0 matches then this also matches.

Let's match any lines that have the letter o or an optional n after the o:

$ grep "on?" ex04.txt
once burnished and unspoken
yet without you here I apprehend
for everything I may have broken.

Notice how it's matching the on in the first result, and also the o in "unspoken"? Look at the other result lines and confirm you know why it matched that line.

Zero or More

The ? is nice, but it's be better if we could match many characters or operators. The * (asterisk) will match zero or more of the previous character or operator. That means if you write p* it will match either zero p characters, or one million p characters.

Try this regex on ex04.txt:

$ grep "ap*" ex04.txt
I led you astray with promises
once burnished and unspoken
yet without you here I apprehend
  unyielding vigilance
for everything I may have broken.

Fairly boring since every line has an a or an ap but now try this one:

$ grep "p.*" ex04.txt
I led you astray with promises
once burnished and unspoken
yet without you here I apprehend

See how combining the . (dot) with * (asterisk) matches a much larger piece of each line? This tells regex to match "zero or more of any character." In this case the regex is matching "p followed by zero or more of any character."

Remember that each operator is really a whole operation or phrase, so we can break this regex down to come up with this translation:

Which becomes, "p character followed by zero or more of any character." You could also write it as, "p character followed by any character (zero or more)."

We call this "repetition" in regex because you're telling to the regex engine inside grep to "repeat the last character or operator zero or more times."

One or More

It's puzzle time again. If the * is how you repeat the last regex "zero or more times" then you should be able to figure out the + which means "one or more times." Using this knowledge, how would you write a regex that found online lines with "y followed by one or more o characters?"

See the Further Study section for the answer.

Terminal/shell Regex are Different

If you're experienced with using the Terminal (shell) then you may notice that the grep regex is different from the shell regex. That's because the shell has to make it easy for you to match characters like . and extensions, so it lets you do ls *.txt to mean "anything ending in .txt." Most shell regex also predate widespread adoption of regex so they're a little off.

Operators So Far

As with the previous exercise, here's what should be on your flash cards:

Be sure to update your flash cards with the three new operators, and do a bit of study before and after each exercise. I promise this will help internalize what these symbols actually mean faster.

Further Study

  1. Answer: grep "yo+" ex04.txt
  2. Spend as much time as you can writing more and more complex regex on ex04.txt.
  3. Find another text file you have and use grep (ugrep) on it.
Previous Lesson Next Lesson

Register for Learn Regex the Hard Way

Register to gain access to additional videos which demonstrate each exercise. Videos are priced to cover the cost of hosting.