Video Coming Soon...
04: Basic Repetition
You are now about to learn the operators that make regex shine. These operators tell grep
to repeatedly match characters in a pattern. It's how you make grep find lines with long repetition or big gaps in the line. When you combine these with the operators you've learned so far you can do about 80-90% of what you need from a regex.
We Need a Better Poem
Let's change the poem to a new one just for practice. Create a text file named ex04.txt
and put it in your Documents
folder just like you did with ex02.txt
. Refer back to Exercise 02 if you forgot how to do that.
Here's the new poem:
I led you astray with promises
once burnished and unspoken
yet without you here I apprehend
unyielding vigilance
for everything I may have broken.
Be sure to get those two space characters at the start of line 4. They totally make the poem.
One or Zero
The easiest form of repetition is to match "zero or one" characters with the ?
(question-mark) operator. You can also call this the "optional" operator, which is more how I think about it. When regex sees this it will look at the character or operator before it, and if there is 1 or 0 matches then this also matches.
Let's match any lines that have the letter o
or an optional n
after the o
:
$ grep "on?" ex04.txt
once burnished and unspoken
yet without you here I apprehend
for everything I may have broken.
Notice how it's matching the on
in the first result, and also the o
in "unspoken"? Look at the other result lines and confirm you know why it matched that line.
Zero or More
The ?
is nice, but it's be better if we could match many characters or operators. The *
(asterisk) will match zero or more of the previous character or operator. That means if you write p*
it will match either zero p
characters, or one million p
characters.
Try this regex on ex04.txt
:
$ grep "ap*" ex04.txt
I led you astray with promises
once burnished and unspoken
yet without you here I apprehend
unyielding vigilance
for everything I may have broken.
Fairly boring since every line has an a
or an ap
but now try this one:
$ grep "p.*" ex04.txt
I led you astray with promises
once burnished and unspoken
yet without you here I apprehend
See how combining the .
(dot) with *
(asterisk) matches a much larger piece of each line? This tells regex to match "zero or more of any character." In this case the regex is matching "p
followed by zero or more of any character."
Remember that each operator is really a whole operation or phrase, so we can break this regex down to come up with this translation:
p
-- thep
character.
-- any character*
-- zero or more of the previous
Which becomes, "p
character followed by zero or more of any character." You could also write it as, "p
character followed by any character (zero or more)."
We call this "repetition" in regex because you're telling to the regex engine inside grep
to "repeat the last character or operator zero or more times."
One or More
It's puzzle time again. If the *
is how you repeat the last regex "zero or more times" then you should be able to figure out the +
which means "one or more times." Using this knowledge, how would you write a regex that found online lines with "y
followed by one or more o
characters?"
See the Further Study section for the answer.
Terminal/shell Regex are Different
If you're experienced with using the Terminal (shell) then you may notice that the grep
regex is different from the shell regex. That's because the shell has to make it easy for you to match characters like .
and extensions, so it lets you do ls *.txt
to mean "anything ending in .txt
." Most shell regex also predate widespread adoption of regex so they're a little off.
Operators So Far
As with the previous exercise, here's what should be on your flash cards:
.
-- Any one character.\s
-- Any space character (tabs, newlines, spaces).\t
-- A tab explicitly.\n
-- A newline explicitly.\\
-- A backslash explicitly.^
-- Match (anchor) the start of a line.$
-- Match (anchor) the end of a line.?
-- Match zero or none.*
-- Match zero or more.+
-- Match one or more.
Be sure to update your flash cards with the three new operators, and do a bit of study before and after each exercise. I promise this will help internalize what these symbols actually mean faster.
Further Study
- Answer:
grep "yo+" ex04.txt
- Spend as much time as you can writing more and more complex regex on
ex04.txt
. - Find another text file you have and use
grep
(ugrep
) on it.
Register for Learn Regex the Hard Way
Register to gain access to additional videos which demonstrate each exercise. Videos are priced to cover the cost of hosting.