Video Coming Soon...
03: Anchoring
In this exercise you'll learn about the concept of "anchoring", which refines a regex to work only on the beginning, end, or entire line of text. I'll also talk about memorizing the main characters in regex to speed up your learning. You'll continue to use the ex02.txt
poem from the previous exercise.
Memorizing Regex Operators
The special characters in regex are normally called "operators". I'll use this word when it's one of the special characters like .
and use "character" for everything else. Here's some examples:
.
-- operatora
-- charactera.a.b
character operator character operator
To get good at regex you can sit there with a text file and grep
until it sinks in organically, or you can help yourself by doing a bit of memorization. If you memorize the few regex operators you'll have an easier time reading and writing regex.
The easiest way to memorize the operators is to create some flash cards. I like using index cards with the operator on one side and the name/description on the other.
Take some time now to make some flash cards like this with the .
, \s
, \n
, \t
operators on them. Then spend some time studying them. It doesn't take a long time each day. Maybe 10-20 minutes before you do an exercise.
NOTE A word about Anki: Anki is fine but don't use a flash card set that someone else made. You actually get more value out of cards you make yourself.
The Beginning Only
Imagine you want to find all the lines in the poem that start with "I". If you try this it won't work:
grep "I" ex02.txt
When you run this you get this output:
I have one million bees
But if I squint
Even if I also say please
We don't want that second line so what can we do? That's where the ^
(caret) operator comes in. It tells grep
that you only want the beginning of each line to match. Change your regex to this:
grep "^I" ex02.txt
Which will produce this output:
I have one million bees
Even if I also say please
That's because the regex engine will do this when it sees ^
:
- If it's the start of the line...
- and the operators/characters after the
^
match... - then it's a match.
- Otherwise it's not a match, move to the next line.
Experiment #1
For this experiment I want you to attempt to write a regex that can match any lines with exactly 4 characters, but I want you to write the regex and try to run it in your head before you run it with grep. The best way to do this is write the regex in a file of your text editor, then process each line in the poem manually and paste the matching lines into a third file.
Once you have your results file, compare it the output grep
(ugrep
) gives you. If your output is different then try to figure out why.
Let's Talk About Terminal
I'm going to simplify the commands by writing them like this:
$ grep "^I" ex02.txt
I have one million bees
Even if I also say please
This is attempting to simulate what you see on your screen, but if you're on Windows it'll look a little different. Even on macOS or Linux you'll see something slightly different. Here's how to read this:
$
this is just your prompt you see on screen. On Windows it'll be a>
character and there's usually a status message before this character. Do not type this character.grep ...
This is the command you type exactly. Everything, even the spaces you see.- Once your command is correct, you hit
ENTER
and the lines after will the output of the program. In this case it's the lines thatgrep
(ugrep
) thinks matches your regex.
This will be the format I use from here on out, so make sure you understand. Do not type the $
character.
The End Only
What if you only want lines that end in e
? Once again, this doesn't work:
$ grep "e" ex02.txt
I have one million bees
None of these have knees
They won't sprint
Even if I also say please
That's too many lines because the e
can be found on all of them. To restrict the matches to only lines ending in e
we need the $
(dollar-sign) character:
$ grep "e$" ex02.txt
Even if I also say please
The $
tells grep to do this:
- If the characters and operators before the
$
match... - then check if we're at the end of the line...
- and if we are, then it's a match...
- otherwise it's not.
Experiment #2
Repeat Experiment #1 but this time use the $
(dollar-sign) for your regex.
The Entire Line
You can also combine these to select for lines that exactly match the whole line, however it's difficult for you to use this until the next exercise. For now, try this regex to see how it works:
$ grep "^They won't sprint$" ex02.txt
They won't sprint
Like I said, kind of useless, but try replacing some of these characters with the .
(dot) operator to experiment with it. In the next exercise we'll learn the operator you need to make this more useful.
Operator Review
You should now have flash cards for the following operators:
.
-- Any one character.\s
-- Any space character (tabs, newlines, spaces).\t
-- A tab explicitly.\n
-- A newline explicitly.\\
-- A backslash explicitly.^
-- Match (anchor) the start of a line.$
-- Match (anchor) the end of a line.
Be sure to spend time memorizing each character and its "translation." You want to get to where you can take a regex like ^..a\s..$
and read each character correctly like this:
anchor the start any char any char the letter a any space any char any char anchor the end
Keep doing this and you'll quickly internalize what these do.
Further Study
- Spend time writing as many regex as you can using what you know. Treat it like a game where you have to find something in the poem using the most cursed regex you can.
Register for Learn Regex the Hard Way
Register to gain access to additional videos which demonstrate each exercise. Videos are priced to cover the cost of hosting.