Video Coming Soon...
02: Basic Matching
In this exercise you'll create a simple little text file and use grep
(ugrep
on Windows) to search for lines in the file. I'll also walk through exactly how a regex work, but keep in mind you'll have to play with them for a while before they really make sense. Just keep experimenting and testing your ideas and you'll get it.
The Setup
To do this exercise you'll need to create a little text file with some text to search. Let's use a little poem:
I have one million bees
None of these have knees
But if I squint
They won't sprint
Even if I also say please
Thank you. Thank you. I'll be off in my studio writing my Nobel Prize acceptance speech now.
Save this file in your Documents
folder so you can access it easily and name it ex02.txt
.
You'll need to create this text file with a simple text editor like Geany but not with a word processor like Microsoft Word. I repeat, do not use Microsoft Word.
Finding Your Text File
If you did as instructed and saved the text file into Documents
then you can do this to get your Terminal
to the same location:
- Start
Terminal
. - Type
cd ~/Documents
the~
is called a "tilde" character and is above the ` (backtick) character on my keyboard but might be somewhere else on yours. - This command will
change directory to Home slash Documents
. - Once you are there you should be able to type
cat ex02.txt
and see the poem printed to your screen. - If you can't, then you did something wrong and need to find the file.
If you can't do cat ex02.txt
then probably the best thing to do is create the file again but be very sure that you are saving it in Documents
. Another way is to do this:
- Use your mouse to find the file on your computer. Basically, how would you find the file so you can double click it and open it?
- Start
Terminal
like normal. - Type
cd
(that'scd
then space). - Grab the file with your mouse and drag it into your
Terminal
window then let go. - When you do it will print out the real location of the file into your terminal. To
cd
to this location, just delete theex02.txt
on the end (use your arrow keys to get to the end) and hitENTER
to submit the commandcd
you typed in #3.
Now, if this happened you need to figure out why. One major thing that programming teaches is paying attention to what you do. If you saved the file in a weird place, take the time to go back and find out why you did that, then try not to do it again.
Before You Continue
You should now be setup to actually play with this file. Confirm you have this:
- A text file named
ex02.txt
inDocuments
. - A
Terminal
open for you to type commands. - Your poem printed to the screen with
cat ex02.txt
.
If you don't have this then go back and try again.
Your First Regex
A regular expression is constructed with a mixture of the text you want to find, and patterns that add additional steps in the search. At the most basic level you can search for an exact word in the poem like this:
grep bees ex02.txt
If you type this into your Terminal
you should see this:
$ grep bees .\ex02.txt
I have one million bees
If you're on Windows then you would see this (because you typed ugrep
):
PS C:\Users\lcthw\Documents> ugrep bees .\ex02.txt
I have one million bees
WARNING From now on I'm only going to show the
grep
version of these commands and assume you know to typeugrep
.
You can type any sequence of characters that are also in the ex02.txt
poem, but you can use additional operators to match patterns. The simplest pattern is the "anything" operator:
grep ..ees ex02.txt
I have one million bees
None of these have knees
You can see that I typed ..ees
which means, "Match any two characters (..) then ees
." Which is why it matches bees
and knees
.
Experiment #1
Take the time now to use this to match as many other lines as possible.
- What is the one regex that will match the most lines that has both
.
and at least one other character? - How would you match a space character? Try putting
"
(double-quote) around your regex.
Matching Blank Space
If you want to match a space then you put "
(double-quote) around the regex, but you can also use the \
character to specify different blank space characters:
\s
-- any space character (tabs, newlines, etc)\t
-- tabs\n
-- newlines
The \
character is like saying, "Treat the next character as a command." It can also mean, "Treat the next character literally." We'll experiment with that in the next exercise.
Experiment #2
- How would you write a regex to match any character after a space?
- What if someone isn't using actual space characters, but instead using tabs?
Further Study
- Find your own text file and search for things in it.
- Use
grep
to search for file names withls | grep <pattern>
but replace<pattern>
with a search pattern from this exercise. - Research the
|
(pipe) character to find out what it does. Best way to research is search online forshell pipe character
orpowershell pipe character
.
Register for Learn Regex the Hard Way
Register to gain access to additional videos which demonstrate each exercise. Videos are priced to cover the cost of hosting.