Video Coming Soon...
00: Introduction
Most of the things you do with a computer are related to processing inputs. If you're dealing with files, then you're usually processing text or binary data in a similar way. If you're opening an image file, then you're doing some kind of input processing. Want to convert a CSV file to Excel? That's going to require processing the CSV. Even if it's not a data file you're still doing some kind of input process. The text you type into your browser window to access a website and even the mouse clicks require some form of input processing.
Another term for "input processing" is parsing. Parsing text involves creating a logical piece of software that understands the structure of the input, and organizes it into an internal structure so you can work on it. When you read that CSV file you are parsing it because you create rules that say things like, "When I see a comma, do this, and when I see a number, do that." Whether people realize it or not, a ton of software is mostly just parsing inputs, even if those inputs aren't text.
You can extend the concepts in parsing to so many things in computer science that it becomes one of the central components of computation. Need to control the state of a User Interface when a user clicks around with a mouse randomly? Parsing. Need to organize the data packets that a server receives so they make sense? Parsing. Need to convert a stream of telemetry data into a meaningful log? Parsing. The techniques in parsing are literally found everywhere in computer science.
And, the first step to understanding parsing is to learn about regular expressions. You can "get by" without them, but learning regex teaches you the first baby steps of parsing. If you can learn how regex work, then the rest of parsing is effectively just higher level regex.
Not Just for Programmers
I don't want you to think that regex are only for programmers. Far, far, from it. Everyone from system administrators to data processors to just people who want to manage tons of files on their computers can benefit from learning regex. If you want to get into managing Linux, then regex is useful. If you have mountains of data to process, regex will help you. The list goes on and on.
Learning regex also teaches you an important way of thinking that helps with later learning programming concepts: Characters in many programming languages have multiple complex meanings. Learning this concept is difficult at first, but once you do it helps with learning any other programming language you'll encounter, and even with learning advanced mathematics.
Why Regex are Difficult to Learn
People really hate regex. They say they're hard to read, difficult to learn, and don't make sense. You could say that about anything in programming, but there is a reason why even programmers find regex difficult to learn:
Most people read whole words, not individual characters, but with regex each character is a whole word or even a phrase.
Let's look at an example:
[a-z]*\.txt
If you've been reading and writing English or another language your whole life you'll see that as a "word." Since your brain is trained to consider a single word as a single concept you can't make sense of it immediately. You probably see that and think, "blagazzstartext?" However, this is actually more like a whole sentence of instructions:
Start a set that is all letters
a
throughz
repeated zero or more times followed explicitly by.txt
.
If I break this down by each character you get this "translation":
[
- start a seta-z
- that is all letters a through z]
- end of set*
- repeated zero or more times\
- explicitly.txt
- followed explicitly by .txt
This "information density" is both the regex super power and its curse. With very little information you can craft insanely powerful text processing machines, but that minuscule information also makes them hard to understand without some training.
This course is that training. You'll learn what each of the characters in a regex mean, what they do, and how to use them to process input efficiently. At first it'll be tough, but if you take it slow you'll start to see how powerful and useful they are.
How to Learn Regex
To learn regex you need to do three things:
- Memorize what each character does, as in the operation it performs on text. There's only about 9 to remember, so it's something almost anyone can do.
- Learn how these characters combine to create more complex "text machines."
- Try to use them over and over until you "get it."
That last one sounds like you can't do it unless you're special, but it really is just a natural result of doing #1 and #2. If you memorize the 9 different characters, then it's possible to write and read more complex regex. You then apply this to a series of challenges and to your own work. Eventually your brain stops treating the symbols as one big inscrutable word and you can understand them.
One thing that helps immensely during this process is having a "cheat sheet" available to look up this information while you're using it. If you keep trying to memorize, and constantly looking up what each operation does, then you'll get it. Just takes time and effort, not magic skill or special talent.
Unanswered Questions
These are production notes for me to research things to include. You can safely ignore them unless you have an idea of answers.
- Unicode? link
- Lookahead, Lookbehind, etc.
- Point people to list of advanced topics to research
- Demo in a programming language? Python or JavaScript
- Vim Tricks?
Register for Learn Regex the Hard Way
Register to gain access to additional videos which demonstrate each exercise. Videos are priced to cover the cost of hosting.