Video Coming Soon...

Created by Zed A. Shaw Updated 2024-10-08 04:45:56

06: Advanced Repetition

I'll be honest with you and admit that I never use these. It's not because these operations aren't useful, it's more that I find most of the other operations do everything I need. I thnk you may find the same thing, so consider this exercise more of an "expansion" exercise. You learn this more to stretch your understanding of regex than to learn something you'll use constantly.

New Test File

Make a new file for this exercise named ex06.txt and put this in it:

444-55-5555
22-234567
4444 5555 6666 7777 8888
+91 444 44444
555-624-3476

These are common formats you'll find in business. The first is a US Social Security Number (SSN). The second is an EIN, which is like an SSN for businesses. The third is a credit card number. Then there's various phone numbers from different countries.

Exactly i Sequences

If you need to match a fixed length of characters or numbers then this is what you want. You use it the same way you use other repetition (*, +) but instead you write { (left-curly-bracket), a count, and } (right-curly-bracket). For example, if you want to match a US SSN you may try to write something like this:

[0-9]+-[0-9]+-[0-9]+

If you run this on the sample file for this exercise you'll realize this doesn't work:

ugrep "[0-9]+-[0-9]+-[0-9]+" .\ex06.txt
444-55-5555
555-624-3476

See how it also picks up the phone number? You only want the SSN 444-55-5555. This is where the exact sequence comes in:

[0-9]{3}-[0-9]{2}-[0-9]{4}

You replace the + or * in a regex with the {3} and it will only match 3 of the previous pattern. Running this regex we get the correct answer:

ugrep "[0-9]{3}-[0-9]{2}-[0-9]{4}" .\ex06.txt
444-55-5555

If we break this down we get this:

Make sure you go through and confirm you understand each part I describe, then take the time to write a regex and find each one of these lines but none of the others.

The "What then How Much" Pattern

Hopefully you're understanding a common pattern in regex:

  1. Write the thing you want to match, or not to match.
  2. Then write how much of that thing to match.

You can see this pattern in most of the regex you've used so far:

This is effectively backwards from how you might say these kinds of patterns normally. You would say:

"Match 3 numbers."

You wouldn't say:

"Match numbers, 3 only."

Which is one of the reasons why regex are confusing to people. Once you get used to this though it's not too difficult to understand.

Between i and j, Inclusive

It's now fairly easy to understand the next sequence operation of {i,j}. It says "find i through j inclusive" numbers of matches. The word "inclusive" is important. It means that it includes the number for i and the number for j. Another way to say inclusive is, "Up to and including j." For example, if you write this {3,4} it will find 3 up to and including 4, so 3 or 4.

Using this knowledge try to find two lines at a time with one regex. For example, find SSNs and Phone Numbers.

i or More sequences

Finally we have the {i,} sequence, which means i or more occurrences. You can think of it like + but it starts with a minimum number. The + will find 1 or more, but {i,} will find any arbitrary minimum number of occurrences. For example, if you do [0-9]{3,} it will match 3 or more numbers.

Using this knowledge, try to match other lines in more combinations. You should also try adding number formats you know to this file and match those exactly.

Your Operators So Far

Time to update your flash cards. As I mentioned in the beginning, I don't really use these but learn them anyway because they might come up in some rare situations.

Some Uses For Sequences

The only place I've really found a use for this is input validation. They aren't as useful for searching, but they do help you confirm that input from users will match a format you need. For example, if you want people's phone numbers on a website then you can use the [0-9]{3}-[0-9]{3}-[0-9]{4} to confirm it fits the US format.

However, keep in mind that it's pretty hard to get validations like this right. No matter how "correct" you think a format is you'll always find some part of the world that surprises you.

Further Study

Previous Lesson Next Lesson

Register for Learn Regex the Hard Way

Register to gain access to additional videos which demonstrate each exercise. Videos are priced to cover the cost of hosting.