Our first regex pattern

The purpose of the compile method is to compile the regex pattern which will be used for matching later.
It's advisable to compile regex when it'll be used several times in your program. Resaving the resulting regular expression object for reuse, which re.compile does, is more efficient.

To add some regular expression inside the raw string notation, we'll put some special sequences to make our work easier.

They are simply a sequence of characters that have a backslash \ character.
For instance,
\d is a match for one digit [0-9]
\w is a match for one alphanumeric character. This means any ASCII character that's either a letter or a number [a-z A-Z 0-9]

It's important to know them since they help us write simpler and shorter regex.

Here's a table with more special sequences

Free Node eBook

Build your first Node apps and learn server-side JavaScript.

📧

Thank you!

You have successfully joined the Scotchy super duper web dev awesome mailing list.

Element

Description

.

This element matches any character except \n

\d

This matches any digit [0-9]

\D

This matches non-digit characters [^0-9]

\s

This matches whitespace character [ \t\n\r\f\v]

\S

This matches non-whitespace character [^ \t\n\r\f\v]

\w

This matches alphanumeric character [a-zA-Z0-9_]

\W

This matches any non-alphanumeric character [^a-zA-Z0-9]

Points to note:

**[0-9] is the same as [0123456789] **

\d is short for [0-9]

\w is short for [a-zA-Z0-9]

[7-9] is the same as [789]

Having learned something about special sequences, let's continue with our coding. Write down and run the code below.

import re
pattern = re.compile(r"\w")# Let's feed in some strings to match
string ="regex is awesome!"# Then call a matching method to match our pattern
result = pattern.match(string)print result.group()# will print out 'r'

The match method returns a match object, or None if no match was found.
We are printing a result.group(). The group() is a match object method that returns an entire match. If not, it returns a NoneType, which mean there was no match to our compiled pattern.

You may wonder why the output is only a letter and not the whole word. It's simply because \w sequence matches only the first letter or digit at the start of the string.
We've just wrote our first regex program!

First, notice there's no re.compile this time. Programs that use only a few regular expressions at a time don't have to compile a regex. We therefore don't need re.compile for this.
Next, re.match() takes in an optional string argument as well, so we fed it with the line variable.
Moving on swiftly!

The search method, like the match method, can also take an extra argument.
The re.MULTILINE simply tells our method to search on multiple lines that have been separated by the new line space character if any.

But what if we wanted to find all instances of words in a string?
Enter re.findall.

re.findall() finds all the matches of all occurrences of a pattern, not just the first one as re.search() does. Unlike search which returns a match object, findall returns a list of matches.
Let's write and run this functionality.

Sometimes you might encounter this (?=) in regex. This syntax is defines a look ahead.
Instead of matching from the start of the string, match an entity that's followed by the pattern.
For instance, r"a (?=b)" will return a match a only if it's followed by b.

The pattern tries to match the closest string that is followed by a space character and the word fox.

Let's look at another example. Go ahead and write this snippet:

"""
Match any word followed by a comma.
The example below is not the same as re.compile(r"\w+,")
For this will result in [ 'me,' , 'myself,' ]
"""
pattern = re.compile(r"\w+(?=,)")
res = pattern.findall("Me, myself, and I")print res

The above regex tries to match all instances of characters that is followed by a comma
When we run this, we should print out a list containing: [ 'Me', 'myself' ]

What if you wanted to match a string that has a bunch of this special regex characters?
A backlash is used to define special characters in regex. So to cover them as characters in our pattern string, we need to escape them and use '\'.

For a real world application, here's a function that monetizes a number using thousands separator commas.

import re
number =input("Enter your number\n")defmonetizer(number):"""This function adds a thousands separator using comma characters."""
number =str(number)try:iftype(int(number))==int:# Format into groups of three from the right to the left
pattern = re.compile(r'\d{1,3}(?=(\d{3})+(?!\d))')# substitute with a comma then returnreturn pattern.sub(r'\g<0>,', number)except:return"Not a Number"# Function call, passing in number as an argumentprint monetizer(number)

As you might have noticed, the pattern uses a look-ahead mechanism. The brackets are responsible for grouping the digits into clusters, which can be separated by the commas.
For example, the number 1223456 will become 1,223,456.

Congratulations for making it to the end of this intro! From the special sequences of characters, matching and searching, to finding all using reliable look aheads and manipulating strings in regex – we've covered quite a lot.

There are some advance concepts in regex such as backtracking and performance optimization which we can continue to learn as we grow. A good resource for more intricate details would be the re module documentation.

Great job for learning something that many consider difficult!
If you found this helpful, spread the word.