Python RegEx:

Regular Expressions can be used to search, edit and manipulate text. This opens up a vast variety of applications in all of the sub-domains under Python. Python RegEx is widely used by almost all of the startups and has good industry traction for their applications as well as making Regular Expressions an asset for the modern day programmer.

In this Python RegEx blog, we will be checking out the following concepts:

Let’s begin this Python RegEx blog by checking out why we need to make use of Regular Expressions.

Why Use Regular Expression?

To answer this question, we will look at the various problems faced by us which in turn is solved by using Regular Expressions.

Consider the following scenario:

You have a log file which contains a large sum of data. And from this log file, you wish to fetch only the date and time. As you can look at the image, readability of the log file is low upon first glance.

Regular Expressions can be used in this case to recognize the patterns and extract the required information easily.

Consider the next scenario – You are a salesperson and you have a lot of email addresses and a lot of those addresses are fake/invalid. Check out the image below:

What you can do is, you can make use of Regular Expressions you can verify the format of the email addresses and filter out the fake IDs from the genuine ones.

The next scenario is pretty similar to the one with the salesperson example. Consider the following image:

How do we verify the phone number and then classify it based on the country of origin?

Every correct number will have a particular pattern which can be traced and followed through by using Regular Expressions.

Next up is another simple scenario:

We have a Student Database containing details such as name, age, and address. Consider the case where the Area code was originally 59006 but now has been changed to 59076. To manually update this for each student would be time-consuming and a very lengthy process.

Basically, to solve these using Regular Expressions, we first find a particular string from the student data containing the pin code and later replace all of them with the new ones.

Regular expressions can be used with multiple languages. Such as:

Java

Python

Ruby

Swift

Scala

Groovy

C#

PHP

Javascript

There is other ‘n’ number of scenarios in which Regular Expressions help us. I will be walking you through the same in the upcoming sections of this Python RegEx blog.

So, next up on this Python RegEx blog, let us look at what Regular Expressions actually are.

What Are Regular Expressions?

A Regular Expression is used for identifying a search pattern in a text string. It also helps in finding out the correctness of the data and even operations such as finding, replacing and formatting the data is possible using Regular Expressions.

Consider the following example:

Among all of the data from the given string, let us say we require only the City. This can be converted into a dictionary with just the name and the city in a formatted way. The question now is that, can we identify a pattern to guess the name and the city? Also, we can find out the age too. With age, it is easy, right? it is just an integer number.

How do we go about with the name? If you take a look at the pattern, all of the names start with an uppercase. With the help of the Regular expressions, we can identify both the name and the age using this method.

What is common in the string? You can see that the letters ‘a’ and ‘t’ are common among all of the input strings. [shmp] in the code denotes the starting letter of the words to be found. So any substring starting with the letters s, h, m or p will be considered for matching. Any among that and compulsorily followed by ‘at’ at the end.

Output:

hat
mat
pat

Do note that they are all case sensitive. Regular expressions have amazing readability. Once you get to know the basics, you can start working on them in full swing and it’s pretty much easy, right?

Next up on this Python RegEx blog, we will be checking out how we can match a range of characters at once using Regular Expressions.

Matching series of range of characters:

We wish to output all the words whose first letter should start in between h and m and compulsorily followed by at. Checking out the following example we should realize the output we should get is hat and mat, correct?

Found the subtle difference? We have added a caret symbol(^) in the Regular Expression. What this does it negates the effect of whatever it follows. Instead of giving us the output of everything starting with h to m, we will be presented with the output of everything apart from that.

The output we can expect is words which are NOT starting with letters in between h and m but still followed by at the last.

Output:

sat
pat

Next up on this Python RegEx blog, I will explain how we can replace a string using Regular Expressions.

Replacing a string:

Next up, we can check out another operation using Regular Expressions where we replace an item of the string with something else. It is very simple and can be illustrated with the following piece of code:

In the above example, the word rat is replaced with the word food. The final output will look like this. The substitute method of the Regular Expressions is made use of this case and it has a vast variety of practical use cases as well.

Output:

hat food mat pat

Next up on this Python RegEx blog, we will check out a unique problem to Python called the Python Backslash problem.

The Backslash Problem:

Consider an example code shown below:

import re
randstr = "Here is Edureka"
print(randstr)

Output:

Here is Edureka

This is the backslash problem. One of the slashes vanished from the output. This particular problem can be fixed using Regular Expressions.

Problem statement – To verify the validity of an E-mail address in any scenario.

Consider the following examples of email addresses:

Anirudh@gmail.com

Anirudh @ com

AC .com

123 @.com

Manually, it just takes you one good glance to identify the valid mail IDs from the invalid ones. But how is the case when it comes to having our program do this for us? It is pretty simple considering the following guidelines are followed for this use-case.

As you can check out from the above output, we have one valid mail among the 4 emails which are the inputs.

This basically proves how simple and efficient it is to work with Regular Expressions and make use of them practically.

Web Scraping

Problem Statement – Scrapping all of the phone numbers from a website for a requirement.

To understand web scraping, check out the following diagram:

We already know that a single website will consist of multiple web pages. And let us say we need to scrape some information from these pages.

Web scraping is basically used to extract the information from the website. You can save the extracted information in the form of XML, CSV or even a MySQL database. This is achieved easily by making use of Python Regular Expressions.

We first being by importing the packages which are needed to perform the web scraping. And the final result comprises of the phone numbers extracted as a result of the web scraping done using Regular Expressions.

Conclusion

I hope this Python RegEx tutorial helps you in learning all the fundamentals needed to get started with using Regular Expressions in Python.

This will be very handy when you are trying to develop applications that require the usage of Regular Expressions and similar principles. Now, you should also be able to use these concepts to develop applications easily with the help of Regular Expressions and Web Scraping too.

After reading this blog on Python RegEx tutorial, I am pretty sure you want to know more about Python. To know more about Python you can refer the following blogs: