Regular Expression in Python

The Regular Expressions

This is the regular expression in python tutorial. As a python developer you should also be aware of the fact that a regular expression is often defined as a type of a special sequence of characters. This special sequence of characters can help you in finding other different kinds of strings. This particular function is quite commonly used in the world of UNIX. A full support is further provided by the re module.

We would be discussing the two major functions that come under this regular expressions function. However, before we begin it is important for you to know that when it comes to using the regular expression in python then we tend to use the Raw Strings in terms of their r’expression’.

The Match Function

As an individual who is using this programming language it is important for you to know that with the help of the match function you can choose to match any kind of RE patterns with the optional flags to any particular kind of string. The syntax of this particular function or of the python regular expression string is also mentioned below.

1

re.match(pattern,string,flags=0)

The table for the description of this function is also mentioned below.

Serial Number

Parameters

Descriptions

1

pattern

This is the regular expression that needs to be matched

2

string

This is the string which would be searched to match the pattern at the beginning of the string

3

flags

You can specify different flags using bitwise OR (|). These are the modifiers which are also listed in the table that is mentioned below.

It is also important for you to know that the re.match function of the python regular expression string allows you to match whenever there is a success and it shows none of any failure. While using this programming language if you wish to match any particular object to receive any matched expression then it is recommended that you should use the functions of group(num) and group ( ). The table for the match function is mentioned below.

Serial Number

Match Object Method

Descriptions

1

group(num=0)

This method returns the entire match or the specific subgroup num

2

groups ( )

This method returns all matching subgroups in a tuple and this shows empty if there were no tuples at all

The example for this is mentioned below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

#!/usr/bin/python

import re

line=“Cats are smarter than dogs”

match0bj=re.match(r’(.*)are(.*?).*’,line,re.M|re.I)

ifmatch0bj:

print“match0bj.group():“,match0bj.group()

print“match0bj.group():“,match0bj.group()

print“match0bj.group():“,match0bj.group()

else:

print“No match!!”

Once you have finished writing this code then you can successfully choose to execute this code. And the result of this code that you can get to observe is mentioned below.

1

2

3

4

5

match.0bj.group():Cats are smarter than dogs

match.0bj.group(1):Cats

match.0bj.group(2):smarter

This was one of the examples for regular expression python tester.

The Search Function

The search function is a function in this programming language that can help you in looking out for the first RE pattern that might be present inside the string which must also have some kind of optional flags. The particular syntax for this type of function is mentioned below.

re.search(pattern, string, flags=0)

The description table for this function is mentioned below.

Serial Number

Parameters

Descriptions

1

pattern

This is the regular expression that needs to be matched

2

string

This is the string which would be searched to match the pattern that might be present anywhere in the string

3

flags

You can specify the different flags using bitwise OR ( | ). These are the types of modifiers which are also listed in the table that is present below.

It is also important for you to know that if you choose to use the re.search function then with the help of that function whenever there is a success then you will see a match object. But whenever there is a failure then you will notice a none. And if you wish to get any kind of matched expression then for that you can choose to use the group(num) or the groups ( ) functions. Another example for regular expression python tester is mentioned below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

#!/usr/bin/python

import re

line=“Cats are smarter than dogs”

search0bj=re.search(r’(.*)are(.*?).*’,line,re.M|re.I)

ifsearch0bj:

print“search0bj.group():“,search0bj.group()

print“search0bj.group(1):“,search0bj.group(1)

print“search0bj.group(2):“,search0bj.group(2)

else:

print“Nothing found!”

After writing the code whenever you choose to execute it then you will observe the below mentioned results.

1

2

3

4

5

search0bj.group():Cats are smarter than dogs

search0bj.group(1):Cats

search0bj.group(2):smarter

This was an example for the regular expression python tester.

The Matching Vs. Searching

On the basis of regular expressions, this programming language offers two different types of primitive operations. Those operations include the match and the search options. The match option can help you in looking for a particular type of match only at the beginning of a string and the search option can help you in looking for a match that might be present anywhere on the entire string. This function is also similar to what the Perl does by default in this programming language. The example for regular expression python tester is mentioned below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

#!/usr/bin/python

import re

line=“Cats are smarter than dogs”;

match0bj=re.match(r‘dogs’,line,re.M|re.I)

ifmatch0bj:

print“match-->match0bj.group():“,match0bj.group()

else:

print“No match!!”

search0bj=re.search(r‘dogs’,line,re.M|re.I)

ifsearch0bj:

print“search-->search0bj.group():“,search0bj.group()

else:

print“Nothing found!!”

When this code is written and successfully executed then you will observe the below mentioned results.

No match! !

1

search-->match0bj.group():dogs

This is an example of the regular expression python tester.

The Search and Replace

It is important for you to know that one of the most important methods of re is sub. This method also uses the python regular expression string.

Syntax

re.sub(pattern, repl, string, max=0)

It is important for you to know that the method mentioned above can help you in replacing all kind of RE pattern that is present in the string with a repl. It is also important for you to know that this method helps in substituting all the occurrences unless the function of max is provided. This method of python regular expression string can also help you in returning the modified string.

For example

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

#!/usr/bin/python

import re

phone=“2004-959-559# This is a phone number”

# Delete Python-style comments

num=re.sub(r’#.*$’, “”, phone)

print“Phone Num:“”,num

# Remove anything other than digits

num=re.sub(r’\D’,“”,phone)

print“Phone Num:“,num

After completely writing this code of regular expression in python, when you execute it then the below mentioned results are displayed.

Phone Num : 2004-959-559

Phone Num : 2004959559

The Option Flags: The Regular Expression Modifier

The regular expression in python modifier is a concept in this programming language that includes a number of optional modifiers that are required to control a number of different aspects of matching. The modifiers are also specified as a type of options flag. It is also important for you to know that you can also choose to provide multiple modifiers by using the particular OR (|). This has been also shown in the previous sections.

The table for the regular expression in python modifier is mentioned below.

Serial Numbers

Modifiers

Descriptions

1

re.I

Performs case-sensitive matching

2

re.L

Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B)

3

re.M

Makes $ match the end of a line and not just at the end of the string. This also makes the ^ match at the start of any line and not just at the start of a string

4

re.S

Makes a period (dot) match any character. This also includes a newline

5

re.U

Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B

6

re.X

Permits “cuter” regular expression syntax. It ignores whitespace ( except inside a set [ ] or when escaped by a backslash) and treats the unescaped # as a comment marker

The Regular Expression Patterns

It is important for you to know that in this programming language all characters match themselves except from the control characters. The example of the control characters are + ? . * ^ $ ( ) [ ] { } | \. It is also important for you to know that you can also choose to escape any particular control character by choosing to precede it with a type of backlash. The table for the regular expression in python modifier patterns is mentioned below.

Serial Number

Patterns

Descriptions

1

^

Matches beginning of line

2

$

Matches end of line

3

.

Matches any single character except newline. Using m option allows it to match newline as well

4

[…]

Matches any single character in brackets

5

[^…]

Matches any single character not in brackets

6

re*

Matches 0 or more occurrence of preceding expressions

7

re+

Matches 1 or more occurrence of preceding expression

8

re?

Matches 0 to 1 occurrence of preceding expressions

9

re{ n }

Matches exactly n number of occurrences of preceding expression

10

re{ n, }

Matches n or more occurrences of preceding expressions

11

re{ n,m }

Matches at least n and at most m occurrences of preceding expression

12

a| b

Matches either a or b

13

(re)

Groups regular expressions and remembers matched texts

14

(?imx)

Temporarily toggles on i, m, or x options within a regular expression. If in parenthesis, only that area is affected

15

(?-imx)

Temporarily toggles off i, m, or x options within a regular expression. The area is only affected if the parenthesis is present there.

16

(?: re)

Groups regular expressions without remembering matched text

17

(?imx: re)

Temporarily toggles on i, m, or x options within parenthesis

18

(?-imx: re)

Temporarily toggles off i, m, or x options within parenthesis

19

(?#…)

Comment

20

(?= re)

Specifies positions using a pattern. It lacks a range

21

(?! re)

Specifies position using pattern negation. It further doesn’t have a range

22

(?> re)

Matches independent pattern without backtracking

23

\w

Matches word characters

24

\W

Matches nonword characters

25

\s

Matches whitespace equivalent to [\t\n\r\f]

26

\S

Matches nonwhitespace

27

\d

Matches digits equivalent to [0-9]

28

\D

Matches nondigits

29

\A

Matches beginning of string

30

\Z

Matches end of string. If a newline exists, it matches just before newline

Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code

The Regular Expression Examples

The Literal Characters

The table for the python regular expression string literal characters is mentioned below.

Serial Number

Examples

Descriptions

1

python

Match “python”.

The Character Classes

The table for the character classes is mentioned below.

Serial Number

Examples

Descriptions

1

[Pp]ython

Match “Python” or “python”

2

rub[ye]

Match “ruby” or “rube”

3

[aeiou]

Match any one lowercase vowel

4

[0-9]

Match any digit; same as [0123456789]

5

[a-z]

Match any lowercase ASCII letter

6

[A-Z]

Match any uppercase ASCII letter

7

[a-zA-Z0-9]

Match any of the above

8

[^aeiou]

Match anything other than a lowercase vowel

9

[^0-9]

Match anything other than a digit

The Special Character Classes

The table for the python regular expression string special character classes is mentioned below.

Serial Number

Examples

Descriptions

1

.

Match any character except newline

2

\d

Match a digit: [0-9]

3

\D

Match a nondigit: [^0-9]

4

\s

Match a whitespace character: [ \t\r\n\f]

5

\S

Match nonwhitespace: [^ \t\r\n\f]

6

\w

Match a single word character: [A-Za-z0-9_]

7

\W

Match a nonword character: [^A-Za-z0-9_]

The Repetition Cases

The table for the regular expression python tester repetitions cases is mentioned below.

Serial Number

Examples

Descriptions

1

ruby?

Match “rub” or “ruby”: the y is optional

2

ruby*

Match “rub” plus 0 or more ys

3

ruby+

Match “rub” plus 1 or more ys

4

\d{3}

Match exactly 3 digits

5

\d{3,}

Match 3 or more digits

6

\d{3,5}

Match 3, 4, or 5 digits

The Nongreedy Repetitions

The Nongreedy repetitions in this programming language perform the function of matching the smallest number of repetitions. The table for the regular expression python tester nongreedy repetitions is mentioned below.

Serial Number

Examples

Descriptions

1

<.*>

The greedy repetition: matches “<python>perl>”

2

<.*?>

Nongreedy: matches “<python>” in “<python>perl>”

The Grouping with Parenthesis

The table for the regular expression python tester grouping with parenthesis is mentioned below.

Serial Number

Examples

Descriptions

1

\D\d +

No group: + repeats \d

2

(\D\d) +

Grouped: + repeats \D\d pair

3

( [Pp]ython (, )?) +

Match “Python”, “Python, python, python”, etc.

The Backreferences

Backreferences in this programming language are basically used to match a previously matched group once again. The table for the regular expression python tester backreferences is mentioned below.

Serial Number

Examples

Descriptions

1

([Pp])ython&\1ails

Match python&pails or Python&Pails

2

([“”])[^\1]*\1

Single or double-quoted string. \1 matches whatever the 1st group matched. \2 matches whatever the 2nd group matched, etc

The Alternatives

The table for the python regular expression string alternatives is mentioned below.

Serial Number

Examples

Descriptions

1

python|perl

Match “Python” or “perl”

2

rub(y|le) )

Match “ruby” or “ruble”

3

Python (! + |\?)

“Python” followed by one or more ! or one ?

The Anchors

Anchors in this programming language are required to specify the exact match positions. The table for the python regular expression string anchors is mentioned below.

Serial Number

Examples

Descriptions

1

^Python

Match “Python” at the start of a string or a literal line

2

Python$

Match “Python” at the end of a string or a line

3

\APython

Match “Python” at the start of a string

4

Python\Z

Match “Python” at the end of a string

5

\bPython\b

Match “Python” at a word boundary

6

\brub\B

\B is nonword boundary: match “rub” in “rube” and “ruby” but not alone

7

Python (?=!)

Match “Python”, if followed by an exclamation point

8

Python (?!!)

Match “Python”, if not followed by an exclamation point

The Special Syntax with Parenthesis

The table for the regular expression in python special syntax with parenthesis is mentioned below.

Serial Number

Example

Descriptions

1

R (? # comment)

Matches “R”. All the rest is a comment

2

R (?i) uby

Case-insensitive while matching “uby”

3

R (?i:uby)

Case-insensitive while matching “uby”

4

rub (?:y|le) )

Group only without creating \1 backreference

With this, we finish the regular expression in python part of our Python tutorial.