Linux Regular Expressions Fundamentals

Jithin

5 Jan 2017

Regular expressions is a pattern-matching language used for enabling applications to sift through data looking for specific content. In addition to vim, grep, and less using regular expressions, programming languages such as Perl, Python, and C all use regular expressions when using pattern-matching criteria. Regular expressions are a language of their own, which means they have their own syntax and rules. In this tutorial, we will can take a look at the syntax used in creating regular expressions, as well as showing some examples of using regular expressions

A simple regular expression

The simplest regular expression is an exact match. An exact match is when the characters in the regular expression match the type and order in the data that is being searched. Suppose that a user was looking through the following file of data looking for all occurrence of the pattern cat:

Cat

Dog

Concatenate

Dogma

Category

Educated

Family

Vindication

Chilidog

Cat is an exact match of a c, followed by an a, followed by a t. using cat as the regular expression while searching the previous file gives the following matches:

Cat

Concatenate

Category

Educated

Vindication

Using line anchors

The previous section used an exact match regular expression on a file of data. Note that the regular expression would match the data no matter where on the line it occurred: beginning, end, or middle of the word or line. One way that can be used to control the location of where the regular expression looks for a match is line anchor.

Use a ^, a beginning of line anchor, or $, an end of line anchor. Using the file from earlier:

Cat

Dog

Concatenate

Dogma

Category

Educated

Family

Vindication

Chilidog

To have the regular expression match cat, but only if it occurs at the beginning of the line in the file, use ^cat. Applying the regular expression ^cat to the data would yield the following matches:

Cat

Category

If users only wanted to locate lines in the file that ended with dog, use that exact expression and an end of line anchor to create the regular expression dog$. Applying dog$ to the file would find two matches:

dog

Chilidog

If users wanted to make sure that the pattern was the only thing on a line, use both the beginning and end of line anchors. ^cat$ would locate only one line in the file, one with a beginning of a line, a c, followed by an a, followed by a t, and ending with an end of line. Another type of anchor is the word boundary. \< and \> can be used to respectively match the beginning and end of a word.

Wildcards and multipliers

Regular expressions use a . as the unrestricted wildcard character. A regular expression of c . t will look for data containing a c, followed by any one character, followed by a t. Example of data that would match this regular expression’s pattern are cat, cot, and cut, but also c5t and cQt. Another type of wildcard used in a regular expression is a set of characters at a specific character position. When using an unrestricted wildcard, users could not predict the character that would match the wildcard; however, if users wanted to only match the words cat, cot, and cut, but not odd items like c5t, cQt, replace the unrestricted wildcard with one where accepted characters are specified. If the regular expression was changed to c[aou]t, it would be specifying that the regular expression should match pattern that starts with a c, are followed by an a or an o or a u, followed by a t.

Multipliers are a mechanism used often with wildcards. Multipliers apply to the previous character in the regular expression. One of the more common multipliers used is *.A*, when used in a regular expression, modifies the previous character to mean zero o infinitely many of that character. If a regular expression of c. *t was used, it would match cut, cat, coat, culvert, etc.; any data that started with a c, then zero to infinitely many characters, ending with a t. Another type of multiplier would indicate the number of previous characters desired in the pattern. An example of using an explicit multiplier would be c.\ {2\} t. using this regular expression, users are looking for data that begins with a c, followed by exactly any two characters, ending with a t.

If you need any further assistance please contact our support department.