How to Use Regular Expressions TODAY in Your Windows PowerShell Code

Do you need to learn all about regular expressions before using them with PowerShell? Nope. Timothy Warner, author of Sams Teach Yourself Windows PowerShell 5 in 24 Hours, doesn't waste time with boring backstory. Learn how to combine regex with your PowerShell code to jump right into performing search-and-replace operations, validation, and more.

From the author of

From the author of

If you're a Window systems administrator (and decidedly not a programmer), I would hazard a guess that your PowerShell adoption thus far has been a bit...slow. Am I correct?

Let me speed things up for you. I'll teach you in this article how to use regular expressions (regex for short, typically pronounced REJ-ex) in your PowerShell code to parse string data with laser-like efficiency.

Suppose you're tasked with one or more of the following real-world scenarios:

The aforementioned tasks are trivial for .NET programmers: "I'll just use regex!" they say. However, if you're getting into PowerShell automation slowly, your blood might run cold at the thought of performing complicated pattern matches.

Don't stress! By the end of this article, you'll understand what regex actually does, and you'll learn how to implement regex patterns in PowerShell by using the -match operator, the -replace operator, and the Select-String cmdlet. Let's begin.

Regular Expression Basics

In a nutshell, regular expressions represent a rule set for performing pattern matching on string data. You're probably familiar with using the old MS-DOS wildcard characters. For instance, we can run the following command at the prompt to find all .xls or .xlsx files in the current folder whose names contain the word report:

C:\>dir *report*.xl?

In this example, the asterisk (*) represents zero or more characters, and the question mark (?) substitutes for any single character.

NOTE

As used in this example, the asterisk and question mark are not regular expression operators. In regex we use both the asterisk and the question mark, but they have slightly different functionality, which you'll see soon.

Open an administrative PowerShell console, and let's dive right in. We can use the -match operator to perform true/false tests against incoming string data. Doing so gives you valuable practice with both regex and PowerShell syntax.

NOTE

To learn more about PowerShell comparison operators, run the command Get-Help -Name about_*operators to view the relevant conceptual help files. These documents offer a treasure trove of useful information but, sadly, they're often overlooked by Windows systems administrators.

The following tests should both evaluate to True. Can you see why?

'project14' -match 'pro'
'project14' -match '14'

Your first regular expressions lesson is that you can perform literal matches. The subject string project14 contains both pro and 14, so both expressions evaluate to True. Of course, this question arises: Does the match value include just the matching characters, or the entire string?

Windows PowerShell populates the $matches automatic array variable with the previous regex match result. Run the previous tests again, this time adding $matches after each. In the following code, I'm using the PowerShell command separator, the semicolon (;), to keep the example compact:

Windows is not case-sensitive, so it doesn't matter whether I type $Matches, $matches, $MatCHeS, or some other combination. As long as PowerShell recognizes the command name and syntax, it will process the command correctly.

Now let's say we have a bunch of files whose names start with the word project. Do you think the following expression will result in True or False?

'project14' -match 'project*'

If you tried the previous example, you know we'll get False here. Why? Your second regular expression lesson is that some regex metacharacters operate only on the preceding character, so 'project*' can be translated as "one or more occurrences of t." Yes, that's right. With regex, you need to construct your match patterns one character at a time.

While the asterisk matches one or more occurrences of the preceding character, the question mark actually behaves much like the MS-DOS question mark wildcard. Let's say we wanted to match project10 through project19:

'project14' -match 'project1?'

A metacharacter in regex is a character (or character combination) that's processed by the regex engine in a non-literal way.

NOTE

Speaking of regex engines, a common question is what regex "flavor" PowerShell uses. Because PowerShell derives its power from the .NET Framework, we can logically (and accurately) conclude that the PowerShell regex engine is the .NET Framework itself. The .NET regex implementation, in turn, is based on the industry-standard Perl regex engine.

Let's check out another metacharacter:

'8675309' -match '\d'

The \d metacharacter is called a character class, and it matches one or more instances of (you guessed it) the preceding character in the string. You can use quantifiers to match specific occurrences. Take a look:

'8675309' -match '\d{7}'

The $matches variable should show you the entire subject string (8675309) instead of only the number 8, because the {7} denotes seven repetitions of the digit match. The following table shows other examples of using the \d character class with the { } quantifier.

Example

Interpretation

'\d{1,3}'

Match between one and three times

'\d{5,}'

Match five or more times

Regex has many character classes, but I can't explain them all here. Instead, the following table gives you a "punchlist" of my favorites.

One more regex concept before we do some "real world" examples: Put match ranges in square brackets ([ ]). The following expression should evaluate to True (be sure to inspect $matches as well):

'admin@company.com' -match '[a-z]+'

The match should have been 'admin' in this case. Yes, I sneaked in another metacharacter; in regex syntax, the plus (+) quantifier matches one or more instances of the preceding character. This is unlike the asterisk, which you'll recall matches zero or more instances of the preceding character. The range construct is awesome in regex, because your subject string might have variable length.

Using the -match Operator in the Real World

Let's say we need to parse a list of universal naming convention (UNC) paths in a text file named C:\input\servers.txt:

We need to find out (a) whether server532 exists in the file; and, if so, (b) the name(s) of any shared folder(s) hosted by that server. How can we do this? Well, the first thing we need to do is grab all the servers.txt content and import the data into our PowerShell run space:

Get-Content -Path 'C:\input\servers.txt'

That's not enough, though. We need to filter that file content by using the Where-Object cmdlet, the -match operator, and a regex expression:

You probably know that the $_ token is shorthand notation for the current object in the PowerShell pipeline. But doubtless you're wondering what \\\\ means. Get ready for regular expression lesson three: We need to escape certain characters to suppress the .NET regex engine from processing them as non-literals.

TIP

By default, Microsoft Windows is not case-sensitive. Use the -cmatch operator if you need to make a case-sensitive regex match.

The UNC example is particularly confusing because the backslash (\) is the escape character, and we need to escape the two literal backslashes that precede any UNC path.

Let's try another example. This time, we want to match \\sharepoint.company.pri from servers.txt:

Whoa, Nelly! Now we're truly getting into the thick of things. Notice that I used the shorthand \w+ construction to match one or more occurrences of a word character. Because the period/dot (.) isn't a word character, I escape the two periods in the hostname sharepoint.company.pri. Cool, eh?

CAUTION

Don't fall into the conceptual trap of thinking that there is always one preferred way to construct a valid regular expression. PowerShell, just like any programming language and even any operating system, has many different methods for accomplishing a given result. Your goal as a PowerShell administrative scripter is gradually learning to write more concise and performant code.

Introducing Select-String

For jobs when you need to dip into one or more files, find matches, and potentially make replacements, Select-String is what you need. Consider the following sample file named C:\input\customers.csv:

First of all, the names and metadata in this example are entirely fictional. Second, notice that we have a comma and no intervening space separating each column entry (this file contains comma-separated values, after all).

Now imagine that instead of four records this database file has several thousand records. We're tasked with identifying every U.S. Social Security number (SSN) in the file. As you may know, the SSN has the following general format:

I used the ForEach construct to loop through the dataset and the -replace operator to replace the SSN matches with our redaction string. The results look good, but if you open the source file, you won't see the letter X everywhere. What's up?

Well, Select-String writes MatchInfo objects to the pipeline. In order to replace the source string data, we need to operate on that source string data.

My proposed solution is to use Get-Content to "vacuum" the customers.csv text into our run space, perform the match/replace, and then export the final result set to a new file. Try the following: