Finding Things on Linux and Understanding Regular Expressions

The Shell Built-in Wildcard Provision

September 14, 2009

Regular expressions (regexps) are a very powerful tool, allowing you to look for text strings matching a particular pattern. In this first part of a two-part series, I'm going to look at using them on the command line. The next part will cover regexps in editors and other programs.

The shell built-in wildcard provision

There's some basic regexp-type provision built into the shell: the most basic example of this is the * wildcard. This example will list every file in the current directory which has a .jpg extension:

ls *.jpg

What actually happens here is that the shell expands the *before it passes the file list to ls. So that line is really equivalent to

ls file1.jpg file2.jpg ...

In contrast, this command-line will produce the same output, but using grep with full regexp syntax (see the next section for more on grep

):

ls | grep '.*\.jpg'

This runs ls on the current directory (so listing all files), then passes the output through grep, which uses 'proper' regexps, rather than the shell built-in. Here, . means 'any character', and * means '0 or more of the preceding characters': so .* is '0 or more of any character'. The \ is used to escape the second period, so it's treated as a real period rather than a standin for 'any character'. i.e. we get files ending .jpg. Note the difference between this and the shell built-in, where a period is just treated as a real period, and * means 'any character'.

The single quotes are very important! Without the single quotes, the shell will try to do expansion before running the command, and strange things will result. Always single-quote your regular expressions on the command line.

The shell wildcard provision can be very useful. The important thing is to remember that the syntax isn't quite the same as for 'proper' regular expressions. Here's another shell built-in example, which will move all of your old logs (which on my system are named like mail.log.0.gz, system.log.1.bz2, etc) to a subdirectory:

mv *.log.[0-9].* logarchive/

[0-9] will match any character between 0 and 9: this works with proper regexps as well as with the shell built-in.