Category: Education

Whenever I need to search through multiple files after a specific term, I usually turn to GNU grep. If I happen to be on a Windows computer, I normally install Cygwin that allows me to use the same tool. If all else fails, or I am unable to install Cygwin for some reason, I turn to the the built-in Windows command line tools find and findstr for emergencies.

Grep is a great tool. It’s easy to learn, and it has multiple additional functionaltity (like regular expression search patterns) for finding that particular needle in the haystack you are looking for .

In its simplest form, you normally run:

grep "search term" *

This will search through all files in the current folder and report where instances of “search term” are found.

If you however need to search through the current folder (which also happens to contain multiple subfolders), and you want to do a recursive search through all the subfolders, you would on a Linux system normally write the following:

grep -r "search term" *

And you will then get the desired response. Grep will recurse through the current directory and all subdirectories for instances of “search term”.

I got really surprised the other day when I had to do a recursive search on files with a specific suffix, like *.txt. Beforehand, I knew to a certain degree what the result would be, but grep was unable to find what I was looking for! Let me illustrate by example:

For this test we will use a random set of text files found on the internet. For my own amusement, I used the ufo archives from textfiles.com, located at http://archives.textfiles.com/ufo.tar.gz. Feel free to try out for yourself.

Now, if we want to find out which files that contain a specific search term, in the current directory and all subdirectories, the intuitive approach would be to run the following command:

grep -r "The lecture was then essentially" *.ufo

Don’t you agree? However, grep will not provide the desired response if you run this command!

Instead, it will search through all files in the *current* directory!

In order to search through all the subdirectories as well, you need to use the following command:

grep -r "The lecture was then essentially" * --include=*.ufo

This will give the desired output. Note that the file search pattern has been modified from “*.ufo” to “*”.

I haven’t confirmed my theory on this yet, but my assumption is that “*.ufo” is interpreted by the shell first and then passed on to grep. Grep would then think (since the shell interpreted your input first hand) you would only be interested in the files listed in the current directory.

Now that you know better, you won’t make the mistake I did (and probably have done several times earlier without knowing better!). Happy searching!