Wizard Boot Camp, Part Eight: Utilities You Should Know

If you’re a heavy user of scripting languages, like Python, you may find it faster to write a few lines of Python code than to type a long command line (or a loop) at a shell prompt. But Linux systems have a long heritage — from Unix, and from other systems too — of power-packed utility programs that do a particular job particularly well. Even if Ruby or Perl code drips from your fingers, passing some data to a shell or a utility can sometimes make quick work of hacks that aren’t part of Your Favorite Interpreter’s feature set.

Some of the contents of /bin and /usr/bin can be surprising — even for the grey-bearded wizards who’ve run Linux systems for decades. Here’s the first of a several “Boot Camp” columns to browse through and ask yourself whether you’ve used all of the non-obvious features of these obvious utilities. You long-time wizards might also want to read the series “What’s GNU in Old Utilities?” — its first article is online at http://www.linux-mag.com/id/2048.

cat

What use is a utility that reads files, or standard input, and writes them to standard output?

One good reason is cat‘s filtering options: -b and -n for line numbering; -s for removing multiple empty lines; and -v (and more) for showing “unprintable” characters. (See the man or info page.)

Or consider this little shell script named foo:

#!/bin/sh
{
prog1
cat - "$@"
prog2
} > outfile

It’s using cat as “plumbing” to insert text from the script’s standard input, and from any files named on foo‘s command line, into the proper sequence of output being written to outfile. (We covered the shell’s curly-brace operators in the second column of this Boot Camp series, available online at http://www.linux-mag.com/id/4705.) So, if you run foo this way:

$ aprog | foo infile[12]

then outfile will hold the following data, in the following order:

the output of prog1

the output of aprog

the contents of infile1

the contents of infile2

the output of prog2

Like many standard Linux utilities, if you put a single dash (-) on its command line, cat will read its standard input at that point. Here, cat reads its standard input (which comes from aprog), then reads the files infile1 and infile2 (which the shell expanded from the wildcard pattern infile[12]).

date and touch

These tiny utilities show, and set, times of files and the time on the system clock.

By itself, date shows the time in your system’s timezone. Setting the TZ environment variable tells date to use a timezone different than the system default; timezone names are typically listed under /usr/share/zoneinfo. So, for instance, to check the times in your location, in Poland, and in western Australia, you could pass temporary values of TZ to date this way:

Options include -u to show UTC (GMT) time and -R for RFC-2822 time (used in email and other messages). An argument starting with a plus sign lets you choose almost any output format. As an example:

$ date "+It's %A at %l:%M %p."
It's Wednesday at 12:27 PM.

Other uses are in setting file timestamps and using them for time-tracking. For instance, typing touch last-report-printed saves the current time as the last-modified date of the file last-report-printed. You can use that timestamp file later:

The first command shows the timestamp. The second searches the directory /reports for all files modified more recently than the timestamp file, then sends those files to the printer with lpr. (The -print0 and -0 options keep “unusual” filenames from causing problems, as explained in Filename Trouble, online at http://www.linux-mag.com/id/1994.)

The touch utility can also copy timestamps and set arbitrary timestamps. The first command below sets the timestamp of the file named last-backup to January 23 at 11:45 PM (2345) of the current year. The second command copies the timestamp of last-report-printed onto the file previous-report-printed:

Store a line of text in a shell variable, including escape sequences for TAB and a newline that echo will interpret using its -e option. Send five of them to the printer (plus a final empty line), without or with line numbers:

We’re redirecting the standard output of all echo and netstat calls within the while loop to the file netstat.log. See Part Two of this series, online at http://www.linux-mag.com/id/4705.

grep

This utility is invaluable for finding things, and most of its uses are so obvious that they don’t belong in this column. A few possibly-non-obvious tips, though:

If the files might not have readable text, pipe grep‘s output through cat -v to protect your terminal.

The GNU color-highlighting option --color makes it much easier to spot a match in a mass of output text. But if you pipe colored text through a filter like cat -v, cat will change colored areas into escape-sequence gibberish. Try using two greps — with the highlighting done after the filtering:

grep 'xyz$' */* | cat -v | grep --color 'xyz$'

(What’s GNU, Part Two, online at http://www.linux-mag.com/id/2117, has much more about color and other useful GNU grep options.)

If grep finds lots of matching text, piping the output to the less pager can help. Using sed G (as we’ll see in a future column) will double-space the matching lines, separating them for clarity:

grep 'xyz$' */* | sed G | less

If you also use a less search command (here, type /xyz$ after less starts running), then less will highlight the matching text. (Another useful less option here is -a, which skips all matches shown on the current screen when you type n to repeat the search. For more tips, read the two-part series Doing More with less. It starts at http://www.linux-mag.com/id/4145/.

* The Swiss-Army-Knife find is another utility that’s so obvious we shouldn’t even need to mention it. Still, it’s handy with grep for searching a filesystem tree. As we saw earlier (in the date section), use its -print0 option with xargs -0. Adding the grep option -H makes sure that every filename is output. (Without -H, in case xargs passes just a single filename to grep, grep wouldn’t show the filename.)

gzip and bzip2

Compressing files is a no-brainer when you need more disk space, and gzip/gunzip/zcat are often-used. bzip2/bunzip2/bzcat often compresses more effectively, though the compression phase may be slower. Here are some things you might not know about these power tools:

Not all data is worth compressing. Data that has enough redundancy, or gaps of “filler” (like the NUL bytes in a tar archive), can be reduced in size by compression. For instance, a TIFF photo with big areas of a single color, or a text file with lots of contiguous whitespace, will compress. A JPEG photo or an already-compressed file usually won’t compress well.

Using gzip on an already-compressed file usually won’t hurt, although the file size may increase a little. (By the way, gzip refuses to compress files with names ending in .gz.) Still, to avoid needless gzipping, you can do “trial compression” with the -v option (which shows compression percentage) and a temporary file:

$ gzip -cv photo.tif > gz.gz
photo.tif: 96.5%

The -c option writes the compressed data to stdout. (The output of -v goes to stderr.) If the compression ratio was high enough, overwrite the original with mv gz.gz photo.tif; otherwise, use rm gz.gz.

When -c isn’t used, gzip makes the compressed archive the same file mode (access permissions) as the original. Our two-step method above doesn’t do that, but you can — by transferring the original file mode onto the compressed copy before you run mv:

$ chmod --reference=photo.tif gz.gz

Yup, this all is a likely candidate for a shell script that does conditional compression, testing to see whether gzip or bzip is better, etc. The bash operator 2> redirects stderr into a file. You might pre-check the file type with file so your script won’t try to compress already-compressed data types.

When sending compressible data across a network, you can compress data on-the-fly to reduce transmission time. (This is typically only useful if the network bandwidth is low, and/or if the systems on both ends of the link aren’t so overloaded that they bog down the compression.)

If you’re using ssh, and if both systems support it, the ssh -C option will compress network traffic. Otherwise, use built-in compression in utilities (like tar) that have it — or pipe data through gzip and zcat. Here are two examples:

Let’s say you’re using a portable Linux laptop without a printer and you want to print some text files on your office printer. You could compress their data in one of these two ways:

Although scp has a -r option to copy directory trees, using tar preserves more of the metadata. If the versions of tar on both systems support gzip compression, use the first version below. Or, if the remote system doesn’t support tar z, try the second:

If you haven’t used ssh non-interactively like this, see Transfer Tips, Part I, online at http://www.linux-mag.com/id/1312.

ln

Links are commonly used to point from one filesystem location to another. They also have a more surprising use: to make a program that does several things, deciding what to do by checking the name it was called with. Here’s an example: a single program in /usr/bin that sets, shows, and removes at jobs:

Thanks for this series of articles. I’ve been using unix/linux for quite awhile, but still didn’t know about many of the cool things you have featured here.

Most of these features can probably be found in the man pages, but these articles have jump started the leap from reading of a feature in a man page and actually determining what to do with it and how to use it.

@bryanjrichard thanks for fixing the above link, FYI, in the GNU help you linked to, there is yet another link not working…comments were closed there as its an old post or I would have put this there…FYI. Thank you in advance…

I keep listening to the reports speak about receiving boundless online grant applications so I have been looking around for the finest site to get one. Could you tell me please, where could i find some?

Greetings I am so grateful I found your blog page, I really found you by accident, while I was browsing on Askjeeve for something else, Nonetheless I am here now and would just like to say thank you for a marvelous post and a all round interesting blog (I also love the theme/design), I don’t have time to look over it all at the moment but I have saved it and also included your RSS feeds, so when I have time I will be back to read a lot more, Please do keep up the fantastic work.

Advertiser Disclosure:
Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.