PDF files are simultaneously wonderful and heinous. They are wonderful in being ubiquitous and mostly being cross platform. They are heinous in being very difficult to work with from the command line, search, grep, use only the text inside the PDF, or use outside of proprietary products.

xpdf is a wonderful set of PDF tools. It is on many linux distros and can be installed on OS X. While primarily an open PDF viewer for X, xpdf has the tool "pdftotext" that can extract formated or unformatted text from inside a PDF that has text. This text stream can then be further processed by grep or other tool. The '-' after the file name directs output to stdout rather than to a text file the same name as the PDF.

Make sure you use version 3.02 of pdftotext or later; earlier versions clipped lines.

The lines extracted from a PDF without the "-layout" option are very long. More paragraphs. Use just to test that a pattern exists in the file. With "-layout" the output resembles the lines, but it is not perfect.

Bash has a great history system of its commands accessed by the ! built-in history expansion operator (documented elsewhere on this site or on the web). You can combine the ! operator inside the process redirection

How often do you make a directory (or series of directories) and then change into it to do whatever? 99% of the time that is what I do.

This BASH function 'md' will make the directory path then immediately change to the new directory. By using the 'mkdir -p' switch, the intermediate directories are created as well if they do not exist.

The backtick operator, in general, will execute the text inside the backticks. On OS X, the pbpaste command will put the contents of the OS X clipboard to STDOUT. So if you put backticks around pbpaste, the text from the OS X clipboard is executed.

If you add the pipeline | pbcopy, the output from executing the command on the clipboard is placed back on the clipboard.

This pipeline will find, sort and display all files based on mtime. This could be done with find | xargs, but the find | xargs pipeline will not produce correct results if the results of find are greater than xargs command line buffer. If the xargs buffer fills, xargs processes the find results in more than one batch which is not compatible with sorting.

Note the "-print0" on find and "-0" switch for perl. This is the equivalent of using xargs. Don't you love perl?

Note that this pipeline can be easily modified to any data produced by perl's stat operator. eg, you could sort on size, hard links, creation time, etc. Look at stat and just change the '9' to what you want. Changing the '9' to a '7' for example will sort by file size. A '3' sorts by number of links....

Use head and tail at the end of the pipeline to get oldest files or most recent. Use awk or perl -wnla for further processing. Since there is a tab between the two fields, it is very easy to process.

diff is designed to compare two files. You can also compare directories. In this form, bash uses 'process substitution' in place of a file as an input to diff. Each input to diff can be filtered as you choose. I use find and egrep to select the files to compare.