Category: Unix

Writing a search engine for your LAMP server is actually fairly simple. I realized I could simply use grep for the search algorithm and have the search engine use regular expression metacharacters. I mean, why reinvent the wheel? Of course, a search engine that crawls the entire web will be more complicated; this one just searches the local web server.

Here is the code I’ve written. It uses less than ten lines of PHP code, and even what I have here has some parts that are rather superfluous (I should probably streamline it)…

Since the search algorithm is simply a front-end for grep, I didn’t really have to think about its implementation. Basically, the script has a decision statement that looks at the superglobal variable $_GET['query']. If it’s null, that means the query hasn’t been submitted yet, so it shows the prompt for the query. If it’s not null, it shows the results of the query, which is of course a regular expression. The results are obtained by greping the local server filesystem and returning all files that contain that pattern.

One thing that makes PHP code somewhat confusing is the way you can have a PHP script interleaved with HTML code. It’s one of those things you just have to get used to.

Possible enhancements include:

Using egrep instead of grep so the user can use extended regular expressions.

Adding the ability to search for images, videos, and other media on the server, rather than just web pages (this could be done by returning any such media that are used by pages that match the pattern).

Searching filenames in addition to file contents (you could use find for this).

Using a ranking algorithm (currently it just lists them in the canonical order returned by grep).

And of course adding CSS and other formatting to make the page more aesthetically pleasing.

For those of you who don’t know, the Snoopy calendar is a gimmick of the hacker culture. It basically refers to a line printer calendar for 1969 featuring the iconic beagle from Peanuts, that apparently hangs on the wall of every Real Programmer’s office. I’m not entirely certain of the origins of this meme, but it dates back at least to the humorous essay Real Programmers Don’t Use Pascal, which was posted to Usenet back in 1983.

I have a certain fondness for this particular gimmick which I can’t quite explain. Perhaps it is the fact that it provides a feasible and exciting challenge for me – that of making my own Snoopy calendar. I have actually done this a couple of times. This time I used a combination of C programming and several Unix programs like enscript, figlet, cal, and sed.

There are three components to the Snoopy calendar. The first is the Snoopy ASCII/line printer art, the second is the calendar, and the third is the year number banner. These components must be pasted together in the same text file, which can then be converted to Postscript for printing. The Unix paste program is not satisfactory for this, since it doesn’t align the text, so I wrote my own program. Here is the final result:

I then did some manual editing to round out the corners. I put this output above the output of cal -y 1969 in Vim, and then ran the program that I had written to paste it onto a Snoopy ASCII picture (I used a different picture this time). I then ran it through enscript, using landscape orientation and reducing the text size to it would all fit on one page, and also telling it to run in line printer emulation mode.

I found a somewhat interesting feature in the Apple terminal, supported by both Terminal.app and iTerm2. Essentially, it allows for the printing of emojis in the terminal. I discovered this when I noticed that the Homebrew package manager prints beer and tap emojis when it is downloading and installing packages. I was curious about how these emojis are represented, so I used a program that I had written, called hex, which essentially just displays the hex values of characters entered by the user. I’ve used this program to find escape sequences for things like function keys and the like, allowing me to implement these inputs in my programs.

I copied the beer mug emoji from the terminal after downloading a package. I then started the hex program and hit Command-V, then copied the hex bytes (there were four of them) by hand into a file using a hex editor. To test it, I ran cat on the file I had created, and lo and behold – a beer emoji was printed to my terminal.

I then decided to test a concept, the concept of printing emojis in a program. I used assembly language for this, because I’m not entirely sure of the proper syntax for C, and I thought it would be faster to write if I did it in assembler. I tested several codes, and then when I found the result, I wrote a comment in the source file indicating what the emoji was. Be warned – this code is extremely tedious, even for an assembly program.

It’s not particularly useful, especially considering that these emojis are not portable to other systems, but it’s an interesting concept to test. It’s kind of confusing that the emojis use four bytes each, whereas the UTF-8 character set used by the terminal uses three bytes for each Unicode character. I’m not sure how Apple implemented this.

One of the most useful features of the GNU Compiler Collection is the -D option to cpp, which allows you to define macros at the command line. This in combination with #ifdefs and #ifndefs in the C/C++ source files allows for very versatile conditional compilation, because it allows you to set certain parameters that the program uses without having to edit the original source code. You can use this to, say, compile a debugging version of a program, among other things.

It is widely known that the C/C++ languages use CPP for preprocessing. What is less well-known is the fact that calendar, the default BSD reminder program, also uses CPP. This is one advantage that calendar has over the newer and more feature-rich remind program, which, as far as I know, doesn’t use preprocessing. calendar is available for all major BSD variants, including macOS. It may have been ported to other *NIXs such as Linux, though I am not sure, and I don’t feel like looking it up right now.

calendar uses CPP to allow for the conditional inclusion of several libraries of pre-written reminders or events – from the standard run-of-the-mill dates like US holidays and birthdays of famous people, to more exotic things like important events in the history of computing, and important dates in the Lord of the Rings timeline. This is done in the obvious way:

# include <calendar.usholiday># include <calendar.birthday># include <calendar.computer># include <calendar.lotr>

Unfortunately, that’s about all you can do with CPP in the default calendar program. I decided I wanted to be able to include certain libraries conditionally, so that if I want to just view reminders for things I have to do in my own life, I have that option, and if I want to also check on upcoming holidays, or events in Tolkien’s universe, I can manipulate those options with a simple command line flag. The CPP code in my calendar file would then look something like this:

#ifdef _HOL_# include <calendar.usholiday>#endif#ifdef _BDAY_# include <calendar.birthday>#endif#ifdef _COMP_# include <calendar.computer>#endif#ifdef _LOTR_# include <calendar.lotr>#endif

… And I would manipulate these options from the command line using a parameter like -D_COMP_.

So I got to work writing a frontend for calendar that adds that capability. Here is the result, written in bash and sed:

After this I set an alias in my .bashrc file to have the calendar command run this script, rather than running the calendar program directly.

There are some problems with this script, the main one being that it is extremely slow, sometimes taking as long as 10-15 seconds to do the preprocessing. If I rewrote this program in C, I could speed it up by a few orders of magnitude, not only because C is inherently faster, but also because it exposes more of the underlying details of how everything is implemented, which allows you to program more intelligently and optimize your program for the hardware.

For example, I don’t know for sure whether comparing two integers is faster than comparing two strings in the bash shell (a problem I ran into here when trying to decide whether to just use a “true”/”false” string to determine whether to use -B or -W; the bash shell doesn’t have Boolean types), because I don’t understand the underlying implementation. I would have to spend days studying the source code for the shell to get a sense of how to optimize everything. All I know is that all shell variables are essentially strings, so it’s not the same as C, where comparing two integers is much faster than going through two character arrays and comparing each pair of characters one by one. In the bash shell, you have to convert the numerical strings to numbers, perform an arithmetic or comparison operation on them, and then convert them back to strings. Both methods are extremely inefficient. This is what I don’t like about extremely high-level scripting languages such as Unix shell, Python, Ruby, PHP, etc. But hey, they allow you to write programs a hell of a lot faster than C, which is better if you just want a quick-and-dirty solution to a programming problem.

For a long time I have needed a way to transfer data into my DOS virtual machines. I came somewhat close to a solution by using Keka to archive directories as ISO files, but I was unable to create what I truly needed – a floppy disk image containing the archived contents of a directory. Well, now I’ve found a way to do just that, partly by doing some research on Google and partly by just figuring stuff out on my own.

The first thing you need to do is create a 1.44 MB empty file with an extension of .ima, .img, or some other raw floppy image extension. This can be done using dd, with the following command:

dd if=/dev/zero of=floppy.img bs=1024 count=1440

I took a screenshot of the output of dd for a visual:

The next step is to format the file with an MS-DOS FAT filesystem. The easiest way to do this is in DOS. So insert the blank disk image in the VM and type format a:

Now you have a blank floppy image and are ready to add files to it. To verify that the floppy had been formatted, I ran a hexdump in the Unix terminal.

Next you need to mount the floppy image. The easiest way to do this is to just double click on its icon in the GUI, then copy and paste the files from the source directory or directories to the directory that the image is mounted on. I tested this first with a couple of simple standalone programs – Visicalc and DOS Cal:

Now I have installed two programs: CAL.EXE and VC.COM. Indeed, when I go to the C: drive and type cal, the program starts.

Just a side note here: the default path in MS-DOS is set to C:\DOS, so if you want to run programs from a different directory you need to edit the search path in the AUTOEXEC.BAT file:

For some reason, my DOS VM would only read the first floppy image that I created. Creating further floppy images using the same technique resulted in an Abort-Retry-Fail error message. To work around this, I have written a C program that automates the process of creating and formatting the floppy image, using the hexdump of the original image. That way I know the file will be exactly the same and I’ll be able to create multiple floppy images for software that requires a multi-disk installation. I will talk about this program, as well as MS-DOS 6.22, MS-DOS disk labels, and some additional software that I’ve installed in the next few blog entries. For now, farewell.

Unix man pages are written in the troff language. There are three basic Unix programs that interpret troff code – nroff, troff, and groff. nroff is used for preparing documents for display in the terminal. troff prepares documents to be printed on phototypesetters, a technology that is pretty much obsolete by now. The Unix man program is essentially just a frontend for nroff that preprocesses the code with a set of macros known as the man macros and then pipes it into less.

What we will be using is the GNU roff program, or groff. groff is a troff interpreter that converts the troff code to Postscript. In order to apply the command, you first have to locate the man page you want to convert. On my system, most of the man pages are located in /usr/share/man. Let’s say we want to convert the file nmap.1 into Postscript. We would use a command like this:

groff -man /usr/share/man/man1/nmap.1 > nmap.ps

The -man option tells groff to run the code through the man macro package, which is the macro package used for man pages, before sending its output to the Postscript file.

Now we have a Postscript file, which is fairly easy to convert to a PDF. Many document viewing programs (such as Apple Preview) will automatically convert a Postscript file to a PDF when you open it. Alternatively, if you just want a hard copy, you can skip the PDF and send the Postscript file directly to a printer using the lp command. The only requirement is that it must be a Postscript printer, which the majority of printers on the market nowadays are.

Through some config file hacking, I have managed to set up an Apache HTTP server on my Macbook. I did this so that I could test the full functionality of PHP. Since PHP is one of the top most needed skills for freelance coding jobs, I figured it would be a good idea to learn it, and of course to use any of the features of PHP beyond just the core language, you need a web server.

Starting the Apache server is pretty easy. All you have to do is type sudo httpd at the command line (assuming Apache is installed on your system, which I think it is for most Unix-based systems). It is recommended that you use apachectl as a frontend instead of using httpd directly, but I couldn’t seem to get this to work, so to start Apache I use sudo httpd and to stop it I use sudo killall httpd.

Now configuring the server to use PHP was somewhat more difficult, though still not too much so. First of all, for a server to use PHP, the PHP DOS initialization file needs to be present as /usr/local/lib/php.ini. After some digging around, I found the PHP ini file at etc/php.ini.default, so I just copied it (changing the filename of course).

The next thing I had to do was tell Apache to load the PHP module at startup. This is done by editing the file /etc/apache2/httpd.conf and uncommenting the appropriate code line. It must be remembered that editing this file requires root privileges.

The appropriate line is

LoadModule php5_module libexec/apache2/libphp5.so

…Shown here already uncommented.

The next thing you have to do is find out what directory Apache is using to serve files to clients. This is determined by the DocumentRoot environment variable, and controlled by a <Directory> tag.

Here we see that the server’s filesystem is rooted at /Library/WebServer/Documents. Of course this is Mac-specific, and the root will be different on other systems, and we can also change it, though I felt no need to.

If you title a document “index.html”, “index.php”, etc. then this will be the file that the client goes to when the user types your domain name without appending a path at the end. Also, if you title a document, say, my-pictures.html, the file extension can be omitted in the URL.