The Importance of Sanitizing Input Strings

As a young programmer, it’s easy for me to forget that, unlike in the academic programming environment, it is vital that I protect my programs against any and all potential attacks. Recently I realized just how easy it is to leave a gaping hole in a program’s security. I am posting this today as both a reminder to myself and a warning to any who might run across it: sanitize your program input!

Recently I was working on a personal project: a little web app that displays the machines in the Computer Science department lab based on active OS (since they are dual-booted with Ubuntu and Windows XP, the OS will regularly change on any given machine). You can see a snapshot of an older version of the map that was released a few months ago here.

The map worked fine, but it had a lot of overhead and required root privileges to run, so I decided to rewrite the backend using netcat instead of my previous solution which utilized nmap‘s OS detection. With netcat’s z flag, determining whether a machine is listening on a given port is trivial and since the labs are all extremely homogeneous it only requires a test of port 22 to determine if the machines are booted into Ubuntu (which will accept SSH connections) or into Windows (which will deny port 22 connections) or offline (in which case the connection will be dropped).

In any event, I wrote a PHP script which accepts three parameters based on the names of the lab machines, “prefix,” “start,” and “end.” If I were to call the script with the following query string: ?prefix=dog&start=1&end=62 then the script would scan the machines between dog01.cs.utsa.edu and dog62.cs.utsa.edu.

In order to make this magic happen, the program must make a shell call to netcat for each machine to be scanned. This can be completely safe, but only if you sanitize your data. Since the prefix will be used in each shell call and, unlike the numeric values for “start” and “end,” will remain static, it is perfect for inserting malicious code by the user. Before I released the script, I did one final sanity check and realized that I had not sanitized this value. Before closing the hole, I decided to play with the exploit a bit and the following is what I did:

First, I considered how the input strings were used in my program. The following string was passed to the shell for each iteration through the machines: /bin/nc -zv -w 1 $mac 22 2>&1| /bin/cat -. The value for $mac was defined as the user-entered $prefix and the sanitized integer value somewhere between $start and $end. As a proof of concept, I decided to do an ls of the /home directory to see who was currently mounted on the web server. Obviously, a user who wanted to do more damage could conceive of much worse things, but I was not inclined to foo-bar the server for the sake of this test.

It turns out, all that was required was to insert the URL-encoded value of “;ls -l /home;” as the value for $prefix. It breaks the netcat call, but that’s OK. The output returned was the following:

(That last bit about command not found is the shell trying to figure out what to do with the integer that was given after our second semicolon.)

To resolve this security hole, all that is required is a regex replacement of all special characters. In this particular instance, anything other than letters is not going to represent a valid prefix, so, after pulling the prefix from the user, I simply apply the following: $prefix = preg_replace( '/[^a-zA-Z]/s', '', $prefix ). For my specific implementation, I am also able to limit the length of the value passed in (not shown). And that’s it. The program is now secure against malicious users compromising the server, through user input, at any rate.

If you have a similar almost uh-oh, feel free to share in the comments below!