Uploading Files Using CGI and Perl Article

Creating the File Upload Script

Handling the data that the browser sends when it uploads a file is quite a complex process. Fortunately, the Perl CGI library, CGI.pm, does most of the dirty work for us!

Using two methods of the CGI query object, param and upload, we can retrieve the uploaded file’s filename and file handle, respectively. Using the file handle, we can read the contents of the file, and save it to a new file in our file upload area on the server.

1. First Things First

At the top of our script, we need to create the shebang line. We then put the Perl interpreter into strict mode to make our script as safe as possible, and include the Perl CGI and File::Basename modules for use in the script. We’ll also use the CGI::Carp module to display errors in the web page, rather than displaying a generic “500 Server Error” message (it’s a good idea to comment out this line in a production environment):

#!/usr/bin/perl -wT

use strict;
use CGI;
use CGI::Carp qw ( fatalsToBrowser );
use File::Basename;

Note the use of the -w switch to make Perl warn us of any potential dangers in our code. It’s nearly always a good idea to put the -w in! In addition, the -T switch turns on taint checking. This ensures that any untrusted input to the script, such as the uploaded file’s filename, is marked as tainted; we then need to explicitly “clean” this data before using it. (If you try to use tainted data, Perl throws an error.) More on this in a moment.

2. Setting Safety Limits

In order to prevent the server being overloaded by huge file uploads, we’ll limit the allowable size of an uploaded file to 5MB; this should be big enough to handle most digital photos:

$CGI::POST_MAX = 1024 * 5000;

We’ll also create a list of “safe” characters for filenames. Some characters, such as slashes (/), are dangerous in filenames, as they might allow attackers to upload files to any directory they wanted. Generally speaking, letters, digits, underscores, periods, and hyphens are safe bets:

my $safe_filename_characters = "a-zA-Z0-9_.-";

3. The Upload Directory

We need to create a location on our server where we can store the uploaded files. We want these files (the photos) to be visible on our web site, so we should store them in a directory under our document root, for example:

my $upload_dir = "/home/mywebsite/htdocs/upload";

You’ll need to create a directory called “upload” on your web site’s document root, then set $upload_dir to the absolute path to that directory, as I’ve done above. Make sure your directory can be read and written to by your script; on a shared UNIX server, this usually means setting the mode to 777 (for example, by issuing the chmod 777 upload command at the command line). Check with your web hosting provider if you’re not sure what you need to do.

4. Reading the Form Variables

The next step is to create a CGI object (we assign it to $query below); this allows us to access methods in the CGI.pm library. We can then read in the filename of our uploaded file, and the email address that the user entered into the form:

If there was a problem uploading the file — for example, the file was bigger than the $CGI::POST_MAX setting — $filename will be empty. We can test for this and report the problem to the user as follows:

We can’t necessarily trust the filename that’s been sent by the browser; an attacker could manipulate this filename to do nasty things such as upload the file to any directory on the Web server, or attempt to run programs on the server.

The first thing we’ll do is use the fileparse routine in the File::Basename module to split the filename into its leading path (if any), the filename itself, and the file extension. We can then safely ignore the leading path. Not only does this help thwart attempts to save the file anywhere on the web server, but some browsers send the whole path to the file on the user’s hard drive, which is obviously no use to us:

The above code splits the full filename, as passed by the browser, into the name portion ($name), the leading path to the file ($path), and the filename’s extension ($extension). To locate the extension, we pass in the regular expression '..*' — in other words, a literal period (.) followed by zero or more characters. We then join the extension back onto the name to reconstruct the filename without any leading path.

The next stage in our quest to clean up the filename is to remove any characters that aren’t in our safe character list ($safe_filename_characters). We’ll use Perl’s substitution operator (s///) to do this. While we’re at it, we’ll convert any spaces in the filename to underscores, as underscores are easier to deal within URLs:

$filename =~ tr/ /_/;
$filename =~ s/[^$safe_filename_characters]//g;

Finally, to make doubly sure that our filename is now safe, we’ll match it against our $safe_filename_characters regular expression, and extract the characters that match (which should be all of them). We also need to do this to untaint the $filename variable. This variable is tainted because it contains potentially unsafe data passed by the browser. The only way to untaint a tainted variable is to use regular expression matching to extract the safe characters:

(Note that the above die function should never be executed, because we’ve already removed our dodgy characters using the earlier substitution. However, it doesn’t hurt to be cautious!)

6. Getting the File Handle

As I mentioned above, we can use the upload method to grab the file handle of the uploaded file (which actually points to a temporary file created by CGI.pm). We do this like so:

my $upload_filehandle = $query->upload("photo");

7. Saving the File

Now that we have a handle to our uploaded file, we can read its contents and save it out to a new file in our file upload area. We’ll use the uploaded file’s filename — now fully sanitised — as the name of our new file:

Notice the die function at the end of the first line above; if there’s an error writing the file, this function stops the script running and reports the error message (stored in the special variable $!). Meanwhile, the binmode function tells Perl to write the file in binary mode, rather than in text mode. This prevents the uploaded file from being corrupted on non-UNIX servers (such as Windows machines).

8. Thanking the User

We’ve now uploaded our file! The last step is to display a quick thank-you note to the users, and to show them their uploaded photo and email address: