Given the lack of political will to make deep cuts to greenhouse gas emissions, and the pitiful excuses politicians make for inaction; given the present nature of the debate, where special interests fund campaigns aimed at stalling any progress by appealing to the ignorance of the public; given the nature of the Foundation, an organisation which raises its funds and conducts most of its activities in the richest and most polluting country in the world: I think there is an argument for voluntary reduction of emissions by the Foundation.

I don’t mean by buying tree-planting or efficiency offsets, of which I am deeply skeptical. I think the best way for Wikimedia to take action on climate change would be by buying renewable energy certificates (RECs). Buying RECs from new wind and solar electricity generators is a robust way to reduce CO2 emissions, with minimal danger of double-counting, forward-selling, outright fraud, etc., problems which plague the offset industry.

If Domas Mituzas is correct, and Wikimedia uses on the order of 100kW for its servers, then buying a matching number of RECs would be a small portion of our hosting budget. If funding is nevertheless a problem, then we could have a restricted donation drive, and thereby get a clear mandate from our reader community.

Our colocation facilities would not need to do anything, such as changing their electricity provider. We would, however, need monitoring of our total electricity usage, so that we would know how many RECs to buy.

I’m not appealing to the PR benefits here, or to the way this action would promote the climate change cause in general. I’m just saying that as an organisation composed of rational, moral people, Wikimedia has as much responsibility to act as does any other organisation or individual.

Ultimately, the US will need to reduce its per-capita emissions by around 90% by 2050 to have any hope of avoiding catastrophe (see e.g. table 9.3 in the Garnaut Review, and chapter 4.3 for more context). Nature doesn’t have exemptions or loopholes, we can’t continue emitting by moving economic activity from corporations to charities.

]]>http://tstarling.com/blog/2009/12/should-wikimedia-buy-recs/feed/11Secure web uploadshttp://tstarling.com/blog/2008/12/secure-web-uploads/
http://tstarling.com/blog/2008/12/secure-web-uploads/#commentsTue, 16 Dec 2008 10:13:47 +0000Timhttp://tstarling.com/blog/?p=4I’ve written hundreds of mailing list posts over the years, in my role first as a volunteer software developer and system administrator for Wikipedia, and later as an employee in the same role. But I’ve never had my own domain name, and I’ve never had a blog.

But I do have things to say, and I’ve often thought about setting up a soap box such as this, with the aim of reaching a wider audience than the mailing lists I usually post to. An important issue has finally come up, and I feel compelled to tell you about it. So I have created this blog.

The issue is a basic feature, which is present in many web applications: file uploads. Due to design choices by the browsers, particularly Internet Explorer, it turns out to be extremely difficult to allow users to upload arbitrary files, without endangering the security of the application.

We spent a lot of time working on secure uploads for MediaWiki, and we thought we had it more or less right. But it turns out that our handling of Internet Explorer wasn’t nearly rigorous enough, and there were still a number of ways to use file uploads to steal the authentication cookies of Internet Explorer users. In MediaWiki 1.13.3, I have, hopefully, closed these gaps. I did this by reverse-engineering three versions of Internet Explorer.

In the rest of this post, I’ll give a tutorial to building a file upload application, working through the security pitfalls from the most naive to the most subtle. I’ll use PHP in my examples, but none of the issues here are PHP-specific.

Upload feature in 10 lines of code: what could possibly go wrong?

Let’s suppose an unlucky newbie developer decided to build their upload feature by following the example in the PHP manual. What could possibly go wrong?

$uploaddir='/var/www/uploads/';$uploadfile=$uploaddir.basename($_FILES['userfile']['name']);if(move_uploaded_file($_FILES['userfile']['tmp_name'],$uploadfile)){echo"File is valid, and was successfully uploaded.\n";}else{echo"Possible file upload attack!\n";}

It opens up an arbitrary script execution vulnerability. An attacker can just upload a file ending with .php, navigate to it in their browser, and the server will execute it. Many web applications are (or have been) vulnerable to this most basic and severe vulnerability.

It is particularly severe because there is a profit motive to exploit it. Spammers have written scripts to search for these kinds of vulnerabilities. They automatically upload a script which runs perpetually, in a virtual() loop to avoid max_execution_time, which relays spam from another host out to the Internet. They can also use vulnerabilities such as this to set up a spamvertised website on the server.

If your code is going to be distributed, it’s not enough to ask the user to disable PHP execution in the /var/www/uploads directory. Nobody reads the manual. Instead we have to work out the circumstances under which a typically configured web server will execute a file as a script, and making sure that circumstance does not happen for uploaded files. In practice that means checking the file extension.

There is another pitfall, however, which is that some web servers (notably Apache with mod_mime) consider files to have multiple extensions. For example, index.php.fr is considered to be the equivalent of index.php, but in French. So to be secure, we must compile a blacklist of script extensions, and check each part of the filename against it.

Client-side scripts

We know that certain file types will be executed by web servers as scripts, and these should not be allowed to be uploaded. Similarly, certain file types may contain scripts that will be executed by the client web browser. These types must be detected, and then either validated or disallowed. Although client scripts can’t take over the client’s computer and use it for sending spam, they can steal the user’s login credentials for your web site, and use it to do anything the user can do. JavaScript running from the same origin as your application will have full access to the application’s cookies. If the victim is an administrator for a web app such as Drupal or WordPress, the attacker may be able to insert arbitrary PHP code into the site’s skin, and thus take over the server.

To the list of client-side hazards, we will also add file types which can easily be downloaded and executed on the client computer, giving the uploader/attacker full control, without the user being properly warned of the risks of doing so.

Bad client-side extensions:

html, htm, mhtml, mht

svg

exe, scr, msi, com, pif, cmd, cpl

js, jsb, vbs, bat

For a long time, MediaWiki omitted SVG from the list, despite the fact that it has been as dangerous as HTML since Firefox 1.5.

Rather than maintain an extensive blacklist of file types in your application, it’s probably easier to just have a whitelist, say, just allowing the common image formats. But if you let the user configure the whitelist, you need to have a mechanism for warning them when they try to allow one of these dangerous types. And as we will see shortly, controlling the extension alone is not sufficient to provide security.

Content type detection

Back in around 1997, Microsoft decided that web application developers were having it too easy. They could blacklist a few bad file extensions and create a reasonably secure file upload application. So with the release of Internet Explorer 4.0, they launched a crackdown on this secure practice. The result was FindMimeFromData().

Apparently some users were uploading files with the wrong extension on them, or something. So the IE team decided that they weren’t going to trust the content type specified by the server (generally derived from the extension), and instead, they were going to try to detect the file type by looking at the data. They assigned an inexperienced developer to the case, and never bothered to properly test the resulting code.

The community determined that now, if a file had some HTML tags in the first 256 bytes, under certain obscure circumstances, Internet Explorer would decide that the file was in fact HTML, and go on to execute malicious scripts contained within the file. But Microsoft never documented which HTML tags would cause this, or what the circumstances were in which the type could be reassigned. They did release a vague and incomplete document called MIME Type Detection in Internet Explorer, but that wasn’t much help.

The general approach by a security-conscious web application is to check the first part of the file for HTML tags, to determine if IE will detect the file as HTML. Such files can then be rejected. The problem is, the algorithm in the web application must precisely match the secret algorithm used by IE. If the web application is less strict than IE in any minor detail during the process, that detail can be exploited by an attacker to create a file which will be accepted by the web application, but detected as HTML by IE.

I didn’t think this was an acceptable situation. So with the help of IDA Pro Freeware, I disassembled the relevant code in IE 5.0, 6.0 and 7.0. I then ported the whole thing from assembly language to PHP, with version differences incorporated into the code as conditional blocks. A web application can use this code to determine, with some confidence, what type Internet Explorer will assign to a file, and thus whether it should be allowed or not.

The algorithm is large and complex, so rather than detail it here in full, I would prefer to encourage reuse, redistribution and porting of my PHP code, which can be found on MediaWiki’s Subversion server:

If you want to blacklist dangerous file types, or you have a user-configurable whitelist, you are probably best off using the full code. For developers who just want to whitelist a few image types, it’s probably overkill, so I will give a synopsis here of the relevant parts.

FindMimeFromData synopsis

The algorithm proceeds as follows:

First, look at the server’s proposed content type. If it is not in a list of “known” types, then immediately accept that type.

If the proposed content type is text/html, image/gif, image/jpeg or, as of IE 7, image/png, then do a special case check to see if the data matches that declared type. If it does match, the type is returned.

Do a heuristic test on the data to see if it looks like CDF, RSS, Atom, other XML, HTML, XBitMap, BinHex or “scriptlet”. If the heuristic test matches, return the corresponding type.

Look for magic numbers in the first few bytes of the file, for a sizeable list of candidate file types. If a match is found, return that type.

Do a heuristic test to determine if the data is “text” or “binary”. This test is buggy and will detect non-ASCII text as binary.

If the server’s proposed content type is known to be a binary type, and the heuristic suggests that the file is binary, return the proposed type.

If the server’s proposed content type is known to be a text type, and the heuristic suggests that the file is text, return the proposed type.

If the server’s proposed content type is on a list usually containing only text/html, return the proposed type.

Search HKEY_CLASSES_ROOT to see if the file extension has a corresponding MIME type which might be returned. If it does, return it.

Search HKEY_CLASSES_ROOT to see if the file extension has an application registered to it. If it does, return application/octet-stream.

Return text/plain or application/octet-stream according to the result of step 5.

The trick for whitelisters is that you might be able to get off the ride at step 1 or 2, and so avoid the complexity of steps 3 to 11. For step 1, the known types are:

So if your type is not on this list, and you can be sure the webserver will always return it, then you can allow that upload. This is useful if your web app needs to stream arbitrary user data for some other reason: you can send a header “Content-Type: application/x-my-secret-type”, and IE won’t interpret it as HTML.

For step 2, the magic numbers are as follows:

HTML: complex, but ASCII “<html” in the first 255 bytes is sufficient

GIF: first 5 bytes must be GIF87 or GIF89. Note that PHP’s getimagesize() only checks that the first 3 bytes are “GIF”, so if you use getimagesize(), you will be insecure. Don’t rely on any libraries, check the magic numbers yourself.

Safari

Safari is known to use a similar content detection process, based on the Internet Explorer one. But they didn’t have access to the Internet Explorer code when they wrote it, nor to my diassembly results. So it differs in many minor details. I can’t tell you what those details are in any rigorous way, I just know a few by word of mouth. Like Internet Explorer, it is closed-source and undocumented. So all I can say is that if you use Safari, don’t advertise the fact to any potentially malicious people. You’re probably insecure in most web apps.

Browser plugins

There are only two browser plugins which are used by large numbers of people across multiple browsers and platforms: Flash and Java. Both of them create severe security problems in a file upload application, and need to be dealt with specially.

Flash

A reasonable description of the cross-domain-policy problem in web applications was written by Stefan Esser, titled Poking new holes with Flash Crossdomain Policy Files. In a nutshell: you need to scan uploaded files in their entirety for the text “<cross-domain-policy>”, with optional whitespace between the tag name and the angle brackets, and reject any matching files. If you don’t, your server will be exposed to CSRF vulnerabilities. External flash applets see this text as a license to breach the same-origin policy, allowing them to utilise the victim’s cookies for arbitrary purposes.

Java

The so-called GIFAR vulnerability is particularly irritating, and MediaWiki only deals with it in a heavy-handed manner, by blacklisting uploads of zip and zip-like file formats (such as OpenOffice formats).

The problem is that an external web page can embed a java applet using a JAR file hosted on your site. Such a java applet can perform requests with the cookies of the site that hosts it, thus opening up a client script injection vulnerability.

The Java plugin doesn’t check if the file has a .jar extension. It doesn’t even check if the file has the proper magic number at the start, it just executes the file regardless. It does check to see if the directory at the end has an appropriate magic number, so you can use that to blacklist potential JAR files, and all other zip files along with it. To do this, search the last 65558 bytes of the file for the hexadecimal bytes 50 4B 05 06. Reject any files that match.

An alternative would be to parse the entire zip directory and to reject any archives that contain a file with a .class extension. I can’t vouch for this method. If you did this, the zip library you used would have to be exactly as tolerant of zip format errors as the one used by Java. It would probably be best to actually shell out to Java to do the test.

Conclusion

The web sucks. Writing secure web applications is bizarrely complicated and even the most diligent developers get it wrong. Me included. Use this post at your own risk.

Here’s a better idea: restrict web uploads to people who provide a credit card number, and pre-authorise a $50 fine for malicious uploads.