Something to keep in mind: files don't really have types. That is, the information you know as the "file type" is not actually part of the file. The file type that you see displayed in your file manager, or returned by the appropriate PHP function, or so on, is simply a heuristic guess at the intended use of the file, based on the file's name and possibly its content. What really matters is what you do with the file, i.e. what program(s) you use to read or process it.
–
David ZMay 13 '14 at 16:11

There is a "good" answer to this already somewhere on the Stack Exchange network (was it this site?). But basically there is no full proof way. You need to secure your system, rename the file, stream it, etc. so that when "opened" it won't compromise your system.
–
w3dMay 13 '14 at 20:41

5 Answers
5

You want PHP's Fileinfo functions, which are the PHP moral equivalent of the Unix 'file' command.

Be aware that typing a file is a murky area at best. Aim for whitelist ("this small set of types is okay") instead of blacklist ("no exes, no dlls, no ..."). Do not depend on file typing as your sole defense against malicious files.

True, although I would add that not all files have magic numbers. In particular, Linux-like systems tend to recognize a text file by its lack of a known magic number. And depending on which program is doing the interpretation, a known file extension could override the type determination from the magic number. (which is occasionally desirable) I would say only that Linux file managers base their determination of the file type on the file's content and its name, whereas Windows Explorer bases it solely on the extension.
–
David ZMay 13 '14 at 16:06

There is no conception of file type. In computer world everything is a bunch of 0/1 and whether it is and image or a lot of random characters depends on how do you interpret your zeros and ones. File type (as an extension like .docx, .png) are just for the convenience of the user to be able to do an educated guess of what can it be and to open it with a proper tool. As with any guess, it can be wrong.

So instead of trying to play around with techniques like suggested fileinfo, if I were you, I would rather figure out what do I allow people to upload.

So if you allow people to upload images, use getimagesize and may be even check that the width height is in appropriate range (who knows may be someone will upload an image like 500.000 pixels width/height and your server will die while resizing it. Its a valid image, but still not what you want). May it make sense to resize every image and only serve resized formats and store somewhere untouchable originals.

If you decide that users can upload .mp3 files, take a look at something that can deal with these sort of files. Who knows may be there are already tested methods to check whether this is really mp3 file.

Regarding of what you decide, use something to mitigate possible problems (assuming that the person upload a file $file = $_FILES['file']):

You can parse the files with a specific parser which throws an exception when the file is not really what it waits... Anything else can be falsified I think.
For example you can use GD or Imagick by image files, a JSON parser by json files, DOM and XML parser (with turned off external entities) by HTML and XML files, etc... By Imagick you can use the identify tool as well. I think there are other tools for other file types.

file inclusion (Never include an uploaded file by serving the clients, use file reading methods like file_get_contents(), or use the X-Sendfile header without HTTP header injection vulnerability, if you want to have access control on the file. If not, then let the HTTP server do its job.),

eval injection (Never use exif data in an eval context, for example with preg_replace().),

Salvador Dali has some very good suggestions with regards to images. 1 thing however that he is missing. It is possible for an image to show as perfectly valid however contain malicious code. This for example can be placed after the end of image marker (0xFF, 0xD9). 1 potential way to get around this is to re sample the file using something like GD. It used to be quite common for avatar and signature uploads to be taken advantage of in forums. Someone would upload their image which will display as normal but will also contain code that could infect the users PC with malware.