Take advantage of md5 checksums for download validity

I'm fairly confident that you have, at one time or another, run across an md5checksum file as you have perused the internet. Whether it was a download file or even an application upgrade, those md5 files are there for a reason. But just what is the reason?

When someone puts a file up on a server for download, how does the host or the end-user know, for sure, the file they are about to download (or are serving up) is the valid file? What if someone hacked into the server and replaced the file with a bogus file that contained malicious code? It's happened before and it will happen again. Fortunately there is a way to avoid downloading invalid files - checking the md5 hash. The only problem is that this method only works if the host and user knows how to use md5 tools. In this tutorial you will learn how to add an md5 checksum to a file and how to run a check on a file you have downloaded.

What is md5 and checksum?

Before we continue with the actual steps, you might benefit from knowing exactly how the process of checksumming works. MD5 stands for Message Digest algorithm 5, which is a cryptographic 128 bit hash function and serves as a "fingerprint" for a digital file. A checksum is a fixed-size datum that is computed from a block of data. When it is crucial for a piece of data (such as a download) to be valid, the datum is compared to the original block the datum was computed from to check for a match. When an md5 checksum matches, the user/host can be certain the file is valid. When the md5 checksum does not match, a red flag should immediately go up and the original block of data should be discarded. If a file changes by so much as a byte, the checksum will fail.

For most users these tasks are handled from the command line. There are GUI tools available (such as GtkHASH) that can tackle the same tasks. But for the purposes of this tutorial we will stick with the command line tool.

Creating an md5 sum

For those who plan on hosting files for download, you will want to know how to create an md5 sum. This is very simple. Open up a terminal and change to the directory holding the file you want to work with. Say, for example, you want to create an md5 on the file /var/www/files/download.tgz. To do this you would change to the /var/www/files directory and issue the following command:

md5 download.tgz

The above command will output something like:

632668fb5bb3fe578033a42b4ba718f2 download.tgz

Now for those that are wanting to have an md5 checksum file available you can run that command and pipe the output to a file like so:

md5 download.tgz > download.md5

Now you can upload the download.md5 file alongside the download.tgz file so the users can run a checksum.

Running a checksum

Now that you have both files, you want to run your checksum to make sure the .tgz file is the legitimate file. To do this you would issue the command:

md5sum download.md5

The output of the above command should look familiar (if you created the md5sum):

632668fb5bb3fe578033a42b4ba718f2 download.md5

Now run the md5sum command on the .tgz file like this:

md5sum download.tgz

The output should reveal the exact same string as shown above (the only difference being the file name will be different):

632668fb5bb3fe578033a42b4ba718f2 download.md5

If that string of characters isn't the same, the checksum didn't pass and you might be dealing with a corrupted file. In case of a corrupted file you will want to contact the host of the file or the developer. But if the strings match you know the checksum passed and the file should be safe to use.

Final thoughts

MD5 sums have been in use for quite some time. Whenever given the chance you should always take advantage of that system. Who knows, it might save you from installing a piece of malicious software some day.

About Jack Wallen

Jack has been a technical writer, covering Linux and open source, for nearly ten years. He began as an editor in chief of Linux content with Techrepublic and is now a freelance writer for numerous sites. Jack is also a writer of novels and is currently working on his first zombie fiction!

Honestly, the reason i use checksums these days are to verify that large files have been downloaded completely. i've had the occasional Linux ISO that seemed to download correctly but after some troubleshooting, i'd find that the checksum didn't match the one posted. It's not a concern with torrents, but torrents can be slower than a superfast web server.

There is the issue of malicious code, but honestly i trust my [open] sources and while it's possible malicious code could be slipped in, it's such a rare occurrence that i'm really not concerned. Not the most secure security model, i know, but i've yet to have it fail me in the fifteen or so years i've been downloading from remotes sources.

Topics

About Ghacks

Ghacks is a technology news blog that was founded in 2005 by Martin Brinkmann. It has since then become one of the most popular tech news sites on the Internet with five authors and regular contributions from freelance writers.