Post navigation

Recently, much to my surprise, I found out that some critical data I had burned on DVD could not be read back from the medium. A more thorough examination showed that many of the files on the DVD had different filesizes than the ones on the hard disk. This is when I started to seriously think about verifying my critical backups.
I don’t really know what had caused that faulty copy. That incident made me add the step of data verification to the backup process, at least for my server backups. I am aware that some graphical burning programs can verify a copy, but I needed a CLI method and this is what will be described in this small article.

It seems that there are two ways to verify the burned data:

Make a list of the files on the hard disk and get their md5 sums, then do the same for the burned files and finally compare the md5sums for each file.

Move the files to a central directory, make an ISO image, burn the image and finally compare the on-disk image’s md5 sum with the md5 sum of the written data on the medium.

I tend to stick to the second method, because it is less pain, regardless of the fact that it requires to move the files to a single directory in order to create that ISO image. But, be advised, this method can give false negatives, if it’s not done in the right way.

So, assuming that all backups have been moved to a directory, create the ISO image with mkisofs:

$ mkisofs -J -l -R -V "Sep2006" -o sep2006.iso /path/to/backups/

The sep2006.iso image gets created and mkisofs (or genisoimage) prints something like the following:

What is critical to take a note of is in the last line: the number of extents (blocks) that have been written to the image.

Use whatever program to write the image to a CD/DVD. For example, to write it to a DVD using growisofs, the command would be:

$ growisofs -Z /dev/hdc=sep2006.iso

Although growisofs accepts mkisofs (or genisoimage) options, making it easy to write the files directly to the DVD with the desired extensions, the image-creation stage is still necessary, so to be able to easily calculate the md5 sum of the data on-disk. I bet there could be a way to pass the directory contents through md5sum with a long BASH oneliner, but I haven’t tried it.

Also, note that growisofs, having finished writing, outputs the number of extents it has written to the DVD. For example, in my test it was:

builtin_dd: 169392*2KB out @ average 4.4x1385KBps

This number is not the number of blocks of the ISO image data. growisofs also writes some other data to the medium, eg when closing the session etc, so, do not to take it into account. You can safely use either the number from the mkisofs (or genisoimage) output or calculate the number of extents (blocks of size 2048 bytes) yourself with ls and awk:

$ echo $(( $(ls -l sep2006.iso | awk '{ print $5 }') / 2048 ))
169383

The above divides the image’s filesize by 2048 and prints the result.

So, getting back to the md5 sum calculations, you can get the on-disk image’s md5 sum with the following:

$ cat sep2006.iso | md5sum
cc363de222ba6fe7455258e72b6c26ca -

The final step is to calculate the md5 sum of the burned data. The dd command can be used to read the DVD, but the crucial part is that dd must read as much data as the size of the ISO image. Otherwise, it is almost certain that you’ll get a false negative about the quality of the copy.

Data is written on a CD/DVD in blocks of size: 2048 bytes. The number of the written blocks is the number of extentsmkisofs (or genisoimage) had printed to the stdout when creating the ISO image. The following command instructs dd to read 169383 blocks, 2048 bytes each, and pipe it to md5sum:

The two md5 sums are identical, which means that the DVD copy is good.

A mistake I’d been making, before starting to take into account the number of blocks written on the DVD, was that I calculated the DVD’s md5 sum with the following:

$ dd if=/dev/hdc | md5sum

This is a totally wrong approach, because this method, apart from the ISO image data, also feeds md5sum with other data that is written to the medium, eg data that is written when closing the session or whatever. My knowledge does not help me with this… The fact is that the last method is wrong.

The procedure described above may seem a bit complicated, but it’s not. This small article was written in a very fast pace, but I hope the procedure is clear.

About George Notaras

George Notaras is the editor of the G-Loaded Journal, a technical blog about Free and Open-Source Software. George, among other things, is an enthusiast self-taught GNU/Linux system administrator. He has created this web site to share the IT knowledge and experience he has gained over the years with other people. George primarily uses CentOS and Fedora. He has also developed some open-source software projects in his spare time.

19 responses on “Verify a burned CD/DVD image on Linux”

diff will give you info if all the files match content, doublecheck with $? (it will be 0 if everything’s OK) and triplecheck with dmesg (see if you have any bad sector errors).
This can work for iso images too if you mount them via the loop device.
Finally, if you check video dvd’s like this, mount them like UDF filesystem so the files on them are lowercase.

Beside all this, there’s a neat tool called cdck that reports some interesting statistics about your media.

This is very interesting. I never thought of using diff for this, but I’ll also try the method you mentioned next time.
I had tried cdck in the past. It’s a good tool for checking optical media, but, IIRC, it needed a significant amount of time to check a single disc. I ended up with md5sum because it can be computed fast and generally, if the two sums don’t match, gives an idea that something might have gone wrong.

This is a great article! I must confess that after reading it I have settled on the simpler diff procedure, outlined in the comment by linportal.

WhY? I am a really old man and much as I like computers in general and Linux in particular, my time is now limited “by external factors”, so I have not learned some of the finer points of commands, invokable only in CLI. Also, some of the lines of code in the article seem unnecessarily complex (I may well be totally wrong in this…) For instance, the author uses the following command:

$cat sep2006.iso | md5sum

Would it not be the same to do in one step without piping, viz
$md5sum sep2006.iso

This is not criticism, just would like to know… Great article, loved it!
OldAl.

Hi Al,
You’re right about the md5sum usage. I have got used to using the “cat” command so often, because I usually pipe the data to a little “pipe-monitoring” program, called pv, in order to have some kind of progress indicator about the whole operation, since md5sum does not have one. So, the actual command was (for example):

$ cat knoppix.iso | pv | md5sum

But, when I was writing the post, I just stripped the “pv” part off the command line so that I wouldn’t have to provide any explanation about it, since it’s not very popular. But, I liked your feedback because it gave me the chance to write about the pipe-viewer, which I find very useful in many occasions.
Also – just to criticize myself a bit :) – the iso_size/2048 calculation could have taken place inside the awk statement. I’ll correct these things when I have some free time, because, in the way I have written them, they add unnecessary complexity to the whole operation.

I was searching for a simple way to check the written CD image when I said to myself – this should be really easy because Linux people like things to be straight forward and as simple as they can. Tried:

I thought I did some small stupid mistake or missed an argumet. I kept searching but everything I found seemed so complicated compared to the everyday task I wanted to accomplish. Just before I started this adventure I was removing the automount option in fstab. I remembered that in fstab my cd was /dev/hda. Then I tried:

Hello GNot,
As far as I understand your approach, there is one major gap in your verifcation process: You verify the burned dvd against the on-disk-image but have no means to assure that this image is correct compared to the original data in your directory tree.
So, in my opinion, linportals approach is more safe because it compares the data on the dvd directly with the source data as a whole and not only by some hash value. I found no information about the exact method of comparing the contents of files used by diff but I assume it to be more exact than comparing the hashes (there is a -distinct- possibility of two files having the same hash).

Jetero,
as it was mentioned, md5sum /dev/hdX (where hdX the cdrom device node), is the wrong approach. CD/DVD burning utilities write slightly more data on the medium than the image data, eg when closing sessions etc.

Hello Hellfire,
This is correct. The on-hard-disk ISO image is not checked, but it is assumed that the hard disk itself and mkisofs are functioning correctly, so the image contains the exact data as it is in the directory tree. This is indeed a gap in the verification process, but I suppose that it must be extremely rare to create a bad ISO image with mkisofs.

Regarding “cat FILE | md5sum” vs. “md5sum FILE”:
I also use the first one as it happens that some md5sum version have a problem with the 2 GB file limit (basically not giving all needed options to the open() call). So given DVD ISO images are mostly larger than 2 GB one can bypass this error by letting cat do the open/reading and stream the result into md5sum.

A handy tool for md5sums is md5deep ( http://md5deep.sourceforge.net/ )
I usually generate quickly a list of md5sums with it and I include it in the medium. Not exactly what you are trying to achieve, but a handy utility nevertheless.

Trailing zero’s and nuls at the end can change the MD5 hash. So to calculate the md5sum we need to:
1) find the size of the ISO in bytes
2) run dd with this exact size in bytes: dd if=/dev/dvd | head –bytes= | md5sum
So for example:

About

G-Loaded Journal is a technical blog that publishes news, guides and tutorials about GNU/Linux and other Free Open-Source Software. Our mission is to share our experience and knowledge about system administration, automation and programming.