Giant file-rsync+dd+md5sum=no cry

Create a bash/cmd script at each end to break the file into pieces with dd.

md5sum each piece at both ends and compare to figure out which chunks are bad

transfer the bad chunks from source to target

dd the chunks back into the giant file

recheck the md5sum of the file to make sure it matches

Create a bash/cmd script at each end to break the file into pieces

Tip: rename the file to something which doesn’t require escape sequences, especially if your source/target are running different OSes. For example, spaces mean the name has to be enclosed in quotes on Windows and have a backslash prepended on Linux. So get those spaces out of there.

dd thinks in terms of blocks.

I set the blocksize to 1 megabyte to make the math easier. I want each chunk to be 128MB. The size of the chunk is up to you, but the trade-off is waiting for excess data to transfer versus dealing with more part files. Anyhow, we have
bs=1048576count=128 .

To tell
dd where to start when it’s copying data out of a file, supply the skip option. So the first chunk has
skip=0 , the second chunk has
skip=128 , the third has
skip=256 , and so on. Why?

dd thinks in terms of blocks.

I usually create an Excel workbook and use fill-down to create the correct skip numbers and then
CONCATENATE() to create the actual dd command lines. Copy and paste them into a text document. Send it to both ends with the correct extensions/permissions/shebang line/etc.

Here’s how I set up my excel sheet to create my batch/shell scriptThe formulas allow me to fill down to create the correct lines in my batch/shell script

Run the batch/shell script at each end to create corresponding partXXXX files. If you follow my example, the value in the K column shows you where to stop copying; it changes to false at the line where you’ve passed the final dd required.

md5sum the pieces at each end and compare

Pretty easy; use
md5sum on all of the partXXXX files at each end. Save the output into an md5 file and then get both files in the same place so you can compare.

md5sum all of the pieces

Shell

0

md5sum part*>part.md5

Using the command line
diff tool will work, but if you have a GUI tool it should make it easier to see which files don’t match. Let’s hope there aren’t many.

Transfer the bad chunks from source to target

This part should be easy; just send the good chunks from the source to the target to replace the bad chunks. To make sure you haven’t wasted your time,
md5sum the replacement chunks once they reach the destination. Re-retransfer any that don’t match.

dd the chunks back into the giant file

We will use
dd again. Instead of redoing the whole process in reverse, we only need to dd in the fixed chunks.

Either redo your Excel sheet or just find and replace in your target batch/shell script.

dd to reassemble file

Shell

0

ddof=Bigfile.isoif=partXXXX bs=1048576count=128conv=notrunc seek=YYYY

The key things here are that the
if and
of have been swapped, we must add
conv=notrunc, and we use
seek instead of
skip. We swap the input and output files because we’re outputting to the big file. We use
conv=notrunc because by default dd will truncate the destination file at the point where you start writing. We don’t want to destroy the file, so this is important. Finally, when we need to write the destination file anywhere other than the start, we have to use
seek instead of
skip .

You only need the lines corresponding to the fixed chunks. So your final batch/shell script might end up looking like this: