Transfer Logical Volume from one physical machine to another using ssh

I have used dd to move lvm partitions around quite often. In the past, the procedure has been

stop all processes using lvm

dd lv to image file, possibly compressing in the process

copy image file to new server using ssh or rsync

on new server, use dd to populate lv

I have often wished there was a way to simply copy directly from one lv to another on separate machines, without the intermediate steps and extra file space requirements. Thanks to the following web sites, I have a new procedure. I recommend reading the following articles as they go into more detail than I do and show additional ways of saving times using the power of ssh

In my new procedure, the lv image is copied directly to the remote server's lv, requiring less time, and less disk space. It is most simply performed if root on the source computer has ssh access on the target computer, which I do only for the period of the actual transfer (root ssh login is not a good idea normally). Its simplest form is the one command issued from the source computer:

dd if=/path/to/source | ssh root@targetServer 'dd of=/path/to/target'

Example: If you are logged into the source as root, and root has remote access to the target on 192.168.1.55, and you have created an lv of the same size in Volume Group vg0 on the target, and both are named 'temp', issue the following command:

dd if=/dev/vg0/temp | ssh 192.168.1.55 'dd of=/dev/vg0/temp'

Tweaks

Compression

I have read, in the above articles and other places, that compressing the image during transfer will increase the speed of the transfer, and I have taken this as gospel without checking it. In preperation for this article, I performed a most un-scientific test of this assumption and transferred the same 10G disk image with and without compression, to wit:

In the second line, I'm simply passing the output of dd to gzip, then letting ssh send it over the network, where on the target machine it is uncompressed (gunzip -c), then piped to dd. Imagine my surprise when it actually took longer!

I then decided to "help" gzip by writing zeros to every unused space on the volume before executing the command:

Discounting the time it took to write the zeros (188s, 3m8s, plus about 3s to delete the file), it still almost as long using compression, and if you add that time in, it actually takes the longest of all, 1633s). I can only postulate the process overhead of doing the compression shadows the savings in transfer times. Note: the source volume only contained about 700M of jpg images, so 9.3G was nothing but zeros.

Bottom Line

On a slower network, compression may help you, but if you are doing your work on a network of 1G or so, and your source and/or targets are under load, it may be faster to forgo compression completely. Under the following circumstance, I would definitely not do compression:

Source and Target are running virtual servers (ie, other processes require a lot of cpu)

Target is running Software RAID (source does not matter that much, but writing to a RAID does take processor)

Network is 1G or greater.

Additional Tweaks

Thanks to Clay (see comments below), he suggests killing the encryption on ssh to gain some additional speed and tweaking your block size (bs= parameter on the dd command). From what I read (I did not know this), the arcfour encryption appears to be a much less secure form of encryption, but significatnly faster. Use it over secured networks.

Clay also pointed out that the block size will make a difference. I remember someplace that block size should be some multiple of your sector size on the physical hard drive (512 on mine). So, the optimized command would be

I tested this (and forgot to record the results) and there was an obvious speedup using the -c arcfour. Just don't do this over an insecure system if your data is sensitive. And, everytime I've used dd without setting bs=4M, the process takes longer. Thanks Clay.

Watching what you're doing!

I love dd, and have every since I "discovered" it in the 90's. It is fast, efficient, and can copy almost anything to anything else. But, it gives no indication of how long it will take (after all, screen I/O decreases that efficient thing, right?).

So, type in dd if=bigvolume of=output and just wait. How long? Who knows. You can log into another terminal, find the pid of the dd, then type kill -USR1 pid, and it will show you the progress (or, background the original command with &, then repeatedly type it), but then you have to figure out how large the move is, and how much is left to go, then calculate the remaining time. BORING!

About 10 years ago, I found a cute little solution for this, and fell in love. Then, I lost it, but recently, I have found my lost love. Her name is "pv" and she is available on almost all Linux systems (debian: apt-get install pv). pv is designed to be part of a pipe group, and it displays a progress bar (or other options) going through the pipe. Note that I said a pipe, any pipe, something that moves data around. So

command | pv | command ....

displays something on STDERR while a command is running.

Let's give a good example. We are doing a dd from an lv to a file. Normally, you would type:

dd if=/dev/vg0/temp of=/tmp/temp.img

Press enter, then wait. Is it running? Well, the machine is certainly slower. When will it be done? Who knows. Rewrite the command as:

dd if=/dev/vg0/temp | pv | dd of=/tmp/temp.img

and you will at least get an indicator the thing is running.

However, pv has several flags you can pass to it, one of the most important being the --size (-s), telling it how much information it can expect to receive from STDIN (think about it, it has no way of knowing before hand). By setting this to even the most approximate of values, you get an indication of whether you have time for a cup of coffee, or even a full movie before your process is running. My favorite flags are:

-p show a progress bar

-e give an ETA (completion time)

-t show total elapsed time

-r show rate during transfer

So, I just remember it as peters, without one of the e's, pv -petrs SIZE.

Comment of Clay:Thanks for the informative post!
I have a setup with a direct 10 GbE connection between two ... show moreservers, so I found that the ssh encryption and dd bs were actually the bottlenecks. Thus, using the bs=4M modifier on the dd commands, and using ssh's arcfour compression (less secure, but faster) let me get up to 130MB/s. Something like:
dd if=/dev/mylvtocopy bs=4M | ssh -c arcfour otherserver dd of=/dev/newlv bs=4M Added at: 2013-07-15 07:34

Comment of Nock:Hello! I know this is kind of old but I was wondering if you tried the ... show moresame thing with the tweaks Clay commented.. I'm planning a backup strategy that will require this steps and your info would be useful. Thanks! Added at: 2015-03-25 20:30

Comment of Rod:I'm sorry, I did not post this earlier.
Yes, I did try Clay's notes and it did ... show moreincrease the speed of transfer. Just turning off the encryption made a measurable speedup, and setting blocksize helped also. Strongly recommended. Do make sure you understand the security concerns of using arcfour (I use it quite often on secure networks). Added at: 2015-03-25 21:15