I have to migrate a few servers to Linux, and one important aspect that I need to evaluate is that my new host system must have elastic storage capacity. Naturally, doing some basic research, I came across LVM.

Is there any performance penalty for using lvm? If so, how can I measure it?

What I am considering right now is to have Linux as a host OS with LVM and virtualized Linux boxes running on top of it (should I add LVM on the guest OS as well?).

6 Answers
6

LVM is designed in a way that keeps it from really getting in the way very much. From the userspace point of view, it looks like another layer of "virtual stuff" on top of the disk, and it seems natural to imagine that all of the I/O has to now pass through this before it gets to or from the real hardware.

But it's not like that. The kernel already needs to have a mapping (or several layers of mapping actually) which connects high level operations like "write this to a file" to the device drivers which in turn connect to actual blocks on disk.

When LVM is in use, that lookup is changed, but that's all. (Since it has to happen anyway, doing it a bit differently is a negligible performance hit.) When it comes to actually writing the file, the bits take as direct a path to the physical media as they would otherwise.

There are cases where LVM can cause performance problems. You want to make sure the LVM blocks are aligned properly with the underlying system, which should happen automatically with modern distributions. And make sure you're not using old kernels subject to bugs like this one. Oh, and using LVM snapshots degrades performance (and increasingly so with each active snapshot). But mostly, the impact should be very small.

As for the last: how can you test? The standard disk benchmarking tool is bonnie++. Make a partition with LVM, test it, wipe that out and (in the same place, to keep other factors identical) create a plain filesystem and benchmark again. They should be close to identical.

With respect to performance, LVM will hinder you a little bit because it is another layer of abstraction that has to be worked out before bits hit (or can be read from) the disk. In most situations, this performance hit will be practically unmeasurable.

The advantages of LVM include the fact that you can add more storage to existing filesystems without having to move data around. Most people like it for this advantage.

One disadvantage of LVM used in this manner is that if your additional storage spans disks (ie involves more than one disk) you increase the likelyhood that a disk failure will cost you data. If your filesystem spans two disks, and either of them fails, you are probably lost. For most people, this is an acceptable risk due to space-vs-cost reasons (ie if this is really important there will be a budget to do it correctly) -- and because, as they say, backups are good, right?

For me, the single reason to not use LVM is that disaster recovery is not (or at least, was not) well defined. A disk with LVM volumes that had a scrambled OS on it could not trivially be attached to another computer and the data recovered from it; many of the instructions for recovering LVM volumes seemed to include steps like go back in time and run vgcfgbackup, then copy the resulting /etc/lvmconf file to the system hosting your hosed volume. Hopefully things have changed in the three or four years since I last had to look at this, but personally I never use LVM for this reason.

That said.

In your case, I would presume that the VMs are going to be relatively small as compared to the host system. This means to me you are more likely to want to expand storage in a VM later; this is best done by adding another virtual disk to the VM and then growing the affected VM filesystems. You don't have the spanning-multiple-disks vulnerability because the virtual disks will quite likely be on the same physical device on the host system.

If the VMs are going to have any importance to you at all, you will be RAID'ing the host system somehow, which will reduce flexibility for growing storage later. So the flexibility of LVM is probably not going to be required.

So I would presume you would not use LVM on the host system, but would install VMs to use LVM.

@DM - You seem to have skipped mentioning that an LVM2 physical volume may be any block device, including md-RAID. I.E: pvcreate /dev/md0 regardless of what the underlying RAID type /dev/md0 is. So if your /dev/md0 happens to be a RAID array of mirrored physical disks... It is sort of hard for the loss of a single physical drive to affect your LVM2 group. Also: You can use a LVM2 logical volume(s) as the media side when creating a RAID array. Both operate at the device mapper level, both are device-in / device-out layers.
– user13719Dec 27 '11 at 22:27

1

your recovery concerns are excessive, it is trivial to move a lvm array between computers with a reasonably recent linux distro (i.e. debian oldstable is new enough)
– hildredJan 14 '14 at 13:33

@user13719: yes, you can LVM any block device, but in practice people don't do this. They end up with a single drive LVM'd. Then they add another drive, and use LVM to extend the existing file system onto the new disk. At that point, the failure of either disk will kill the LVM.
– David MackintoshFeb 11 '14 at 21:59

@hildred, the above is what I was referring to -- I am not aware of any tools that can recover data from a LVM that spans multiple disks (block devices) with one disk missing.
– David MackintoshFeb 11 '14 at 22:00

2

This is like saying knives are bad because you might cut yourself while juggling them... How about just not doing that? Use them for tasks that they are better suited towards, like cutting veggies.
– Chinoto VokroNov 3 '16 at 21:05

But just benchmark it on your own and see if your hardware and the OS you want to use behave the same and if you can ignore the (maybe slightly) impact of an additional layer of complexity which gives you elastic storage.

Should you add LVM to the guest OS: That depends on if you need the guest OS to have elastic storage as well, doesn't it? Your needs dictate what you have to deploy.

You certainly can change how work is done. For example, I can give you directions to a location via GPS coordinates, or by streetnames, or by local landmarks. Different ways, but you still have to walk the same path. The time it takes to look at a paper map vs. following your phone's instructions may differ slightly, but ultimately it's negligible compared to the walking time.
– mattdmFeb 4 '15 at 16:05

I already stated that the impact of the added work in the case of lvm has no real impact. I wonder, what the point is you are driving at?
– akiraFeb 4 '15 at 16:10

The point I am "driving at" is that "Note: You only add work and not 'change' they way the work is done" is not a factual statement.
– mattdmFeb 13 '16 at 19:26

@mattdm: it's obvious, that if you change the way the work is done (eg, another algorithm, another fs etc), that you then get different results. lvm is not changing the way the fs works. you know that. and that is why i am wondering what your point actually is? "add a layer of something" means "adding", not "changing the other thing as well". you know that as well.
– akiraFeb 14 '16 at 20:51

No one mention lvm2 can make read and write speed get multiplicated (similar to raid0).
I personally use 3 identical disks and over them lvm2 in stripped mode, the read and write operations takes 1/3 of the time, thas is a big impact, filesystem is three ti es faster over it.
I know: any disk fail and all data on them will not be accesible; but that does not mean any lost, since BackUPs are a MUST, nothing like Raid, LVM2, ZFS will avoid to have BackUPs; so i never use mirroring, raid5 and such, i allways use stripping (to get top most performance) and have synced BackUPs.
ZFS is great for on-the-fly compression, and with copies parameter bigger than one is like mirroring, but one thing ZFS has and no one else has is auto-recover on-the-fly the bit rot (bits that spontaneously changes while disk is powered off), but ZFS i poses a really great impact (calculate checksums, verify them) and a mayor problem (adding more physical disks).

To resume: i use ZFS only for my BackUPs on external disks, multiple (two or three) ssd with lvm2 striped for OS (aftwr upgrades i redo the clone of the OS), i tend to use inmutable OS; and i use multiple (six) spinnin disks with lvm2 stripped for data, like virtual machines, again afyer any change i redo BackUPs; so after any disk fail i only need to replace it and restore last backup; now a days i have near 1.8GiB/s write speed, so restoring one virtual machine from BackUP only takes less than 30 seconds (32GiB per virtual machine disk).

So my answer is: do not use just one thing, be smart and use the best of each part, lvm2 stripped is faster than mdraid level 0, more when using six spinning disks; one warning with stripping ssd, two and three is good, four ssd can degrade performance (my tests gave lower write speed when i used four identical ssd in stripped mode, no matter if lvm, mdraid0, etc), seems that SSD TRIM and such write amplification can be the main cause of adding more ssd to stripped volume makes lower write speed.

Waring with ssd, and any raid0 (stripped volumes), align things perfectly, assign cluster sizes on filesystem correctly, stip size, etc so no one causes degradation; as sample: disk sector is 2048, so 2K at any read/write as minimun, never use a filesystem that uses 512 bytes clusyer, over that, better to use 2K or 4K cluster size; now imagine you use 3xHDD, each of 2K sectors, so at any read/write optimun filesystem cluster would be 3x2K=6K, but that is not possible on many filesystems, then think what if use a 64K cluster size, 64K/6K=32/3, that causes unbalanced, so not optimal, and so on. Make maths to get optimum cluster size.

My bests results are: Cluster size = stripsize * number of disk on the stripe; that way each read/write is of the exact size that causes all disks to work, so speed improve is rrally great. An an example 192K cluster size for 3 disks with 64K stripe size; another example 192K cluster size for 6 disk with 32K stripe size.

And allways remember to test single disk in 4K, 8K, 16K, 32K, 64K block; a lot of disks gives really bad speeds with lower numbers like 4K, but gives more than ten times faster time when on 64K, 128K or higher.

Yes, using big cluster sizes can make a lost of space waste on las cluster of each file (if you use millions of files of only 1 byte each) better use a compact/pack on-the-fly system over the file-system, as a sample a 4TiB disk with a 4K cluster size can only have less than 4TiB/4K=1073741824 files of 1Byte each, that is just 1GiB if all files are 1Byte size (cluster size 4K), bigger cluster size worst ratio, but if files are huge, like virtual machines (near 32GiB as a sample, or just a few megabytes) the lost is only on last cluster; so big files, big cluster size is much better for performance, but beware how virtual machine uses it.

No one will tell you this secret: inside the guest do not use 4K cluster size, use the same cluster size as the cluster size whrere the virtual disk resides, or a multiple of it.

Yes, i am a manic of getting the top most speed inside the guest disks, as i said with 6 rotating disks i get near 1.7GiB/s, SATA III bus speed is the bottleneck, not the disks themselfs. I use high end (not cheap) disks, 128MiB cache with write speed of 283MiB/s each.

For you and for all people: It is much best to learn how cluster size, stripe size and block size must be related prior to do any speed test, else testing LVM2 or any other RAID (also ZFS) can give FALSE conclusions.

Just a sample for such: I test my linux boot times with 2x60MiB/s 2.5 inch 5400rpm Sata disks on a Sata II ports mainboard, and then test with 2xSSD Sata III (they can write more than 250MiB/s each if connected to Sata III ports), the boot times only takes two second less, just two seconds on a five minute boot, why? because most of the boot time disks are not being used, it is doing things on ram and cpu, but not i/o.

Allways test real-day thing you will do, not just crude speeds (in other words, max speed).

Max speed is good to be know bit not representable, you may not be using the disks at max speed 100% of the time, OS and APPs must do things on ram and cpu without I/O, so on that time disk speed does not matter at all.

All people say SSD improves a lot Windows Boot speed, on my tests that is also FALSE, it only i proves 28 seconds on a boot time of near eigth minutes.

So if you do like me: Linux copy-to-ram on boot, SSD will not be bettet than rotating HDDs, i had also tested USB 3.1 Gen2 stick (139MiB/s read), boot time gets only affeted a few seconds on a five minute boot,why? easy, the read is done when copying to ram, afyer than disk/ssd/usb-stick is not used again on the rest of the bolt, data is on ram, like a ram-drive.

Now i am selling all my SSD i have, they do not improve Linux copy-on-ram at boot, but benchmarking them say they are 5x times faster... see, benchmark gives FALSE conclusions... yest, test and test real day work.

Hope this can male things clear... LVM with bad cluster and stripe sizes affect much more by far than overhead of layer.