Introduction

LVM (Logical Volume Management) offers a great flexibility in managing your storage and significantly reduces server downtimes by allowing on-line disk space management: The great idea beneath LVM is to make the data and its storage loosely coupled through several layers of abstraction. You (the system administrator) have the hand of each of those layers making the entire space management process extremely simple and flexible through various set of coherent commands.

Several other well-known binary Linux distributions makes an aggressive use of LVM and several Unixes including HP-UX, AIX and Solaris offers since a while a similar functionality modulo the commands to be used. LVM is not mandatory but its usage can bring you additional flexibility and make your everyday life much more simpler.

Concepts

As usual, having a good idea of the concepts lying beneath is mandatory. LVM is not very complicated, but it is easy to become confused, especially because it is a multi-layered system; however LVM designers had the good idea of keeping the command names consistent between all LVM command sets, making your life easier.

LVM consists of, mainly, three things:

Physical volumes (or PV): nothing more than a physical storage space. A physical volume can by anything like a partition on a local hard disk, a partition located on a remote SAN disk, a USB key or whatever else that could offer a storage space (so yes, technically it could be possible to use an optical storage device accessed in packet writing mode). The storage space on a physical volumes is divided (and managed) in small units called Physical Extents (or PE). Just to give an analogy if you are a bit familiar with RAID, PE are a bit like RAID stripes.

Volume Groups (or VG): a group of at least one PV. VG are named entities and will appear in the system via the device mapper as /dev/volume-group-name.

Logical Volumes (or LV): a named division of a volume group in which a filesystem is created and that can be mounted in the VFS. Just for the record, just as for the PE in PV, a LV is managed as chucks known as Logical Extents (or LE). Most of the time those LE are hidden to the system administrator due to a 1:1 mapping between them and the PE lying be just beneath but a cool fact to know about LEs is that they can be spread over PV just like RAID stripes in a RAID-0 volume. However, researches done on the Web tends to demonstrate system administrators prefer to build RAID volumes with mdadm than use LVM over them for performance reasons.

In short words: LVM logical volumes (LV) are containers that can hold a single filesystem and which are created inside a volume group (VG) itself composed by an aggregation of at least one physical volumes (PV) themselves stored on various media (usb key, harddisk partition and so on). The data is stored in chunks spread over the various PV.

Note

Retain what PV, VG and LV means as we will use those abbreviations in the rest of this article.

Your first tour of LVM

Physical volumes creation

Note

We give the same size to all volumes for the sake of the demonstration. This is not mandatory and be possible to have mixed sizes PV inside a same VG.

Okay nothing really exciting there, but wait the fun is coming! First check that sys-fs/lvm2 is present on your system and emerge it if not. At this point, we must tell you a secret: although several articles and authors uses the taxonomy "LVM" it denotes "LVM version 2" or "LVM 2" nowadays. You must know that LVM had, in the old good times (RHEL 3.x and earlier), a previous revision known as "LVM version 1". LVM 1 is now considered as an extincted specie and is not compatible with LVM 2, although LVM 2 tools maintain a backward compatibility.

The very frst step in LVM is to create the physical devices or PV. "Wait create what?! Aren't the loopback devices present on the system?" Yes they are present but they are empty, we must initialize them some metadata to make them usable by LVM. This is simply done by:

Allocatable indicates whether the PV is used to store data. As the PV is not a member of a VG, it cannot not be used (yet) hence the "NO" shown. Another set of information is the lines starting with PE. PE stands for Physical Extents (data stripe) and is the finest granularity LVM can manipulate. The size of a PE is "0" here because we have a blank PV however it typically holds 32 MB of data. Following PE Size are Total PE which show the the total number of PE available on this PV and Free PE the number of PE remaining available for use. Allocated PE just show the difference between Total PE and Free PE.

The latest line (PV UUID) is a unique identifier used internally by LVM to name the PV. You have to know that it exists because it is sometimes useful when having to recover from corruption or do weird things with PV however most of the time you don't have to worry about its existence.

Note

It is possible to force how LVM handles the alignments on the physical storage. This is useful when dealing with 4K sectors drives that lies on their physical sectors size. Refer to the manual page.

Volume group creation

We have the blank PV at this time but to make them a bit more usable for storage we must tell to LVM how they are grouped to form a VG (storage pool) where LV will be created. A nice aspect of VGs resides in the fact that they are not "written in the stone" once created: you can still add, remove or exchange PV (in the case the device the PV is stored on fails for example) inside a VG at a later time. To create our first volume group named vgtest:

Just like we did before with PV, we can get a list of what are the VG known by the system. This is done through the command vgs:

# vgs
VG #PV #LV #SN Attr VSize VFree
vgtest 3 0 0 wz--n- 5.99g 5.99g

vgs show you a tabluar view of information:

VG: the name of the VG

#PV: the number of PV composing the VG

#LV: the number of logical volumes (LV) located inside the VG

Attrs: a status field. w, z and n here means that VG is:

w:writable

z: resizable

n: using the allocation policy normal (tweaking allocation policies is beyond the scope of this article, we will use the default value normal in the rest of this article)

VSize and VFree gives statistics on how full a VG is versus its size

Note the dashes in Attrs, they mean that the attribute is not active:

First dash (3rd position) indicates if the VG would have been exported (a 'x' would have been showed at this position in that case).

Second dash (4th position) indicates if the VG would have been partial (a 'p' would have been showed at this position in that case).

Third dash (rightmost position) indicates if the VG is a clustered (a 'c' would have been showed at this position in that case).

Exporting a VG and clustered VG are a bit more advanced aspects of LVM and won't be covered here especially the clustered VGs which are used in the case of a shared storage space used in a cluster of machines. Talking about clustered VGs management in particular would require and entire article in itself. For now the only detail you have to worry about those dashes in Attrs is to see a dash at the 4th position of Attrs instead of a p. Seeing p there would be a bad news: the VG would have missing parts (PV) making it not usable.

Note

In the exact same manner you can see a detailed information about physical volumes with pvdisplay, you can see detailed information of a volume group with vgdisplay. We will demonstrate that latter command in the paragraphs to follow.

Before leaving the volume group aspect, do you remember the pvs command shown in the previous paragraphs? Try it gain:

Logical volumes creation

Now the final steps: we will create the storage areas (logical volumes or LV) inside the VG where we will then create filesystems on. Just like a VG has a name, a LV has also a name which is unique in the VG.

Note

Two LV can be given the same name as long as they are located on a different VG.

To divide our VG like below:

lvdata1: 2 GB

lvdata2: 1 GB

lvdata3 : 10% of the VG size

lvdata4 : All of remaining free space in the VG

We use the following commands (notice the capital 'L' and the small 'l' to declare absolute or relative sizes):

Basically it say we have 1533 PE (chunks) available for a total size of 5.99 GiB. On those 1533, 921 are used (for a size of 3.60 GiB) and 612 remains free (for a size of 2.39 GiB). So we expect to see lvdata4 having an approximative size of 2.4 GiB. Before creating it, have a look at some statistics at the PV level:

Quite interesting! Did you notice? The first PV is full, the second is more or less full and the third is empty. This is due to the allocation policy used for the VG: it fills its first PV then its second PV and then its third PV (this, by the way, gives you a chance to recover from a dead physical storage if by luck none of your PE was present on it).

It is now time to create our last LV, again notice the small 'l' to specify a relative size:

Now the $100 question: if pvdisplay and vgdisplay commands exist, does command named lvdisplay exist as well? Yes absolutely! Indeed the command sets are coherent between abstraction levels (PV/VG/LV) and they are named in the exact same manner modulo their first 2 letters:

Nothing extremely useful to comment for an overview beyond showing at the exception of two things:

LVs are accessed via the device mapper (see the lines starting by LV Name and notice how the name is composed). So lvdata1 will be accessed via /dev/vgtest/lvdata1, lvdata2 will be accessed via /dev/vgtest/lvdata2 and so on.

just like PV are managed in sets of data chunks (the so famous Physical Extents or PEs), LVs are managed in a set of data chunks known as Logical Extents or LEs. Most of the time you don't have to worry about the existence of LEs because they fits withing a single PE although it is possible to make them smaller hence having several LE within a single PE. Demonstration: if you consider the first LV, lvdisplay says it has a size of 2 GiB and holds 512 logical extents. Dividing 2GiB by 512 gives 4 MiB as the size of a LE which is the exact same size used for PEs as seen when demonstrating the pvdisplay command some paragraphs above. So in our case we have a 1:1 match between a LE and the underlying PE.

Oh another great point to underline: you can display the PV in relation with a LV :-) Just give a special option to lvdisplay:

To go one step further let's analyze a bit how the PE are used: the first LV has 512 LEs (remember: one LE fits within one PE here so 1 LE = 1 PE). Amongst those 512 LEs, 511 of them (0 to 510) are stored on /dev/loop0 and the 512th LE is on /dev/loop1. Huh? Something seems to be wrong here, pvdisplay said that /dev/loop0 was holding 512 PV so why an extent has been placed on the second storage device? Indeed its not a misbehaviour and absolutely normal: LVM uses some metadata internally with regards the PV, VG and LV thus making some of storage space unavailable for the payload. This explains why 1 PE has been "eaten" to store that metadata. Also notice the linear allocation process: /dev/loop0 has been used, then when being full /dev/loop1 has also been used then the turn of /dev/loop2 came.

Now everything is in place, if you want just check again with vgs/pvs/vgdisplay/pvdisplay and will notice that the VG is now 100% full and all of the underlying PV are also 100% full.

Filesystems creation and mounting

Now we have our LVs it could be fun if we could do something useful with them. In the case you missed it, LVs are accessed via the device mapper which uses a combination of the VG and LV names thus:

lvdata1 is accessible via /dev/vgtest/lvdata1

lvdata2 is accessible via /dev/vgtest/lvdata2

and so on!

Just like any traditional storage device, the newly created LVs are seen as block devices as well just as if they were a kind of harddisk (don't worry about the "dm-..", it is just an internal block device automatically allocated by the device mapper for you):

Renaming a volume group and its logical volumes

So far we have four LVs named lvdata1 to lvdata4 mounted on /mnt/data01 to /mnt/data04. It would be more adequate to :

make the number in our LV names being like "01" instead of "1"

rename our volume groupe to "vgdata" instead of "vgtest"

To show how dynamic is the LVM world, we will rename our VG and LV on the fly using two commands: vgrename for acting at the VG level and its counterpart lvrename to act at the LV level. Starting by the VG or the LVs makes strictly no difference, you can start either way and get the same result. In our example we have chosen to start with the VG:

Ooops... It is not exactly a bug, mount still shows the symlinks used at the time the LVs were mounted in the VFS and has not updated its information. However once again everything is correct because the underlying block devices (/dev/dm-0 to /dev/dm-3) did not changed at all. To see the right information the LVs must be unmounted and mounted again:

Using /dev/volumegroup/logicalvolume or /dev/volumegroup-logicalvolume makes no difference at all, those are two sets of symlinks pointing on the exact same block device.

Expanding and shrinking the storage space

Did you notice in the previous section we have never talked on topic like "create this partition at the beginning" or "allocate 10 sectors more". In LVM you do not have to worry about that kind of problematics: your only concern is more "Do I have the space to allocate a new LV or how can I extend an existing LV?". LVM takes cares of the low levels aspects for you, just focus on what you want to do with your storage space.

The most common problem with computers is the shortage of space on a volume, most of the time production servers can run months or years without requiring a reboot for various reasons (kernel upgrade, hardware failure...) however they regularly requires to extend their storage space because we do generate more and more data as the time goes. With "traditional" approach like fiddling directly with hard drives partitions, storage space manipulation can easily become a headache mainly because it requires coherent copy to be made and thus application downtimes. Don't expect the situation to be more enjoyable with a SAN storage rather a directly attached storage device... Basically the problems remains the same.

Expanding a storage space

The most common task for a system administrator is to expand the available storage space. In the LVM world this implies:

Creating a new PV

Adding the PV to the VG (thus extending the VG capacity)

Extending the existing LVs or create new ones

Extending the structures of the filesystems located on a LV in the case a LV is extended (Not all of the filesystems around support that capability).

Bringing a new PV in the VG

In the exact same manner we have created our first PV let's create our additional storage device, associate it to a loopback device and then create a PV on it:

Great, vgdata is now 8 GB large instead of 6 GB and have 2 GB of free space to allocate to either new LVs either existing LVs.

Extending the LV and its filesystem

Bringing new LV would demonstrate nothing more nevertheless extending our existing LVs is much more interesting. How can we use our 2GB extra free space? We can, for example, split it in two allocating a 50% to our first (lvdata01) and third (lvdata03) LV adding 1GB of space to both. The best of the story is that operation is very simple and is realized with a command named lvextend:

Ouaps!! We did a mistake there: lvdata01 has the expected size (2GB + 1GB for a grand total of 3 GB) but lvdata03 only grown of 512 MB (for a grand total size of 1.1 GB). Our mistake was obvious: once the first gigabyte (50% of 2GB) of extra space has been given to lvdata01, only one gigabyte remained free on the VG thus when we said "allocate 50% of the remaining gigabyte to lvdata03" LVM added only 512 MB leaving the other half of this gigabyte unused. The vgs command can confirm this:

Obviously resizing a LV does not "automagically" resize the filesystem structures to take into account the new LV size making that step part of our duty. Happily for us, ext3 can be resized and better it can be grown when mounted in the VFS. This is known as online resizing and a few others filesystems supports that capability, among them we can quote ext2 (ext3 without a journal), ext4 (patches integrated very recently as of Nov/Dec 2011), XFS, ResiserFS and BTRFS. To our knowledge, only BTRFS support both online resizing and online shrinking as of Decembrer 2011, all of the others require a filesystem to be unmounted first before being shrunk.

Note

Consider using the option -r when invoking lvextend, it asks the command to perform a filesystem resize.

Now let's extend (grow) the ext3 filesystem located on lvdata01. As said above, ext3 support online resizing hence we do not need to kick it out of the VFS first:

Et voila! Our LV has now plenty of new space usable :-) We do not bother about how the storage is organized by LVM amongst the underlying storage devices and it is not our problem after all. We only worry about having our storage requirements being satisfied without any further details. From our point of view everything is seen just as if we were manipulating a single storage device subdivided in several partitions of a dynamic size and always organized in a set of contiguous blocks.

Now let's shuffle the cards a bit more: when we examined how the LEs of our LVs were allocated, we saw that lvdata01 (named lvdata1 at this time) consisted of 512 LEs or 512 PEs (because of the 1:1 mapping between those) spread over two PVs. As we have extended it to use an additional PV, we should see it using 3 segments:

Segment 1: located on the PV stored on /dev/loop0 (LE/PE #0 to #510)

Segment 2: located on the PV stored on /dev/loop1 (LE/PE #511)

Segment 3: located on the PV stored on /dev/loop1 (LE/PE #512 and followers)

Bingo! Note that if it is true here (LVM uses linear allocation) would not be true in the general case.

Warning

Never mix a local storage device with a SAN disk within the same volume group and especially if that later is your system volume. It will bring you a lot of troubles if the SAN disk goes offline or bring weird performance fluctuations as PEs allocated on the SAN will get faster response times than those located on a local disk.

Shrinking a storage space

On some occasions it can be useful to reduce the size of a LV or the size of the VG itself. The principle is similar to what has been demonstrated in the previous section:

umount the filesystem belong to the LV to be processed (if your filesystem does not support online shrinking)

reduce the filesystem size (if the LV is not to be flushed)

reduce the LV size - OR - remove the LV

remove a PV from the volume group if no longer used to store extents

The simplest case to start with is how a LV can be removed: a good candidate for removal is lvdata03, we failed to resize it and the better would be to scrap it. First unmount it:

Noticed the little change with lvs? It lies in the Attr field: once the lvdata03 has been unmounted, lvs tells us the LV is not opened anymore (the little o at the rightmost position has been replaced by a dash). The LV still exists but nothing is using it.

To remove lvdata03 use the command lvremove and confirm the removal by entering 'y' when asked:

Notice the 1.60 of space has been freed in the VG. What can we do next? Shrinking lvdata04 by 50% giving roughly 1.2GB or 1228MB (1.2*1024) of its size could be a good idea so here we go. First we need to umount the filesystem from the VFS because ext3 does not support online shrinking.

Wow, we have near 3 GB of free space inside, a bit more than one of our PV. It could be great if we can free one of the those and of course LVM gives you the possibility to do that. Before going further, let's check what happened at the PVs level:

Did you noticed? 1 GB of space has been freed on the last PV (/dev/loop3) since lvdata04 has been shrunk not counting the space freed on /dev/loop1 and /dev/loop2 after the removal of lvdata02.

Next steo: can we remove a PV directly (the command to remove a PV from a VG is vgreduce)?

# vgreduce vgdata /dev/loop0
Physical volume "/dev/loop0" still in use

Of course not, all of our PVs supports the content of our LVs and we must find a manner to move all of the PE (physical extents) actually hold by the PV /dev/loop0 elsewhere withing the VG. But wait a minute, the victory is there yet: we do have some free space in the /dev/loop0 and we will get more and more free space in it as the displacement process will progress. What is going to happen if, from a concurrent session, we create others LV in vgdata at the same time the content of /dev/loop0 is moved? Simple: it can be filled again with the PEs newly allocated.

So before proceeding to the displacement of what /dev/loop0 contents, we must say to LVM: "please don't allocate anymore PEs on /dev/loop0". This is achieved via the parameter -x of the command pvchange:

Great news here, the Attrs field shows a dash instead of 'a' at the leftmost position meaning the PV is effectively not allocatable. However marking a PV not allocatable does not wipe the existing PEs stored on it. In other words, it means that data present on the PV remains absolutely intact. Another positive point lies the remaining capacities of the PVs composing vgdata: the sum of free space available on /dev/loop1, /dev/loop2 and /dev/loop3 is 3060MB (1016MB + 1020MB + 1024MB) so largely sufficient to hold the 2048 MB (2 GB) actually stored on the PV /dev/loop0.

Now we have frozen the allocation of PEs on /dev/loop0 we can make LVM move all of PEs located in this PV on the others PVs composing the VG vgdata. Again, we don't have to worry about the gory details like where LVM will precisely relocate the PEs actually hold by /dev/loop0, our only concerns is to get all of them moved out of /dev/loop0. That job gets done by:

We don't have to tell LVM the VG name because it already knows that /dev/loop0 belongs to vgdata and what are the others PVs belonging to that VG usable to host the PEs coming from /dev/loop0. It is absolutely normal for the process to takes some minutes (real life cases can go up to several hours even with SAN disks located on high-end storage hardware which is much more faster than local SATA or even SAS drive).

At the end of the moving process, we can see that the PV /dev/loop0 is totally free:

511 PEs free out of a maximum 511 PEs so all of its containt has been successfully spread on the others PVs (the volume is also still marked as "unallocatable", this is normal). Now it is ready to be detached from the VG vgdata with the help of vgreduce :

Great! Things are just simple than that. In their day to day reality, system administrators drive their show in a extremely close similar manner: they do additional tasks like taking backups of data located on the LVs before doing any risky operation or plan applications shutdown periods prior starting a manipulation with a LVM volume to take extra precautions.

Replacing a PV (storage device) by another

The principle a mix of what has been said in the above sections. The principle is basically:

Create a new PV

Associate it to the VG

Move the contents of the PV to be removed on the remaining PVs composing the VG

Remove the PV from the VG and wipe it

The strategy in this paragraph is to reuse /dev/loop0 and make it replace /dev/loop2 (both devices are of the same size, however we also could have used a bigger /dev/loop0 as well).

Here we go! First we need to (re-)create the LVM metadata to make /dev/loop0 usable by LVM: