I have a kvm environment where a shared disk mounted by 2 VMs through rbd ceph. What I would like to know that in the future how is it possible to expand OCFS2 filesystem what I planning to use on top of it:

LVM2 is not an option:

"As of this writing OCFS2 is not integrated or supported with any volume managers. The procedure to extend an existing OCFS2 repository is done manually from only one Oracle VM server pool member (dom0)."

"It is important to note that creating OCFS2 volumes on logical volumes (LVM) is not supported. This
is due to the fact that logical volumes are not cluster aware and corruption of the OCFS2 file system
may occur."

In other cases when some of these KVM machines needs more disk I can add more disks on the fly to them (eg.: vdb, vdc, vdd) then pvadd them to LVM then simply grow my logical volumes but if this is not possible with OCFS2 then what is the solution for doing this? (Preferrably that I don't ever have to shut the VM down or umount the filesystem for it).

1 Answer
1

I don't have a Ceph cluster on hand to play with at the moment, but from memory you can expand an existing RBD volume. Thus, no LVM required. You might have troubles getting the VMs to recognise the larger RBD volume without a reboot, though -- KVM's ability to pass underlying block devices changes into the VM is spotty, at best (testing will quickly answer for you, one way or another). The OCFS2 docs are wrong about LVM not being cluster-aware, though -- you just need to use clvm instead of regular local-only LVM. (Oracle being wrong about something? Say it ain't so!)

Another option, given that you're using RBD and it uses sparse allocation, is that you can just initially make the RBD volume "freaking enormous" (as large as you could ever possibly want), and then add more OSDs to store the extra data as you store more data. This solution comes unstuck if you're regularly deleting files (unless TRIM is supported all the way down the stack, which I'd say is extremely unlikely with OCFS2). If you're only adding files, though, this approach is likely to work quite well.

I'd strongly recommend against this whole "one giant filesystem" approach, though. There are "sweet spots" for filesystem sizes, and going beyond them tends to cause problems. If you ever get filesystem corruption, for instance, trying to recover a multi-petabyte filesystem is likely to take... quite a while. At some point, you're probably better off just using RADOS directly (or with an interface layer like radosgw, and use the S3 or SWIFT protocols) to store your files as individual objects, and skip the filesystem layer entirely.

Hello, you are right, I have tested it yesterday that it is possible to extend the rbd outside, then grow OCFS2 to the new size later, however simple reboot of the VMs is not enough. Have to completely shut them down and start them again to recognize the new disk geometry. Still it's better than putting another layer on top like CLVM.
– pluto73Aug 12 '15 at 9:06

A reboot of a VM does not stop the qemu process, Stop/start does, and when the process starts it will rescan the attached disks
– dyasnyAug 12 '15 at 12:41