After we wrote virt-df and later libguestfs, what customers were asking me about was to be able to read out of /proc and /sys in a running virtual machine.

Of course that’s not possible with libguestfs. libguestfs reads the filesystem. /proc is a synthetic “filesystem” that only exists in the living C structs in the Linux kernel. What’s worse, those structs change with every release and every vendor specific patch. Following C structs is not easy, although we did it (with the help of a giant database) for virt-mem.

The prize of being able to “read” /proc is great — reading out statistics, process tables, network configuration, and much more information besides.

To do this tractably, what we need is to be able to inject syscalls into the virtual machine. If we could inject the following sequence of syscalls, we’d be able to read /proc in a completely portable manner without needing to chase kernel structs:

open ("/proc", O_RDONLY);
getdents (fd, ...);
read (fd, ...);

Here is my half-baked idea for how to do it.

Wait for a userspace program to be running. Then pause the VM.

Fork qemu, so we have a complete copy of the VM, its state, memory and so on. The original (parent) process can now be resumed, and hence the VM resumes. The rest of this discussion concerns only the child process.

Disconnect qemu from any outside influence. This means disconnecting any block devices, network devices, and perhaps other devices. This ensures our private copy of the VM can’t accidentally overwrite any state from the real, running VM.

At this point we have a “captured” userspace process in the VM. It doesn’t particularly matter which process we happened to capture. We now set up the stack frame and registers for the system call we want to execute. Any previous contents of the memory and registers can be discarded.

Set the emulation running. (The captured userspace process now runs and performs the syscall).

Trap back into qemu when the syscall exits.

Capture the return value from the syscall, which might be a status code, error or read buffer. In any case, we’ve successfully injected a syscall into the VM and this has allowed us to read something out of /proc.

Discard the qemu child process.

We make the modest assumption that the syscall we chose will run without scheduling. Even if it does schedule, the fact that we have disconnected qemu from any block devices (writes effectively go to /dev/null) should mean at least it won’t damage anything.

Notice that we’re using the public syscall interface to the Linux kernel, not depending on the details of changing internal structures.

As ideas go this seems tractable, although the implementation is both technically difficult and probably hard to get upstream. We need a way to trap-and-pause when a VM switches to userspace. We need to be able to fork the VM and do all sorts of modifications on our copy. Then we would need some nice wrappers around this so the user just has to type “virt-ifconfig myvm” (note: previously we implemented virt-ifconfig as part of the virt-mem project by chasing kernel structs).

Or maybe it would be possible to write a driver for getting this kind of information. Like a hardware driver that does something with an interrupt and some memory which makes the guest OS report the status of whatever you want.
I must admit I don’t know enough about the other virtual drivers to know if this is easier. It just seems easier and cleaner to me.

VMWare engineers have documented the difficulties of inspecting kernel structures “from the outside” at http://stackframe.blogspot.com/ See especially the references to “getlinuxoffsets”. It sounds like you’ve done something similar for virt-mem.

I’m ignorant of KVM’s support for kernel debugging but I /do/ know that VMWare includes a robust GDB stub in their free products. Powerful stuff.

I think the “non-intrusive” approach (perhaps aided by some heuristics tailored for “recent” kernels) is slick, but the method you outlined above certainly sounds more powerful.

About the author

I am Richard W.M. Jones, a computer programmer. I have strong opinions on how we write software, about Reason and the scientific method. Consequently I am an atheist [To nutcases: Please stop emailing me about this, I'm not interested in your views on it] By day I work for Red Hat on all things to do with virtualization. I am a "citizen of the world".

My motto is "often wrong". I don't mind being wrong (I'm often wrong), and I don't mind changing my mind.

This blog is not affiliated or endorsed by Red Hat and all views are entirely my own.