Posted
by
samzenpus
on Friday October 30, 2015 @04:34AM
from the the-fix-is-in dept.

williamyf writes: ArsTechinca, The Register, and other outlets are reporting that today the XEN project patched a vulnerability in the ParaVirtualized VMs that allowed a guest to access the control OS of the hypervisor. Qubes researchers wrote: "On the other hand, it is really shocking that such a bug has been lurking in the core of the hypervisor for so many years. In our opinion the Xen project should rethink their coding guidelines and try to come up with practices and perhaps additional mechanisms that would not let similar flaws to plague the hypervisor ever again".

It never really ever had "relevant" market share. Hypervisors only have been in the market for roughly ten years, and its not like they have been running nuclear power plants. Its closest relative, the microkernel, has in QNX, and other proprietary products.

It was almost as bad as a "if(true == true)" "bug". They should have unit tested this to make sure it failed as expected. Always test your edge cases, and also try to test corner cases. But really. Who doesn't test their edge cases? They're the simplest tests to identify.

My biggest complaint about other programmers is they don't account for the possible ranges of their parameters. Many just want to get the program "working" and nothing more. I don't like undefined behavior. I try my best to design my programs to have all behaviors well defined. When something goes wrong, I can almost always tell you exactly why it went wrong or exactly where to look. I hate the whole, get 20 people involved because no one knows what code is causing the issue and lets spend hours guessing wh

I can offer some obvious, to me, thoughts.*) End points are sometimes ill defined and the last legal value and first illegal value must becorrect... Off by one bugs fall into this... so does testing for zero in floating point land.Often found inside a function.*) Edge cases would be interface is

By some usage of "rarely". They seem pathological to me. They tend to happen in bursts or start once a threshold is met, in my experience. 80% of my job is dealing with these "rare" corner cases that other people never thought or cared about.

The truth is nobody uses para-virtualized VMs anymore. EC2 which was the last bastion for pv xen stopped using it a couple of years ago and moved entirely to hvm model. I'm not even sure that the latest Linux kernel support are compiled with Xen PV support. If you looked at the kernel code for PV XEN support you know what the mess that was so good riddance. You need to understand what PV mode means for hypervisors: a kernel must be specifically modified to talk to a hypervisor so instead of performing a privileged CPU instruction it would call a Hypervisor provided function. I'm sure there were tons of security issues with that approach and many still exists. Anyway PV model is not relevant anymore since Intel introduced hardware virtualization on the CPU. It was introduced to to improve perfromance of VMs but it's not relevant anymore

I'm not even sure that the latest Linux kernel support are compiled with Xen PV support.

You mean, you're not sure if it defaults to Y? Or whether common distributions are enabling it when they build the kernel? The answer to the latter, at least, is yes. AFAICT, though, most people still using PV are using KVM. The rest are using containers.

PV and HVM are not mutually exclusive. The basic idea of having a modified, "virtualization-aware" guest-os (a.k.a PV) is a good one and results in better performance. Hardware Virtualization often simplifies the implementation of both the hypervisor and PV on the guest, but does not obviate the need for PV.
Using both in tandem can result in even greater performance gains.
http://wiki.xen.org/wiki/PV_on... [xen.org]

You probably want to link to PVH, not PVHVM for the real relevant approach. That said, HVM usually implies, at a minimum, hardware support for nested page tables. The bug in question is only present when using shadow page tables. Even if you're using PV devices in an HVM or PVH VM, you're using the hardware page tables.

The parent post should be considered a NOOP. There's plenty of accurate information available about the different modes and hybrid modes that hypervisors use these days, but reading the above will just make you stupid.

That may be the case for cloud deployment. However, there are other very important areas that PVs are being used. For example: qubes, a security focused Linux distribution https://www.qubes-os.org/ [qubes-os.org].

In addition, there is actually a full spectrum between PV and HVM: http://wiki.xen.org/wiki/Xen_P... [xen.org]. Very few use straight HVM, generally it is HVM + PV Drivers. Linux on Xen ends up using PVHVM. The sweet spot for Open Sources OS under Xen is PVH.

Do you understand the function of a hypervisor? Do you understand how tremendously BAD it is if the host OS can control the hypervisor?

For seven years, Xen virtualization software used by Amazon Web Services and other cloud computing providers has contained a vulnerability that allowed attackers to break out of their confined accounts and access extremely sensitive parts of the underlying operating system.

So, imagine... all these people selling cloud services, making millions and millions of dollars... no

Proprietary software: lawful users may never know about critical security exploits. Even if they do, they are at the mercy of the software's owner; if the owner tells you to toss off, you're SOL.

FLOSS software: anybody can discover a bug, notify the maintainer, and have it fixed promptly. Even the maintainer won't do it, one also has the freedom to make the fix and recompile the source on one's own.

ESR didn't say "given enough eyeballs, no bugs exist."He said they are -shallow-. "The fix will be obvious to someone". That is, you won't spend a month trying to to figure out exactly why foo sometimes conflicts with widget - with with several people looking at the source (not just the output of the binary), someone will more quickly see why foo conflicts with widget and how to fix it.

It looks like in this case it was about 48 hours or so to characterize the problem, agree on the proper fix, code it, test, patch the major public clouds, and release it publicly. Guessing that patching the public clouds took 24 hours, that's about 24 hours for understanding the problem, discussing it fixing it, and testing. Not bad. Here's a quote from CATB with the context of the "bugs are shallow" part:

----... if any serious bug proved intractable. Linus was behaving as though he believed something like this:

8. Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.

My original formulation was that every problem ``will be transparent to somebody''. Linus demurred that the person who understands and fixes the problem is not necessarily or even usually the person who first characterizes it. ``Somebody finds the problem,'' he says, ``and somebody else understands it.

----

It's about bugs not being intractable - they aren't extremely hard to figure out, "the fix will be obvious to someone". That doesn't mean they never existed.

First of, to some other chap that called me a shill (fortunately, down modded), my disbelief about the many eyes is not new, nor is it related to FOSS versus closed source. Please see my posting history on/.

Have a read of the relevant sections of the oldest, most original CATB you can find. I think you'll see it says the same thing. You see, he was talking about the (then new) troubleshooting process that Linus had implemented.

The solution to the metafile bug didn't require deep meditation for ten years. If you don't know there is a bug, that doesn't mean it's buried deep, it just means you don't know there's a bug.

Of course to prevent bugs you need educated developers, good testing, etc. That's all true. An

To me, a huge value of FOSS is that the vendor doesn't have you by the balls. If you need something fixed or changed, you can hire any of millions of programmers to take care of that for you. It doesn't matter if the vendor has gone out of business, isn't interested, etc. - you're in control of your own systems.

This can be worth millions of dollars to a large business or government agency, because migrating to a different, competing system can cost that much if your current software doesn't fill your need.

The actual bug is shown in the original article. The author says "It appears the seven-year-old Xen bug is caused by an entanglement of C macros, bit masking, and Intel x86's fiddly page table flags" but fails to explain exactly what's going on (probably he doesn't understand it himself). Can some explain what actually happens in this line and what failure modes caused the check to be bypassed?

The fact that such a simple-looking line could result in such seriously flawed code tells me that programming sec

I dumped XEN for VMware last year and haven't looked back. The deciding factor (not to mention sliding market share, lack of compatible backup products, and weak tech support) was a VM simply 'disappeared' due to a faulty clean up process. The faulty process deleted the VM and support told me to call data restoration.. When I asked for the number, he said, "no, I mean your inhouse data restoration, your backup administrator"...
VMware has so much of the market every single virtual product or offering ju