Computers, often from a low-level systems perspective. Note that I speak for myself, not my employer.

Monday, July 02, 2007

BluePill detection in two easy steps.

In our HotOS submission, we alluded to a large number of theoretically possible methods of detecting the presence of a hypervisor. I'm not sure which of these many ways the well-publicised BluePill challengers have chosen, but I thought I'd make things a bit more concrete.

In our paper, we outline family of detection methods based on resource consumption. In short, the hypervisor must use cache, memory bandwidth, TLB entries, etc., in the course of multiplexing the CPU. A guest can be made intentionally sensitive to these resources in order to detect a hypervisor. For instance, you can obtain a good estimate of the effective size of the hardware TLB with psuedo-code like the following:

OK, so you can size the TLB. So what? Well, just run the above loop twice, but the second time, stick an instruction that's guaranteed to trap out to the hypervisor (e.g., CPUID) before the loop. On bare hardware, this inocuous instruction will have no impact on the TLBs. In the presence of a hypervisor, the instruction will either flush the TLB entirely (on current Intel VT implementations), or, if you're on a clever SVM hypervisor that uses the ASID facility to avoid TLB flushes on every hypervisor invocation, at least 1 fewer TLB entry. (Every hypervisor must at least read the VMCB's exit code, requiring 1 data TLB entry. For increased sensitivity, you could also detect the VMM's code footprint by rejiggering the above to measure ITLB entries, an exercise left to the reader. Hint: the return instruction is a single byte on x86.)

This detection technique uses only TLB capacity, but a more sophisticated approach could easily include the hardware eviction policy. We could also apply these same technique to instruction and data caches. (Even the x86's coherent data caches can be sized using the INVD instruction, which discards cache contents without writing back dirty data.) A less naive approach that uses the hardware's event-based performance monitoring support (RDPMC and friends) would have an even richer menu of options for noticing virtualization-induced perturbations.

I've seen zero evidence that Rutkowska has pondered resource-based detection attempts like this, or indeed, any attacks more sophisticated than a "go-slow" loop between reads of the PIT. It is hard for me to imagine a "hypervisor" worthy of the name that doesn't leave noticable traces in resource usage. In fact, to the degree that a hypervisor goes to heroic lengths to prevent such a detection technique, e.g., by running a hardware-accurate cache simulator on every guest memory access, it will only open up wider timing discrepancies for the guest's HV-detector to exploit.

I can only conclude that in 2006 Rutkowska was either naive about the possibilities of such attacks, or that she consciously chose to make an outrageous and indefensible claim ("undetectable malware!!!!111") in order to draw attention to herself and her company. Given the peripheral evidence of Rutkowska's competence, I think the evidence favors the latter, but I'd simply love to hear from her on the subject...

12 Comments:

Nice description of this method. Yes, it's one we came up with but luckily not the only one. You could probably see the foreshadowing based on Peter Ferrie's focus last year. I noted a quick reply here.

Hopefully this friendly competition will result in a whole catalog of ways to detect hypervisor rootkits.

Wow, thanks for bringing that to my attention. Their discussion of my technique starts on slide 89. While they strongly imply it, they never come out and say that my technique is incomplete, because, as far as I can tell, they realize that it is.

Their argument appears to hinge on this hand-wave on slide 93: "We need to allocate 512 4k-pages at quasi-fixed virtual addresses – this is tricky!"

Uhh. No, it isn't. In fact, any adjacent 2MB chunk of virtual memory will do it. They go on to make it sound as though the attack is conceptually broken just because the pseudocode I present was simplified. Sure, the hardware may have a few TLB entries that go undetected by my loop; but, the loop will *always* show fewer TLB entries. They admit to this after a few more bullet points, and then fall back to a claim that any attempt to modify lots of consecutive PTEs will trigger their magical "detector detector" which they call "Blue Chicken."

"Blue chicken" is a layer of software that attempts to detect hypervisor detectors. Confused? Good. That's what the blue-pillista snake oil salesmen are counting on. They're perfectly aware that detecting any VMM detector is easily reduced to the halting problem. So, they've admitted that their original claim ("undetectable malware!") is false. What they now have is a much, much harder to engineer malware environment, that might be somewhat harder to detect.

Good point, actually by creating Blue Chicken they are admitting to the fact that Blue pill is detectable. Any rootkit could have a built in detector of detectors, and leave a much smaller footprint then a VMM in my opinion.

Although, if most of the Blue pill detectors that come out are open source, there is nothing stopping the attacker from modifying the Blue Chicken to include methods of detecting the new methods of detection. It can go on and on.. After all, in the end, it appears nothing will be completely undetectable.

Right. As Tal likes to say, "it degenerates into core wars." Another aspect of the "blue chicken" is that it introduces an opportunity for the system under attack to insert its own prophylactic hypervisor. Since the VT/SVM hardware doesn't nest unless you carefully construct your hypervisor to allow it, this is pretty much foolproof. So: 1. Run your detector. If it says there's a hypervisor there, and you didn't load it, something's amiss, bluescreen. 2. Your hypervisor detector didn't detect anything. Great! Either nothing's there, or blue pill is in "blue chicken" mode. Either way, load a prophylactic hypervisor. End of blue pill's reign of terror.

I prototyped it before we wrote it up for the paper, and it seemed to work (provided different answers for native hardware, the software-only VMM, and the hardware-assisted VMM). However, the code is not ready for standalone use; it relies on a library we link against for such baremetal tests, and we haven't released the library. Honestly, there's very little of value beyond the pseudocode I've listed above; you could make a Linux kernel driver that does what that code does with interrupts disabled. A few hours with the Intel manuals will make it obvious how to turn it into compile-able C.

I have read the pseudo code in this article. And i think in the 2nd loop, the i should go descending, but not ascending. Because after the 1st loop, since BIGNUM is big enough, the front pages are not in TLB, while the ones at end are in TLB.

There is huge talk about detecting this particular hypervisor rootkit. My question is that during some of the reading I've done, I've noticed there is very little on persistance.

Why not just reboot the system?

At this point I would like to note that I am pretty much a newb to this level of rootkit detection. I'm decent at looking at kernel level, but I see a very dark horizon if the hypervisor level rootkit is completed in a way that is stable and persistant. If it then is able to build an anti-anti-rookit method that will thus block detection it wouldn't be something to sneeze at.I'm curious about this and look forward to learning more.

First off, I'm not an expert or anything in this area of computing...but couldn't the hypervisor just copy the TLB and return a spoofed amount when it's called again? And then spoof the amount of TLB space left to prevent any buffer overflowing...maybe track all the variables that read it to to prevent detection of a change in the TLB, might take some research but for any way to detect it there should be a way to prevent detection