Thursday, June 28, 2007

We're ready for the Ptacek's challenge!

Thomas Ptacek and company just came up with this funny challenge to test our Blue Pill rootkit. And, needles to say, the Invisible Things Lab team is ready to take their challenge, however with some additional requirements, that would assure the fairness of the contest.

First, we believe that 2 machines are definitely not enough, because the chance of correct guess, using a completely random (read: unreliable) detection method is 50%. Thus we think that the reasonable number is 5 machines. Each of them could be in a state 0 or 1 (i.e. infected or not). On each of this machines we install two files: bluepill.exe and bluepill.sys

The .sys file is digitally signed, so it loads without any problem (we could use one of our methods for loading unsigned code on vista that we're planning to demonstrate at BH, but this is not part of the challenge, so we will use the official way).

The bluepill.exe takes one argument which is 0 or 1. If it's 1 it loads the driver and infects the machines. If it's 0 it also loads the driver, but the driver does not infect the machine.

So, on each of the 5 machines we run bluepill.exe with randomly chosen argument, being 0 or 1. We make sure that at least one machine is not infected and that at least one machine is infected.

After that the detection team runs their detector.exe executable on each machine. This program can not take any arguments and must return only one value: 0 or 1. It must act autonomously -- no human assistance when interpreting the results.

The goal of the detection team is to correctly mark each machine as either being infected (1) or not (0). The chance of a blind guess is:

1/(2^5-2) = 3%

The detector can not cause system crash or halt the machine -- if it does they lose. The detector can not consume significant amount of CPU time (say > 90%) for more then, say 1 sec. If it does, then it's considered disturbing for the user and thus unpractical.

The source code of our rootkit as well as the detector should be provided to the judges at the beginning of the contests. The judges will compile the rootkit and the detector and will copy the resulting binaries to all test machines.

After the completion of the contest, regardless of who wins, the sources for both the rootkit and the detector will be published in the Internet -- for educational purpose to allow others to research this subject.

Our current Blue Pill has been in the development for only about 2 months (please note that we do not have rights to use the previous version developed for COSEINC) and it is more of a prototype, with primary use for our training in Vegas, rather then a "commercial grade rootkit". Obviously we will be discussing all the limitations of this prototype during our training. We believe that we would need about 6 months full-time work by 2 people to turn it into such a commercial grade creature that would win the contest described above. We're ready to do this, but we expect that somebody compensate us for the time spent on this work. We would expect an industry standard fee for this work, which we estimate to be $200 USD per hour per person.

If Thomas Ptacek and his colleges are so certain that they found a panacea for virtualization based malware, then I'm sure that they will be able to find sponsors willing to financially support this challenge.

So Joanna, if you think it'll take 12 man (or woman) months to get to the point of winning such a contest, is it safe to say you don't think it's going to happen? I suppose definitely not in time for BH.

The reactions on the side of Matasano Chargen are pathetic (http://www.matasano.com/log/897/joannas-shocking-confession-there-exists-some-amount-of-money-for-which-i-would-agree-to-see-bluepill-detected-by-lawson-ferrie-dai-zovi-and-ptacek/).

It's like they are trying to get promotion on your name. We don't all feel like in a movie, we all feel like in a stupid tabloid.

Joanna I suggest you demand fewer, not more systems. If they are so confident that they can detect the presense of the rootkit, let them do so in a vacuum. That is, only one machine will be used. At the begining of the contest, you flip a coin: heads, install rootkit; tails, drink coffee for an hour. At the end of the hour, they run their code and declare the machine infected or rootkit free.My reasoning is this: Very few people in the "real world" have access to two identical systems during a rootkit infection crisis to compare behavior. If they insist on more than one system, then make certain that they have different processors, varying memory speeds, varying disk drives so that no system to system comparisons can be reasonably made.

Well KTMM, sounds like they have a detector that they think will work against a rootkit which Joanna has previously said was 100% undetectable. They're enjoying the fact that this is no longer being claimed, for sure, but this industry isn't known for polite handshakes and pleasantly amicable discussions over cups of tea.

We're not comparing between the two machines. If Joanna wants to use 1 machine, that's fine. If she wants to use 5, that's fine. The number of machines in the challenge needn't have anything to do with its statistical validity.

Well, Joanna, you empower conditions for contest... But all your (and your team, of course) additional requirements are "one-side", doesn't you think ? For "fair play" I suggest establish some prize for Ptacek's team in case of successful detection of "blue pill" (they also will did some work, isn't ?).Are you ready for fair contest with equivalent requirements and benefits, Joanna?

To be honest, even if they do detect it, the test environment is much friendlier than the real world. Thus their techniques must be based on something "solid" as opposed to timing and the related because this will be worthless outside the lab and they might as well lose. I have spoken with a source who knows their stuff, this is the real deal. They have said upon release of the source code that it will almost certainly be the cause of significant security problems. Nice job on the *ware and thanks for the inspiration.

I think that if I had a threat like that on my PC, I would not bother about my machine crashing sometimes in the whole checking-process and using all the processor as far the detection tooldetects it. Fair price to pay for removing a bicho like that!.

The requirement of not consuming 90% of CPU for more than 1 second because it would be disturbing to the user isn't consistent with the resource demands of any traditional anti-malware (or general applications for that matter). I think what you really mean is that it would be disturbing to bluepill ;-)

Joanna, I also agree with the post above about the CPU usage and time constrains, it is really few and if the detector searches for the blue pill on an on-demand basis I think the user shouldn't get disturbed, after all he/she requested the scanning, so say, 10 seconds shouldn't matter at all. I agree however if the scanning is active, in such case it shouldn't waste almost all the CPU time just for testing.

devnull, remember that it is supposed that you don't know before hand that you are infected, so crashing the system to perform this task is not acceptable in any way.(even if it chash due to a vulnerability in blue pill because there could be another non-malware driver with the same or similar buggy code).

I know this is a late comment so it might not even be seen, but wouldn't it be fair also to have at least 2 of those machine (or a certain percentage of the runs if you just use one machine) to sometimes run a normal VMM (legit), like VPC and VMWare (perferrably a not so popular one as well. That you disclose on that day only)?

I don't know if Vista has VMM bit enabled by default, I believe windows 2003 R2 does, but you would still want another kind of hypervisor in there to add some challenge, no?

So that the detection tool not only has to detect a VMM because that would likely not be a challenge especially if they provide controlled hardware. But really has to detect bluepill in particular.

Maybe I am missing something obvious and all this isn't really adding to the challenge, I usually do =)

I would also require 2 runs (of 5 machines) or 10 runs with one machine without a single false detection, 3% is still quite likely, people do win the lottery =)

Also, even though Thomas added that 1 or 5 machine doesn't matter, and given that he is the detector and not cheater (thus amusing he is more honest), it might not matter. But if you want to cheat then having them networked and identical surely adds possibilities =)

Sure it doesn't have anything to do with the statistical validity, but it surely adds to the detector's available statistical/heuristical methods it can use =).