Several days ago, when clicking on one of the news links, a new IE window opened, showing the news article, and ultimately, downloading a trojan. Someone must have taken action, because as of this writing, the trojan is not downloaded anymore. And just to be clear: the trojan was not hosted on or linked to from the findmadeleine.com site.

It’s a known tactic of scammers to exploit the curiosity of the general public whenever there’s an important news event. I don’t think I can do something to help find Madeleine, but I’ll keep an eye on the news section to try to stop these scammers.

Monday 21 May 2007

In my previous post about steganography and rainbow tables, I explained a technique to hide data in a rainbow table. The disadvantage of this method is that there is a way, albeit costly, to detect the hidden data. This is because we replace the random bytes, that makeup the start of the chain, by the data we want to hide, thereby breaking the chain. A broken chain can be detected by recalculating the chain and comparing the recalculated hash with the stored hash. If they differ, the chain is broken.

But if we know that we are breaking chains, why don’t we fix them? We can proceed as follows:

replace the start of the chain (random bytes) with the data we want to hide

recalculate the chain

replace the hash of the chain with the new hash we calculated

This way, there are no more broken chains that give away our hidden secret. But now there is another telltale sign that the rainbow table has been modified to hide data: the hashes aren’t sorted anymore. Remember that a rainbow table has to be sorted (the sort key is the index of the hash) to be useful. It is very unlikely that our new hash is greater (or equal) than it’s predecessor and smaller (or equal) than its successor. Detecting an unsorted rainbow table is much easier than finding broken chains.

OK, so if the new rainbow table is unsorted, why don’t we just sort it again? Well, if we resort the rainbow table, we destroy the order in which we stored our hidden data, so we loose the hidden data itself.

You could keep the original order of the hidden data by creating an index, this is another file that indexes the chains with hidden data. For example, you could make a list of all the hashes with hidden data. This list will then allow you to retrieve all chains with hidden data in the correct order. And the fact that you have such a list of chains isn’t necessarily suspicious, it’s just a list of hashes you want to crack…

But there is a simple way out of the unsorted rainbow table problem. Rainbow tables generated with the rtgen program are unsorted. In fact, you have to sort them with the rtsort command after generating them, before they can be used by the rtcrack program. The solution is to adapt the rtgen program to generate a rainbow table with hidden data, and keep this unsorted rainbow table.

The arguments are a file handle to the file with data we want to hide, and the number of bytes per chain we use to hide data. We call the InjectHiddenData method in the rtgen program just after having generated random data (cwc.GenerateRandomIndex();, line 206 of file RainbowTableGenerate.cpp).

Our modified rtgen program allows us to generate an unsorted rainbow table with hidden data. The only way to detect this hidden data is with statistical analysis, provided that the hidden data doesn’t appear random. There are no broken chains that indicate hidden data, unlike with the previous method.
The disadvantage of this method is that you’ll have to generate a new rainbow table to hide your data, which is a lengthy process.

To extract the data file, use the same program as for the previous method, rtreveal

If you don’t feel comfortable using an unsorted rainbow table to hide data, I have probably two extra techniques for you.
One technique creates a sorted rainbow table without broken chains and it is fast. The disadvantage is that it stores much less hidden data. But you’ll have to wait a bit before I publish this technique. I’ve submitted an article about this steganographic technique to 2600 Magazine, and I can only release it after it gets published or refused.

The other technique also creates a sorted rainbow table without broken chains and it is fast, but I still have to work on it. It works, but it might be detectable. I’ll publish it when I’ve finished working on it.

Some persons have commented that I didn’t discount the click fraud factor. The reason why I didn’t is that the motivation of the persons who clicked on my ad doesn’t matter at all. If it’s a person clicking on a “malicious” ad to commit click fraud, the result is the same: the cybercriminal gets a shot at trying to infect his machine.

And if it’s a program instead of a person doing the click fraud, the result is also the same if it’s a Windows program using the MS IE ActiveX component. I’m waiting for feedback to try to quantify the amount of non-Windows automated click-fraud that could have impacted my Google Adwords campaign. I’ll post an update when I get said feedback.

Monday 7 May 2007

How do I know? Because I’ve been running this Google Adwords campaign for 6 months now.

Last fall, my attention got caught by a small book on Google Adwords at our local library. Turns out it’s very easy to setup an ad and manage the budget. You can start with a couple of euros per month. And that gave me an idea: this can be used with malicious intend. It’s a way to get a drive-by download site on the first page of a search result (FYI, I’ve reported on other ways to achieve this). So I started an experiment…

I bought the drive-by-download.info domain. .info domains are notorious for malware hosting.

I setup a web server to display a simple page saying “Thank you for your visit!” and to log each request. That’s all. I want to be absolutely clear about this: no malware or other scripts/code were ever hosted on this server. No PCs were harmed in this experiment.

I started a Google Adwords campaign with several combinations of the words “drive by download” and the aforementioned ad, linking to drive-by-download.info

I was patient for 6 months

During this period, my ad was displayed 259,723 times and clicked on 409 times. That’s a click-through-rate of 0.16%. My Google Adwords campaign cost me only €17 ($23). That’s €0.04 ($0.06) per click or per potentially compromised machine. 98% of the machines ran Windows (according to the User Agent string).

In a previous post on spamdexing , I reported 6,988 click-throughs to malicious websites over a 3 month period. That’s 2,329 click-throughs per month, compared to my 68 click-throughs per month. The Spamdexing “R” Us operation was much more successful than my little experiment, but at a greater cost (they ran a bunch of dedicated web servers). I’m sure I could get much more traffic with a higher Google Adwords budget and a better designed ad.

This is how my ad looks on a search result page:

I designed my ad to make it suspect, but even then it was accepted by Google without problem and I got no complaints to date. And many users clicked on it. Now you may think that they were all stupid Windows users, but there is no way to know what motivated them to click on my ad. I did not submit them to an IQ-test 😉

Recently there have been several stories in the press pointing out that this technique is used “in the wild”. That’s why I’m publishing my results now, but my experiment is still running. Of course, the nature of the experiment has changed now that I have revealed it, but it could still turn out to be interesting.

You can find a video of Google showing my ad here hosted on YouTube, and you can find a hires version (XviD) here. Not the best quality, but I wanted to show off my new Nokia N800.