Krebs on Security

In-depth security news and investigation

New Clues Draw Stronger Chinese Ties to ‘Aurora’ Attacks

A leading security researcher today published perhaps the best evidence yet showing a link between Chinese hackers and the sophisticated cyber intrusions at Google, Adobe and a slew of other top U.S. corporations late last year.

In mid-December, Google discovered that its networks had been breached by attackers who appeared by coming from China. A Wall Street Journal article cited researchers saying the attacks — dubbed Operation Aurora — were launched from six Internet addresses in Taiwan, which experts say is a common staging ground for Chinese espionage.

While Google itself has said that the attacks “originated in China,” experts have been quick to point out that attackers commonly route their communications through faraway computers, and that the real attackers may be located anywhere in the world. But new clues about the origins of the malicious software that was used to exploit the as-yet unpatched Internet Explorer vulnerability suggest that the exploit was in fact assembled by Chinese programmers

The evidence comes from forensic work published today by Joe Stewart, director of malware research for Atlanta based managed security firm SecureWorks. Stewart said he found that a snippet of the source code used in the backdoor Trojan horse program planted by the exploit (called “Hydraq” by various anti-virus companies) matched a source code sample that was detailed in a Chinese-language white paper on mathematical algorithms used in electronics.

Stewart said a Google search for one of the key text strings in that code sample shows that it is virtually unknown outside of China, and that almost every page with meaningful content concerning the algorithm is written in Chinese.

Stewart deduces that the Aurora code base originated with someone who is comfortable reading simplified Chinese.

“Although source code itself is not restrained by any particular human language or nationality, most programmers [tend] to reuse code documented in their native language. To do otherwise is to invite bugs and other unexpected problems that might arise from misunderstanding of the source code’s purpose and implementation as given by the code comments or documentation.”

He concludes that the use of this unique programming implementation in Hydraq “is evidence that someone from within the [People’s Republic of China] authored the Aurora codebase. And certainly, considering the scope, choice of targets and the overwhelming boldness of the attacks (in light of the harsh penalties we have seen handed out in communist China for other computer intrusion offenses), this creates speculation around whether the attacks could be state-sponsored.”

Ironically, if indeed the code was developed by Chinese hacking groups or the Chinese government and intended for use as a weapon against American companies, Chinese Windows users may have the most to lose from the public exploitation of this vulnerability.

That’s because massive numbers of Internet users in China still use Internet Explorer 6, the version of IE most at risk from this flaw. According to current figures gathered by StatCounter, nearly 60 percent of computer users in China browse the Web with IE6 (see chart below). By comparison, StatCounter states that only about six percent of U.S.-based Internet users still browse the Web with IE6.

This becomes even more significant when you consider that the Aurora exploit is now showing up on hugely popular Chinese Web sites, said Gary Warner, director of research in computer forensics at the University of Alabama, Birmingham. Warner shared evidence with krebsonsecurity.com that one of China’s most-visited anime sites was recently hacked and seeded with the Aurora exploit, serving those who visited with IE6 a Trojan that dropped at least 32 different malicious programs, including password stealers and tools used to enlist infected PCs in coordinated, distributed cyber attacks.

“Tens of thousands of people got hit by this, and the malware that got installed was just incredible,” Warner said. “There is just a lot of active exploitation going on in the Chinese market right now, and part of that is because there’s a much larger use of IE6 there than there is over in the United States.”

Microsoft said today that it plans to issue an emergency update on Thursday to address the Internet Explorer vulnerability. Krebsonsecurity.com will have more details on that update shortly after it is released, probably around 1 p.m. or 2 p.m. ET.

What fraction of computers are regularly updated with patches like what we will soon get for this one? I am guessing that it is actually a fairly small number, but I don’t have any evidence to back this statement up.

…Joe Stewart’s comment….”…Corporate and state secrets both have been shanghaied over a period of five or more years…” is so deliciously appropriate. Any readers unfamiliar with that term’s use can “Google” it…no humor intended. The Chinese have a wicked sense of humor, and some mainlanders earnestly reading his blog must be smiling very broadly.

CORRECTION — Asian Characters are authorized in the Windows program itself and when I tried to add Asian Characters to my Windows XP character sets, I was instructed to place the Windows CD into my drive BUT OH NO — it couldn’t find the necessary file.

With Firefox this isn’t necessary, provided you can generate the characters by ‘other means.’

Brucerealtor, I’m not quite sure what you mean by «generat[ing] the characters by ‘other means’» ; my experience is that generally speaking, CJK glyphs are enabled at the OS level, which usually works well on XP, but depending upon how the OS was configured, can fail. I remember trying to help a user who had purchased her version of the OS in Switzerland, with the default language set to German – despite all our best efforts we found ourselves unable to reset it to so similar a language as Swedish ! She was informed by the local office of the both the OEM and Microsoft that her only alternative was to purchase a new Swedish XP license and re-install. Needless to say, she chose to stay with her German-language version….

Good to see, Brian, that you recognise the significance of the disparity between browser-version market share in China on the one hand, and most of the rest of the world on the other ! My interpretation of this is that from the hither to available evidence that has been published publicly, one cannot conclude that Google was specifically targetted by Aurora, but that employees may have inadvertently downloaded a Trojan when visiting other sites like the animé site mentioned by Gary Warner in your blog above. I hope you will continue to pursue this matter further, so that it can be lifted from the realm of speculation to that of more substantial evidence….

Stewart may very well be right, but I’d like to clarify something. It’s a little technical, but bear with me. Stewart did not have access to the source code. He had the binaries. So this is what he did:

1. Disassembled binary to get the assembly code (the stuff that looks like “MOVZX ECX, BYTE PTR …”).

2. Decompiled the assembly code to reconstruct one possible set of source code (C language) programming statements that would compile down to that assembly code.

3. Searched the internet and “was able to locate one [and presumably only one] instance of source code that fully matched the structural code implementation in Hydraq and also produced the same output when given the same input”

That one instance was in the Chinese white paper and used the variable name “crc_ta” for the constants table. Note that, in general, when you decompile assembly code, you generally don’t get the variable names. You could change the name from “crc_ta” to “happy_fun_table” in the C source code, recompile it, and you’d get the exact same assembly code.

4. Searched Google for “crc_ta[16]” and only found Chinese hits.

It’s this last step, 4, that’s unconvincing. Since the variable name “crc_ta” is not in the assembly code, and only comes form that Chinese white paper, it’s not surprising that it’s only coming up on other Chinese pages. [It’s possible that the crc_ta name was in a debugging symbol table in the binary, but if so, you’d think Stewart would have mentioned it.]

Now I haven’t done a careful analysis, but the strength of his argument rests on step 3. Step 4. adds nothing, in my opinion.

Note that making use of Google’s code search, I was able to find at least one other very similar 16-constant versions of the CRC code. See, for example:

Note that here, the programmer has used the variable name “CrcTable” and not “crc_ta”. So if I repeat step Stewart’s step 4 and to a Google search on “CrcTable[16]” rather than “crc_ta[16]”, I’ll get different results (first hit is a Russian website in this case).

Again, I haven’t done a careful analysis, and there may be other clues in the assembly code that rule out this source code. But my main point is you can’t depend on the variable name as evidence.

@David Eisner
Joe Stewart connects the snippet of binary code in the Aurora attack that computes a CRC to the code from China because: 1) it is very unusual to have a table of just 16 constants. 2) the Chinese code compiles to match the binary snippet. He uses the name of the table only to see who else has used or referenced that piece of C source code. Given that, it still seems a bit tenuous, you would have to assume the same compiler was used along with similiar compiler options and who’s to say that someone else who coded the same algorithm wouldn’t see the same output but didn’t post the source on the internet. It’s certainly not a smoking gun but a piece of circumstantial evidence.

My take on it is that the Chinese C source is consistent with the CRC implementation revealed in the disassembled binary. But on first gloss, at least, there appears to be other, non-Chinese C source floating around that is also consistent with the assembly code (see link in my previous comment). Yes, Googling a variable name from the Chinese C source results in Chinese hits, but this isn’t surprising.

Not a perfect analogy, but: It would be like finding a delicious slice of chocolate cake on a park bench, doing a chemical analysis, and finding a German recipe that’s consistent with the makeup of the cake. You Google a word in the recipe — Zucker — and lo and behold, you find a bunch of German recipe websites with the same recipe. So you consider that additional evidence that the cake was made by a German. I’m not sure it is, only that a German recipe is likely to show up on other German websites. And because there appears to be a French recipe that’s also consistent with the chemical makeup of the cake, we don’t know which recipe was used.

I think I agree with you, but to clarify, I would accept it as circumstantial evidence if there was only published source that, using one of a likely set of compilers, produced an exact copy of the binary code used in the Aurora attack. One would assume that an attacker wouldn’t bother to change the code but just use it as is. If this particular source was only discussed or referenced in Chinese language sites than it might indicate a Chinese attack. Again, that would only be circumstantial, not proof in itself. An article today in The Register points out that one or more alogorithms using a 16 constant table certainly does exist in the non-chinese world and casts doubts that this particular binary code wasn’t likely to have been compiled from non-chinese source code. I would hope that Joe Stewart would take the time to see if the various non-chinese examples of source code will produce the same binary output. I think the Register greatly exaggerates by saying it had been a smoking gun but anyway here is the URLhttp://www.theregister.,co.uk/2010/01/26/aurora_attack_origins/

Joe Stewart contends, in addition to being an obscure algorithm (not true), it also has an optimization that’s uniquely Chinese.

However, upon further examination of his Google search on “crc_ta[16]”, it appears the code snip passed around in China are mostly an unoptimized veriation that relies on division to obtain top 4 bits:

da=((uchar)(crc/256))/16

While the 12-bit-shift code optimization fingered as “Chinese code”, has been demonstrated to exist as early as 1988, in the Novell Programmer’s Guide according to the Register article.