The Threat From Bad Malware Analysis

The leak that the U.S. was apparently behind the Stuxnet worm that impacted the Nataz nuclear enrichment facility in Iran came – as it always does – with a rash of analysis and opinion related to the hazards associated with “sophisticated” malware and “cyber weapons.” But it is a reliance on subjective factors and the amateur musings of technicians that poses a greater threat to security worldwide than any given piece of code.

Consider the much ballyhooed issue of authorship. There is no objective factor that can be used to determine if a given piece of code was a collaborative work, or if one (or more) person(s) copied and pasted the code of others. Even if you could make that determination definitively, what insight does that provide? Focusing on the written language a particular piece of malware is compiled in is equally useless: How many different countries is the English language spoken? Chinese? Russian? Spanish? Who says you have to compile in your own language?

What factors would you attribute to code – any code – that was “advanced?” Length? Number of calculations it has to perform? Language? Is Microsoft Excel a more advanced program than Word? Than an anti-virus product? Than Angry Birds? Calling something “advanced” or “sophisticated” is easy when there is no widely accepted definition. Labeling Stuxnet as sophisticated because it focuses on non-commodity hardware that meets certain specific conditions to trigger is fine, but Flame does none of those things yet it is given the same label.

Who has the ability to build “weapons-grade” malcode? Arguing that such capability is the exclusive domain of nation-states completely ignores the democratizing effects of computer technology. Digital weapons are not nuclear weapons; they require no special resources, hard-to-acquire materials, or knowledge that would or could bring you to the attention of authorities. The only barrier between the malicious equivalent of “Hello, World” and Flame is practice.

The further one goes down the attribution trail the worse the analysis problem becomes. Malware authors that could put together Stuxnet or Duqu know that at some point, someone is going to identify and tear apart their work. They are also smart enough to know what reverse engineering is capable of extracting from compiled code. Consequently, they can ensure that a reverse engineer will see specific data points the author wants to be seen, like, say words in Hebrew.

The state of malware analysis today is not dissimilar to other technical fields where science and art mix. Such alchemy is generally reliable for the pedestrian and mundane, but increasingly dubious if not outright dangerous the closer you get to the edge. Consider the practice of arson investigations, whose practitioners use NFPA 921. No manual that deals with such a topic is ever going to be truly comprehensive or definitive given the myriad variables that come into play. Add in a culture of “everybody knows” thinking and you have Texas v. Cameron Todd Willingham.

Political-military decision-makers know they need to pay attention to malware and computer security issues in general, but the analytic rigor associated with malware analysis simply isn’t there when compared to more traditional issues. When faced with a lack of sound data a decision-maker will defer a decision, or make a middling one, which is exactly what our adversaries want. Maintenance of the status quo helps ensure the pwnage can continue unabated.

Solutions to this problem are ready and realistic, and begin with an insistence of focusing on verifiable facts. There is nothing wrong with using data points like written language or differences in coding style to inform a larger analytic effort, as long as those data points are properly weighted and it is understood that absent corroboration they could be completely meaningless.

We need to develop specific meanings for the language we use in assessments so as to ensure that we are communicating with precision. “Advanced” currently has no definition; “possibly” and “probably” can mean different things to different people. While assigning numeric values to inherently non-mathematical attributes can be problematic, it has potential to reduce ambiguity (e.g. “we may be attacked at dawn” vs. “there is a 75% chance we’ll be attacked at dawn”) and perhaps lead to better decision-making.

Finally, we need to develop, agree upon, and promulgate objective measures that can be applied to the testing and evaluation of malware. Every lab analyzing a given piece of malware around the world should be able to replicate the results of every other lab, and deviations can be identified and disseminated. Anything less is subjective art not objective science.

Michael Tanji

Michael Tanji spent nearly 20 years in the US intelligence community. Trained in both SIGINT and HUMINT disciplines he has worked at the Defense Intelligence Agency, the National Security Agency, and the National Reconnaissance Office. At various points in his career he served as an expert in information warfare, computer network operations, computer forensics, and indications and warning. A veteran of the US Army, Michael has served in both strategic and tactical assignments in the Pacific Theater, the Balkans, and the Middle East.