We need to exploit the science of order and disorder to protect networks against coming generations of superworms

Internet security professionals are, by occupational temperament, a pretty nervous bunch. But lately they've had more reason than ever to be jumpy. Early this year, a new kind of worm, known as Storm, began to sweep through the Internet. It hasn't received much attention in the mainstream press, but it has given security professionals more than a few sleepless nights. Storm is far more sophisticated than previous worms, because it uses peer-to-peer technologies and other novel techniques to evade detection and to spread. The popular press hasn't paid much attention to Storm, because it has yet to wreak devastating havoc on businesses, as some previous worms have. But we shouldn't be fooled by that relative quiet: Storm's designers appear to be biding their time, building an attack network far more disruptive than any before seen.

Storm methodically infiltrates computers with dormant code that could be used to take down the entire network of a corporation, creating opportunities for blackmail or for profiting by selling the company's stock short. And Storm's creators, whoever they are, continue to modify and refine their malevolent progeny even as it already stands as a dark cloud poised over the Internet.

Network security software products on the market today offer only limited defense. They use firewalls, which simply block access to unauthorized users, and software patches, which can be created only after a worm or virus's unique bit pattern is discerned. By the time this laborious process of hand coding is complete, the infestation has had hours and hours to spread, mutate, or be modified by its creators.

A new kind of answer is needed. Network security researchers--including ones at our company, Narus, in Mountain View, Calif.--are developing software that can rapidly detect a wide variety of intrusions from worms, viruses, and other attacks without the high rate of false alarms that plagues many conventional Internet security products. These new programs can detect anomalous network behavior in seconds, as opposed to hours or days--even on so-called backbone networks running at 10 billion bits per second. That means the software is fast enough to block threats that can span the globe in minutes, a rate that far outpaces what a firewall can monitor.

This new generation of algorithms is based on concepts related to the thermodynamic concept of entropy. Often defined briefly as a measure of the disorder of a system, entropy as a cornerstone of thermodynamic theory goes back more than a century and a half. But as a construct of information theory it is only 60 years old, and its application to data communications began only in the last decade or so.

In essence, an entropy-based defense works because a worm's malicious activity changes, in subtle but unavoidable ways, the character of the flow of data on a network. Those data flow changes alter, in clearly measurable ways, the entropy of the network--a measure of the endlessly shifting ebb and flow between the predictability and randomness of the movement of data on the network.

Researchers at Intel, Microsoft, Boston University, and the University of Massachusetts are among those plumbing the mysteries of randomness and order in data flows to get a leg up on network attackers. Although ours is the only company we know of whose commercial products apply entropy to network security, we are confident that the approach will find much wider favor in the next few years.

We'll have lots more to say about entropy and how algorithms that measure changes to the order and disorder of a network can detect a worm outbreak long before traditional methods can. But to get a grip on those algorithms, first consider how viruses and worms attack.

Virus or worm? Security experts distinguish between them, but their differences are less important than their similarities. Either can render computers on a network unstable, and in many cases unusable. A virus is a program that can copy itself and infect a computer without the knowledge of the user. It can, and often does, damage a computer's files or the hardware itself. A worm is, similarly, a self-replicating computer program that uses a network to send copies of itself from one computer, which we will call a ”host” of the infection, to other computers on the network. Worms usually harm the network, if only by consuming bandwidth.

Illustration: Harry Campbell

An e-mail worm, the most common kind, spreads slowly, because users have to click on an attachment to become infected or to propagate the worm. Storm is one example; it uses a variety of means to get installed on a host, but the most common one is the e-mail attachment. Not all worms spread by e-mail; in 2004, an infamous worm called Sasser instead exploited a Microsoft Windows network vulnerability, instructing infected systems to download a viral code and then execute it. Such an infestation can spread very quickly indeed. Although there has not been a catastrophic worm since Sasser, network security systems still have to be on guard against this sort of attack, because we never know when the next one will swoop down on us.

Faced with attacks that could occur too quickly for their firewalls to cope with, companies and governments are now depending on Internet and other service providers from which they buy their communications bandwidth to ”clean the traffic” before it ever reaches their front doors. The world's largest carriers, such as AT&T, BT, Korea Telecom, NTT, and Verizon, strive mightily to do that. They are the backbone of the Internet; they carry most of the world's traffic every day. Yet their unique position, that of owning the largest, most complex networks in the world, also makes screening this traffic no easy feat--for two reasons.

First, these global networks have hundreds of entrances and exits. BT Global Services, for example, operates in 170 different countries around the world, connecting to hundreds or thousands of large corporations and service providers in each one. Yet firewalls and other security technologies are designed to protect a single ”link” or connection to the Internet--the point at which an organization's wide area network exchanges its data with the carrier. Second, firewall devices are designed to operate at the speeds of corporate networks, not backbone networks of the sort operated by AT&T, NTT, and so on. Corporate networks generally operate at speeds below 1 gigabit per second. Commercial firewall products designed for them simply cannot protect networks containing thousands of links that operate at core speeds 10 to hundreds of times that fast.

Using principles of entropy to protect a network begins with knowing a great deal about how traffic moves around that network, from hour to hour and minute to minute. Network security systems, including ours, operate inside the data center of a large Internet service provider or carrier. They run on standard off-the-shelf servers from, say, Dell or IBM, and collect data about traffic from a variety of key locations, called nodes, on the network. To collect these data, the carrier has to properly configure its network routers, the servers that direct data traffic throughout the network. The routers must be configured to send ”streams” of traffic statistics, a capability that is built into them. These data provide detail about traffic features such as the source and destination Internet Protocol (IP) addresses of packets in the traffic, the source and destination port numbers, the type of protocol, the number of bytes per packet, and the time elapsed between packets.

It is around traffic features such as these that our entropy algorithms first build a profile of the network's normal behavior. This profile serves as a baseline in detecting anomalies. Our system also collects other data from the network's routers that provide detail about how hard the routers themselves are working--data such as CPU and memory usage--and some additional detail about the volume of traffic on each of the router's interfaces to the network. We then correlate all of these router statistics to verify anomalies detected by our algorithms, identify their root cause, and even suggest mitigating actions to cleanse the traffic.

Today, the big carriers collect some of the same data, but by and large they rely on ”behavior-based” systems to protect their backbone networks. These systems are based on algorithms that focus primarily on changes in the volume of traffic at specific points on the network, ones where large companies and Internet service providers connect to the carrier. For example, during a denial-of-service attack, traffic between a single source (the attacker's computer) and a single destination (the victim's Internet servers) surges precipitously, reflecting an attempt to flood that destination and cut it off from users. In order to maximize the mayhem, attackers spread out their attacks by hijacking unprotected machines on the Internet and planting code that recruits them as ”zombies” (or ”bots,” short for robots). These computers in effect form armies (”botnets”) that number in the tens of thousands and can be orchestrated to launch attacks that emanate from multiple sources. Known as distributed denial-of-service attacks, these actions concentrate the damage into a period lasting minutes or even seconds rather than hours.

Traditional behavior-based systems detect such a sudden increase in traffic volume at the customer link and promptly alert the operator. Of course, this is just the start of an extended cat-and-mouse game. Attackers then devise new and clever methods to fly under the radar and avoid detection. Some intruders, for example, strive to consume the resources of a Web server located within the victim's network. They don't need to flood the server with traffic. Instead, they simply identify the Web pages that are the biggest drain on memory and CPU time--ones containing video clips, for example--and coordinate their armies to request frequent access to those pages only.

Graph: Bryan Christie Design

FINGERING THE CULPRIT

Examples of network fingerprints during well-behaved traffic (distinct traffic-feature distributions in yellow) and during a worm attack (distinct traffic-feature distributions in orange). Note the changes in shape of the distributions during malicious activity. The spikes in graphs A, C, and D show a change in network entropy, as does the flattening of the expected high curve in B.

In such a case, the overall volume of traffic into the network looks normal, yet the attack is effective because it degrades or even chokes off service. Similarly, a flood of spam messages, for example, can overwhelm a mail server. It might seem like such a flood would unavoidably trigger a detection system. But that's not always the case. The load on the server depends on the number of messages, not the quantity of data, which is what the detection system is measuring. If a spam attack is written so that each spam message consists of only a few data packets, then the overall traffic never rises to the threshold level. Internet telephony spam can similarly clog a network.

Many network operators have responded to these sorts of attacks by lowering the threshold of their behavior-based systems in an attempt to detect more subtle changes in traffic volume. This threshold change, however, tends to create false positives, in which the system often mistakenly takes nonmalicious fluctuations in the volume of traffic to be an attack. Such fluctuations are common; think of the flood of traffic that ensues when a Web page on a site with modest traffic is cited on a popular bulletin-board site such as Slashdot or Digg. The problem is that such false positives prevent operators from trusting the system, forcing slow and expensive case-by-case human intervention.

To avoid false positives, security software needs to monitor Internet traffic across the entire network, as opposed to a single link at a single time, and then correlate all the events it detects. Only then can a model of the traffic behavior on the entire network be created, allowing security algorithms to focus on the structure and composition of the traffic and not just its volume.

In other words, a security system must monitor the actual entropy of the network itself.

In thermodynamics, entropy refers to changes in the status quo of a physical system--a cup of ice water, the gas in a balloon, a solar system. It is a measure of ”molecular disorder.” In 1877, Ludwig Boltzmann visualized a probabilistic way to measure the entropy of an ensemble of gas molecules. Boltzmann showed that the ensemble's entropy was proportional to the number of microscopic states such a gas could occupy. More precisely, entropy is a function of k log p, where k is a constant and p is the probability of a given configuration of molecules.

What exactly is a configuration of molecules? Consider the temperature of air, which is determined by the average speed at which its molecules are moving. The temperature of a room might be 20 °C, but some molecules will be moving very quickly, for example in the sudden draft when a door opens or in the vicinity of a hot burner on a stove. Entropy reflects the amount of uncertainty about which exact molecules are moving at what speed.

For a given set of macroscopic quantities, such as temperature and volume, entropy measures the degree to which the probability of the system--in this case, the air in the room--is spread out over different possible states. To take a much simpler example, if you roll a pair of dice, there are 11 different outcomes, some more likely than others. The complete array of possibilities and probabilities--only one way to get a 2, for example, but five chances of a 6 and six for a 7--is a probability distribution. Similarly, each gas molecule in that room has a number of different possible locations and speeds, just as the two dice each have six possible values.

For the entropy of the distribution of possible outcomes of a single die, each possible outcome has the same probability (1/6), so the distribution is flat. In this case there is nothing we can predict about the outcomes in the distribution. They are completely random, and the entropy of the distribution is very high--at its maximum, in fact. In the case of two dice, on the other hand, there are several possible combinations or outcomes that have a higher probability than others. The probability of a 7 is much higher than that of an 11, for example. So if you roll two dice 25 times, the results will be less random than if you rolled one die 25 times. Another way of putting this is that the two-dice system has less entropy than the one-die system. We can guess more reliably about specific outcomes.

That is the principle behind our entropy algorithms. Malicious network anomalies are created by humans, so they must affect the natural ”randomness” or entropy that normal traffic has when left to its own devices. Detecting these shifts in entropy in turn detects anomalous traffic.

Getting back to the gas example, the array of all possible locations and speeds creates a probability distribution for the gas. Because entropy theory is really designed to describe the configuration of a system based on a series of outcome probabilities, we can relate high or low entropy to the high or low probability of an outcome. So there's a rough equivalence between thermodynamic entropy, understood as the probability that the molecules in a gas are in a predicted state, and the amount of information we have about a system.

Graph: Bryan Christie Design

FINGERING THE CULPRIT

Examples of network fingerprints during well-behaved traffic (distinct traffic-feature distributions in yellow) and during a worm attack (distinct traffic-feature distributions in orange). Note the changes in shape of the distributions during malicious activity. The spikes in graphs A, C, and D show a change in network entropy, as does the flattening of the expected high curve in B.

Information entropy was originally conceived by Claude Shannon in 1948 to study the amount of information in a transmitted message. If the two states of a digital signal, 0 and 1, have exactly the same probability of appearing in the signal, then our uncertainty about which bit we will receive next is maximized--like throwing a single die that has only two sides. On the other hand, if the 1 has a higher probability of appearing, then there is slightly less uncertainty about what the next bit will be. That is, if the next bit has a greater chance of being a 1, entropy is reduced. When the information entropy is low, we are less ignorant of the details of the digital communication signal being transmitted.

Much the same can be said about traffic patterns on the Internet. More specifically, an enormous amount of information can be gleaned by observing traffic flows on a data network. If we observe enough of them, we can come up with historical averages for inbound and outbound data packets, noting such key features as which Internet addresses the network receives packets from and which ones it sends packets to. We can also note how many packets are sent in accord with which Internet protocols at various times of the day and the overall traffic volume. At any given time, the probability distributions of the flow of traffic through the network will be characterized by distinct curves [see graphs, ”Fingering the Culprit”]. In fact, the shape of the curve shows the entropy of the system. If the shape of the curve is uniform, then entropy is high. If there's a spike, then a low-probability event has occurred, and the entropy is correspondingly low.

Internet traffic is dynamic and constantly evolving. Nevertheless, over the course of, say, a year, some consistent patterns emerge. These patterns are driven mainly by the mixture of applications generating the traffic, such as Web surfing, e-mail, music downloading, or Internet telephony, though seasonal and geographical factors also affect them. The first step in using these patterns to spot anomalous activity is to develop a probability distribution for each of the characteristics. When these distributions are taken together, they uniquely profile the traffic and create a ”fingerprint” of the network under consideration and what we might call its internal state--the sum total of these network characteristics.

If we have monitored and measured a system long enough, we know which internal states are associated with well-behaved Internet traffic. Any malicious activity introduced into the network alters the nature of the Internet traffic, because it has a designed, premeditated outcome that is different from any of the network's normal states. Even if an attack came in the form of an activity that fits within network norms--say, downloading a number of music files--the fingerprint of the network would look unusual, because it would differ in some way from the network's established patterns of usage, if not in terms of volume, then time of day, source, or some combination of those or other characteristics.

Paradoxically, Internet traffic has features of both randomness and structure, and a worm, for example, will alter both, making the traffic appear in some respects more uniform or structured than normal Internet traffic, while appearing more random in others.

Paradoxically, Internet traffic has features of both randomness and structure, and a worm will alter both

Packets flowing into a server seem to come from random locations. For example, requests for Web pages typically come from surfers all over the Internet. More will come from some people and networks than from others, to be sure, but a graph of them will normally be a fairly uniform curve. If a worm is loose on the Internet, however, and the packet flows from infected hosts grow to be a significant part of the set of total traffic flows, then the addresses of those hosts will show up disproportionately in any distribution graph--indicating how many flows have come from a given source.

During a worm infestation, hosts that have been maliciously co-opted connect to many other hosts in a short period. The number of open connections from infected hosts become dominant, and entropy decreases. Similarly, the target IP addresses seen in packet flows will be much more random than in normal traffic. That is, the distribution of destination IP addresses will be more dispersed, resulting in higher network entropy.

Most malicious attacks tend to seek out and exploit certain vulnerabilities in the implementation of an Internet protocol. Two of the most important of these are the Hypertext Transfer Protocol, HTTP, which downloads Web pages, and the Simple Mail Transport Protocol (SMTP), for sending e-mail. Besides the protocol, specific operating-system ports are used to send and receive traffic. We can think of the protocol as a means of transit, such as an ocean freighter or a yacht, while the port (as the name suggests) terminates the data's journey at the computer's equivalent of a berth number at a marina.

Fingerprinting is also possible at the port level. An attacker can scan for a specific vulnerability by sending packets looking to see whether they are received and what response they get; these scans often have to go to a specific target destination port. If the traffic that results from this scanning becomes a significant component of the overall network traffic, then this will create an unusual fingerprint. Lastly, the flow size--the number of packets in the flow--of the malicious worm activity will become more dominant and will alter the distribution of flow size observed during a normal network operation.

The Sasser worm, one of the largest and best-studied infestations in Internet history, is an ideal example of this port-specific approach. It began by scanning the computers on whatever network it had infiltrated. Whenever a connection was made, the worm sent a piece of code. The goal of the code was to cause the infected host computer to accept commands on TCP port 9996. Sasser then created a small program named cmd.ftp on the host computer, which then executed it. The ”ftp” in this script's name stands for the File Transport Protocol. The FTP script instructed the victim machine to download and execute the worm from the infected host without human intervention. The infected host accepted this FTP traffic on still another port. To spread itself even faster, Sasser spawned multiple threads, finding and capturing as many vulnerable computers within an organization's network as possible.

Each of Sasser's activities created a unique network fingerprint. Information entropy can capture the dynamics of such fingerprints by extracting any sudden change in the shape of the distributions constituting that fingerprint. There is little that the attacker can do to control the information entropy associated with the fingerprint and thereby conceal the attack.

The Sasser worm significantly affected the information entropy of a large North American wireless service provider network [see graph, ”Sasser's Entropy,”based on an analysis done after the attack]. Notice that traffic is much heavier during the day, as reflected by the information entropy: high during the day, low at night. When the Sasser worm invaded this wireless carrier's network, the behavior-based security systems were unable to detect the outbreak until the network became saddled with more than 30 times its normal traffic volume. Behavior-based systems cannot detect the initial attack, because the traffic generated by one infected machine is negligible. Within minutes, however, that one machine has infected 10 others, and those 10 infect 10 more, and so on, each generating its share of data. By the time the behavior-based system can generate an alert, the traffic is overwhelming.

Sasser quickly infected some 20 000 computers. Patches were soon created, but the worm was relaunched in multiple waves, spawning nine more variants in just over 30 days and infecting hundreds of thousands more machines.

The traditional defense against Sasser worked, eventually. But had the worm been detected earlier, for example by a system based on network entropy, Sasser would have been much more limited in its damage and probably would have spawned fewer variants.

Traditional defenses fall far behind when it comes to new, more sophisticated forms of attack, such as Storm

If traditional defenses struggle to keep up with traditional viruses, they fall far behind when it comes to new, more sophisticated forms of attack, such as the Storm worm. In broad outline, Storm shares some of the typical characteristics of an e-mail worm--users click on an attachment, which opens a file that places new code on the user's computer. The code then causes the computer to join an existing botnet, hooking itself up as a slave machine to a master computer out on the Net.

But Storm differs from earlier worms in a number of important ways. First, it does an excellent job of getting people to click on the attachment--it employs some clever social engineering by using subject lines and file names related to a hot topic in the news, such as a major storm or hurricane warning (this is where its name came from).

More significantly, Storm cleverly hides its network activities. While Sasser, for example, created a lot of new--and easily detected--traffic on TCP port 9996, Storm first looks to see what ports and protocols a user is already using. If Storm finds a file-sharing program--such as eDonkey, a popular program for trading music and videos--it uses that program's port and protocol to do its network scanning. The resulting minor increase in activity on that port would be missed by a conventional intrusion detector. Storm also looks to see what IP addresses the file-sharing program has already exchanged data with, instead of suddenly communicating with a whole bunch of new IP addresses, which would again be easily detected.

Finally, traditional worms spread as fast as they can, generating a fingerprint that is easily seen by a network-entropy security system. Storm, on the other hand, has a dormant mode and a waking mode. For example, every 10 minutes it will try to gather information. Then it will go quiet, and then start again.

Storm is now hibernating in millions and millions of computers in North America, Europe, and Asia. It is flying under the radar of current detection systems by tailoring its behavior to its victims' existing patterns of network usage. Its methods are changing in subtle ways over time as its creators stay one step ahead of their adversaries. Its botnet is poised to strike at major networks at any time.

How do we deal with this monster? We look, as we do with all worms, for changes in the entropy of the network. After all, Storm still has to alter certain things about a user's behavior in ways that can be detected. For example, during the 10-minute periods that Storm is active, the victim's computer will send a lot of e-mail, much more than it normally does, typically about 30 messages per minute, or about 300 in one of its 10-minute active phases. Nobody sends out that much e-mail. And the eâ''mail goes out on a port that doesn't usually get e-mail traffic--in our example, the port that eDonkey uses. Normally, eDonkey traffic is very dense--bulky audio and video files--while e-mail is very low-volume data. These are all ways in which the network's entropy has changed.

A victim's communication to a master computer will also differ from previous usage patterns, maintaining a connection for days or even weeks of very low volume. Consider, too, that when two computers on the same network, say Nucci's and Bannerman's, are both victims of Storm, their behavior will suddenly be quite similar, whereas previously they were very different (perhaps Nucci downloaded lots of TV programs, while Bannerman did not).

In all of these ways, Storm is altering the network entropy of the victim's computer traffic. The increase in e-mail activity, for example, biases traffic in favor of port 25, the usual eâ''mail port--decreasing entropy, because port-25 activity becomes more predictable. Similarly, it becomes more predictable than previously that eâ''mail is sent rather than received. Our existing code, at Narus, will detect these changes in entropy, even if the classification of the changes is confusing--the victim behaves somewhat like an e-mail spam generator, somewhat like a worm victim, and somewhat like a botnet member.

Traditional behavior-based Internet security servers cannot detect these attacks accurately enough and early enough to mitigate them before they achieve their goals. Today's carriers and other large network owners need a new approach to security that can correlate traffic data at extremely high speeds. Systems based on information entropy can do that, and the security of these most critical networks depends on it.

About the Author

Antonio Nucci is the chief technology officer, and Steve Bannerman is vice president of product management, of Narus, in Mountain View, Calif. The company produces software for analyzing data traffic and protecting networks. Both authors are IEEE members.

For a general discussion of information entropy, see Charles Seife's 2006 book, Decoding the Universe , published by Viking Penguin; in particular, check out pages 46, 47, and 71. As well, the Wikipedia entry for Entropy is helpful, see http://en.wikipedia.org/wiki/Entropy.