Reversible computing could pinpoint network congestion

TROY, N.Y.  A Rensselaer Polytechnic Institute researcher has proposed a method for network modeling that he claims could significantly cut the time required to locate points of congestion. The technique is based on the concept of reversible computer algorithms.

The work is being done by Christopher Carothers, an assistant professor of computer science who has received a $375,000 five-year award from the National Science Foundation to apply reversible computation to network simulation. "Our first demonstration will shoot for a million-node simulation," Carothers said.

Traditional simulators top out at about 150,000 nodes. Rewriting the simulator using reversible operators, Carothers said, can speed the simulator sixfold, putting 1,000,000-node simulations within reach.

To make maximal use of reverse computing, Carothers has begun re-engineering the popular public-domain network simulators by using reversible computing throughout. He not only hopes to achieve the first network model to simulate one million nodes in real-time but also intends to catalog the ins and outs of reverse-computing technology along the way.

"We want to understand the fundamental limitations of reverse computation  in general for computational purposes but in particular for these network simulation models," Carothers said. "What models can and can we not reverse? How far can we push the technology of reverse computation?"

Reverse computation hinges on the ability to run some mathematical operations in either direction without losing anything. Incrementing a counter can be reversed by decrementing that counter, for example.

However, most operations involving variables are not as easily reversible, since intermediate values during the operations are only kept in scratchpad memories. To make operations reversible in general, traditional thinking would resort to the same time-stamped audit trail kept for parallel-processing simulations, wherein everything that happens is stored and logged. To run in reverse, such a simulator would pull out the stored audit trail values in precisely the correct order.

Backward march

For computations that are reversible by design, on the other hand, no audit trail is kept, because all operations can be run in reverse.

"We are using reverse computations  literally running programs backward  to make the simulation of large-scale models faster. In this case the application is models of networks, but other things could also be modeled with reverse computations," said Carothers. Today, network managers use their simulators to discover bottlenecks and then use trial and error plus prior knowledge to brainstorm the possible causes of network congestion. The process is commonly called the "what if" methodology.

When a bottleneck is discovered during a simulation, the engineers make educated guesses about how the snarl might have been avoided. Each what-if guess is then implemented, usually by changing a parameter in the simulation program, and the entire simulation is then repeated to see if it avoids the bottleneck.

The second major question asked by engineers of their network simulators is, What can be done to expand the network to more nodes and higher levels of functionality, without creating new bottlenecks? Network simulators can tip off engineers designing networks as to what new resources might be needed and how to interconnect them to optimize the operation of new or expanded networks.

"The output of these simulations is used to make a decision about what parameters should be changed in the system. We are ultimately doing control, but you can also use them for network engineering tasks such as designing future networks and asking 'How should we do it?' You can really ask both kinds of questions of the system," said Carothers.

Normally during a simulation, a known initial condition is set up, and the computation runs forward in time until a point of congestion is reached. At that point, what-if conjectures about parameters are tested to prevent future occurrences of congestion. Trial and error among many simulation runs can eventually craft an optimal set of parameters.

No more guesswork

But if Carothers can perfect the use of reverse simulators, the work might change the way engineers use such tools in the future. In particular, the what-if methodology would go out the window. Instead of brainstorming about the possible causes of network congestion and conducting subsequent simulation runs to test out the theory, Carothers would just reverse his simulator to roll back to the exact cause, with no guessing required.

"We are speeding up these simulations with reverse computation, looking for interesting points of congestion, but the other aspect that is potentially even more interesting is the kinds of experiments you [can] run. For instance, if you are looking for congestion you can run our parallel simulation forward until you find congestion. Then the question becomes, What caused it? Literally from the point at which congestion occurred, you can just run backward and consider all the possible causes that would have led up to this congestion," said Carothers.

"What you really want to know is what congestion you have here and now, under these conditions. What are the potential causes of this congestion? So what we begin to do is run [the simulation] backward and figure out what the potential causes were that created this congestion," said Carothers.

Backward simulations, he said, provide "a more holistic view of what is going on and, with that, a better understanding of all the factors that led to the congestion. You can make a more informed decision about which parameters need tweaking."

The first step, currently being undertaken by Carothers, is to write a completely reversible simulator, because current simulator programs were not intended to be run backward. Having dissected the simulator programs of others, Carothers has identified which procedures in the simulator can be made reversible and which cannot. "Certain operations are perfectly reversible, such as random-number generation," he said. "We leverage that when writing our simulators, because you don't want to keep an audit trail; that is the kind of operation that really slows down a simulator.

"Instead, we are running and doing things in reverse so that we are not saving the state, and ultimately what we have shown is that on a parallel simulation run, we can achieve up to sixfold improvement over state saving."

Since computations in general are not reversible and audit trails are slow, Carothers is expending mainline efforts to identify reversible operations that can replace even the simplest non-reversible algebraic operations. For instance, A = B is not in general reversible, because the earlier value of A is lost unless a time-stamped audit trail is kept to make note of it.

Carothers handles such cases without an audit trail by swapping the values of A and B  a reversible operation that Carothers claims preserves the data like an audit trail, but without its processing overhead.

Counting packets

Regardless, such troublesome algebraic operations are relatively foreign to network simulators. Indeed, one reason Carothers proposed reversible computations for network simulations is that this application naturally avoids common non-reversible operations. On the contrary, network simulators are in the "bean counting" category; they want answers to such questions as how many, how often and the like. Such metrics are reversible.

"All I want to know in my network model is how many packets did I process and, once in a while, the delay of an average packet, but these are all counting [operations]. So every time you process a packet you merely increment a counter. Then, in reverse, all you have to do is decrement the counter, a perfectly reversible operation," said Carothers.

After the million-node simulation, a future goal  probably two years away, according to Carothers  is to demonstrate the ability to run that simulation in reverse to pinpoint and identify the "causes" of particular cases of congestion from their real effects later in time.

"Eventually we will demonstrate actually running the simulation backwards  there is subtle difference when you are actually doing that," said Carothers.

An audio recording of reporter R. Colin Johnson's full interview with Christopher Carothers can be found online at AmpCast.com/RColinJohnson.