Tuesday, February 20, 2007

Network Coding, Part 2

A vuln was discovered in Snort's DCE-RPC reassembly, similar to last year's bug in their SunRPC reassembly. These problems stem from Snort's core architecture. There are two ways of constructing a network applications like intrusion-detection, streaming and backtracking. Snort uses the backtracking model, which is more prone to such mistakes than the streaming model.

In a streaming system, once a byte of input is analyzed, it will no longer be re-analyzed. In a backtracking system like Snort, the technology may go back and re-analyze previous bytes, requiring more complicated reassembly architecture to store those bytes. Streaming models are inherently faster, more reliable, and more secure - but much harder to program.

An intrusion-detection system has a choice whether to use backtracking or streaming technologies. The well-known pattern matching algorithm Boyer-Moore works by skipping ahead, then backtracking, and would be inappropriate for a streaming system. On the other hand, the Aho-Corasick searches for patterns one byte a time, and would work well in a streaming system.

The same applies to more complex pattern-matching using regular-expressions (regex). A regex represents a finite automata. There are two basic ways that a finite automata might work. Using an NFA, all possible combinations of the regex are tested at runtime using backtracking. Using a DFA, all possible combinations are put into a big table, and each streaming byte of input causes a transition to a new state in the table.

Both a backtracking and streaming IDS needs to take care when writing regex expressions to avoid an explosion of possible states. When compiled as an NFA, a hacker can attack the system by causing all states to be traversed. A recent paper shows that a backtracking system like Snort can be DoSed with as little as 4-kbps by causing all backtracking states to be traversed. When compiled as a DFA, the explosion of states will cause all memory to be consumed when compiling the regex - what looks like a simple regex can, in fact, require a DFA of 5-gigabytes to store all combinations.

The streaming model can be used for protocol-analysis as well as pattern-matching. There are not many examples in the open-source community, but a good one can be found in Mozilla's GIF parser (function gif_write() in GIF2.cpp). This code parses the GIF format one byte at a time as the image is streamed from the web-server so that it can render it in on the screen before the file has been completed downloaded. Since each byte is processed individually, each incoming fragment of data is processed by itself rather than being reassembled.

The Mozilla GIF parser looks almost identical to the GIF parser I wrote for the Proventia IDS/IPS. Its structure is similar to all the other 200-odd protocol decodes in Proventia, including the SMB and DCE-RPC parsers. These parsers decode the protocols as a stream of bytes.

Since all the logic in Proventia is stream oriented, it does not actually "reassemble" fragments, it just "reorders" them. When one fragment ends and the other starts, it continues where it left off as if there were no fragment break. The TCP protocol delivers a series of ordered fragments to the NetBIOS/SMB decode, which itself delivers a series of ordered fragments to the DCE-RPC decode, which delivers a series of ordered fragments to the application decodes on top of DCE-RPC. The simplicity of this approach is why Proventia has had SMB and DCE-RPC "reassembly" in the core engine as far back as 2000, even though the major DCE-RPC vulnerabilities weren't discovered until 2003 (in contrast, Snort added DCE-RPC reassembly in 2006).

I talked about ASICs in Part 1 of this series. As Chief Scientist of ISS, I had ASIC vendors come to me with proposals to accelerate TCP reassembly and regex pattern-matching. Not only were their proposals slower than our shipping products, but they had a hard time grasping the concepts that (a) TCP reassembly isn't really needed, and (b) their methods of accelerating regex by converting to a DFA can be done in software without their ASIC.

I have talked to engineers at Ironport (an e-mail appliance) and Sidewinder (a firewall). They have indicated that they use the same approach in their products. Like Proventia, they are the fastest in their class of products. Even Microsoft's IIS uses a streaming model. For example, when sending a "GET /index.html HTTP/1.0", you can send 5-billion spaces between the "GET" and the "/index.html". This is because Microsoft is using a state-machine to parse the incoming bytes from TCP. In contrast, Apache reads in a block of 16k bytes, then backtracks to re-parse the boundary between "GET" and "/index.html".

Wow, thanks for the series of articles, Robert. It's really nice to read casual musings about somewhat complicated topics! Both you and David break down these things really well. Looking forward to the next one!

So what do you do when you don't even know what protocol it is until after N bytes? Just track all possible protocols, dropping each as you get bytes that eliminate matches?

Yes, that's precisely what you do. In particular, HTTP is pretty messy, so it's kept around as a possible match for a long time. Also, you cannot disambiguate just on a stream going in one direction, you have to also look at the response from the other side.

This approach would not handle the TCP overlap cases and retransmission cases.

As I mentioned, the system still has to re-order fragments and otherwise disambiguate them -- it's just doesn't have to copy fragments into another buffer. That's where the Snort buffer-overflows in their TCP, SunRPC, and DCE-RPC parsers came from, was the copying of fragments together into another buffer.