Tuesday, June 28, 2011

Out of Business Before It Happens

Ethernet frames carry a CRC32 to ensure data integrity. It is often assumed that this CRC is carried all the way across the network, but in reality there are cases where the switch fabric modifies the frame and calculates a new CRC at egress. For example:

L3 routing, to decrement TTL and rewrite MAC DA

Vlan tag insertion/removal

QoS classification and overwrite of DSCP/802.1p bits

2011 market share of switch fabrics with at least one of these features: 100%

Even the ultra-cheap five port switches sold at retail contain a fabric capable of reasonably sophisticated packet handling. The functionality is not exposed in the software for market segmentation reasons, but the hardware can do it.

ASIC designers hate special cases, they add test burden and risk. Therefore because there exist logic paths which need to check the CRC at ingress and discard it, the chip will always check the CRC at ingress and discard it. Data inside the chip will be protected somehow, if only to counter soft errors, but there is no way to know what they've done. It might be ECC, it might be a weak parity scheme.

I've worked on a number of different designs for switch fabrics, NICs, forwarding engines, and packet processors. In bringing up these systems I have twice run into hardware bugs which corrupted packet data and then calculated a fresh CRC over the corrupted data. In both of those cases the bug was fixed. Yet if I know of two such examples, how many more products in the market slipped out with hidden data corruption bugs?

As software developers, what does this mean? It means the application needs to check the integrity of its own data. The network can't be trusted to do it for you.

This is not a theoretical problem. A flipped bit in a control packet caused an Amazon S3 outage in 2008. There were Ethernet CRCs at each hop along the link; the problem happened inside one of the switches or NICs. Amazon added MD5 checks to all control messages to prevent a repeat of the problem. Believing that it only happens at Amazon scale is another way of saying you expect to be out of business before it happens to you. Statistically, its only a matter of time.

Applications need end-to-end integrity checking. CRC32 carried as part of the payload is fine for this, and is quite efficient on modern CPUs. MD5 and SHA1 are also good choices. MD5 is no longer suitable for security purposes, but for data integrity it is quite strong. Using SSL guarantees integrity of the data over the Internet, though the network within the datacenter must be protected somehow. Given the increasingly heroic measures disks must take for ECC, storing the checksums alongside the data is a good idea too.

The most important thing is to do something.

footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Ethernet label.