Tuesday, November 10, 2009

Ethernet Integrity, or the Lack Thereof

We don't need our own integrity check. The TCP checksum is pretty weak, but Ethernet uses a ludicrously strong CRC. Even if you don't trust the TCP checksum, Ethernet will detect any errors.

Let's dig into this a bit, shall we?

A modern switch fabric chip is designed for both L2 ethernet switching and L3 IP routing. The additional logic for IP routing adds relatively little area in modern silicon technologies, while not having a routing capability would put them at a competitive disadvantage. Essentially all ethernet fabric chips, even those inside relatively cheap L2 switches, have the design features to route IPv4 traffic to at least a basic degree.

When a packet arrives at the input port (A) its CRC will be checked and the packet discarded if corrupt. If the packet is destined to the router's MAC address, its destination IP address will be looked up for L3 routing (C). An L3 router modifies the packet as part of its function, by decrementing the IP TTL and replacing the L2 destination with that of the next hop. Therefore a fresh CRC has to be regenerated at egress.

Even if the packet is to be switched at L2 (B), there are cases where the packet is modified. For example server machines and switch uplinks often handle multiple vlans, so their ports will be configured for tagging (D). Addition of the vlan tag requires the packet CRC to be recalculated on egress (E).

The point of this description? There are numerous cases at both L2 and L3 where a packet CRC cannot be preserved through the switch and will need to be regenerated at egress. ASIC designers hate special cases, as they add logic and test cases to the design. Because there are cases where the CRC must be regenerated, modern switch fabrics always regenerate the CRC at egress. Even if the packet has not been modified, even if the ingress CRC could have been preserved, it is discarded at ingress and regenerated at egress.

It bears repeating that this is a function of the chip, not the specific product. Even the tiny ethernet switches sold for practically nothing at retail use chips which contain basic vlan tagging and IP routing features (even if that product doesn't use them), and regenerate the CRC on every packet.
The fabric chip they use wasn't specifically designed for such low cost switches, there is not enough profit to justify the effort. In addition to simple L2 switches those chips can be used to build NAT appliances, as the ethernet fan-out for small wireless access points, in DSL and cable routers, for low end WAN routers, etc. When only basic L2 switching is desired these fabrics can function completely standalone without a management CPU, reducing BOM cost to the bare minimum. Addition of a CPU allows the basic L3 functions to be used in the more featureful (but still low end) products.

Impact

What does this mean? The internal memories and logic paths within the switch are not covered by the ethernet CRC, it does not provide end-to-end protection. The switch might implement ECC over the whole path, but this is not common. The packet buffers are generally large enough to justify ECC, but miscellaneous FIFOs are more likely to have simple parity and logic elements often have no protection at all. It only takes one soft error to corrupt the packet contents, and then a fresh CRC will be calculated over the corrupted data.

If you care about the data you send over the network, you should include your own integrity check at the application level. This is another good argument for using SSL: not only do you protect privacy by encrypting the data, you also get a strong end-to-end integrity check.