A few days ago, I ran into an ugly bug on different Scientific Linux 6.3 hosts (therefore this should also affect RHEL 6.3 and CentOS 6.3). The network hangs while the system itself is up, running and responsive. “Just” no network. Restarting the affected network interfaces is not enough, only a complete reboot brings the Intel 82574L-based network cards back to life (those NICs are onBoard on the Supermicro X9SCM-F and X8SIL mainboards of the affected hosts, so I can't simply change them). The logs showed entries like the following:

It seems that the ASPM of the Intel 82574L is broken. The corresponding Linux driver “e1000” therefore has this chip on its ASPM blacklists and disables it when the systems boots. However, there is some side effect which re-enabled the NIC'S ASPM state L1 after a network connection was established. This does not happen on all Linux flavors and kernel versions, but it happens at least on Scientific 6.3 with kernel 2.6.32-279.19.1.

Workaround: disable the NIC's ASPM after the system boots

A quick workaround is to manually disable the NIC'S ASPM after the system booted and the network “stabilized” (e.g. after a few minutes). The following command disables ASPM for a device:

setpci -s <ID-of-device> CAP_EXP+10.b=40

You can use lspci -vnn to get the device ID (first number of the line, 02:00.0 in the following example output):