PCPU locked up on Cisco UCS

ESXi 5.5 Update 2 is stable version, but I got PSOD on one UCS blade few days ago. It scared me since there was a big bug when I upgraded ESXi from 5.1 to 5.5 Update 1 last year(See detail ESXi 5.5 and Emulex OneConnect 10Gb NIC), it lead to dozen virtual machines crashed over and over again.I bet I’m gonna to die if it happens again. 🙂

The error message on the POSD was “PCPU 20 locked up. Failed to ack TLB invalidate”. I checked ESXi logs after rebooting. It looked like the server suddenly crashed without any error or warning messages. I suspected it’s not software layer issue. Eventually I found the CPU lock up problem occurred on Cisco UCS, the root cause is a bug in fnic driver. Please refer detail on CSCut64613. Basically you need to update fnic driver to 1.6.0.17a.