That dhclient bug

[update 1/19/2012: The patch to fix this bug is now in the repos. If you haven’t already installed that update, it will be there when you next do online updates. My thanks to the opensuse team for their work on this.]

Opensuse 12.1 gives me a choice of two DHCP clients to use. They are “dhcpcd” and “dhclient”. The “dhclient” comes from ISC.

When you configure your network with “ifup” settings, the default is to use “dhcpcd”, though this can be changed to use “dhclient”. If you are using NetworkManager, then it only uses dhclient.

First a little background. The acronym “DHCP” stands for “Dynamic Host Configuration Protocol”. It is an extension of the older “bootp” protocol. It’s purpose is to simplify the configuration of network devices. When a computer uses DHCP, then it sends out a network request for configuration information. Initially, it sets its own IP address to the bogus 0.0.0.0, for sending out a broadcast request. If all works well, a DHCP server will see the request and reply with the IP address and other network configuration information that the system should be using.

DHCP does not actually assign your computer an IP address. Rather, it gives your computer a temporary lease on an IP address. Your computer can use that until the lease expires. Normally, sometime before the lease has expired, a computer will send out a request to renew the lease for a longer time period. If the lease is not renewed, it is typically returned to the pool of available IP addresses for leasing to other computers.

Ihave been preferring “dhclient” even for networks configured with “ifup”. There are two reasons for this preference, though I am not sure whether they still apply.

When I was running openSUSE 11.3 on the desktop system at work, I noticed that it was not properly renewing leases. The logs showed the lease running out, the network being reset, and then sending out another request to reconfigure the network. I’m not sure why it did that, but it looked as if it was not properly handling a setup where the initial DHCP request is made through a DHCP relay agent on the LAN, but the actual DHCP server is elsewhere. Switching to “dhclient” completely solved that problem.

When testing WiFi using “ifup” configuration — I think this was with openSUSE 11.4, but it could have been with 11.3 — I ran into a different problem with “dhcpcd”. I configured one WiFi network, and had that running. Then I tried to switch to a different network by editing the settings file and restarting the network. According to the “ifconfig” command, the WiFi connection was working. But it didn’t actually work. On closer investigation, it turned out that the wireless card was configured to still be using the IP address from the first WiFi network, and that happened to be inappropiate for the second. As best I can tell, “dhcpcd” was caching the IP lease information, and then continuing to use that for the new network because the lease had not yet expired. Switching to “dhclient” also fixed that problem.

After installing openSUSE 12.1 on my home desktop system, I naturally switched to “dhclient” based on this past experience. Everything worked well at first. But when I rebooted a few days later, there was no network. The logs showed a strange DHCP failure.

I first tried using NetworkManager. Then I switched back to using “dhcpcd”. Both solutions got my network running again.

It turned out that “/sbin/dhclient-script” had been corrupted – overwritten with an informatory message. In normal use, “dhclient” calls this script, though with NetworkManager a different script is used.

This has been reported as Bug 732910. And it is a strange bug. Here’s what seems to be happening:

Just before running “dhclient-script”, the “dhclient” process for the connection is closing file descriptor 2. That’s the file descriptor for stderr and should never be closed. Where it is being closed has not yet been tracked down.

File descriptor 2 is being reopened for reading “/sbin/dhclient-script”. That probably comes from opening the script ready to run it, and open() normally results in the first available file descriptor which would be 2 after that had been previously closed.

Another script, called from “dhclient-script” writes a message to “/dev/stderr” which is really “/dev/fd/2”. Now you would think that because file descriptor 2 is opened only for reading, that would fail. But it seems that the linux kernel has implement “/dev/fd/n” is a strange way that allows writing. So the message overwrites the script which then fails when it is next used.

It turns out that there is a workaround. If you turn on DHCP debugging, then “dhclient-script” reassigns file descriptor 2 to a log file, and that saves the day. Of course, if you have been already hit by this bug, then it’s too late, for “dhclient-script” will already be corrupted. You will need to reinstall “dhclient” or copy the correct script from another system to get it working again.