Preview Tool

Cisco Bug: CSCus91865 - DHCP server runs out of memory

Last Modified

Dec 16, 2016

Products (1)

Cisco Network Registrar

Known Affected Releases

7.2

Description (partial)

Symptom:
<p>
Some customers have experienced the DHCP server to abort itself, generating a core file with DHCP server log messages indicating that the process is aborting as it was unable to create a thread. Prior to the server abort, memory usage may be seen to increase significantly (100 MB or more) after a reload. This occurs on Linux (it has not been reported or found to occur on Solaris nor Windows), typically impacting customers that have fairly large configurations (where the DHCP server uses 2 GB or more of memory), and occurs on a reload (after a few to hundreds of reloads). Please refer to CSCus91865 for more details and the latest information regarding this issue
</p>
<p>
When the server aborts itself, because it was unable to create a thread or otherwise has run out of memory, the cnrservagt will automatically start a new DHCP server process. Thus, the impact to most customers is:
<ol>
<li>Slightly longer reload times.</li>
<li>Large core files (typically 3.5GB to just over 4 GB) in the /opt/nwreg2/local directory - these must be periodically removed to avoid running out of disk space. Note that whether these core files are created and how depends on the system settings (see man pages for core(5)).</li>
<li>Occasionally, the server will take a long time while reloading before aborting (the server is found to be using 100% CPU on one processor - spending most of its time in memory allocation system calls).</li>
</ol>
</p>
Conditions:
<p>
In working with Red Hat on this issue, it was determined to result from the behavior of the glibc MALLOC library and the pattern of memory allocations and thread usage within the DHCP server - the two do not play nicely.
</p>
<p>
The MALLOC library uses the concept of ARENAs (memory pools) to improve performance and reduce the need for locks and reducing lock contention. However, at times the ARENAs are reused differently than they were used earlier in the life of the process and memory held by an ARENA is thus not necessarily reused or freed to the system. Thus this can thus result in many ARENAs holding large amounts of memory - increasing the memory required for the DHCP server process.
</p>
<p>
Eventually most of the memory space is in use (or what is still available is fragmented), and when the server requests the system to create a thread, the system is unable to obtain the necessary contiguous mappable space for the thread - and hence the thread creation fails and the server considers this "fatal" and (by design) aborts itself.
</p>
<p>
This is known to occur on Red Hat Enterprise Linux (RHEL) /CentOS 5.x with Network Registrar 8.2 and earlier. It also can occur on RHEL / CentOS 6.x with Network Registrar 8.3.
</p>