Agent-Free Systems Management

Summary

On server class systems, monitoring and managing hardware health/configuration remotely for large number of systems is crucial. One important component of this systems management solution on each server is the Service Processor. System Administrators and Monitoring/Configuration Software (like Nagios, etc.) connect to the Service Processor via shared/dedicated management networks.

The information provided by the Service Processor is mostly independent of the Operating System running on the server. It is possible through systems management software installed on the operating system to obtain a richer set of systems management functionality overall. Such systems management software that run on Linux are specific to the vendor of the server, and can also be proprietary. They can also be bulky and require to be validated/managed like any other application.

We can envision an ideal systems management solution comprising of the Service Processor and the operating system combination that “just work” without the need for a vendor specific (and sometimes proprietary) software without a major loss of feature.

The goal of this feature is the substitute some of the important functionality of the systems management software that is usually installed on the operating system by a native implementation. This will also put existing standards already in use by Service Processors like IPMI and WSMAN to better use.

Example on Dell PowerEdge Servers: Redirect OIDs under .1.3.6.1.4.1.674.10892.2 to the service processor, we will need this in /etc/snmpd.conf: proxy -v2c -Os -c public <Serv_Proc_IP> .1.3.6.1.4.1.674.10892.2

Purpose: One-to-Many management consoles are able to launch the service processor management console by retrieving the URL from an OS based agent. Moving this functionality into the OS enables the same feature without the need to install an additional application into the OS.

Retrieve IP address and URL of service processor and expose them via

standard DMTF name-space

environment variable for privileged users

Needs to be dynamic: Any changes to service processor IP address/URL should reflect on the host OS or when queried by wsman

Purpose: Many management consoles and tools manage hardware via WS-MAN. This requires the addition of a WMI provider from the hardware vendor. Placing a WS-MAN redirection to the service processor’s WS-MAN stack into the OS enables the same feature without the need to install an additional application into the OS

Benefit to Fedora

The Fedora users of servers that contain Service Processors do not have to install additional software dedicated to systems management and still expect standard pieces of information to be available remotely.

Assists with debugging system failures (panic, hang, etc.) remotely.

Scope

The new features will require:

Automated loading of ipmi driver where service processor hardware is available.

Contingency: OpenIPMI already has systemd start-up script that loads the drivers when enabled.

One start-up script that will run after ipmi drivers have loaded to:

fetch service processor IP address/URL

set OS name, version in the service processor

setup snmpd.conf for redirection

A configuration file that accompanies the start-up script that contains:

snmpd OID of the service processor (will differ for each OEM)

systemd already has support for hardware watchdog, but we will require that ipmi_watchdog driver is loaded and does not conflict with iTCO_wdt on systems that have both watchdog hardware.

This is not Dell specific and will work on any system with an IPMI compliant Service Processor

Patch /usr/libexec/openipmi-helper to set systemd watchdog if /etc/sysconfig/ipmi:IPMI_WATCHDOG=yes

Support for IPMI (driver and freeipmi) in Anaconda

systemd already supports hardware watchdog.

How To Test

Install Fedora on test machine with service processor

Publish OS information to Service Processor

Service Processor should provide the OS Version and Name via various supported interfaces

Heartbeat to Service Processor

With the watchdog daemon configured, a kernel panic or system crash should result in the system rebooting after the set time and a snapshot of the crash and/or an entry in the SEL log should be recorded.

Retrieve log from Service Processor

syslog should contain IPMI SEL events logged by ipmievd

Support for redirection of SNMP

After configuration of /etc/snmpd.conf, snmp queries to Fedora with the Service Processor OID should succeed and return correct values that would otherwise be retrieved via the Service Processor's snmp agent.

Include IPMI support in anaconda

During install time, we should have access to ipmitool or freeipmi commands that can be used via kickstart's pre-install section.

Contingency Plan

Documentation

Release Notes

Better management of Fedora system remotely via Service Processor

On systems that contain IPMI compliant Service Processors, it is now possible to have closer integration of OS and Service Processor without the need for 3rd party software. This will enable better management of the system remotely.