Oracle Blog

Reflections on OS integration

Sunday Nov 09, 2008

It's hard to believe that this day has finally come. After more
than two and a half years, our first Fishworks-based product has been
released. You can keep up to date with the latest info at the
Fishworks blog.

For my first technical post, I'd thought I'd give an introduction to
the chassis subsystem at the heart of our hardware integration strategy. This
subsystem is responsible for gathering, cataloging, and presenting a unified
view of the hardware topology. It
underwent two major rewrites (one by myself and one by Keith) but the
fundamental design has remained the same. While it
may not be the most glamorous feature (no one's going to purchase a box
because they can get model information on their DIMMs), I found it an
interesting cross-section of disparate technologies and awash in subtle
complexity. You can find a video of myself talking about and
demonstrating this feature
here.

libtopo discovery

At the heart of the chassis subsystem is the FMA topology as
exported by
libtopo.
This library is already
capable of enumerating hardware in a physically meaningful manner, and
FMRIs (fault managed resource identifiers) form the basis of
FMA fault diagnosis. This alone provides us the following basic
capabilities:

Much of this requires platform-specific XML files, or leverages IPMI
behind the scenes, but this minimal integration work is common to Solaris. Any
platform supported by Solaris is supported by the FishWorks software
stack.

Additional metadata

Unfortunately, this falls short of a complete picture:

No way to identify absent CPUs, DIMMs, or empty PCI slots

DIMM enumeration not supported on all platforms

Human-readable labels often wrong or missing

No way to identify complete PCI cards

No integration with visual images of the chassis

To address these limitations (most of which lie outside the purview of
libtopo), we leverage additional metadata for each supported
chassis. This metadata identifies all physical slots (even those that
may not be occupied), cleans up various labels, and includes visual
information about the chassis and its components. And
we can identify physical cards based on devinfo properties extracted
from firmware and/or the pattern of PCI functions and their attributes
(a process worthy of its own blog entry). Combined with libtopo, we have
images that we can assemble into a
complete view based on the current physical layout, highlight
components within the image, and respond to user mouse clicks.

Supplemental information

However, we are still missing many of
the component details. Our goal is to be able to provide complete
information for every FRU on the system. With just libtopo, we can get
this for disks but not much else. We need to look to alternate
sources of information.

kstat

For CPUs, there is a rather rich set of information available via
traditional kstat interfaces. While we use libtopo to identify CPUs
(it lets us correlate physical CPUs), the
bulk of the information comes from kstats. This is used to get model,
speed, and the number of cores.

libdevinfo

The device tree snapshot provides additional information for PCI
devices that can only be retrieved by private driver interfaces.
Despite the existence of a VPD (Vital Product Data)
standard, effectively no vendors implement it. Instead, it is read by some firmware-specific
mechanism private to the driver. By exporting these as properties in
the devinfo snapshot, we can transparently pull in dynamic FRU
information for PCI cards. This is used to get model, part, and
revision information for HBAs and 10G NICs.

IPMI

IPMI (Intelligent Platform Management Interface) is used to
communicate with the service processor on most enterprise class
systems. It is used within libtopo for power supply and fan
enumeration in libtopo as well as LED management. But IPMI
also supports FRU data, which includes a lot of juicy tidbits
that only the SP knows. We reference this FRU information directly to
get model and part information for power supplies and DIMMs.

SMBIOS

Even with IPMI, there are bits of information that exist only in SMBIOS,
a standard is supposed to provide information about the physical
resources on the system. Sadly, it does not provide enough information
to correlate OS-visible abstractions with their underlying physical
counterparts. With metadata, however, we can use SMBIOS to make this
correlation. This is used to enumerate DIMMs on platforms not
supported by libtopo, and to supplement DIMM information with data
available only via SMBIOS.

Metadata

Last but not least, there is chassis-specific metadata. Some
components simply don't have FRUID information, either because they are
too simple (fans) or there exists no mechanism to get the information
(most PCI cards). In this situation, we use metadata to provide
vendor, model, and part information as that is generally static for a
particular component within the system. We cannot get information
specific to the component (such as a serial number), but at least the
user will be able to know what it is and know how to order another
one.

Putting it all together

With all of this information tied together under one subsystem, we
can finally present the user complete information about their hardware,
including images showing the physical layout of the system. In addition,
this also forms the basis for reporting problems and analytics (using
labels from metadata), manipulating chassis state (toggling LEDs, setting
chassis identifiers), and making programmatic distinctions about the
hardware (such as whether external HBAs are present). Over the
next few weeks I hope to expound on some of these details in further
blog posts.