Oracle Blog

Blog for robj

Tuesday Nov 30, 2010

In November, Oracle released a sneak peek at the next major release of Solaris in the form of Oracle Solaris 11 Express 2010.11. There are tons of great features and innovations in this release. One of the features I worked on was a new service smtp-notify, that can be configured to send email notifications in response to various Fault Management events, such as when a hardware component has been diagnosed as faulty. Notifications can be configured for the following FMA event types (the descriptions below have been excerpted from the smf(1m) man page)

problem-diagnosed

A new problem has been diagnosed by the FMA subsystem.

The diagnosis includes a list of one or more suspects,

which (where appropriate) may have been automatically

isolated to prevent further errors occurring. The prob-

lem is identified by a UUID in the event payload, and

further events describing the resolution lifecycle of

this problem quote a matching UUID.

problem-updated

One or more of the suspect resources in a problem diag-

nosis has been repaired, replaced or acquitted (or has

been faulted again), but there remains at least one

faulted resource in the list. A repair could be the

result of an fmadm command line (fmadm repaired, fmadm

acquit, fmadm replaced) or might have been detected

automatically such as through detection of a part serial

number change.

problem-repaired

All of the suspect resources in a problem diagnosis have

been repaired, resolved or acquitted. Some or all of the

resources might still be isolated at this stage.

problem-resolved

All of the suspect resources in a problem diagnosis have

been repaired resolved or acquitted and are no longer

isolated (for example, a cpu that was a suspect and off-

lined is now back online again; this un-isolate action

is usually automatic).

The smtp-notify service is enabled out-of-the-box.

# svcs smtp-notify

STATE STIME FMRI

online Oct_28 svc:/system/fm/smtp-notify:default

You can list the default notification preferences with svcs(1m):

# svcs -n

Notification parameters for FMA Events

Event: problem-diagnosed

Notification Type: smtp

Active: true

reply-to: root@localhost

to: root@localhost

Notification Type: snmp

Active: true

Notification Type: syslog

Active: true

Event: problem-repaired

Notification Type: snmp

Active: true

Event: problem-resolved

Notification Type: snmp

Active: true

What does the above output tell us? It tells us that problem-diagnosed events will result in an email notification being sent to root@localhost. It will also result in a message being sent to syslog and an SNMP trap being generated. Additionally, SNMP traps will be generated for problem-repaired and problem-resolved events.

What does an example email notification look like? See below:

From noaccess@diffuser.sfbay.sun.com Wed Jul 21 19:58:29 2010

Date: Wed, 21 Jul 2010 19:58:29 -0700 (PDT)

From: No Access User <noaccess@diffuser.sfbay.sun.com>

To: root@localhost

X-FMEV-HOSTNAME: diffuser

X-FMEV-CLASS: list.suspect

X-FMEV-UUID: e82aa706-ce6a-cbbb-a529-ceef1c9b57b0

X-FMEV-CODE: AMD-8000-AV

X-FMEV-SEVERITY: Major

Subject: Fault Management Event: diffuser:AMD-8000-AV

SUNW-MSG-ID: AMD-8000-AV, TYPE: Fault, VER: 1, SEVERITY: Major

EVENT-TIME: Wed Jul 21 19:58:29 PDT 2010

PLATFORM: Sun-Fire-X4200-Server, CSN: 0000000000, HOSTNAME: diffuser

SOURCE: eft, REV: 1.16

EVENT-ID: e82aa706-ce6a-cbbb-a529-ceef1c9b57b0

DESC: The number of errors associated with this CPU has exceeded acceptable levels. Refer to http://sun.com/msg/AMD-8000-AV for more information.

AUTO-RESPONSE: An attempt will be made to remove this CPU from service.

IMPACT: Performance of this system may be affected.

REC-ACTION: Schedule a repair procedure to replace the affected CPU. Use 'fmadm faulty' to identify the module.

Those who've seen the messages that are logged to the console when FMA diagnoses a fault will see that the format is similar. One additional thing to note is that each FMA email notification message also includes the following X-headers, which are there to aid admins who write mail filters:

Header Name

Description

X-FMEV-HOSTNAME

the name of the host on which the event occurred

X-FMEV-CLASS

the event class

X-FMEV-CODE

the Knowledge Article message ID

X-FMEV-SEVERITY

the severity of the event

X-FMEV-UUID

the UUID associated with the event

Email notification for FMA are highly configurable via svccfg(1m). For example, you can enable/disable them per event type. For example:

# svccfg setnotify problem-diagnosed mailto:active

or

# svccfg setnotify problem-diagnosed mailto:inactive

You can configure separate lists of one or more email recipients per event type. For example:

Of course defining your own message template is nice, but it's only really useful if you have a way of referencing information about the actual FMA event in your message. To facilitate this, we support the following expansion macros that can be embedded in message templates:

Macro

Description

%%

expands to a literal % character

%<HOSTNAME>

expands to the hostname on which the event occurred

%<URL>

expands to the URL of the knowledge article associated with this event

%<CLASS>

expands to the event class

%<UUID>

expands to the UUID of the event

%<CODE>

expands to the knowledge article message ID

%<SEVERITY>

expands to the severity of the event

But wait…there's more!

The smtp-notify service can also be configured to generate notifications for SMF service state transitions. I won't go into the details of that here, but it's all documented in the smf(1m), svccfg(1m) and smtp-notify(1m) man pages.