petalert.pl
-----------------------
This is a snmptrapd handler script to alert when Platform Event Traps
(PET) occur. It was written because traptoemail distributed with
net-snmp-5.3.2.2 is incapable of handling multi-line hexstrings and
restricted to email alert.
This script operates in two modes, traphandle or embperl. When in
traphandle mode, it concatenates the quoted hex string into one long
line, then builds structures to resemble embperl mode. Both modes then
invokes helper decoder, ipmi-pet(8) from FreeIPMI, parses the output
and alerts in given way like email, nagios external command, nsca, etc.
1. REQUIREMENTS
freeipmi-1.1.1 and above is required for the script to function. Both
FreeIPMI and the script imply Unix-like system, notably GNU/Linux;
Windows is not supported as of this writing, Dec 13, 2011.
Net-SNMP 5.3.2.2 and above is required to make a complete alerting
solution. Actually only snmptrapd is related which acts as the trap
receiver.
If you prefer to running it as embedded perl handler, your version of
Net-SNMP should have embedded perl support enabled, see "Embedded Perl
Support" from snmpd.conf(8) for more infomration. Usually it's
enabled, and you can verify with the following command:
# net-snmp-config --configure-options | tr ' ' '\n' | grep perl
'--enable-embedded-perl'
If you prefer to invoking the handler directly rather than invoking it
with perl(1), make sure the script itself has execute permission. Both
cases require a working Perl installation, better Perl-5.8.8.
If you prefer to built-in email alerting, make sure Net::SMTP is
installed.
If you prefer to Nagios monitoring system, make sure the Nagios
process and snmptrapd is on the same host. Usually you don't need to
worry about write permission of Nagios external command file, because
the handler is invoked as root by snmptrapd. If that's not your case,
you need to ensure write permission on the command file.
You might prefer to other alerting methods, bad news is it is not
implemented yet. Please drop me a mail, then I might take my time to
go on with plugin support.
Paranoids might check firewall rules allowing only traffic from
trusted hosts.
2. CONFIGURATION
(Note backslash-newline concatenates adjacent lines, so put them in
one) Put a line like these in your snmptrapd.conf file:
traphandle .1.3.6.1.4.1.3183.1.1 /usr/bin/perl \
/usr/share/doc/freeipmi/contrib/pet/petalert.pl --mode=traphandle \
--alert=email --sdrcache SDRCONF -- -f FROM -s SMTPSERVER ADDRESSES
Or, if you prefer embedded perl,
perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl";
perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \
--alert=email -- -s SMTPSERVER -f FROM ADDRESSES));
where:
only --mode is required, see "petalert.pl -h".
Make sure execute permission is granted to execute handlers, for
example,
authCommunity execute COMMUNITY_STRING
see "ACCESS CONTROL" from snmptrapd.conf(8) for more information. Bad
news is that you have to use numeric representation, so in addition
add "-Of -On" to snmptrapd options.
You have to enable PET on IPMI nodes as well, including LAN access,
PEF alerting, community, alert policy and destination. You may use
ipmi-config from FreeIPMI to do the configuration (use --category to
checkout core and pef category of configuration). See "IPMI NODES".
You might wish to set up PTR records for IPMI nodes, otherwise,
snmptrapd reports to traphandle and the script will fall
back to use ip.
2.1 ACKNOWLEDGE
Platform event trap is over UDP, you might worry about trap loss. IPMI
spec allows the trap receiver to acknowledge the trap. Use --ack to
acknowledge the trap before alerting. You may need workarounds for
acknowledgement. See BUGS. So in a acknowlege setup, it might be like
this:
perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl";
perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \
--ack -W malformedack \
--alert=email -- -s SMTPSERVER -f FROM ADDRESSES));
2.2 NAGIOS INTEGRATION
Nagios monitoring system could be plugged into by writing to its
external command file as passive check. See ipminodes.cfg and
check_rmcping for related Nagios configuration.
Assuming Nagios process is local, use:
perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl";
perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \
--alert=nagios -- -H short -S PET NAGIOS_COMMAND_FILE));
where "-H short" means if 10.2.3.4 resolves to foo.example.com, Nagios
passive check gets foo as host; use "-H fqdn" to pass foo.example.com
to Nagios. In addition, "-S PET" sets service description.
If Nagios process is on remote host, normally you turns to NSCA which
consists of NSCA daemon on the Nagios host and the send_nsca client
program. To alert by send_nsca,
perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl";
perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \
--alert=nsca -- --prog /usr/bin/send_nsca -H short -S PET \
-- -H NAGIOS_HOST -c SEND_NSCA_CONF));
Notice the unattached -- appears two times in the configuration line
separating three steps of arguments processing, namely generic args,
alert specific args, and external helper args.
3. SDR CACHE FILE MAPPING
Notice the underlying helper program ipmi-pet(8) normally depends on
some sdr cache file, either preinitialized or created on demand. If no
credential is supplied, ipmi-pet(8) simply assumes localhost and
creates sdr cache which is usually
~/.freeipmi/sdr-cache/sdr-cache-.localhost. You may wish to
supply preinitialized ones, then use -c sdrmapping.conf to associate
them with IPMI nodes.
The sdr cache config syntax is: every unindented line starts an sdr
cache file, followed by any number of indented lines of IPMI
nodes. Every IPMI node line may consist of multiple nodes delimited by
whitespaces. Comments follow Shell-style, trailing whitespaces are
trimmed, empty lines skipped.
For example,
|/path/to/sdr-cache-file-1
| 10.2.3.10 # comment
|
|/path/to/sdr-cache-file-2
| 10.2.3.4 # one node
| 10.2.3.5 10.2.3.6 # two nodes
| 10.2.3.[7-9] # trhee nodes in range form
|
^-- this is the beginning of lines
3.1 SDR CACHE INITIALIZATION
The sdr cache file can be initialized by ipmi-sel(8) and the
--sdr-cache-file option.
# ipmi-sel -h 10.2.3.4 -u root -P --sdr-cache-file=/path/to/sdr-cache-file-X
Password:
Caching SDR repository information: /path/to/sdr-cache-file-X
Caching SDR record 125 of 125 (current record ID 125)
ID | Date | Time | Name | Type | Event
1 | Dec-12-2011 | 16:41:51 | SEL | Event Logging Disabled | Log Area Reset/Cleared
...
4. IPMI NODES
For PET to be generated, configurations on IPMI nodes have to be done,
including LAN access, PEF alerting, trap community, alert policy and
destination. You may use ipmi-config from FreeIPMI to do the
configuration (use --category to checkout core and pef category of
configuration).
However, before doing configurations and facing unexpected firmware
issues, you'd better verify that the trap receiver end works
well. Simply modify the following example traphandle input to meet
your setup, then feed it to stdin of petalert.pl like this, assuming
you prefer to alert email:
# perl petalert.pl -D :all --mode=traphandle --sdrcache SDRCONF \
--alert=email -- -f FROM -s SMTPSERVER ADDRESSES <; $x=eval "@v"; print join(" ", $x->[0], @{$x->[1]})."\n"'
[
'/usr/sbin/ipmi-pet',
[
'--pet-acknowledge',
'-h',
'10.2.3.4',
'356096',
'44',
'45',
'4C',
'4C',
...
]
]
Ctrl-D
/usr/sbin/ipmi-pet --pet-acknowledge -h 10.2.3.4 356096 44 45 4C 4C ...
Then you could simply paste the command in the shell to simulate a
manual acknowledge. Looks like acknowledge requests without previous
PET is also accepted and responded as usual.
snmptrapd(8) itself allows for logging of traps into syslog which
requires log permission, see "ACCESS CONTROL" from snmptrapd.conf for
more information.
NSCA daemon logs to syslog, set "debug=1" in nsca.cfg to get detailed
connection handling. Nagios is also able to log to syslog, set
"use_syslog=1" in nagios.cfg to help debugging alert.
6. PET TRAFFIC
On a no-acknowledge setup, usually there should be only one packet on
behalf of the PET from the ipmi node targeting the trap receiver,
however, firmware defect was spotted resulting in additional traffic,
see BUGS.
On an acknowledge setup, there should be three packets per event, one
PET, one PET acknowledge request from trap receiver targeting the ipmi
node, and one PET acknowledge response in the other direction. More
bugs were spotted, see BUGS.
Any setup, packets could be captured like this
# tcpdump -i any -nn -vvv -s0 -w pet.pcap 'host 10.2.3.4 and udp'
Then you can browse the interactions with the help of Wireshark.
7. BUGS
It's spotted that factory default rules of iDRAC Express on Dell
PowerEdge R610 don't match software generated events. You need to make
a catch-all filter rule to report those events. However, hardware
gernerated events are not subject to such limitation. To verify this
situation, open the case to generate a hardware generated intrusion
event. Dell PowerEdge 1950 with BMC has similiar problem. The
difference is that 1950 has 31 filter rules, so you don't worry about
overwriting an existent one.
It's spotted that Dell PowerEdge 1950 with BMC suffers clock drift,
remarkably SEL timestamps. bmc-device(8) from FreeIPMI could be used
to adjust SEL time and SDR repository time,
# bmc-device --set-sdr-repository-time=now
# bmc-device --set-sel-time=now
Notice that 'now' refers to current timestamp on the host where the
commands are issued. bmc-device(8) works out of band, so simply issue
the commands on a host where clock is synchronized.
It's spotted that iDRAC Express on Dell PowerEdge R610 generates two
traps per hardware event. Notice session id from the two traps differ,
they are different traps instead of duplication, even though other
contents of payload are identical.
It's spotted that iDRAC Express on Dell PowerEdge R610 produces
malformed PET acknowledge responses. In this case, ipmi-pet exits with
timeout error "ipmi_cmd_pet_acknowledge: message timeout". You may use
'-W malformedack', which is simply passed through, to instruct the
underlying helper ipmi-pet(8) to disable such detection and to
immediately return. Timeout hurts snmptrapd because slow handler
hinders the main loop. To discover potential time consuming cases,
use "-D perf" and observe the log.
It's spotted that some DNS servers return "localhost" on private ip
addresses rather than NXDOMAIN, in this case, snmptrapd(8) passes
"localhost" as resolved hostname to petalert.pl which is
confused. You'd better switch to a correctly configured DNS server, or
contact the administrator to solve the problem.
Kaiwang Chen
kaiwang.chen@gmail.com