Appendix A
SNMP Support

The Messaging Server supports system monitoring through the Simple Network Management Protocol (SNMP). Using an SNMP client (sometimes called a network manager) such as Sun Net Manager or HP OpenView (not provided with the this product), you can monitor certain parts of the Messaging Server. For more information on monitoring the Messaging Server refer to Chapter 19, "Monitoring the Messaging Server"

This chapter describes how to enable SNMP support for the Messaging Server. It also gives an overview of the type of information provided by SNMP. Note that it does not describe how to view this information from an SNMP client. Please refer to your SNMP client documentation for details on how to use it to view SNMP-based information. This document also describes some of the data available from the Messaging Server SNMP implementation, but complete MIB details are available from RFC 2788 and RFC 2789.

SNMP Implementation

The Messaging Server implements two standardized MIBs, the Network Services Monitoring MIB (RFC 2788) and the Mail Monitoring MIB (RFC 2789). The Network Services Monitoring MIB provides for the monitoring of network services such as POP, IMAP, HTTP, and SMTP servers. The Mail Monitoring MIB provides for the monitoring of MTAs. The Mail Monitoring MIB allows for monitoring both the active and historical state of each MTA channel. The active information focuses on currently queued messages and open network connections (for example, counts of queued messages, source IP addresses of open network connections), while the historical information provides cumulative totals (for example, total messages processed, total inbound connections).

SNMP is supported on Solaris 8 platforms only. Support on other platforms will appear in a later release. The SNMP support on Solaris makes use of the native Solaris SNMP technology, Solstice Enterprise Agents (SEA). Customers do not need to install SEA on Solaris 8 systems: the necessary run-time libraries are already present.

Limitations of the Messaging Server SNMP support are as follows:

Only one instance of Messaging Server per host computer can be monitored via SNMP.

The SNMP support is for monitoring only. No SNMP management is supported.

No SNMP traps are implemented. (RFC 2788 provides similar functionality without using traps.)

SNMP Operation in the Messaging Server

On Solaris platforms, the Messaging Server SNMP process is an SNMP subagent which, upon startup, registers itself with the platform’s native SNMP master agent. SNMP requests from clients go to the master agent. The master agent then forwards any requests destined for the Messaging Server to the Messaging Server subagent process. The Messaging Server subagent process then processes the request and relays the response back to the client via the master agent. This process is shown in Figure A-1.

Figure A-1 SNMP Information Flow

Configuring SNMP Support for the Messaging Server on Solaris 8

Although the overhead imposed by SNMP monitoring is very small, the Messaging Server nonetheless ships with SNMP support disabled. To enable the SNMP support, run the following commands:

Once you have enabled SNMP, the start-msg command, without any parameters specified, will automatically start the SNMP subagent process along with the other Messaging Server processes.

Note that the Solaris native SNMP master agent must be running in order for the Messaging Server SNMP subagent to operate. The Solaris native SNMP master agent is the snmpdx daemon which is normally started as part of the Solaris boot procedure.

The SNMP subagent will automatically select a UDP port on which to listen. Should you require, you can assign a fixed UDP port to the subagent with the following command:

# configutil -o local.snmp.port -v port-number

You may later undo this setting by specifying a value of zero for the port number. A value of zero, the default setting, tells Messaging Server to allow the subagent to automatically select any available UDP port.

Normally, there should be no reason to edit either of these files. The MIBs served out by Messaging Server are read-only and there’s no need to specify a port number in the ims.reg file. If you do specify a port number, then it will be honored unless you also set a port number with the configutil utility. In that case, the port number set with configutil is the port number which will be used by the subagent. If you do edit the files, then you will need to stop and restart the SNMP subagent in order for your changes to take effect:

# stop-msg snmp# start-msg snmp

Monitoring from an SNMP Client

Point your SNMP client at those two OIDs and access as the “public” SNMP community.

If you wish to load copies of the MIBs into your SNMP client, ASCII copies of the MIBs are located in the msg_svr_base/lib/config-templates directory under the file names rfc2788.mib and rfc2789.mib. For directions on loading those MIBs into your SNMP client software, consult the SNMP client software documentation. The SnmpAdminString data type used in those MIBs may not be recognized by some older SNMP clients. In that case, use the equivalent files rfc2248.mib and rfc2249.mib also found in the same directory.

Co-existence with Other Sun ONE Products on Unix Platforms

Other Netscape or Sun ONE products which provide SNMP support may do so by displacing the platform’s native SNMP master agent. If you will be running such Sun ONE products on the same host as Messaging Server and wish to monitor both via SNMP, then configure the Sun ONE Proxy SNMP Agent as described in Chapter 11 of Managing Servers with iPlanet Console (http://docs.sun.com/source/816-5572-10/11_snmp.htm). This allows the Messaging Server SNMP subagent—a native SNMP subagent—to co-exist with the non-native Sun ONE SNMP subagents in the other Sun ONE products.

Note that on platforms where more than one instance of Messaging Server may be concurrently monitored, there may then be multiple sets of MTAs and servers in the applTable, and multiple MTAs in the other tables.

Note

The cumulative values reported in the MIBs (e.g., total messages delivered, total IMAP connections, etc.) are reset to zero after a reboot.

Each site will have different thresholds and significant monitoring values. A good SNMP client will allow you to do trend analysis and then send alerts when sudden deviations from historical trends occur.

applTable

The applTable provides server information. It is a one-dimensional table with one row for the MTA and an additional row for each of the following servers, if enabled: WebMail HTTP, IMAP, POP, SMTP, and SMTP Submit. This table provides version information, uptime, current operational status (up, down, congested), number of current connections, total accumulated connections, and other related data.

The .1, .2, etc. suffixes here are the row numbers, applIndex. applIndex has the value 1 for the MTA, value 2 for the HTTP server, etc. Thus, in this example, the first row of the table provides data on the MTA, the second on the POP server, etc.

The name of the Messaging Server instance being monitored. In this example, the instance name is mailsrv-1.

These are SNMP TimeStamp values and are the value of sysUpTime at the time of the event. sysUpTime, in turn, is the count of hundredths of seconds since the SNMP master agent was started.

The operational status of the HTTP, IMAP, POP, SMTP, and SMTP Submit servers is determined by actually connecting to them via their configured TCP ports and performing a simple operation using the appropriate protocol (for example, a HEAD request and response for HTTP, a HELO command and response for SMTP, and so on). From this connection attempt, the status—up (1), down (2), or congested (4)—of each server is determined.

Note that these probes appear as normal inbound connections to the servers and contribute to the value of the applAccumulatedInboundAssociations MIB variable for each server.

For the MTA, the operational status is taken to be that of the Job Controller. If the MTA is shown to be up, then the Job Controller is up. If the MTA is shown to be down, then the Job Controller is down. This MTA operational status is independent of the status of the MTA’s Service Dispatcher. The operational status for the MTA only takes on the value of up or down. Although the Job Controller does have a concept of “congested,” it is not indicated in the MTA status.

For the HTTP, IMAP, and POP servers the applRejectedInboundAssociations MIB variable indicates the number of failed login attempts and not the number of rejected inbound connection attempts.

applTable Usage

Monitoring server status (applOperStatus) for each of the listed applications is key to monitoring each server.

If it’s been a long time since the MTA last inbound activity as indicated by applLastInboundActivity, then something may be broken preventing connections. If applOperStatus=2 (down), then the monitored service is down. If applOperStatus=1 (up), then the problem may be elsewhere.

assocTable

This table provides network connection information to the MTA. It is a two-dimensional table providing information about each active network connection. Connection information is not provided for other servers.

Below is an example of data from applTable (mib-2.27.2.1).

assocTable:

assocRemoteApplication.1.11 = 129.146.198.1672

assocApplicationProtocol.1.11 = applTCPProtoID.253

assocApplicationType.1.1 = peerinitiator(3)4

assocDuration.1.1 = 4005

...

Notes:

In the .x.y suffix, x is the application index, applIndex, and indicates which application in the applTable is being reported on. In this case, the MTA. The y serves to enumerate each of the connections for the application being reported on.

The source IP address of the remote SMTP client.

This is an OID indicating the protocol being used over the network connection. aplTCPProtoID indicates the TCP protocol. The .n suffix indicates the TCP port in use and .25 indicates SMTP which is the protocol spoken over TCP port 25.

It is not possible to know if the remote SMTP client is a user agent (UA) or another MTA. As such, the subagent always reports peer-initiator; ua-initiator is never reported.

This is an SNMP TimeInterval and has units of hundredths of seconds. In this example, the connection has been open for 4 seconds.

assocTable Usage

This table is used to diagnose active problems. For example, if you suddenly have 200,000 inbound connections, this table can let you know where they are coming from.

mtaTable

This is a one-dimensional table with one row for each MTA in the applTable. Each row gives totals across all channels (referred to as groups) in that MTA for select variables from the mtaGroupTable.

Below is an example of data from applTable (mib-2.28.1.1).

mtaTable:

mtaReceivedMessages.11 = 172778

mtaStoredMessages.1 = 19

mtaTransmittedMessages.1 = 172815

mtaReceivedVolume.1 = 3817744

mtaStoredVolume.1 = 34

mtaTransmittedVolume.1 = 3791155

mtaReceivedRecipients.1 = 190055

mtaStoredRecipients.1 = 21

mtaTransmittedRecipients.1 = 3791134

mtaSuccessfulConvertedMessages.1 = 02

mtaFailedConvertedMessages.1 = 0

mtaLoopsDetected.1 = 03

Notes:

The .x suffix provides the row number for this application in the applTable. In this example, .1 indicates this data is for the first application in the applTable. Thus, this is data on the MTA.

Only takes on non-zero values for the conversion channel.

Counts the number of .HELD message files currently stored in the MTA’s message queues.

mtaTable Usage

If mtaLoopsDetected is not zero, then there is a looping mail problem. Locate and diagnose the .HELD files in the MTA queue to resolve the problem.

If the system does virus scanning with a conversion channel and rejects infected messages, then mtaSuccessfulConvertedMessages will give a count of infected messages in addition to other conversion failures.

mtaGroupTable

This two-dimensional table provides channel information for each MTA in the applTable. This information includes such data as counts of stored (that is, queued) and delivered mail messages. Monitoring the count of stored messages, mtaGroupStoredMessages, for each channel is critical: when the value becomes abnormally large, mail is backing up in your queues.

Below is an example of data from mtaGroupTable (mib-2.28.2.1).

mtaGroupTable:

mtaGroupName.1.11 = tcp_intranet2

...

mtaGroupName.1.21 = ims-ms

...

mtaGroupName.1.31 = tcp_local

mtaGroupDescription.1.3 = mailsrv-1 MTA tcp_local channel

mtaGroupReceivedMessages.1.3 = 12154

mtaGroupRejectedMessages.1.3 = 0

mtaGroupStoredMessages.1.3 = 2

mtaGroupTransmittedMessages.1.3 = 12148

mtaGroupReceivedVolume.1.3 = 622135

mtaGroupStoredVolume.1.3 = 7

mtaGroupTransmittedVolume.1.3 = 619853

mtaGroupReceivedRecipients.1.3 = 33087

mtaGroupStoredRecipients.1.3 = 2

mtaGroupTransmittedRecipients.1.3 = 32817

mtaGroupOldestMessageStored.1.3 = 1103

mtaGroupInboundAssociations.1.3 = 5

mtaGroupOutboundAssociations.1.3 = 2

mtaGroupAccumulatedInboundAssociations.1.3 = 150262

mtaGroupAccumulatedOutboundAssociations.1.3 = 10970

mtaGroupLastInboundActivity.1.3 = 1054822

mtaGroupLastOutboundActivity.1.3 = 1054222

mtaGroupRejectedInboundAssociations.1.3 = 0

mtaGroupFailedOutboundAssociations.1.3 = 0

mtaGroupInboundRejectionReason.1.3 =

mtaGroupOutboundConnectFailureReason.1.3 =

mtaGroupScheduledRetry.1.3 = 0

mtaGroupMailProtocol.1.3 = applTCPProtoID.25

mtaGroupSuccessfulConvertedMessages.1.3 = 03

mtaGroupFailedConvertedMessages.1.3 = 0

mtaGroupCreationTime.1.3 = 0

mtaGroupHierarchy.1.3 = 0

mtaGroupOldestMessageId.1.3 = <01IFBV8AT8HYB4T6UA@red.iplanet.com>

mtaGroupLoopsDetected.1.3 = 04

mtaGroupLastOutboundAssociationAttempt.1.3 = 1054222

Notes:

In the .x.y suffix, x is the application index, applIndex, and indicates which application in the applTable is being reported on. In this case, the MTA. The y serves to enumerate each of the channels in the MTA. This enumeration index, mtaGroupIndex, is also used in the mtaGroupAssociationTable and mtaGroupErrorTable tables.

The name of the channel being reported on. In this case, the tcp_intra channel.

Only takes on non-zero values for the conversion channel.

Counts the number of .HELD message files currently stored in this channel’s message queue.

mtaGroupTable Usage

A sudden jump in the ratio of mtaGroupStoredVolume to mtaGroupStoredMessages could mean that a large junk mail is bouncing around the queues.

A large jump in mtaGroupStoredMessages could indicate unsolicited bulk email is being sent or that delivery is failing for some reason.

If the value of mtaGroupOldestMessageStored is greater than the value used for the undeliverable message notification times (notices channel keyword) this may indicate a message which cannot be processed even by bounce processing. Note that bounces are done nightly so you will want to use mtaGroupOldestMessageStored > (maximum age + 24 hours) as the test.

If mtaGroupLoopsDetected is greater than 0, a mail loop has been detected.

mtaGroupAssociationTable

This is a three-dimensional table whose entries are indices into the assocTable. For each MTA in the applTable, there is a two-dimensional sub-table. This two-dimensional sub-table has a row for each channel in the corresponding MTA. For each channel, there is an entry for each active network connection which that channel has currently underway. The value of the entry is the index into the assocTable (as indexed by the entry’s value and the applIndex index of the MTA being looked at). This indicated entry in the assocTable is a network connection held by the channel.

In simple terms, the mtaGroupAssociationTable table correlates the network connections shown in the assocTable with the responsible channels in the mtaGroupTable.

Below is an example of data from mtaGroupAssociationTable (mib-2.28.3.1).

mtaGroupAssociationTable:

mtaGroupAssociationIndex.1.3.11 = 12

mtaGroupAssociationIndex.1.3.2 = 2

mtaGroupAssociationIndex.1.3.3 = 3

mtaGroupAssociationIndex.1.3.4 = 4

mtaGroupAssociationIndex.1.3.5 = 5

mtaGroupAssociationIndex.1.3.6 = 6

mtaGroupAssociationIndex.1.3.7 = 7

Notes:

In the .x.y.z suffix, x is the application index, applIndex, and indicates which application in the applTable is being reported on. In this case, the MTA. The y indicates which channel of the mtaGroupTable is being reported on. In this example, 3 indicates the tcp_local channel. The z serves to enumerate the associations open to or from the channel.

The value here is an index into the assocTable. Specifically, x and this value become, respectively, the values of the applIndex and assocIndex indices into the assocTable. Or, put differently, this is saying that (ignoring the applIndex) the first row of the assocTable describes a network connection controlled by the tcp_local channel.

mtaGroupErrorTable

This is another three-dimensional table which gives the counts of temporary and permanent errors encountered by each channel of each MTA while attempting delivery of messages. Entries with index values of 4000000 are temporary errors while those with indices of 5000000 are permanent errors. Temporary errors result in the message being re-queued for later delivery attempts; permanent errors result in either the message being rejected or otherwise returned as undeliverable.

Below is an example of data from mtaGroupErrorTable (mib-2.28.5.1).

mtaGroupErrorTable:

mtaGroupInboundErrorCount.1.1.40000001 = 0

mtaGroupInboundErrorCount.1.1.5000000 = 0

mtaGroupInternalErrorCount.1.1.4000000 = 0

mtaGroupInternalErrorCount.1.1.5000000 = 0

mtaGroupOutboundErrorCount.1.1.4000000 = 0

mtaGroupOutboundErrorCount.1.1.5000000 = 0

mtaGroupInboundErrorCount.1.2.40000001 = 0

...

mtaGroupInboundErrorCount.1.3.40000001 = 0

...

Notes:

In the .x.y.z suffix, x is the application index, applIndex, and indicates which application in the applTable is being reported on. In this case, the MTA. The y indicates which channel of the mtaGroupTable is being reported on. In this example, 1 specifies the tcp_intranet channel, 2 the ims-ms channel, and 3 the tcp_local channel. Finally, the z is either 4000000 or 5000000 and indicates, respectively, counts of temporary and permanent errors encountered while attempting message deliveries for that channel.

mtaGroupErrorTable Usage

A large jump in error count may likely indicate an abnormal delivery problem. For instance, a large jump for a tcp_ channel may indicate a DNS or network problem. A large jump for the ims_ms channel may indicate a delivery problem to the message store (for example, a partition is full, stored problem, and so on).