Q: Why do I see a frightening number of Java processes/memory allocated to Java with ps or top?

A: From Pete Siemsen and Ben Reed's emails from 3/12/2001:

Pete:

> 3. With the software up, a "ps" command is a little
> frightening, as it looks like there are 400+ processes
> on your machine. Don't worry, it's an artifact of
> Java's thread support, and your system will run fine.

Ben:
Yup. The more technical explanation (as I understand it) is that there is
only one type of process internally inside the linux kernel, a kernel process.
While regular user processes map 1-to-1 to kernel processes, each thread in a
threaded application gets it's own kernel process, even though, from a
programming perspective, they are still running in the same userland process.
That's why applications like the Java JVM or Mozilla appear to have many more
processes than you would think they should. PS is also very bad at
enumerating memory usage for the same reason. Threads for the most part share
memory, but they each get listed so as to appear to have allocated that much
memory themselves.

mike from opennms pointed out that if you want a quick summary of the number
of threads in each VM you can use the command:

ps axww | awk '/java/ { print $NF }' | sort | uniq -c | sort -n

Or you can use (on Linux):

pstree root | grep 'java'

As well, but this command won't tell you which VM has what threads, just the
number of threads in each.

Remember the thread counts won't match the configuration exactly. Something to
do with the fact that Java has a ThreadManager? thread, etc.

With NTPLS, you will see better thread handling in Linux.

Q: Why are jar_cacheXXXX.tmp files filling up my /tmp?

A: If you have been doing a kill from the command line to stop Tomcat, you may see lots of jar_cacheXXXX.tmp files in your /tmp directory (or whatever your $TMPDIR is set to for Tomcat). When Tomcat is not running, you can safely delete these files.

Tomcat will open 10 to 100 of these files when running, but clean them all up when shut down properly. If Tomcat is stopped prematurely (by say a kill -9), then the files aren't deleted, and can fill up your /tmp partition.

This problem affects Tomcat version 4.0.1 and later. It's actually originating from the JDK's JarURLConnection class, which Tomcat uses in its internal classloaders.

Q: Why do I get "can't parse argument 'RRA:AVERAGE:0.5:1:8928'"?

A: Currently, OpenNMS does not support localization, thus we recommend
setting the locale to "en_US".

However, RRDTool, which we use, does. Thus, if your locale is
not "en_US" you need to make a small change to the datacollection-config.xml
file in /opt/OpenNMS/etc:

Change every line like

RRA:AVERAGE:0.5:12:8928

where there is a 0.5 (zero - dot - five) to:

RRA:AVERAGE:0,5:12:8928 or 0,5

(zero - comma - five).

Q: I installed OpenNMS, and admin/admin Does Not Log Me On, Why?

A: The OpenNMS install process has several steps. First, the product dependencies are installed and configured, such as Postgres, RRDTool and Tomcat. Then the OpenNMS packages are installed (the core, webapps and docs). As these OpenNMS packages are being installed, modifications are made to both Postgres and Tomcat. When everything is complete, you should be able to start up OpenNMS and go.

There are several ways to install OpenNMS, and some people have found that they get the login prompt when going to http://localhost:8080/opennms/ after install, but username="admin", password="admin" doesn't seem to work.

This is due to something going wrong with the install. It has nothing to do with the username and password being wrong (in case you are wondering, they are stored in /opt/OpenNMS/etc/users.xml).

Look in your install.log (it should either be in root's home directory or where you ran the install). Read it. Note any errors.

The most common error has to deal with Postgresql. First, make sure it is running. "ps -ef | grep postmaster" should return a running postmaster process. If not, start postgres (on Red Hat Linux you can use /sbin/service postgresql start).

If it is running, attempt to access it with "psql -U opennms opennms". If you can log in, make sure all the tables are there:

If not, you probably forgot to change the security settings running Postgres 7.2 or higher.

If everything looks good, make sure the install.pl script modified the Tomcat server.xml file (the date and time will be different that the rest of the files in that directory).

I found that the failure of the DBI and DBD::Pg modules caused the admin/admin combo not to work (I was installing using RPM's). Since I hadn't discovered the FAQ-o-matic yet, I just installed these modules the Perl way (i.e., perl -MCPAN -e 'install DBI' and perl -MCPAN -e 'install DBD::Pg' ). The DBD::Pg install did a lot of complaining about the POSTGRES_INCLUDE and POSTGRES_LIB environment variables, but once I installed cleanly, and then removed and reinstalled the opennms rpm's, everything seemed to work fine (at least, I am able to get into the opennms web page with admin/admin).

Also, you can check:

grep -i opennms /var/tomcat4/conf/server.xml

The install.pl script is supposed to modify this file to allow Tomcat to know how to authenticate requests to the OpenNMS webapp. If this file doesn't include an opennms entry, try rerunning install.pl again:

$OPENNMS_HOME/bin/install.pl -q $OPENNMS_HOME/etc/create.sql

Given that the install fails to setup postgres, is there a script to just setup the database? Yes, run
"/opt/OpenNMS/bin/install.pl etc/create.sql".

The reason for the logon failure is probably that postgresql wasn't accepting local request to access databaes when you installed opennms.

(/var/pgsql/data/...) Adapt postgresql.conf so that TCP request are allowed. Adapt the hb_pga.conf file to allow local and network accesses.
and then:

either re-install opennms

create the opennms databases yourself

I had the same problem.

Just a note on the two above install scripts.

$OPENNMS_HOME/bin/install.pl -q $OPENNMS_HOME/etc/create.sql is the script that works

/opt/OpenNMS/bin/install.pl etc/create.sql Doesn't

(Ed. note: install.pl is deprecated in 1.1.4)

If you build from source, you will need to put some links in the $TOMCAT4_HOME/server/lib directory:

Another possible problem here is that if you run tomcat as any user other than root, which is a good security practice, then the tomcat user may not have access to read/write the config files on /opt/OpenNMS/etc. If that is the case, you cannot login as it cannot read the usernames and passwords from users.xml.

So one solution is to:

a) add the tomcat user to the opennms group

b) add read/write permissions to the /opt/OpenNMS/etc/ and /opt/OpenNMS/log directories.

Q: Tomcat won't start, complains about JAVA_HOME, why?

A: Sun had released a new version of the 1.4 JDK since the 1.0.0 RPMs were built. If you install that JDK, Tomcat will probably complain about JAVA_HOME. Edit /etc/tomcat4/conf/tomcat4.conf to correctly show the path to your JAVA_HOME.

Q: Why do I get '"FATAL 1: IDENT authentication failed for user "postgres"'?

A: An IDENT error means that a user other than postgres is not allowed to
connect to the database as 'ident'.

The quick fix is to add the line:

host all 127.0.0.0 255.255.255.255 trust

to your "/var/lib/pgsql/data/pg_hba.conf" file. This means that anyone
is able to connect as another username (ie, 'ben' the unix user can connect
as 'postgres' the postgresql user, as long as he has the right password.)

Make sure your pg_hba.conf looks like this. There should not be two "local" entries.
Once OpenNMS is installed, you can change the pg_hba.conf to your liking assuming
you understand the consequences.

# TYPE DATABASE IP_ADDRESS MASK AUTH_TYPE AUTH_ARGUMENT
local all trust
host all 127.0.0.1 255.255.255.255 trust
# Using sockets credentials for improved security. Not available everywhere,
# but works on Linux, *BSD (and probably some others)
#local all ident sameuser

Or, if you like a more secure setup try the following:

host all 127.0.0.1 255.0.0.0 password

This will require a password for all connections over TCP/IP from localhost (such as the connection that opennms uses).

The following has worked with SLES 10 -

# TYPE DATABASE USER CIDR-ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust

Q: Why does apt complain about zebra and gated?

A: If you get the following error:

Sorry, but the following packages have unmet dependencies:
gated: Obsoletes: zebra but 0.91a-6 is to be installed
zebra: Obsoletes: gated but 3.6-12 is to be installed
E: Unmet dependencies.
Try 'apt-get -f install' with no packages (or specify a solution).

Uninstall either gated or zebra. Some distros (i.e. Redhat Linux) will allow you to install these conflicting packages, which will confuse apt.

Q: Why does OpenNMS Says My DNS Server is Down, When It Is Up?

A: Problem: After installing OpenNMS, it discovers DNS servers, but then says they are down.

By default, OpenNMS does a lookup on "localhost". While this returns an error from most DNS servers, the receipt of the error proves that the DNS server is running.

Note: the default configuration of OpenNMS 1.9.x makes response codes 3 (NXDOMAIN) and 5 (REFUSED) fatal to the DNS poller, but one of these is likely to be the desired response to a query containing "localhost." Edit the "fatal-response-codes" parameter in poller-configuration.xml to correct this behavior.

However, Microsoft DNS servers behave differently. In order to get OpenNMS to work with Microsoft DNS servers, edit the poller-configuration.xml file, and change the value of the DNS poller "lookup" parameter to something other than "localhost", such as "opennms.org".

Q: Why are some of my XML files all one line?

A: Why are some of the files in the /opt/OpenNMS/etc directory all one line, instead of being indented? I swear they were indented at one time.

OpenNMS uses castor to parse certain XML files. Any file that gets changed via the GUI, such as the poller and the notifications configuration files, will be written back as a single line.

It is possible to get castor to indent the lines, but it then adds whitespace that causes OpenNMS to fail, such as adding a carriage return after and before .

The task of fixing that is currently available to whoever wants it (grin)

In the meantime, use /opt/OpenNMS/bin/xml.reader.pl to fix your files. The syntax would be something like:

at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.processConnection(org.apache.tomcat.util.net.TcpConnection, java.lang.Object[]) (/usr/lib/lib-org-apache-coyote-http11-4.1.27.so)
at org.apache.tomcat.util.net.TcpWorkerThread.runIt(java.lang.Object[]) (/usr/lib/lib-org-apache-tomcat-util-4.1.27.so)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run() (/usr/lib/lib-org-apache-tomcat-util-4.1.27.so)
at java.lang.Thread.run() (/usr/lib/libgcj.so.5.0.0)
at _Jv_ThreadRun(java.lang.Thread) (/usr/lib/libgcj.so.5.0.0)
at GC_start_routine (/usr/lib/libgcj.so.5.0.0)
at __clone (/lib/tls/libc-2.3.3.so)

Then you have Tomcat configured to use gcj instead of Sun's java. Check your tomcat4.conf file and make sure JAVA_HOME is set properly.

Q: Why do I see JDBC related Exceptions in the log files?

A: There have been a couple of problems reported regarding exceptions using Postgres. These are caused by problems in the JDBC Driver for Postgres.

The first exception occurs in 1.1.3 end looks something like the following:

java.lang.StringIndexOutOfBoundsException: String index out of range: 23
at java.lang.String.charAt(String.java:444)
at org.postgresql.jdbc2.ResultSet.toTimestamp(Unknown Source)
at org.postgresql.jdbc2.ResultSet.getTimestamp(Unknown Source)

The other occurs when using OpenNMS with Postgres 7.4 and looks like the following:

java.sql.SQLException: ERROR: SET AUTOCOMMIT TO OFF is no longer supported

In both cases you need to upgrade the JDBC driver for postgres. go to [1].

If you are using postgres 7.2 or 7.3 download the 7.3.x JDBC2 driver.

If you are using PostgreSQL 7.4 use its JDC2 or JDBC3 driver.

You need to name them postgresql.jar and put them in the following dirs.

$OPENNMS_HOME/lib
$TOMCAT/webapps/opennms/WEB-INF/lib

On Debian you also need to replace postgresql.jar in /usr/share/java as well. (Thanks John.)

Them restart opennms and them tomcat

Q: Why do I get node level SNMP information, but no interface level information?

A: I was noticing this on a Fedora Core 1 machine with multiple NICs. The node level information would be produced just fine, such as "Number of Users" and "CPU Utilization", but there was no interface information, such as traffic.

It turns out that this is a bug in net-snmp 5.1-2.1 which is the latest net-snmp for Fedora Core 1. I downloaded the source RPMs for 5.2.1, but was unable to get all of the dependencies worked out, so I found some rpms that did the trick. Now the snmpwalk commands return useful information and I get interface statistics.

Q: OpenNMS stops working after about 1 hour or intermittent servlet crashes are seen in the Web GUI.

A: This issue should be fixed in 1.1.4, but since it had a great impact on OpenNMS the original FAQ entry is left here as a reference.

OpenNMS will stop working after a period of time - this varies with the number of threads configured for the various daemons.

Intermittent servlet errors are encountered in the Web GUI - this seems to vary with the amount of concurrent usage of the GUI.

This has so far been reported with:

Debian Woody/Sid, RedHat? 8.0/(7.x?), SuSe? 8.1

OpenNMS 1.0.1

Sun JDK 1.4.0, 1.4.1, 1.4.1_01

Tomcat 4.0.3-0

Check for file like hs_err_pid9499.log in the directory that OpenNMS or Tomcat was launched.

Doing a Google search on the error ID shows that it appears with other apps as well.

Here is an excerpt from one article:

I have seen this problem come up with a variety of applications, most notably JBoss 3.X. The way I got around it was specify the -Xrs and -Xint options to the VM before running any application.
Doing a "man java" gives:
-Xint Operates in interpreted-only mode. Compilation to
native code is disabled, and all bytecodes are exe-
cuted by the interpreter. The performance benefits
offered by the Java HotSpot VMs adaptive compiler
will not be present in this mode.
-Xrs Reduce usage of operating-system signals by Java
virtual machine (JVM).
Sun's JVM catches signals to implement shutdown
hooks for abnormal JVM termination. The JVM uses
SIGHUP, SIGINT, and SIGTERM to initiate the running
of shutdown hooks. The JVM uses SIGQUIT to perform
thread dumps.
Applications that embed the JVM frequently need to
trap signals like SIGINT or SIGTERM, and in such
cases there is the possibility of interference
between the applications' signal handlers and the
JVM shutdown-hooks facility.
To avoid such interference, the -Xrs option can be
used to turn off the JVM shutdown-hooks feature.
When -Xrs is used, the signal masks for SIGINT,
SIGTERM, SIGHUP, and SIGQUIT are not changed by the
JVM, and signal handlers for these signals are not
installed.
Note that -X options are non-standard and may change in the future.
Running with "-Xint -Xrs" results in stable operation.

In /opt/OpenNMS/bin/opennms.sh, find "HOTSPOT" and add the "X" flags like this:

Also in /etc/tomcat4/conf/tomcat4.conf change the CATALINA_OPTS line like:

export CATALINA_OPTS="-Xint -Xrs -DTOMCATLAUNCH=true...

thats it...

Here's an update from DJ Gregor:

I just received an email from Sun saying that the crashing problem with Java 1.4.1 (and versions, too) has been assigned a bug number and is being worked on.

See the attached message for details, and here are the useful excerpts (including a workaround):

> This bug is being tracked under the following Bug-ID: 4724356
> ...
> Feel free to check the status of Java(TM) bugs via the JDC at:
> http://developer.java.sun.com/developer/bugParade/index.html
>
> The work around is to increase the amount of memory available
> in the permanent generation (used to store class objects and
> related metadata). Add this specification to the command line
> that is used to launch the JVM:
> -XX:MaxPermSize=128m
> Use larger sizes if necessary.
>
> For more information, refer to this document:
> http://wireless.java.sun.com/midp/articles/garbagecollection2

Over the last few months I have seen the occassional post on OpenNMS dying with a Java Hotspot error in output.log complaining of an "Unexpected Signal 11".

I was recently on a machine that was producing these errors frequently, and the problem had to deal with memory.

By default, OpenNMS allocates 256MB to the Java Heap Size for OpenNMS. This, combined with the default 64MB for Tomcat4 can exceed the available memory on some systems (systems with 256MB or 512MB and other processes).

To correct this, edit $OPENNMS_HOME/bin/opennms.sh, search for "HEAP" and lower it. This fixed the problem for me.
I was not out of RAM, but was getting a lot of "Too many open files" messages in my logs, as well as the above crashes.

As well as the above two fixes, I changed in /opt/OpenNMS/bin/opennms.sh the line:

ulimit -s 2048

to:

ulimit -s 8192 ulimit -n 10240

which combined with the above two fixes cleared everything up.

Q: How Can I Best Test My XML Files?

A: From Eric Burghard on the OpenNMS Discuss mailing list.

I played with xml and xsd files this weekend (taken from OpenNMS' CVS tree). My main goal was to be able to validate each of my .xml files I had to read some specs about XML and XSDs because:

description: Local types are not referenceable because
of the lack of a namespace definition.

workaround:
Add attributes elementFormDefault="qualified" and xmlns:
evt="http://xmlns.opennms.org/xsd/event" to the schema root tag
(change the name for reflecting the target's namespace )
Change all references by prefixing them with the new namespace's alias ref="evt:sometype"

unnecessary type specification

error message: E src-attribute.4: Attribute 'type' have both a type attribute and a
annoymous simpleType child..

Now its time to validate your .xml file with your valid and well formed .xsd file. You had to specify the schema file that will be used during validation inside the .xml. Add theses attributes to the root tag (change the namespace for reflecting the one defined in the .xsd)

More probable: certain SNMP agents fail when an snmpgetnext command is issued.

To test the first scenario, simply do an snmpwalk on the ipAddrTable and insure for the interface you are trying to collect on (ipAdEntAddr) there is an ifIndex (ipAdEntIfIndx) that matches an ifIndex in the ifTable.

If one exists, you may have the second issue. Here is a test you can try.

If this is the case, then at the moment there is nothing you can do. The way OpenNMS works is that all the necessary information we need for data collection is contained in one "get" request. The request is sent and the thread closes. When the reply comes back, a new thread is started and the information is added to the database.

Note that this method works fine if the vendor supports SNMP correctly. If you have support with them, please open a ticket and see if they will correct the problem. In order to make these requests individually would require quite a bit of new code to be written, and since it is rare (it occurs mainly on older HP printers) we haven't been able to spare the time.

Q: How are node labels determined?

A: Node labels in OpenNMS are determined in the following order:

User Defined

DNS lookup

SMB (NetBIOS)

SNMP

IP Address

All node labels can be set by the user on the node's page in OpenNMS, and a user defined label supersedes all other methods.

For devices with more than one interface, the lowest numbered interface is used.

If a node supports SNMP, the Primary SNMP interface will be used as the IP address to lookup to determine the node label. Also, in version 1.1 and beyond, the lowest non-127.*.*.* software loopback address will be set as the Primary SNMP interface.

If a node label changes, check out the provisiond.log. You should see a database dump listing what was known about the node before the node scan and what was determined after it.

Note: A node can be written to the database before the SNMP service is discovered. New nodes might see their labels change with the first rescan.

Q: How Do I Log Out of the webUI?

A: Through OpenNMS 1.2, BASIC authentication is used. In order to logout/re-login, you have to close all instances of the browser and start a new session. This issue is also explained in further detail in the archives.

Starting in OpenNMS 1.3.2, OpenNMS uses form-based authentication, and once logged in there is a link in the upper right-hand corner of the screen to log out next to your username.

Q: I upgraded to 1.1.1. Why does "Manage/Unmanage" not work?

A: If you upgrade to OpenNMS 1.1.1, you may get this error when trying to access the Admin page to manage and unmanage interfaces and services:

Q: Why doesn't the dhcpd process ever start?

A: OpenNMS is best run on a server with a static IP address. There are a number of reasons for this (setting the trap destination, for example) but also since the OpenNMS dhcpd process acts like a user, it has to bind to the same port that would be used to set a DHCP address on the server itself.

If you must run using DHCP, edit service-configuration.xml in $OPENNMS_HOME/etc and comment out the section:

Then restart OpenNMS. Note that without dhcpd you will not be able to monitor DHCP servers.

Q: I can snmpwalk a device, but OpenNMS won't collect data on it, why?

A1: While walking a device's MIB using the snmpwalk utility is a good initial test for whether the device's SNMP agent is configured appropriately, it's important to note that the snmpwalk utility included in most modern UNIX-like operating system distributions uses the Net-SNMP libraries, which are much more forgiving of all kinds of SNMP protocol violations than is the SNMP4J library that OpenNMS uses by default. Therefore a successful snmpwalk result is not 100% indicative that an SNMP agent is behaving correctly.

A2: OpenNMS was originally designed to monitor IP services, and as such tends to be IP centric. SNMP data, however, can be pretty free-form, so there had to be a way to associate SNMP data with a particular IP interface.

The way this was done was to use the ipAddrTable and map the ifIndex given there to the ifTable.

In addition, since there is only one SNMP agent per device (usually), rather than poll for SNMP data through each available interface, the concept of a "primary" SNMP interface was introduced. This interface would be used for all SNMP requests to the device.

In order to be a primary SNMP interface, several things must occur.

the IP address for the interface must exist in a collectd package.

the IP address must map to a valid ifIndex (originally in order to map the data to a particular IP address).

if more than one interface qualifies to be a primary interface, the lowest numbered interface is marked as "primary" and the others as "secondary", unless ...

a loopback address exists with a non 127.*.*.* IP address and meets the qualifications above - then it is chosen.

However, several people have reported the need to monitor SNMP on a device that either does not have a valid primary interface candidate or they wish to use another address altogether. At the moment a solution does not exist, although we hope to have something in place soon.

Possible workarounds include directly modifying the database and substituting in an ifIndex. This will work for awhile, but may be overwritten during the next Provisiond node scan.

Q: Why Does My Windows DHCP Server Show as Down?

A: To monitor Windows DHCP servers, you need to edit the dhcpd-configuration.xml file and put in the MAC address of the OpenNMS server in the macAddress field.

On *nix machines, /sbin/ifconfig -a will usually show you the MAC address.

Q: Why do I get opennms startup failed?

A: More recent versions of OpenNMS have a "START_TIMEOUT" value set. This can either be found in $OPENNMS_HOME/bin/opennms.sh or $OPENNMS_HOME/etc/opennms.conf, not sure which way it ends up so both are included. If you see opennms startup failed check its status using "opennms.sh -v status" , if you see start_pending it is likely you will need to increase the START_TIMEOUT value, 60-75 seconds should work on slower machines.

Q: Looking in output.log I see lots of references to 'java.lang.Exception' that appears to be 'Caused by: org.jrobin.core.RrdException: Bad sample timestamp ..... Last update time was ....., at least one second step is required'

OpenNMS stores RRD data in the following manner:
Node level data is stored in
$OPENNMS_HOME/share/rrd/snmp/[nodeid]
and interface level data is stored in
$OPENNMS_HOME/share/rrd/snmp/[nodeid]/[ifdescr+MAC]
If you have two interfaces with the same ifDescr and same MAC
address, OpenNMS will collect data on both of them, but then try to
write it to the same file, say ifInOctets.rrd.
You can usually safely ignore this error.

Q: I switched to JRobin and now no graphs show up. Trying to view the graphs directly gives me an exception. I already switched on java.awt.headless; what gives?

A: Taken from IRC troubleshooting with DJ Gregor and Mike Huot

Search for the string 'x11' in this page (DJ's suggestion that put me on the right track)

If you do not have a full installation of the X Window System on your OpenNMS server, the Java runtime cannot access some graphics routines that it needs even in "headless" mode. Check whether your JAVA_HOME/jre/lib/PLATFORM/libawt.so can find all the X libraries it needs. On my Linux Ubuntu 5.10 (Breezy Badger) system, server install type, I had most of the X libraries but was missing libXp.so:

The missing libmlib_image.so and libjvm.so seem benign. Running apt-get install libxp6 resolved the libXp.so.6 link failure. After restarting both OpenNMS and Tomcat, JRobin graphs work beautifully. I would expect to see this problem also on Debian GNU/Linux server-profile systems and heavily minimalized Solaris ones.

Another symptom that may indicate you are having this issue is that the outage graphs in the front page in OpenNMS 1.3.0 and later are absent and replaced with text labels.

Q: SNMP datacollection fails when i try to read from port 260/udp in order to collect checkpoint data

A: It's likely that the SNMP daemon listening on port 260/udp is providing only the Check Point private enterprise MIB (rooted at 1.3.6.1.4.1.2620) and absolutely nothing else. That means no MIB-2 system table, no ifTable, no ipAddrTable, nothing but Check Point information. The critical item that is missing (from the system table) is the sysObjectID object, which would tell OpenNMS what kind of device it's dealing with. Without this information, there is no way for OpenNMS to determine what data it should collect from the agent.

You can work around this problem by manually hacking the database. DO NOT ATTEMPT UNLESS YOU KNOW EXACTLY WHAT YOU ARE DOING. Update the node table in the OpenNMS database, setting the nodesysoid column for your Check Point hosts to e.g..1.3.6.1.4.1.2620.1.1. Do not ask me to give you the exact SQL statement to do this -- if you can't figure it out, you need to have a better understanding before you try something like this. If you have a datacollection package properly configured that matches on this system OID, it should start working after a rescan.

Note that the Check Point SNMP agent (cpsnmpd) is meant to be used only by Check Point SmartView Status, which is why it is so lacking in information that would be useful to OpenNMS or a like product. Also note that the cpsnmpd distributed with Firewall-1 versions prior to R55 (NG with AI) is very fragile and should not be used for anything at all -- the agent is likely to fall over when walked. On FW-1/VPN-1 releases R55 or later running on SecurePlatform (but not Red Hat / RHEL or Crossbeam XSLinux) and possibly Nokia, you may find that there is a master agent on udp/161 that can "pull in" the Check Point MIBs by running cpsnmpd as an AgentX subagent. You can try enabling this functionality in cpconfig. Crossbeam X-Series devices have a separate issue that causes all the VAP interfaces to get reparented to the CPM; you can avoid it by never letting OpenNMS discover the CPM or (advanced topic) by restricting the view exposed by the SNMP daemon on the CPM.

Q: OpenNMS 1.2.x won't start on a Linux system, I get "Could not initialize IcmpSocket" errors

The likely problem is that OpenNMS is attempting to run under GIJ, the GNU interpreter for Java. GIJ is not suitable for running OpenNMS; you can verify that this is the problem by running the following command:

`head -1 /opt/OpenNMS/etc/java.conf` -version

If you see gij (GNU libgcj) in the output, then OpenNMS is using GIJ. If you do not have a Sun JDK installed on your system, you will need to get one from Sun (just the latest SE version is fine, you don't need EE or NetBeans). Then re-run /opt/OpenNMS/bin/runjava -s or /opt/OpenNMS/bin/runjava -S to set the Java interpreter to be used for OpenNMS. The startup should now work.

Q: Tomcat fails to start properly on 1.3.2 and I see an "out of PermGen space" error in catalina.out (or similar)

Q: I upgraded to 1.3.2 and now my resource graphs aren't showing up

A: If you were using rrdtool via JNI (as opposed to JRobin), it's likely that the change in the default RRD strategy has bitten you. There are two solutions for this problem.The first is to change the RRD strategy for your upgraded installation back to rrdtool / JNI. Edit OPENNMS_HOME/etc/rrd-configuration.properties and uncomment the line that reads:

# To switch to the JNI implementation uncomment the following line:
#org.opennms.rrd.strategyClass=org.opennms.netmgt.rrd.rrdtool.JniRrdStrategy

The second solution is to use the JRobinConverter to convert your RRD files, as JRobin is unable to read ones created by the rrdtool / JNI strategy.

Q: My thresholds never trigger, even when the value clearly exceeds the threshold

A: If you are using 1.2.8 and above in the 1.2 line or 1.3.2 in the 1.3 line you may need to use the range parameter in your thresholding pacakage definition.

Q: I see a ERROR (pollerBackend failing to init) in manager.log and OpenNMS won't start.

Check your host name resolution(/etc/hosts, etc.). Your hostname must resolve to your IP address and vice versa. You can use "getent hosts <host name | IP address>" on most modern UNIXes to look this up.

Q: I try to start OpenNMS but I get an error "Caused by: java.lang.OutOfMemoryError: unable to create new native thread"

A: Try commenting-out these lines in $OPENNMS_HOME/bin/opennms.sh:

ulimit -s 8192 > /dev/null 2>&1
ulimit -n 10240 > /dev/null 2>&1

Q: Why is the time zone wrong in OpenNMS?

A: Nothing is wrong--this is normal and is due to the default configuration. OpenNMS will detect SNMP on nodes, but by default the poller isn't configured to monitor the SNMP service. In general, you don't care if the SNMP service fails on a node, other than the fact that you can't collect data over SNMP. It's a bit redundant, anyway, since the data collector ("collectd") sends events that can generate notifications if it fails to collect data from a node.

Note that polling is separate from data collection. The poller monitors a service for simple up or down status, creates an outage when a service goes down, and the outage can trigger notifications to be sent (e.g.: the "service HTTP is down on node BigImportantWebServer" notification that you want to get on your pager). Data collection is separate and collects performance data from SNMP (and other protocols, too, but SNMP is by far the most common).

It is possible to not monitor a service with the poller, like SNMP, and still collect data via that service, which is indeed the default configuration for SNMP.

Q: OpenNMS doesn't startup with the error with the "pollerBackEnd": "Port already in use: 1099"

A: A legacy version of the remote poller back-end that runs in the OpenNMS daemon is trying to use port 1099 and you already have some process listening on port 1099 (likely another Java process). If you aren't doing Remote Monitoring, disable the poller back end and you'll be fine. Edit service-configuration.xml and comment-out the entire <service> section for "OpenNMS:Name=PollerBackEnd". It should look something like this:

The legacy version of the remote poller interface inside OpenNMS uses Java RMI in the poller back-end to listen for and respond to requests from remote pollers. The trick comes in where other Java programs may also be using RMI on the same host, and they are likely using the same port, 1099. These ports can be reconfigured inside opennms.properties with the following system properties:

Q: OpenNMS keeps on running out of memory with errors like "java.lang.OutOfMemoryError: Java heap space" in output.log

A: It can be one of a few things, so the possibilities are listed below in order of likelihood:

Your disk subsystem can't keep up with RRD file writes and the in-memory RRD update queue in the OpenNMS Java daemon grows until the Java heap is full.

Your CPU or disk subsystem can't keep up with event processing or event/alarm persistence/de-duplication and the in-memory event queue in the OpenNMS Java daemon grows until the Java heap is full.

Your OpenNMS system is handling a large number of nodes and interfaces and needs a larger heap to handle everything that's cached in memory within the OpenNMS Java daemon.

For the RRD problem, you can verify if this is the case (and it almost always is the problem) by enabling DEBUG logging for queued.log and looking for lines containing "QS" in that log file. Look for totalOperationsPending and if that's large (in the tens of thousands to hundreds of thousands), then queued RRD updates are likely your problem. A few possible fixes:

Speed up your disk subsystem. On Linux, try to stay way from LVM, put RRD files on their own filesystem with RAID 0+1 (NOT RAID-5!!!). Battery-backed write cache may help if the cache is large enough to cache writes to the same file across multiple update periods (5 minutes by default).

Use storeByGroup.

Increase the size of the heap and the multiple-updates-per-write feature of queued might allow the disks to keep up. Note: this generally gives a 1.5x-2.0x increase, but not much more. Also, you will need to have enough physical RAM and the heap must be large enough to cache samples for multiple sampling periods.

Delete .jrb (or .rrd) files that are not being used; It is unknown why this decreases system load, but has been demonstrated to decrease load from 20 to 3 on one customer's installation. (find /opt/opennms/share/rrd/snmp/ -name "*.jrb" -mtime +30 -exec rm -rf {} \;)