Installation

Plugin Installation

The XML Collector is now part of the core package as of OpenNMS Horizon 20. For older releases, an additional package named opennms-plugin-protocol-xml must be installed in order to use this collector.

The stock sample configuration files were created for 3GPP Data Collection, which is a very special case of parsing XML Data that requires some advanced features of the XML Collector that probably can be ignored for almost all standard cases.

So, for standard usage, it is recommended to create the configuration files from scratch and use the current content as a configuration example.

Configuration Basics

This collector has been created to extract data from an XML document in order to store it on RRDs/JRBs and for threshold processing. This XML can be retrieved in many ways including HTTP and SFTP and the local filesystem.

Like other xml based collectors, it requires the creation of an XML collection inside a file named xml-datacollection-config.xml, and an associated service in a package in collectd-configuration.xml, in order to enable and use this collector. The service needs to be added to the node, either by using a service detector in a provisioning foreign source or as a managed service via a provisioning requisition.

For tabular metrics, the resource types used must be defined inside some file created in either the etc/datacollection/ or etc/resource-types.d/ directory.

Single Metrics

The whole idea of this collector is to define the proper XPaths to access the relevant information from the XML document.

For example, suppose that the XML document has the following structure:

xpath on each xml-object is a relative XPath calculated from the resource’s Xpath to the metric value.

The above example assume that the XML document will be retrieved using HTTP. The place holder {ipaddr} will be replaced at runtime with the IP Address of the node. For example, if the address is 192.168.0.1, the URL used at runtime will be http://192.168.0.1/stats

In order to create charts for the above configuration, the graph templates should look like this:

The name attribute of the xml-object tag must be used for the columns attribute on the template and for the DEF name as well.

Because these are single metrics, nodeSnmp must be used for the type.

Tabular Metrics

IMPORTANT

The following information is for educational purposes only. Solaris does not provide the statistics used on the examples in XML format by default.

The difference here is that the XML document is in a tabular form similar to SNMP Table statistics.

This requires to define a custom resource type. This can be added directly on datacollection-config.xml or added indirectly using an external reference. The second method is preferred in order to simplify the configuration and avoid confusions with the current SNMP configuration.

Suppose that we want to process Solaris Zones statistics, which comes with the following structure:

Note that the templates were added only for two of the 6 numeric metrics.

The name attribute of the xml-object tag must be used for the columns attribute on the template and for the DEF name as well.

Because these are tabular metrics, solarisZone (which is the resource type for this particular metrics) must be used for the type.

A Note on Namespaces

If your source XML uses namespaces, you will need to use some special XPath syntax to essentially ignore the namespace elements since the XPath processor is not namespace aware (at least not as of 2014-10-10). Here's an example:

XML Sources URLs

There are several ways to define the URL used to retrieve the XML Data. One important placeholder, {ipaddr}, has already been introduced, but there are other placeholders that can be used inside the URL like any asset record and the core elements of the node like: nodeId, nodeLabel, foreignSource, foreignId.

The idea is to be able to parameterize the URL as much as possible and put nodes' specific information in their asset records or any core elements.

The XML can be retrieved using basically any protocol, but two of them have been modified in order to use basic authentication. They are HTTP and SFTP, here are some examples using placeholders:

Where {username} and {password} are asset records and the runtime URL should look like this:

http://admin:admin@192.168.0.1/statistics/data.htm?serverId=1234

On a similar way the following URL is also valid:

sftp://{username}:{password}@{ipaddr}/statistics/data.xml

3GPP is an example of a custom URL handling, because the target file is different on each collection interval, so that's why the sample configuration files uses a slightly different sftp protocol named sftp.3gpp.

In addition you can use cron to collect files from somewhere else and output to a local file with the remote node's IP address in the file path.

file:///var/tmp/customstats/server-{ipaddr}.xml

XML Collector configuration

This is an example of how to create a collection package with a service that will use the XML collector. This must be included inside collectd-configuration.xml:

There is a service named SolarisZones. The XML Collector will be associated with this service and applied to all nodes according with the filters defined on the package “XML Collection” using the collection “Solaris Zones” (the collection parameter must match the name of any xml-collection defined on xml-datacollection-config.xml).

Graph Templates

On a similar way used for all other collectors, a set of graph templates should be defined either by directly modifying snmp-graph.properties or by adding them on an external file, for example, snmp-graph.properties.d/xml-graphs.properties.

Advanced Features

It is important to know that the XML collector for tabular metrics can use a custom StorageStrategy and/or PersistSelectorStrategy, defined with the resource type, to customize the way you store the data and which resources should be taken in consideration (and skip unwanted data).

For the Solaris Zone example, suppose that the globa zone must be omitted. This case requires the usage of the PersistRegexSelectorStrategy like follows:

In the above example, there are two fixed parameters and one dynamic header. It is dynamic because the value is retrieved from the asset field named comment from the node.

Parsing not well formatted HTML with XPath

HTML is not required to be a strict XML in terms of the syntax. Most browsers are going to be able to process not well formatted HTML, for example:

<p>This is one paragraph<br><p>This is another paragraph<br>

The above HTML is not valid in terms of XML syntax, it should be written like the following to be valid

<p>This is one paragraph</p><br/><p>This is another paragraph</p><br/>

BTW, both definitions are equivalent.

XPath expects a well formatted XML, so if you want to process an HTML with XPath and the XML Collector, the HTML must be well formatted. If you are not sure if the HTML is well formatted or not, you can configure your collector like the following:

Keep in mind that any non-numeric characters will be removed from the "selected text", so "Document Count: 5" will be parsed as just "5". For floating point numbers, it should use the period character as a separator, for example: "45.56".

That means 45,56 is not supported, actually it will be parsed as 4556; so something like 4,563.33 will be 4563.33

Configuring HTML parsing with CSS Selectors instead of XPath

Dealing with HTML with XPath is probably not the best solution in modern Web Applications. It is much better to use CSS selectors instead, for example, let's take the following HTML as an example:

Pre-process XML data with XSLT 1.0

For some use cases, the raw XML data could be complicated to parse or it is just too big. Now you can create a XSLT 1.0 file to applied over the raw XML data prior applying the XML-Source to facilitate the configuration (or reduce its complexity).

Let's say the XSLT lives on /opt/opennms/etc/pre-process.xslt, the configuration to use that file to pre-process the XML data should be:

The above example uses the GET method, but it also works with POST as well.

JSON Collector

There are occasions where the data required to be collected is not exposed as a XML document. JSON is becoming very popular these days, and it is more common than XML specially for ReST interfaces.

Now, thanks to Apache Commons JXPath, it is possible to declare the XML-Source configuration using XPath to parse JSON documents.

Let's say the example for the Solaris Zones explained above (once again, that is not enabled on Solaris, it is just an example) returns a JSON data instead of XML. The configuration for the XML-Source remains the same, assuming the sample JSON data looks like the following:

Similar with the form fields, the content can by dynamic, but it is extremely important to pay attention to the indentation of the "{" and "}" in the JSON format to avoid confusions with the placeholders, for example: