OpenTSDB Package Installation and Extracting Time Series Data Points

Learn all about OpenTSDB package installation and building client programs using HTTP APIs for loading and extracting time series data.

by Kalpana C

Sep 3, 2013

OpenTSDB is a specialized database to store sequence of data points generated over a period of time in uniform time intervals. It uses HBase as the underlying database in order to handle huge amounts of data. This article addresses the challenges faced by developers during the build and set up process. It also explains how to leverage OpenTSDB's HTTP APIs to develop client programs so that users can create their own user interface for charts and graphs without depending on standard features provided by OpenTSDB.

OpenTSDB

OpenTSDB is designed to handle terabytes of data and still maintain very good performance levels for various types of monitoring needs. It can store data about metrics over a period of time. A typical time series record consists of a metric name, the timestamp and the associated value. OpenTSDB provides features for aggregation and down sampling of large amounts of data and comes with its own default visualization component.

OpenTSDB Architecture

OpenTSDB has three responsibilities, collecting, loading/storing, and querying data. The main objective of this architecture is to write and read data points into HBase. The primary motivation to build on HBase is scalability (data collection), availability (running on multiple tsd) and consistency. A large scale time series database, HBase is used for linear scaling, automatic replication and efficient scans. There are many ways to collect the data. We can collect the metrics using tcollector, a custom client program, or use stats command for metrics on OpenTSDB itself. The diagram below depicts the interaction and data flow across various components involved with OpenTSDB.

Figure 1: OpenTSDB Read and Write Path Architecture

Setting up OpenTSDB

Download the dependencies (Gnuplot, HBase and Zookeeper instance and OpenTSDB) tar or rpm files. Get the recommended version of tar or rpms and locate the files for the installation directory path and then extract:

Installing Gnuplot

OpenTSDB uses Gnuplot by default for plotting graph. Thus installation of Gnuplot is a prerequisite for installing OpenTSDB. To find out more information about Gnuplot, go here.

To test the successful Gnuplot installation, type gnuplot command and it should open the Gnuplot terminal.

Installing HBase

Data put in OpenTSDB is stored in HBase. So HBase needs to be installed before OpenTSDB is installed. To create HBase tables from TSD, HBase should run first. OpenTSDB can support both single node HBase instance and full cluster set up. Follow the instructions for setting up HBase.

Installing TSDB

As the build script executes; it creates build directory, temporary (tsd.tmp) directory, static root directory etc. The build script automatically compiles all the files and deploys the package into net directory. If compilation succeeds then run the install command from the build directory.

Starting TSD (Time Series Daemon)

Once OpenTSDB installation is completed successfully, start OpenTSDB. There are four flags whose values need to be passed for starting TSD [--port, --staticroot,--cachedir, --zkquorum].

Port: By default TCP listen to this port 4242

Staticroot: Specify the web root from which to serve static files (/s URLs)

Cachedir: Create a temporary directory for caching and performance, tsd.tmp directory is created when the build script executes. You can specify the tsd.tmp directory or else you can create a new directory, under which results of the requests will be cached.

Zkquorum: Optional flag, if the zookeeper is running in single instance, specify the host name or comma separated name of the zookeeper ensemble

The above command can be stored in a shell script, e.g. tsdb-start.sh, and can be optionally executed with nohup so that OpenTSDB will keep running even if the session dies.

[hadoop@webhost build] $ nohup tsdb-start.sh &

Once the script executes successfully, OpenTSDB is ready to serve. The web-based user interface can be accessed through: http://<machine ip>:4242

Generating TSDB tables in HBase

HBase tables need to be created before loading any metric data through OpenTSDB. The create_table.sh script located in the src directory under opentsdb can be run to create the required tables.

[hadoop@webhost src]$create_table.sh

By default, LZO compression is enabled in the script. If we run the script with the LZO option the required jar file needs to be present in HBase lib directory. In a production environment, it is recommended to use LZO compression. Otherwise, for testing purposes, the option can be set to none.

Creating Metrics for HBase Schema

Once the tables are created in HBase, we will need to register the metric names for which time series data will be added. The mkmetric command can be used for registering metric name.

Loading Metrics Data

There are many ways to collect time series data on particular matrices. We can collect the metrics using the tcollector system, custom client program, or commands can be used to load bulk data from compressed flat files. The stats command can be used to collect metrics on OpenTSDB itself. Here is the sample script to collect the data from TSDB stats metrics. This script collects the stats metrics at 5 second intervals and load it into the tsdb with the help of a put command.

The file should contain the data in the format of (metric timestamp value tags [tagK, tagV]). E.g. [tsd.http.latency_50pct 136436594 56 type=all host=webhost etc.]. If the data file is huge, it is recommended you compress it using GZip format.

[hadoop@webhost build]$ tsdb import loadmetrics_datapoints.gz

Bulk loading data points using OpenTSDB TextImporter.java from files

Set the common TSDB options and set the file path to import. Run the TextImporter.java to load metric data points

final class CliOptions {
static {
InternalLoggerFactory.setDefaultFactory(new Slf4JLoggerFactory()); }
/** Adds common TSDB options to the given {@code argp}. */
static void addCommon(final ArgP argp) {
argp.addOption("--table", "tsdb",
"Name of the HBase table where to store the time series"+" (default: tsdb).");
argp.addOption("--uidtable", "tsdb-uid",
"Name of the HBase table to use for Unique IDs (default: tsdb-uid).");
argp.addOption("--zkquorum", "127.0.0.1",
"Specification of the ZooKeeper quorum to use (default: localhost).");
argp.addOption("--zkbasedir", "/usr/lib/hbase/",
"Path under which is the znode for the -ROOT- region (default: /hbase).");
}
}

Loading OpenTSDB self-metrics

Every 5 seconds, the script will collect the data points and send them to the TSD.

[hadoop@webhost build]$ collect_metrics.sh

HTTP APIs for Getting Time Series Data Points

OpenTSDB comes packaged with a web-based UI for accessing time series data and generating graphs. Often users may want to have their own custom UI and charting solutions. OpenTSDB provides a set of HTTP-based APIs so that any application can invoke queries and retrieve OpenTSDB data points and draw their own graphs. We have added a sample java client to read the OpenTSDB data point for certain metrics. Data retrieved from OpenTSDB is a list of timestamps and data points associated with the timestamp for given metrics.

Figure 3: TimeSeriesMetricVO

TimeSeriesMetricVO holds the list of TimeSeriesReords for a particular metric. We have created a sample java client to fetch the time series metric details. Please modify the opentsdb.properties files with the opentsdb server URL before running the application. The metric name and the dates in the TestClient.java will need to be changed as necessary.

TestClient.java is the invoker class of the OpenTSDBClient.java file. For constructing the URL for the HTTP request we need certain parameters to be passed to OpenTSDB, such as metric name, start time, end time and aggregate function type. The members of the OpenTSDBQueryParameter need to be set before passing it as an argument to OpenTSDBClient.

This article has addressed the challenges faced by developers during the build and set up process and also explained how to leverage OpenTSDB's HTTP APIs to develop client programs so that users can create their own user interface for charts and graphs without depending on standard features provided by OpenTSDB. We believe our proposed approach greatly improves the process of developing OpenTSDB and HBase-based applications by enhancing reusability of the code.

Kalpana C is a Technology Analyst with the ILCLOUD at Infosys Labs. She has a decade of experience in Java/J2EE, Big Data related frameworks and technologies.

Co-Author Priyadarshi Sahoo is a Technology Lead at Infosys Ltd. He has more than 8 years of experience in Java/J2EE related technologies.