My Life as a Sys Admin

Category Archives: logstash

Redis is an open-source, networked, in-memory, key-value data store. It’s being heavily used every where from Web stack to Monitoring to Message queues. Monitoring tools like Sensu already has some good scripts to Monitor Redis. Last Month during PyCon 2014@Plivo, opensourced a new rate limited queue called SHARQ which is based on Redis. So apart from just Monitoring checks, we decided to have a tsdb of what’s happening in our Redis Cluster. Since we are heavily using ELK stack to visualize our infrastructure, we decided to go ahead with the same.

CollectD Redis Plugin

There is a cool CollectD plugin for Redis. It pulls a verity of Data from Redis which includes, Memory used, Commands Processed, No. of Connected Clients and slaves, No. of blocked Clients, No. of Keys stored/db, uptime and challenges since last save. The installation is pretty simple and straight forward.

Now copy the redis python plugin and the conf file to collectd folder.

$ mkdir /etc/collectd/plugin # This is where we are going to place our custom plugins
$ cp /tmp/redis-collectd-plugin/redis_info.py /etc/collectd/plugin/
$ cp /tmp/redis-collectd-plugin/redis.conf /etc/collectd/

By default, the plugin folder in the redis.conf is defined as ‘/opt/collectd/lib/collectd/plugins/python’. Make sure to replace this with the location where we are copying the plugin file, in our case “/etc/collectd/plugin”. Now lets restart the collectd daemon to enable the redis plugin.

$ /etc/init.d/collectd stop
$ /etc/init.d/collectd start

In my previous Blog, i’ve mentioned how to enable and use the ColectD input plugin in Logstash and to use Kibana to plot the data coming from the collectd. Below are the Data’s that we are receiving from the CollectD on Logstash,

Being a DevOps guy, i always love metrics. Visualized metrics gives a good picture of what’s happening in our live battle stations. There are now a quite lot of Open Source tools for monitoring and visualizing. It’s more than a year since i’ve started using Logstash. It never turned me down. ElasticSearch-Logstash-Kibana (ELK) is a killer combination. Though i started Elasticsearch + Logstash as a log analyzer, later StatsD and Graphite took it to the next level. When we have a simple infrastructure it’s easy to monitor. But when the infra starts scaling, it becomes quite difficult to keep track of all the events happening inside each nodes. Though service checks can help, but there is still limitation for it. I faced a lot of scenarios where things breaks but service checks will be fine. Under such scenarios logs are the only hope. They have all these events captured.

At Plivo, we manage a variety of servers from SIP, Media, Proxy, WebServers, DB’s etc. Being a fully Cloud based system, i really wanted to have a system which can keep track of all the live events/status of what’s really happening inside our infra. So my plan was to collect two important stats, 1) Server’s events 2) Application events.

Collectd and Logstash

Collectd is a daemon which collects system performance statistics periodically. Since we have a lot Server’s which handle Realtime Media, it’s a very critical component for us. We need to ensure that the server’s are not getting overloaded and there is no latency in network. I’ve been using Logstash heavily for stashing all my logs. And there is a stable input plugin for collectd to send the all the system metrics to logstash.

First we need to enable the Network Plugin, and then we need to mention our Logstash server IP and port so that collectd can start injecting metrics. Below is a sample colectd configuration.

Now on the Logstash server, we need to add the CollectD plugin on to the input filter in the logstash’s config file.

input {
collectd {
port => "5555" # default port is 25826
}
}

Now we are set. Based the plugins enabled in the collectd config file, collctd will start sending the metrics to Logstash on the Interval mentioned in the config, default is 10s. So in my case, i wanted the Load, CPU usage, Memory usage, Bandiwdth (TX and RX) etc. There are default plugins for all these metrics, which we can just enable it in the config file. We also had some custom plugins to collect some custom metrics. BTW writing custom plugin is pretty easy in Collectd.

Now using the Logstash’s Elasticsearch output plugin, we can keep these metrics in Elasticsearch. Now this where Kibana comes in. We can start visualizing these metrics via Kibana. We need to create a custom Lucene Query. Once we have the query, we can create a custom histogram’s for each of these queries. Below aresome sample Lucene queries that we can use with Kibana.

For Load -> collectd_type:"load" AND host:"test.plivo.com"
For Network usage -> collectd_type:"if_octets" AND host:"test.plivo.com"

Below is the screenshot of histogram for Load and Network (TX and RX)

Log Events

Now next is to collect the events from the application logs. We use SIP protocol for all our VOIP sessions. So all our SIP server’s are very critical for us. SIP is pretty similar to HTTP. The response codes are very similar to HTTP responses, ie 1xx, 2xx, 3xx, 4xx, 5xx, 6xx. So i wrote some custom grok patterns so keep track of all of these responses and stores the same on the Elasticsearch.

The second stats which i was interested was our SIP registrar server. We provide SIP endpoints to our customers so that they can use the same with SIP/Soft phones. So i was more interested on stats like Number of registrations/sec, Auth error rates. Plus using ElasticSearch’s MAP facet’s i can create BetterMap. In my previous blog post’s i’ve mentioned on how to create these bettermaps using Kibana and Elasticsearch. Below bettermap screenshot shows us the SIP endpoint registrations from various locations in the last 2 hours.

Now using the Kibana we can start visualizing all these data’s. Below is a sample of Dashboard that i’ve created using Kibana.

ELK stack proved to be an amazing combination. We are currently injecting 3 million events every day and ElasticSearch was blazingly fast in indexing all theses.

Being in DevOps it’s always Multi tasking. From Regular customer queries it goes through Monitoring, Troubleshooting etc. And offcourse when things breaks, it really becomes core multi tasking. Especially when you have a really scaling infrastructure, we should really understand what’s really happening in our infrastructure. Yes we do have many new generation cloud monitoring tools like Sensu, but what if we have a near real time system that can tell us the each and every events happeing in our infrastructure. Logs are the best places where we can keep track of the events, even if the monitoring tools has missed it. We have a lot of log aggregator tools like tool Logstash, Splunk, Apache Kafka etc. And for log based event collection the common choice will be always Logstash -> StatsD -> Graphite. And ElasticSearch for indexing these.

My requirement was pretty straight. Record the events, aggregate them and keeps track of them in a timely manner. Kibana uses ElasticSearch facets for aggregating the search query results.Facets provide aggregated data based on a search query. So as a first task, i decided to visualize the location of user’s who are registering their SIP endpoints on our SIP registrar server. Kibana gives us a good interface for the 2D heat map as well as a new option called BetterMap. Bettermap uses geographic coordinates to create clusters of markers on map and shade them orange, yellow and green depending on the density of the cluster. So from the logs, i just extracted the register events, and used a custom regex patterns to extract the details like the Source IP, usernames etc using logstash. Using the logstash’s GeoIP filter, the Geo Locations of the IP can be identified. For the BetterMap, we need coordinates, in geojson format. GeoJSON is [longitude,latitude] in an array. From the Geo Locations that we have identified in the GeoIP filter, we can create this GeoJSON for each event that we are receiving. Below is a sample code that i’ve used in logstash.conf for creating the GeoJSON in Logstash.

The above filter will create a GeoJSON array “geoip.coordinates”. This array can be used for creating the BetterMap in Kibana. Below are the settings for creating a BetterMap panel in the Kibana dashboard. While adding a new panel, select “bettermap” as the panel type, and the co-ordinate filed should be the one which contains the GeoJSON data. Make sure that the data is of the format [longitude,latitude], ie Longitude first and then followed by latitude.

Moving ahead, i decided to collect the events happening on our various other server’s. We were one of the earliest companies who started using SIP (Session Initiation Protocol). SIP employs design elements similar to the HTTP request/response transaction model. So similar to web traffic, i’ve decided to collect events related to 4XX, 5XX and 6XX error responses, as it is very important to us. Once the logs are shipped to logstash, i wrote another custom grok pattern, which extracts the Error Code and Error responses, including the server which returned the same. These data’s can be used for future analysis also. So i decided to store these on ElasticSearch. So now we have the real time event data’s stored, but how to visualize. Since i dont have to perform much mathematical analytics with data, i decided to to remove graphite. Kibana has a wonder full GUI for visualizing the data. So decided to go ahead with Kibana. One option is “histogram” panel time. Using histogram we can visualize the data via a regular bar graph, as well as using the area graph. There is another panel type called “terms” which can be used to display the agrregated events via pie chart, bar chart, or a table. And below is what i achieved with Kibana.

This is just an inital setup. I’m going to add more events to this. As of now Kibana + Elasticsearch proves to be a promising combination for displaying all near real time events happening in my Infrastructure.

Being an OPS guy, i love Logs a lot. Logs contains lots of sensitive events recorded in it. Though a lot of people rely on monitoring tools, there are a lot of scenario where we still can’t rely on monitoring. In such scenarios, logs are the best sources to identify those events in a near real time fashion. A common scenario is Web Operations, where we need to count the the various 4xx, 5xx, Auth errors experienced by the user’s. I had a simliar requirement where i need to identify the 4xx, 5xx, 6xx errors and other similar failures on various SIP server’s. But apart from just visualising these error’s i also wanted a notification system which can notify me when the value crosses the threshold.

Logstash and StasD is a perfect combination for aggregating events from the logs. StatsD has a Graphite backend, where it sends the aggreagated metric values for visualizing. But when we have large number graphs, and offcourse when being a multi tasking Ops guy, it’s not possible to sit and watch all these graphs. So we need a notification system which alert’s us when things starts breaking. Here comes RIEMANN. Riemann aggregates events from your servers and applications with a powerful stream processing language. Riemann is pretty light weight, easy to configure monitoring framework. Logstash sents the filtered events from the logs to StatsD output plugin. Based on the flushInterval, statsD iterates through the received events sents the aggregated metric values to the Graphite. There is also a Riemann output plugin for Logstash, but we need to pass the state/metric to the plugin. In my case, logstash filters the event from the log, so i need to converts these events to time based metric values. Since statsD already has these events converted into time series metrics, i decided to write a small backend for statsD that can send these aggregated metrics to Riemann.

The StatsD backend basically requires to main functions, one is ”flush_stats” which will get invoked once the flush interval is reached. This function then iterates over the received metrics and passes these aggregated metrics to another function called ”post_stats”, which sends the metrics to the corresponding aplications. In our case, we need to send the metrics to Riemann. There is a Riemann-Node plugin, which we can utilize here for sending the metrics to Riemann server. Below is the content for the ”flush_stats” and ”post_stats” functions. Currently i’ve added support only for counters. Soo i’ll be adding support for Counters and Timers also.

So here i’m not gonna send the per second metrics. I’m using the default 10 sec flushInterval. So every seconds StatsD will send the incremented metrics to Riemann. The namespace, sender etc are defined in the logstash conf itself. The full plugin file is available in here

To use this Riemann backend, first we need to copy this file into the backend folder of the StatsD repo folder. Then we need to enable this plugin in the StatsD config file. Below is a sample config file which uses both graphite and Riemann backends.

So now StatsD will send out the incremented and the per second metric to Graphite and the Riemann backend will send the incremented metric to the Rieman server. No we can define the metric threshold and the notification method on the reimann config file. Below is my reimann metric threshold and notification.

So whenever the recived metric value is beyond 10, Riemann will notify the same to my Email. I’ve done some dry testing with this setup. So far this setup never turned me down. Though there are some tweaks to be done, but this setup really suited to my requirement. being an OPS guy, my primary focus is to detect the outages at a very early stages to minimize the impact. Hope this guy will be an added defence layer for the same.

For the last few days i was playing around with my two of my favourite tools Logstash and StatsD. Logstash, StatsD, Graphite together makes a killer combination. So i decided to test this combination along with Lumberjack for Real time Monitoring. I’m going to use, Lumberjack as the log shipper from the webserver, and then Logstash will stash the log’s porperly and and using the statsd output plugin i will ship the metrics to Graphite. In my previous blog, i’ve explained how to use Lumberjack with Logstash. Lumberjack will be watching my test web server’s access logs.

By default, i’m using the combined apache log format, but it doesnot have the original response time for each request as well as the total reponse time. So we need to modify the LogFormat, in order to add the two. Below is the LogFormat which i’m using for my test setup.

Once the LogFormat is modified, restart the apache service in order to make the change to be effective.

Setting up Logstash Server

First Download the latest Logstash Jar file from the Logstash site. Now we need to create a logstash conf file. By default there is a grok pattern available for apache log called “COMBINEDAPACHELOG”, but since we have added the tow new fields for the response time, we need to add the same for grok pattern also. So below is a pattern which is going to be used with Logstash.

Now we can access the dashboard using the url, “http://ip-address:8080&#8221;. Once we have started the carbon cache, we can start the Logstash server.

$ java -jar logstash-1.1.13-flatjar.jar agent -f logstash.conf -v

Once the logstash has loaded all the plugins successfully, we can start shipping logs from the test webserver using Lumberjack. Since i’ve enabled the STDOUT plugin, i can see the output coming from the Logstash server. Now we can start accessing the real time graph’s from graphite gui. There are several other alternative for the Graphite GUI like Graphene, Graphiti, Graphitus, GDash. Anyways Logstash-StatsD-Graphite proves to be a wonderfull combination. Sorry that i could not upload any screenshot for now, but i will upload soon

Logstash is one of the coolest projects that i always wanted to play around. Since i’m a sysadmin, i’m forced to handle multiple apps, which will logs in different formats. The most weird part is the timestamps, where most of the app uses it’s own time formats. Logstash helps us to solve such situations, we can remodify the time stamp to a standard time format, we can use the predefined filter’s for filtering out the log’s, even we can create our own filter’s using regex. All the documentations are available in the Logstash website Logstash mainly has 3 parts, 1) INPUT -> from which the log’s are shipped to Logstash, 2) Filter -> for filtering our incoming log’s to suit to our needs, 3) Output -> For storing or relaying the Filtered output log’s to various Applications.

Lumberjack is one such input plugin designed for logstash. Though the plugin is still in beta state, i decided to give it a try. By default we can also use logstash itself for shipping logs to centralized Logstash server, the JVM made it difficult to work with many of my constrained machines. Lumberjack claims to be a light weight log shipper which uses SSL and we can add custom fields for each line of log which we ships.

Setting up Logstash Server

Download the latest the logstash jar file from the logstash website. Now create a logstash configuration file for the logstash instance. In the config file, we have to enable the lumberjack plugin. Lumberjack uses SSL CA to verify the server. So we need to generate the same for the logstash server. We can use the below mentioned command to generate the SSL certificate and key.

Setting up Lumberjack agent

On the machine from which we are going to ship the log’s, clone the Lumberjack github repo.

$ git clone https://github.com/jordansissel/lumberjack.git

Install the fpm ruby gem, which is required to build the lumberjack package.

$ gem install fpm
$ cd lumberjack && make
$ make deb => This will build a debian package of the lumberjack
$ dpkg -i lumberjack_0.0.30_amd64.deb => The package will install all the files to the `/opt/lumberjack`

Now copy the SSL certificate which we have generated at the Logstash server, to the Lumberjack machine. Once the SSL certificte has been copied, we can start the lumberjack agent.

Now we will start getting the output from the Logstash in our screen, since we are using the ‘stdout’ output plugin. A very good detailed documentation about Lumberjack and Logstash can be found here, written by Brian Altenhofel. He had given a talk on this at Drupalcon 2013, Portland. The video for the talk is available here. It’s a very good blog post.

For the last few days i was playing around with Riak, a distributed database. It’s very simple to configure and use and offcourse it supports MapReduce. I wanted to try out the map reduce, and since logstash has a plugin to write data into riak, i decided to use it with logstash on an Ubuntu 12.04 machine.

Configuring Riak

Installing Riak is very simple, it has only a few dependencies.

“apt-get install libssl0.9.8 erlang”

Once the dependencies are being installed, we have to download and install the deb package of Riak from its website.

Once Riak is installed, go to “/etc/riak”, where the config files are available. We can change the name of the riak node by editing the “vm.args” file. By default Riak will listen to “127.0.0.1”, but we can change this by editing “app.config” file. In order to use enable https enable, we need to uncomment the https section in Riak core config. We also have to mention the path of the server key and certificate. Riak comes with a build in Admin console, which currently has very minimal functions. It shows the status of the riak nodes as well as the members in the riak ring. To enable this, open the “app.config” go to “riak_control_config” and change the “enabled,false” to “enabled,true”. The user name and password can be mentioned in the userlist option.

If we have multiple machine we can create a riak cluster using riak-admin tool. Currently i’ve only one machine with Riak installed.

In Riak, data’s are stored in “Buckets”. A Bucket is a container and keyspace for data stored in Riak, with a set of common properties for its contents (the number of replicas, or n_val, for instance). Buckets are accessed at the top of the URL hierarchy under “riak”, e.g. /riak/bucket.

Configuring Logstash

Now we have Riak machine, listening on port port “8098”. Now we need to configure logstash to sendthe data to the riak. This is very simple because logstash has an output plugin which can directly write to riak.

In the output section of logstash config file, add the riak output plugin. It should be like this,

” riak {

bucket => bucketname

type => typename

nodes => [“riakserverip”,”8098″,”riakserverip”,”8098″]

}”

In the nodes section we have to mention the riak node ip’s. Since i’ve only one riak node, i’m mentioning the same ip twice.

That’s it, now we need to start logstash, then logstash will start writing data into the bucket which we mentioned in the conf file.

There is one good GUI for Riak called “rekon“. Just get the source code from github and edit the “install.sh” and change the ip mentioned in to the ip which riak listens to and execute it. Now we can access the GUI using the below url

Using this we can see the buckets inside the riak and also the corresponding key values.

Now testing the “Map Reduce Function”

This is one of the main features of Riak. For example i’m going to write a map reduce function that will display all the keys in my bucket that has the keyword “mylinux”, which is the hostname of my machine. This function will return the key as well as the number of occurrences. Below is a simple MapReduce function.