An Elastic Stack Primer

This article describes my adventures while getting initiated into the Elastic Stack. We'll be building upon the elasticsearch index I've set up for search in an earlier post. We'll upgrade our stack to 5.x, incorporate logstash and kibana, so we can consume and visualize data– besides what our search interface allows; and we'll also hone and troubleshoot our understanding of the Elastic Stack along the way.

Getting started with the Elastic Stack can feel a bit overwhelming. You need to set up elasticsearch, Kibana, and logstash before you even can get to the fun parts – maybe even before you fully understand how the three synergize providing you with formidable and formerly untapped insights into your platform.

I have been running Pony Foo since late 2012, and, although I had set up Google Analytics and Clicky since the very beginning, these kinds of user tracking systems are far from ideal when it comes to troubleshooting. In the years since I launched the blog I tried a couple of instrumentation tools that would provide me with reporting and metrics from the Node.js application for my blog, but these solutions would require me to, well... instrument the Node apps by patching them with a snippet of code that would then communicate with a third party service. Visiting that service I could learn more about the current state of my production application. For a variety of reasons, – too expensive, makes my app too slow, etc. – I ended up ditching every one of these solutions not long after giving them a shot.

Not long ago I wrote about setting up search for a blog, and since I'm already working on the Kibana analytics dashboard team I figured it wouldn't hurt to learn more about logstash so that I could go full circle and finally have a look at some server metrics my way.

Logstash is, in short, a way to get data from one place to another. It helps stream events pulled out of files, HTTP requests, tweets, event logs, or dozens of other input sources. After processing events, Logstash can output them via Elasticsearch, disk files, email, HTTP requests, or dozens of other output plugins.

The above graph shows how Logstash can provide tremendous value through a relatively simple interface where we define inputs and outputs. You could easily use it to pull a Twitter firehose of political keywords about a presidential campaign into Elasticsearch for further analysis. Or to anticipate breaking news on Twitter as they occur.

Or maybe you have more worldly concerns, like streaming log events from nginx and node into elasticsearch for increased visibility through a Kibana dashboard. That's what we're going to do in this article.

Installing the Elastic Stack

I had used Elasticsearch 2.3 for my previous blog post. This represented a problem when it came to using the Elastic Stack, because I wanted to use the latest version of Kibana – 5.0.0-alpha3, at the time of this writing.

As mentioned in the earlier article, we'll have to download and install Java 8 to get Elasticsearch up and running properly. Below is the piece of code we used to install Java 8. Keep in mind this code was tested on a Debian Jessie environment, but it should mostly work in Ubuntu or similar systems.

Next, we'll install the entire pre-release Elastic Stack – formerly ELK Stack, for Elasticsearch, Logstash and Kibana. The following piece of code pulls all three packages from packages.elastic.co and installs them.

After installing all three, we should turn on their services so that they run at startup. This is particularly desirable in production systems. Debian Jessie relies on systemd for services, so that's what we'll use to enable these services.

You want to reduce your "hands on" 👏 production experience as much as possible. Automation is king, etc. Otherwise, why go through the trouble?

Lastly, we start all of the services and log their current status. We pause for ten seconds after booting Elasticsearch in order to give it time to start listening for connections, so that Logstash doesn't run into any trouble.

It's important to note that we'll be expecting the access.log file to have a specific format that Logstash understands, so that we can consume it and split it into discrete fields. To configure nginx with the pattern we'll call metrics, add this to the http section of your nginx.conf file.

Next, in the server section of your site.conf for the site you want to be parsing logs for, just tell nginx to use the metrics logging format we've defined earlier.

access_log /var/log/nginx/ponyfoo.com.access.log metrics;

After exposing the Kibana web app on my nginx server and pointing a DNS CNAME entry for analytics.ponyfoo.com(currently password protected) to my load balancer, I was good to go.

From nginx to Elasticsearch via Logstash

Logstash relies on inputs, filters applied on those inputs, and outputs. We configure all of this in a logstash.conf config file that we can then feed logstash using the logstash -f $FILE command.

Getting Logstash to read nginx access logs is easy. You specify an event type that's later going to be used when indexing events into Elasticsearch, and the path to the file where your logs are recorded. Logstash takes care of tailing the file and streaming those logs to whatever outputs you indicate.

When it comes to configuring Logstash to pipe its output into Elasticsearch, the configuration couldn't be any simpler.

output {
elasticsearch {}
}

An important aspect of using Logstash is using filters to determine the fields and field types you'll use when indexing log events into Elasticsearch. The following snippet of code tells Logstash to look for grok patterns in /opt/logstash/patterns, and to match the message field to the NGINX_ACCESS grok pattern, which is defined in a patterns file that lives in the provided directory.

The contents of the file containing the grok patterns are outlined below, and they should be placed in a file in /opt/logstash/patterns, as specified in the patterns_dir directive above. Logstash comes with quite a few built-in patterns and the general way of consuming a pattern is %{PATTERN:name} where some patterns such as NUMBER also take a type argument, such as %{NUMBER:bytes:int}, in the piece of code below.

Each match in the pattern above means a field will be created for that bit of output, with its corresponding name and type as indicated in the pattern file.

Going back to the filter section of our logstash.conf file, we'll also want a date filter that parses the timestamp field into a proper date. a geoip filter so that the client_ip field gets the geolocation treatment.

You can run logstash -f $FILE to test the configuration file and ensure your pattern works. There's also a website where you can test your grok patterns online.

If you're running Logstash as a service, or intend to, you'll need to place the logstash.conf file in /etc/logstash/conf.d– or change the default configuration directory.

My preference when configuring globally-installed tools such as Logstash is to keep configuration files in a centralized location and then create symbolic links in the places where the tool is looking for those files. In the case of Logstash there's two types of files we'll be using: grok patterns and Logstash configuration files. I'll be keeping them in $HOME/app/logstash and creating links to the places where Logstash looks for each of these kinds of files.

By the way, I had an issue where nginx logs wouldn't be piped into Elasticsearch, and Logstash would fail silently, but only when run as a service. It turned out to be a file permissions issue, one that was easily fixed by ensuring that the logstash user had execute permissions on every directory from / to /var/log/nginx.

You can verify that by executing the following command, which interpolates getfacl / all the way through to the /var/log/nginx directory.

You'll note that in this case, user logstash of group logstash wouldn't have execute rights on /var/log/nginx. This can be easily fixed with the following command, but it gave me quite the headache until I stumbled upon this particular oddity.

sudo chmod o+x /var/log/nginx

Naturally, you'll also need to grant read access to the relevant log files.

I didn't just want to stream nginx logs to Elasticsearch, I also wanted node application level logs to be here. Let's do this!

Throwing node Logs into the Mix

To add Node logs into the mix, we don't need to change the output section of our Logstash file. It has all we needed. We do need to change the input and filter sections. First, we'll add another file to the input section. This should point to the file where your Node.js app is streaming its logs into. In my case that file was /var/log/ponyfoo-production.log, which is where the service running my node app redirected its output.

Now, the node_app log is a bit different. Sometimes, there are multi-line log entries. That happens when I log a stack trace. That's easily fixed by adding a multiline codec to my input processor, where each log entry must start with a NODE_TIME pattern-matching string, or otherwise is merged into the current log entry. Effectively, this means that lines that don't start with a timestamp will be treated as part of the last event, until a new line that matches a timestamp comes along, creating a new entry.

In order for the filter section to play nice with two different input files under different formats, we'll have to update it using conditionals. The following piece of code is the same filter section we had earlier, wrapped in an if that checks whether we are dealing with an event of type nginx_access.

For node_app events, we'll do something quite similar, except there's no geolocation going on, and our main message pattern is named NODE_APP instead of NGINX_ACCESS. Also, make sure that the filter matches the new document type, node_type.

If you restart logstash, – either using the service or by executing the logstash -f logstash.conf command again – it should start streaming logs into Elasticsearch, from both nginx and node, as soon as the Logstash service comes online.

How do we interact with those logs in a friendly manner then? While it's okay for search to be mostly accessible through the web interface of the blog, it'd be pretty pointless to go through all of this trouble just to dump the logs into a table again on the web front-end.

That's where Kibana comes in, helping us visualize the data.

Getting Started with Kibana

Having already installed the Elastic Stack, and after having set up Logstash to channel our logs into Elasticsearch, we're now ready to fire up Kibana and gain some insights into all our data.

The default port Kibana listens on is 5601. I use nginx as a reverse proxy in front of Kibana. You can use proxy_pass to forward requests to the back-end Kibana node app. Here's the full server entry for nginx I use for ponyfoo.com.

Now you should be able to visit your Kibana dashboard at either http://localhost:5601 or your designated port and hostname.

open http://localhost:5601

The first time you load Kibana, you will be greeted with a message about configuring a default index pattern. Kibana suggests we try the logstash-* index pattern. Logstash defaults to storing events on indices named according to the day the event is stored on, such as logstash-2016-06-17, and so on. The * in the pattern acts as a wildcard so that every index starting with logstash- will match.

Once we've set up the index pattern, we might want to visit the Discover tab. Here, we'll be able to pull up Elasticsearch records as data rows, as well as a simple time-based chart displaying the amount of events registered over time. You can use the Elasticsearch query DSL to perform any queries and you can also apply filters on top of that query.

Data tables are nowhere near the most exciting kind of thing you can do in Kibana, though. Let's take a look at a few different charts I've set up, and while we're at it I'll explain the terminology used by the Kibana interface as well as a few statistics terms in layman's language. Articles have been written before that describe in detail everything you can do with Kibana visualizations, but I'll favor a real-world example-based approach.

Let's head to the Visualize tab for our first visualization.

Charting Status Codes and Request Volume

Create a pie chart and select logstash-* for the index pattern. You'll note that the default pie chart visualization is a whole pie containing every matching document and displaying the total count. Fair enough.

Add a Split Slices bucket with a Terms aggregation and choose the status field – then hit Enter. That sentence was overcharged with technical terms. Let's break it down. We are splitting the pie into several slices, using the Terms aggregation type to indicate that we want to identify distinct terms in the status field, such as 200, 302, 404, and grouping documents by that aggregation. We'll end up with an slice for documents with a status code of 200, another slice for documents with a 302 status code, and so on.

Generally speaking, a healthy application should be returning mostly 2xx and 3xx responses. This chart makes it easy to visualize whether that's the case.

You can further specialize the graph by adding a sub-bucket, aggregating again using Terms, this time by the request.raw field. That way, you'll not only get a general ratio of status codes, but you'll also gain insight into which requests are the most popular, as well as which ones are the most problematic, all in one chart.

In the screenshot found below, you can see that the /articles/feed route is getting even more traffic than the home page.

Maybe a little bit of caching is in order, since the RSS feeds don't change all that often. In the same light, earlier on these pie charts helped me identify and fix a bunch of requests for icons and images that were yielding 404 errors. That's something that would've been harder to spot by looking at the dense and raw nginx logs, but it's really easy to take action based on visualizations that help us work with the data.

Save the pie chart using the Save link on the top bar and give it a name such as "Response Stats".

Plotting Responses Over Time, By Status Code

Go back to the Visualize tab and create a new line chart. You'll see that we only get a single dot graphed, because there's no x axis by default. Let's add an X-Axis, using a Date Histogram of @timestamp. This is a fancy way of saying we want to group our data points into 30 minute long buckets. All events within a 30-minute window are grouped – aggregated – into a single bucket displaying the Count (that's the y axis). A Date Histogram makes it possible to render a large dataset into a line chart without having to render several thousands of individual data points. The Date Histogram aggregation allows you to define the time lapse you want to use to group events, or you can use the default Auto value of 30 minutes, as with the example below.

While it's useful to render requests over time, particularly when it comes to determining whether our server is functioning properly under load, experiencing a sudden surge of requests, or not responding to a single request, it's even more useful to understand the underlying sub-segments of data. What are the status codes for each of those responses over time? Was the spike that we saw in the earlier chart the product of 500 errors or just a spike in traffic?

Add a Split Lines aggregation by Terms using the status field. After hitting Enter (or the Play button), we'll note that the chart is now divided into several different lines, each line representing all responses that ended with a given status code. Now it'll be much easier to spot error spikes and discern them from spikes in traffic.

Save this chart as "Response Times", and let's create something else.

Charting Geographic Data

It's really easy to chart geographic data into a map using Kibana, provided that you've set up the geoip filter in Logstash as indicated earlier, so that the IP addresses found in nginx logs are transformed into geolocation data that can be charted into a map. Given that, it's just a matter of going to the Kibana Visualize tab, creating a Tile Map, and adding a Geo Coordinates bucket using the field computed by Logstash.

Save the map as "Request Locations".

Stitching It All Together

Now that we've created three different visualizations, where we have a rough overview of status codes and the hottest endpoints, time-based plots of each of the status codes, and a map showing where requests are coming from, it'd be nice to put all of that in a single dashboard.

Visit the Dashboard tab in Kibana, pick Add from the top-bar navigation and add all of our saved charts. After some rearranging of the charts, you'll get a dashboard like the one below. You can save this one as well, naming it something like "Overview". Whenever you want to get a glimpse into your operations now you can simply visit the Overview dashboard and you'll be able to collect insights from it.

I've also included an "average response time" over time chart in my dashboard, which wasn't discussed earlier. You can plot that on a line chart with a y axis of a Date Histogram with minute or second resolution, and an x axis using an Average aggregation of request_time.

Securing access to your data and Kibana instance would be a most welcome and wise addition. We can do that using security features in X-Pack. Other X-Pack features include cluster health monitoring, alerting on changes in your data, reporting, and graph exploration.

I've set it up myself for my blog, and I must say that X-Pack is pretty awesome. Check out my deployment scripts to learn more about setting it up. In this article, I've decided against providing a more detailed description of the installation process and features in X-Pack. If you want to give it a shot, we offer a month-long trial where you can take all of these extra awesome features for a ride.