How To Monitor Your Ubuntu 16.04 System with Sysdig

Introduction

Sysdig is a comprehensive open-source system activity monitoring, capture and analysis application. It features a powerful filtering language with customizable output, and core functionality that can be extended with Lua scripts called chisels.

The application works by tapping into the kernel, which allows it to see every system call and all of the information passing through the kernel. This also makes it an excellent tool for monitoring and analyzing system activity and events generated by application containers running on a system.

The core Sysdig application monitors the server it is installed on. However, the company behind the project offers a hosted version called Sysdig Cloud that can monitor any number of servers remotely.

The standalone application is available on most Linux distributions, but it's also available, with more limited functionality, on Windows and macOS. Aside from the sysdig command line tool, Sysdig also comes with an interactive UI called csysdig with similar options.

In this tutorial, you'll install and use Sysdig to monitor an Ubuntu 16.04 server. You'll stream live events, save events to files, filter results, and explore the csysdig interactive UI.

Prerequisites

Step 1 – Installing Sysdig Using the Official Script

There's a Sysdig package in the Ubuntu repository, but it's usually a revision or two behind the current version. At the time of publication, for example, installing Sysdig using Ubuntu's package manager will get you Sysdig 0.8.0. However, you can install it using an official script from the project's development page, which is the recommended method of installation. This is the method we'll use.

But first, update the package database to ensure you have the latest list of available packages:

sudo apt-get update

Now download Sysdig's installation script with curl using the following command:

This downloads the installation script to the file install-sysdig to the current folder. You'll need to execute this script with elevated privileges, and it's dangerous to run scripts you download from the Internet. Before you execute the script, audit its content by opening it in a text editor or by using the less command to display the contents on the screen:

less ./install-sysdig

Once you're comfortable with the commands the script will run, execute the script with the following command:

cat ./install-sysdig | sudo bash

The command will install all dependencies, including kernel headers and modules. The output of the installation will be similar to the following:

Now that you've got Sysdig installed, let's look at some ways to use it.

Step 2 – Monitoring Your System in Real-Time

In this section, you'll use the sysdig command to look at some events on your Ubuntu 16.04 server. The sysdig command requires root privileges to run, and it takes any number of options and filters. The simplest way to run the command is without any arguments. This will give you a real-time view of system data refreshed every two seconds:

sudo sysdig

But, as you'll see as soon as you run the command, it can be difficult to analyze the data being written to the screen because it streams continuously, and there are lots of events happening on your server. Stop sysdig by pressing CTRL+C.

Before we run the command again with some options, let's get familiar with the output by looking at a sample output from the command:

evt.cpu is the CPU number where the event was captured. In the above output, the evt.cpu is 0, which is the server's first CPU.

proc.name is the name of the process that generated the event.

thread.tid is the TID that generated the event, which corresponds to the PID for single thread processes.

evt.dir is the event direction. You'll see > for enter events and < for exit events.

evt.type is the name of the event, e.g. 'open', 'read', 'write', etc.

evt.info is the list of event arguments. In case of system calls, these tend to correspond to the system call arguments, but that’s not always the case: some system call arguments are excluded for simplicity or performance reasons.

There's hardly any value in running sysdig like you did in the previous command because there's so much information streaming in. But you can apply options and filters to the command using this syntax:

sudo sysdig [option] [filter]

You can view the complete list of available filters using:

sysdig -l

There's an extensive list of filters spanning several classes, or categories. Here are some of the classes:

Since it's not practical to cover every filter in this tutorial, let's just try a couple, starting with the syslog.severity.str filter in the syslog class, which lets you view messages sent to syslog at a specific severity level. This command shows messages sent to syslog at the "information" level:

sudo sysdig syslog.severity.str=info

Notes: Depending on the level of activity on your server, you might not see any output after typing this command, or it might take a long time before you see any output. To force issues, open another terminal emulator and perform an action that will generate a message to syslog. For example, perform a package update, upgrade the system, or install any package.

Kill the command by pressing CTRL+C.

The output, which should be fairly easy to interpret, should look something like this:

You can also filter on a single process. For example, to look for events from nano, execute this command:

sudo sysdig proc.name=nano

Since this command filers on nano, you will have to use the nano text editor to open a file to see any output. Open another terminal editor, connect to your server, and use nano to open a text file. Write a few characters and save the file. Then return to your original terminal.

Getting a real time view of system events using sysdig is not always the best method of using it. Luckily, there's another way - capturing events to a file for analysis at a later time. Let's look at how.

Step 3 – Capturing System Activity to a File Using Sysdig

Capturing system events to a file using sysdig lets you analyze those events at a later time. To save system events to a file, pass sysdig the -w option and specify a target file name, like this:

sudo sysdig -w sysdig-trace-file.scap

Sysdig will keep saving generated events to the target file until you press CTRL+C. With time, that file can grow quite large. With the -n option, however, you can specify how many events you want Sysdig to capture. After the target number of events have been captured, it will exit. For example, to save 300 events to a file, type:

sudo sysdig -n 300 -w sysdig-file.scap

Though you can use Sysdig to capture a specified number of events to a file, a better approach would be to use the -C option to break up a capture into smaller files of a specific size. And to not overwhelm the local storage, you can instruct Sysdig to keep only a few of the saved files. In other words, Sysdig supports capturing events to logs with file rotation, in one command.

For example, to save events continuously to files that are no more than 1 MB in size, and only keep the last five files (that's what the -W option does), execute this command:

sudo sysdig -C 1 -W 5 -w sysdig-trace.scap

List the files using ls -l sysdig-trace* and you'll see output similar to this, with five log files:

As with real-time capture, you can apply filters to saved events. For example, to save 200 events generated by the process nano, type this command:

sudo sysdig -n 200 -w sysdig-trace-nano.scap proc.name=nano

Then, in another terminal connected to your server, open a file with nano and generate some events by typing text or saving the file. The events will be captured to sysdig-trace-nano.scap until sysdig records 200 events.

How would you go about capturing all write events generated on your server? You would apply the filter like this:

sudo sysdig -w sysdig-write-events.scap evt.type=write

Press CTRL+C after a few moments to exit.

You can do a whole lot more when saving system activity to a file using sysdig, but these examples should have given you a pretty good idea of how to go about it. Let's look at how to analyze these files.

Step 4 – Reading and Analyzing Event Data with Sysdig

Reading captured data from a file with Sysdig is as simple as passing the -r switch to the sysdig command, like this:

sudo sysdig -r sysdig-trace-file.scap

That will dump the entire content of the file to the screen, which is not really the best approach, especially if the file is large. Luckily, you can apply the same filters when reading the file that you applied to it while it was being written.

For example, to read the sysdig-trace-nano.scap trace file you created, but only look at a specific type of event, like write events, type this command:

Notice that all the lines in the preceding output contain 11.11.11.11:51282->22.22.22.22:ssh. Those are events coming from the external IP address of the client, 11.11.11.11 to the IP address of the server, 22.22.22.22 . These events occurred over an SSH connection to the server, so those events are expected. But are there other SSH write events that are not from this known client IP address? It's easy to find out.

There are many comparison operators you can use with Sysdig. The first one you saw is =. Others are !=, >, >=, <, and <=. In the following command, fd.rip filters on remote IP address. We'll use the != comparison operator to look for events that are from IP addresses other than 11.11.11.11:

sysdig -r sysdig-write-events.scap fd.rip!=11.11.11.11

A partial output, which showed that there were write events from IP addresses other than the client IP address, is shown in the following output:

Further investigation also showed that the rogue IP address 33.33.33.33 belonged to a machine in China. That's something to worry about! That's just one example of how you can use Sysdig to keep a watchful eye on traffic hitting your server.

Let's look at using some additional scripts to analyze the event stream.

Step 5 – Using Sysdig Chisels for System Monitoring and Analysis

In Sysdig parlance, chisels are Lua scripts you can use that analyze the Sysdig event stream to perform useful actions. There are close to 50 scripts that ship with every Sysdig installation, and you can view a list of available chisels on your system using this command:

spy_file: Echo any read or write made by any process to all files. Optionally, you can provide the name of a file to only intercept reads or writes to that file.

httptop: Show the top HTTP requests

For a more detailed description of a chisel, including any associated arguments, use the -i flag, followed by the name of the chisel. So, for example, to view more information about the netstat chisel, type:

sysdig -i netstat

Now that you know all you need to know about using that netstat chisel, tap into its power to monitor your system by running:

If you see ESTABLISHED SSH connections from an IP address other than yours in the Client Address column, that should be a red flag, and you should probe deeper.

A far more interesting chisel is spy_users, which lets you view interactive user activity on the system.

Exit this command:

sudo sysdig -c spy_users

Then, open a second terminal and connect to your server. Execute some commands in that second terminal, then return to your terminal running sysdig. The commands you typed in the first terminal will be echoed on the terminal that you executed the sysdig -c spy_users command on.

Next, let's explore Csysdig, a graphical tool.

Step 6 – Using Csysdig for System Monitoring and Analysis

Csysdig is the other utility that comes with Sysdig. It has an interactive user interface that offers the same features available on the command line with sysdig. It's like top, htop and strace, but more feature-rich.

Like the sysdig command, the csysdig command can perform live monitoring and can capture events to a file for later analysis. But csysdig gives you a more useful real time view of system data refreshed every two seconds. To see an example of that, type the following command:

sudo csysdig

That will open an interface like the one in the following figure, which shows event data generated by all users and applications on the monitored host.

At the bottom of the interface are several buttons you can use to access the different aspects of the program. Most notable is the Views button, which is akin to categories of metrics collected by csysdig. There are 29 views available out of the box, including Processes, System Calls, Threads, Containers, Processes CPU, Page Faults, Files, and Directories.

When you start csysdig without arguments, you'll see live events from the Processes view. By clicking on the Views button, or pressing the F2 key, you'll see the list of available views, including a description of the columns. You may also view a description of the columns by pressing the F7 key or by clicking the Legend button. And a summary man page of the application itself (csysdig) is accessible by pressing the F1 key or clicking on the Help button.

The following image shows a listing of the application's Views interface.

Notes: For every button, there's a corresponding keyboard shortcut, or hotkey, to the left side of the button. Pressing a shortcut key twice will get you back to the previous window. Pressing the ESC key will achieve the same result.

Though you can run csysdig without any options and arguments, the command's syntax, as with sysdig's, usually takes this form:

sudo csysdig [option]... [filter]

The most common option is -d, which is used to modify the delay between updates in milliseconds. For example, to view csysdig output updated every 10 seconds, instead of the default of 2 seconds, type:

sudo csysdig -d 10000

You can exclude the user and group information from views with the -E option:

sudo csysdig -E

This can make csysdig start up faster, but the speed gain is negligible in most situations.

To instruct csysdig to stop capturing after a certain number of events, use the -n option. The application will exit after that number has been reached. The number of captured events has to be in the five figures; otherwise you won't even see the csysdig UI:

sudo csysdig -n 100000

To analyze a trace file, pass csysdig the -r option, like so:

sudo csysdig -r sysdig-trace-file.scap

You can use the same filters you used with sysdig to restrict csysdig's output. So, for example, rather than viewing event data generated by all users on the system, you can filter the output by users by launching csysdig with the following command, which will show event data only generated by the root user:

sudo csysdig user.name=root

The output should be similar to the one shown in the following image, although the output will reflect what's running on your server:

To view the output for an executable generating an event, pass the filter the name of the binary without the path. The following example will show all events generated by the nano command. In other words, it will show all open files where the text editor is nano:

sudo csysdig proc.name=nano

There are several dozen filters available, which you can view with the following command:

sudo csysdig -l

You'll notice that that was the same option you used to view the filters available with sysdig command. So sysdig and csysdig are just about the same. The main difference is that csysdig comes with a mouse-friendly interactive UI. To exit csysdig at any time, press the Q key on your keyboard.

Conclusion

Sysdig helps you monitor and troubleshoot your server. It will give you a deep insight into all the system activity on a monitored host, including those generated by application containers. While this tutorial didn't cover containers specificially, the ability to monitor system acticity generated by containers is what sets Sysdig apart from similar applications. More information is available on the project's home page.

Sysdig's chisels are a powerful extension of the core Sysdig functionality. They're written in Lua, so you can alway customize them or write one from scratch. To learn more about crafting chisels, visit the project's official chisel page.