Sankey Visualization with Vega in Kibana 6.2

Continuing the series on building custom Vega graphs in Kibana, today's topic is a simple two-level Sankey graph to show network traffic patterns. (Last time, we discussed custom Vega visualization in Kibana.) A Sankey diagram is a type of flow diagram with the width of the lines shown proportionally to the flow quantity. Each entry in the sample data has source and destination country code. The graph will have two modes: all-to-all (default), plus it will allow users to select either the source or the destination country, and show only related traffic.

Prerequisites

Use makelogs utility to generate sample data. Install it with npm install -g makelogs, run with makelogs. You may want to generate a bigger dataset with a -c 100k parameter. This assumes you already have NPM. Don't do this on a production cluster!

Click on the Management tab (last icon on the left), create a new index pattern: enter logstash-* for the index pattern, click Next step, choose @timestamp for the time filter, click Create index pattern.

Test that everything works by using Discover tab, setting the time filter in the upper right corner to the last 1 hour, and observing randomly generated data.

Data

For this example, I will use geo.src (as the first stack) and geo.dest (as the second stack) fields from the random data generated by the makelogs utility. The fields represent traffic's source and destination. This query would aggregate by both fields using Elasticsearch composite aggregation, counting the number of documents for each combination, and returning the first 10.

Visualization Strategy

We can think of the Sankey graph as having nodes organized into stacks, and edges connecting the nodes. For this graph we only have two stacks: the source and the destination. The two stacks are drawn as vertical bars, one on each side of the screen. Each stack has countries stacked one on top of the other. Each country consists of nodes - portions of the source country that are connected with the nodes in the destination country.

We need to have several data tables to draw the above graph:

nodes This table will not be used for drawing directly, but it will be used as the data source for the other tables. We can visualize nodes as black boxes in the picture above - we have 12 of them here. Each node consists of the document count it represents, the stack ID, the country code, and position within the stack (y0 to y1). Note that this table doesn't have any "screen" coordinates, just the source data. Even stack position is expressed in terms of document counts. First node would go 0..n1-1, second would be n1..n2-1, etc. Also, since we have two stacks, there would be two of the first, second, … nodes, one for each stack.

edges A list of lines (6 in this case), each connecting a node on the left and the right sides. The line will need a pair of (x, y) coordinates, line thickness (strikeWidth), and line color (same as source node). This table is generated by taking all stack=="stk1" rows from the nodes table, and looking up the corresponding node for the destination stack. A linkpath transform generates the SVG path string describing the curved line.

groups Each group combines all of the nodes for the same country for the same stack. In this graph we have 6 groups, 3 on each side. Each group needs to have a stack ID (stk1 or stk2), the starting and the ending Y value (y0 and y1, similar to the nodes table), and the country code. This table also gets generated from the nodes table, grouping it by stack ID and country code, and stacking it similar to nodes. The y values will align with the nodes table because we sort both tables on the same values.

In addition to the tables, I use three scales to project from "data coordinate space" into the "screen coordinate space". The "x" scale determines the location of stk1 and stk2 stacks on the screen. The "y" scale maps graph height to the total height of all nodes in the tallest stack (in our case both stacks have the same height). Lastly, the "color" scale assign a color to each country.

The last step is to convert data tables to visuals using three marks: the path mark for edges, rect mark for the stacks, and the text mark for country labels.

Interactions

At this point you will see the expected graph, but often you may want to define how users can interact with the graph. This is done with signals - dynamic variables that change their value based on events, and can be used in various expressions.

When a user clicks on a country group in the source stack, the graph needs to hide all data that is not from the clicked country. Similarly, it should work when clicking the destination side. This is done by defining a groupSelector signal. When groupMark is clicked, the signal is set to the country code as either the source or destination. Nodes table is automatically filtered based on the signal value, and the graph is redrawn as if no other data exists. To go back, I show a rectangle box in the middle with a text mark. Clicking the box resets the signal, thus restoring nodes table to the full list.

Another interaction is "mouseover" on countries in the stacks. On groupMark mouseover event, groupHover signal is set similar to the groupSelector. The signal is used in controlling transparency of the edge lines.

Debugging and Exploring

Understanding Vega code can be challenging at times. Luckily, you can access Vega internal state at any moment using browser debugging tools. In Firefox and Chrome, use Control+Shift+I (Windows, Linux), or Command+Option+I (Mac). Use console to see table content, e.g. console.table(VEGA_DEBUG.view.data(&apos;edges&apos;)) or the state of a signal with VEGA_DEBUG.view.signal(&apos;height&apos;). You can even view scale internal values with VEGA_DEBUG.view._runtime.scales.y.

Vega code

See code comments for an explanation of what each line does, or read the Vega documentation. Note that this code uses HJSON, a more readable form of JSON.