Taking KSQL for a Spin Using Real-time Device Data

Taking KSQL for a Spin Using Real-time Device Data

Evaluating KSQL has been high on my to-do list ever since it was released back in August. I wanted to experiment with it using an interesting, high velocity, real-time data stream that would allow me to analyse events at the millisecond level, rather than seconds or minutes. Finding such a data source, that is free of charge and not the de facto twitter stream, is tricky. So, after some pondering, I decided that I'd use my Thrustmaster T300RS Steering Wheel/Pedal Set gaming device as a data source,

The idea being that the data would be fed into Kafka, processed in real-time using KSQL and visualised in Grafana.

This is the end to end pipeline that I created...

...and this is the resulting real-time dashboard running alongside a driving game and a log of the messages being sent by the device.

This article will explain how the above real-time dashboard was built using only KSQL...and a custom Kafka producer.

I'd like to point out, that although the device I'm using for testing is unconventional, when considered in the wider context of IoT's, autonomous driving, smart automotives or any device for that matter, it will be clear to see that the low latency, high throughput of Apache Kafka, coupled with Confluent's KSQL, can be a powerful combination.

I'd also like to point out, that this article is not about driving techniques, driving games or telemetry analysis. However, seeing as the data source I'm using is intrinsically tied to those subjects, the concepts will be discussed to add context. I hope you like motorsports!

Writing a Kafka Producer for a T300RS

The T300RS is attached to my Windows PC via a USB cable, so the first challenge was to try and figure out how I could get steering, braking and accelerator inputs pushed to Kafka. Unsurprisingly, a source connector for a "T300RS Steering Wheel and Pedal Set" was not listed on the Kafka Connect web page - a custom producer was the only option.

To access the data being generated by the T300RS, I had 2 options, I could either use an existing Telemetry API from one of my racing games, or I could access it directly using the Windows DirectX API. I didn't want to have to have a game running in the background in order to generate data, so I decided to go down the DirectX route. This way, the data is raw and available, with or without an actual game engine running.

The producer was written using the SharpDX .NET wrapper and Confluent's .NET Kafka Client. The SharpDX directinput API allows you to poll an attached input device (mouse, keyboard, game controllers etc.) and read its buffered data. The buffered data returned within each polling loop is serialized into JSON and sent to Kafka using the .NET Kafka Client library.

A single message is sent to a topic in Kafka called raw_axis_inputs every time the state of one the device's axes changes. The device has several axes, in this article I am only interested in the Wheel, Accelerator, Brake and the X button.

{
"event_id":4300415, // Event ID unique over all axis state changes
"timestamp":1508607521324, // The time of the event
"axis":"Y", // The axis this event belongs to
"value":32873.0 // the current value of the axis
}

This is what a single message looks like. In the above message the Brake axis state was changed, i.e. it moved to a new position with value 32873.

You can see below which inputs map to the each reported axis from the device.

You can tell by looking at the timestamps, it's possible to have multiple events generated within the same millisecond, I was unable to get microsecond precision from the device unfortunately. When axes, "X", "Y" and "RotationZ" are being moved quickly at the same time (a bit like a child driving one of those coin operated car rides you find at the seaside) the device generates approximately 500 events per second.

Creating a Source Stream

Now that we have data streaming to Kafka from the device, it's time to fire up KSQL and start analysing it. The first thing we need to do is create a source stream. The saying "Every River Starts with a Single Drop" is quite fitting here, especially in the context of stream processing. The raw_axis_inputs topic is our "Single Drop" and we need to create a KSQL stream based on top of it.

With the stream created we can we can now query it. I'm using the default auto.offset.reset = latest as I have the luxury of being able to blip the accelerator whenever I want to generate new data, a satisfying feeling indeed.

Derived Streams

We now have our source stream created and can start creating some derived streams from it. The first derived stream we are going to create filters out 1 event. When the X button is pressed it emits a value of 128, when it's released it emits a value of 0.

To simplify this input, I'm filtering out the release event. We'll see what the X button is used for later in the article.

From this stream we are going to create 3 further streams, one for the brake, one the accelerator and one for the wheel.

All 3 axes emit values in the range of 0-65535 across their full range. The wheel emits a value of 0 when rotated fully left, a value of 65535 when rotated fully right and 32767 when dead centre. The wheel itself is configured to rotate 900 degrees lock-to-lock, so it would be nice to report its last state change in degrees, rather than from a predetermined integer range. For this we can create a new stream, that includes only messages where the axis = 'X', and the axis values are translated into the range of -450 degrees to 450 degrees. With this new value translation, maximum rotation left now equates to 450 degrees and maximum rotation right equates -450 degrees, 0 is now dead centre.

We now need to create 2 more derived streams to handle the accelerator and the brake pedals. This time, we want to translate the values to the range 0-100. When a pedal is fully depressed it should report a value of 100 and when fully released, a value of 0.

Smooth is Fast

It is a general rule of thumb in motorsports that "Smooth is Fast", the theory being that the less steering, accelerator and braking inputs you can make while still keeping the car on the desired racing line, results in a faster lap time. We can use KSQL to count the number of inputs for each axis over a Hopping Window to try and capture overall smoothness. To do this, we create our first KSQL table.

A KSQL table is basically a view over an existing stream or another table. When a table is created from a stream, it needs to contain an aggregate function and group by clause. It's these aggregates that make a table stateful, with the underpinning stream updating the table's current view in the background. If you create a table based on another table you do not need to specify an aggregate function or group by clause.

The table we created above specifies that data is aggregated over a Hopping Window. The size of the window is 5 seconds and it will advance or hop every 1 second. This means that at any one time, there will be 5 open windows, with new data being directed to each window based on the key and the record's timestamp.

You can see below when we query the stream, that we have 5 open windows per axis, with each window 1 second apart.

This is the resulting graph in Grafana with each axis stacked on top of each other giving a visual representation of the total number of events overall and total per axis. The idea here being that if you can drive a lap with less overall inputs or events then the lap time should be faster.

Calculating Lap Times

To calculate lap times, I needed a way of capturing the time difference between 2 separate events in a stream. Remember that the raw data is coming directly from the device and has no concept of lap, lap data is handled by a game engine.
I needed a way to inject an event into the stream when I crossed the start/finish line of any given race track. To achieve this, I modified the custom producer to increment a counter every time the X button was pressed and added a new field to the JSON message called lap_number.

I then needed to recreate my source stream and my initial derived stream to include this new field

The next step is to calculate the time difference between each "Buttons5" event (the X button). This required 2 new tables. The first table below captures the latest values using the MAX() function from the axis_inputs stream where the axis = 'Buttons5'

This new stream is again based on the axis_inputs stream where the axis = 'Buttons5'. We are joining it to our lap_marker_data table which results in a stream where every row includes the current and previous values at the point in time when the X button was pressed.

A quick query should illustrate this (I've manually added column heading to make it easier to read)

We can now see the time difference, in milliseconds ( LAP_TIME_MS ), between each press of the X button. This data can now be displayed in Grafana.

The data is also being displayed along the top of the dashboard, aligned above the other graphs, as a ticker to help visualize lap boundaries across all axes.

Anomaly Detection

A common use case when performing real-time stream analytics is Anomaly Detection, the act of detecting unexpected events, or outliers, in a stream of incoming data. Let's see what we can do with KSQL in this regard.

Driving Like a Lunatic?

As mentioned previously, Smooth is Fast, so it would be nice to be able to detect some form of erratic driving. When a car oversteers, the rear end of the car starts to rotate around a corner faster than you'd like, to counteract this motion, quick steering inputs are required to correct it. On a smooth lap you will only need a small part of the total range of the steering wheel to safely navigate all corners, when you start oversteering you will need make quick, but wider use of the total range of the wheel to keep the car on the track and prevent crashing.

To try and detect oversteer we need to create another KSQL table, this time based on the steering_inputs stream. This table counts steering events across a very short hopping window. Events are counted only if the rotation exceeds 180 degrees (sharp left rotation) or is less than -180 degrees (sharp right rotation)

This data is plotted on the Y axis (we're talking graphs now) on the "Steering inputs" panel in Grafana. The oversteer metric can be seen in red and will spike when steering input exceeds 180 degrees in either direction.

Braking too Hard?

Another anomaly I'd like to detect is when maximum brake pressure is applied for too long. Much like the brake pedal in a real car, the brake pedal I'm using has a very progressive feel, a fair amount of force from your foot is required to hit maximum pressure. If you do hit maximum pressure, it shouldn't be for long as you will most likely lock the wheels and skid off the race track, very embarrassing indeed.

The first thing to do is to create a table that will store the last time maximum brake pressure was applied. This table is based on the brake_inputs stream and filters where the value = 100

Something worth mentioning is that if I hold my foot on the brake pedal at the maximum pressure for any period of time, only one event is found in the stream. This is because the device only streams data when the state of an axis changes. If I keep my foot still, no new events will appear in the stream. I'll deal with this in a minute.

Next we'll create a new stream based on the brake_inputs stream and join it to our max_brake_power_time table.

For each row in this stream we now have access to all columns in the brake_inputs stream plus a timestamp telling us when max brake power was last reached. With this data we create a new derived column bi.timestamp - mb.last_max_brake_ts AS time_since_max_brake_released which gives a running calculation of the difference between the current record timestamp and the last time maximum brake pressure was applied

For example, when we query the stream we can see that maximum pressure was applied at timestamp 1508772739115 with a value of 100.0. It's the row immediately after this row that we're are interested in 99.90234225 | 1508772740803 | 1508772739115 | 1688.

Remember, that while an axis is held at the same value, 100.0 in this case, no more events will appear in the stream until the value changes again. This is why we are interested in the row preceding the maximum value, this row is telling us how long the value of 100.0 was applied for. In this case the time it was held for was 1688 milliseconds. Notice that on subsequent rows the value increases, but we are not interested in those rows. In order to isolate what we want, we need another table. This new table takes our previously created stream, brake_inputs_with_max_brake_power_time and groups it by the last_max_brake_ts column. For each grouping we then get the MIN(time_since_max_brake_released).

When we query this table, while stepping hard on the brake pedal for a few seconds at a time, we get the information we want. We can see the timestamp for when maximum brake pressure reached and for how long it was sustained.

Here's what the above data looks like when visualised in Grafana. The bottom graph is showing when maximum brake pressure was hit and on for how long it was sustained. I've set a threshold against the graph of 1 second so any extreme braking is clearly identifiable - if you're that hard on the brakes for that long, you're probably going to end up in the scenery.

The Tale of 2 Laps

After putting it all together, it's time to take to the track and see how it looks. This video shows 2 complete laps onboard with the Caterham Seven 620R around Brands Hatch in the UK. The first lap is a relatively smooth one and the second is quite ragged. Notice that the first lap ( lap 68 ) is quicker overall than the second ( lap 69 ). On lap 69, I start to drive more aggressively and oversteer spikes start to appear in the steering input graph. Lap 69 also has significantly more events overall than lap 68 as a result my more exuberant ( slower ) driving style. You'll also notice that maximum brake pressure is reached a couple of times on each lap, but for no longer than the threshold of 1 second on each occurrence.

Summary

KSQL is awesome! Although it's only a developer preview at this point, it's impressive what you can get done with it. As it evolves over time and mirrors more of the functionality of the underlying Streams API it will become even more powerful, lowering the barrier to entry for real-time stream processing further and further. Take a look at the road map to see what may be coming next.

Oh, and I recently discovered on the #KSQL community Slack group, that you can execute KSQL in Embedded Mode right inside your Java code, allowing you to mix the native Streams API with KSQL - very nice indeed !

About Us

Hundreds of organizations, from Fortune 500 companies to Ivy Leagues to startups, trust Rittman Mead as a source of knowledge about business intelligence, data, and analytics. What could your business do with the faster, more insightful analytics?