Patterns of Event Stream Processing is a series of articles surveying concepts that underpin streaming systems. In this post, we’ll study a technique called session windowing that’s used for grouping related data over a period of time. We’re going to look at what they are, what they’re used for, and why they’re hard to build from scratch. We’ll finish with our suggested design pattern for building session windows in a scalable manner.

A primer on windowing

Windows are an abstraction for breaking up a (potentially) unbounded stream of events into distinct sections, where the events that fall into each section share some common characteristic. It’s helpful to examine subsections of a stream in isolation since most kinds of analysis only require a subset of the entire stream.

Windowing is a descriptive term that is analagous to being inside a room where the walls are opaque, and only the transparent windows provide views of the world. Your view from the window is clearly a subset of everything going on outside, thus allow you to intentionally restrict your view.

There are different kinds of windows that bucket events in a different ways. For example, “fixed” windows bucket data into fixed size blocks of time. “Sliding” windows bucket data by time, too, but set up their buckets in an overlapping manner to create a “rolling” effect, which is useful for understanding change over time. “Global” windows simply put all data into one single bucket.

Each kind of window has criteria for what falls into an instance of that window. For most kinds, the criteria is statically defined and intuitive to understand. Sessions, on the other handy, are relatively tricky because the criteria changes dynamically in response to what events the window has already seen.

Session window basics

Informally, a session is a group of logically related activity that happens “in one sitting”. Reacting to concentrated levels of user activity is extremely desirable, as business savvy companies can easily analyze and leverage that activity. Consider a motion sensor in a shopping mall that emits data whenever movement is detected. A team might utilize sessions to consolidate periods of motion to understand customer behavior.

For our running example in this article, we’ll check in on our fictional friends at Aubergine Games, makers of the hit indie game Vegetable Towers. Months after launching, Vegetable Towers’ numbers were surging, and the business team saw an opportunity to capitalize on their freemium model. The plan to boost revenue in Q4 was drawn up: an in-game store for users to exchange real currency for weapon and armor upgrades. The in-game store was a text-book sales play. Players browse a catalog of items and put them into their shopping cart. When the player’s ready to purchase to purchase, they enter their payment information and complete the transaction.

The store launched, and the business team excitedly crunched their numbers to see if their campaign was driving up revenue. After about 2 weeks, they concluded that the store was bringing in 70% as much capital as they’d hoped. Not great, but not bad. To bridge their goal, the business team needed actionable information about what was happening when players made their way into and through the store. In other words, they needed a better understanding of in-game store sessions.

The engineering team got to work and began by reading some data out of their event log to see what data they could leverage. The lead engineer pulled up activity for user AllOfR, a fanatical Australian player. They determined that a few days ago she had logged into the store, placed an item in her cart, then logged out - closing the session but never completing the transaction and making a purchase. The business team would probably be intrigued if they could view a metric in real time of the number of users putting items in their carts and then deciding not to purchase. They were halfway to buying - maybe the players would click “buy” if Vegetable Towers could react fast enough and make an enticing offer?

Formally speaking

A session is formally defined by a grouping attribute (sometimes called the session key), the time of a “starting” event, the time of an “ending” event, and the set of all events between the opening and closing events. AllOfR’s session with the store can be broken down as follows:

Grouping attribute

Events taken by AllOfR

Time of first event

Timestamp of login

Time of last event

Timestamp of logout

Event set

{login,put-item-in-cart,logout}

It’s easy to see why these related events all belong to AllOfR’s session, since they’re clearly bounded by the hinting names of “login” and “logout”. They’re also all actions taken by AllOfR, not another player. The grouping attribute provides isolation across sessions. Any actions not taken by AllOfR don’t belong to any of her sessions.

If there were always delimiting events that clearly marked boundaries and they were received in order, most problems involving sessions would be easy. There are two major hitches that make determining the starting and ending bounds of a session much harder in contemporary systems.

Mobile computing blurs the line between “logged in” and “logged out”. The vast majority of interactions with a mobile app never end in a logout event - you simply stop using the device.

In many domains, when an ending boundary and a starting boundary are close enough, they really count as a single logical session and ought to be treated that way for business purposes. For example, if AllOfR enters the store, then exits for 5 seconds by accident, but then re-enters the store, one could argue that AllOfR has entered the store twice, and that’s two sessions. 5 seconds is a really short time to step out (maybe her internet momentarily disconnected). For most analysis purposes, we’d treat this as a single session.

When we can’t rely on the presence of semantic beginning and ending events, sessions must involve another element: a timeout gap. A timeout gap is the minimum duration between any two events that cause them to fall into distinct sessions. The gap serves as measure of idle tolerance to logically split events.

Timeout gaps divide activity periods

Let’s revisit our example with AllOfR and assume that her mobile device isn’t going to reliably transmit logout events when she’s in the store, so instead we’ll timestamp every event. Our goal will be to construct sessions with a 10 minute timeout gap. We’ll start with the follow sequence of events:

event

timestamp

put-item-in-cart

July 5th, 2017 2:30 PM

checkout

July 5th, 2017 2:34 PM

In the absence of all other events, these two actions from AllOfR occur 4 minutes apart, combining them into a session with bounds 2:30 and 2:34. Below you’ll find an interactive example to learn more about how session windows work. The left side is an editor with an array of JSON maps, where each map represents an event. On the right is a zoomable timeline displaying the sessions for those events. Each blue block in the timeline represents one session. The table below indicates the presence of a “checkout” event in each session. Try adding new events, changing the event types, and adjusting the timestamps to see what happens.

Loading interactive content...

The team continued to scour their logs and discovered 3 more events for AllOfR:

event

timestamp

put-item-in-cart

July 5th, 2017 3:03 PM

remove-item-from-cart

July 5th, 2017 3:07 PM

preview-item

July 5th, 2017 3:16 PM

The first event, put-item-in-cart, occurs at 3:03 on the same day. When we view all sessions for AllOfR, we find that we have a session with bounds 2:30 - 2:34 that matches the day. 3:03 is more than 10 minutes after the closing boundary, 2:34, so the put-item-in-cart event constitutes a new session. The bounds of the new session are 3:03 - 3:03, the exact same timestamp.

The second event, remove-item-from-cart, rolls in with a timestamp at 3:07. Looking at our sessions again, we find two of them for July 5th. We compute the acceptable range for all sessions, which is +- 10 minutes (remember we’re using a 10 minute timeout gap), to absorb the new event. Notably, any events within the timeout gap on either side of a session are eligible for capture. Adding 10 minutes to either side makes the ranges 2:20 - 2:44 and 2:53 - 3:13 for each session. Clearly, the timestamp for this event, 3:07, falls into the second session. We now have two sessions with bounds 2:30 - 2:34 and 3:03 - 3:07.

Let’s consider the third event, preview-item. We perform the same steps, finding the relevant sessions for AllOverR and computing their timeout ranges. This time, the sessions can capture events within ranges 2:20 - 2:44 and 2:53 - 3:17. The event, timestamped at 3:16, slots into the second session again, making the new session boundaries 2:30 - 2:34 and 3:03 - 3:16.

That was a lot to take in. We’ve preloaded the data for the story in the next interactive example to match our discussion. Note that there hasn’t yet been a checkout in the second session, which is why it’s marked in red. Try flipping one of the event types to checkout. Try changing the user ID of a few of the events to something else, and watch how the timeline splits off a new session key.

Loading interactive content...

Sessions are a fascinating construct because of their dynamism. The boundaries grow in either direction in reaction to learning new information at runtime. This elastic nature comes at a cost when we increase the problem complexity to account for real software networks that are less friendly to event ordering.

In the world of Vegetable Towers, our fictitious company grew their capacity to understand consumer activity inside the store. With sessions, analysts could now track when each player entered the store, placed an item in their cart, but wandered away without making a purchase. Perhaps the player nearly bought the items in their cart, but ultimately decided it was just a bit too expensive. If sessions are computed with realtime streaming, the business team can play another angle, offering players a coupon to discount their order by 15% to entice a purchase on the spot.

Out of order events makes the problem much harder

When we look at where the complexity lies with respect to computing sessions over event streams, the tricky area is clearly the fact that session window boundaries grow in response to new information. What makes this especially difficult is the fact that event streams frequently deliver their messages in an out of order manner. Sometimes this is intentional - we might parallelize an event stream to increase the rate that consumers can read from it. Other times it’s incidental due to network or geographical reasons. AllOfR might be sitting in a car that her friend is driving, playing Vegetable Towers in the passenger seat. They could drive under a tunnel, cutting off network activity for a few minutes. AllOfR’s events will effectively lag behind for a bit until they come out of the tunnel and reconnect. Robust streaming systems need to be tolerant to receiving old events.

If we total up AllOfR’s event log thus far and order what we’ve seen, we come up with the following sequence:

event

timestamp

put-item-in-cart

July 5th, 2017 2:30 PM

checkout

July 5th, 2017 2:34 PM

put-item-in-cart

July 5th, 2017 3:03 PM

remove-item-from-cart

July 5th, 2017 3:07 PM

preview-item

July 5th, 2017 3:16 PM

We’ve determined, for a 10 minute timeout gap, that this sequence equates to 2 sessions: 2:30 - 2:34 and 3:03 - 3:16 for July 5th. Later in the day, we receive more activity for AllOfR:

event

timestamp

put-item-in-cart

July 5th, 2017 3:29 PM

checkout

July 5th, 2017 3:37 PM

put-item-in-cart

July 5th, 2017 3:25 PM

By this point, you can probably see that the first two events constitute a new session, yielding 3 total sessions for our Australian friend: 2:30 - 2:34, 3:03 - 3:16, and 3:29 - 3:37. The last event, listed at 3:25, shows where stream disorder makes this problem super interesting - we’ve received an event that merges two existing sessions. We’ll go slow with how this works since this is the key to understanding sessions at an implementation level.

When we compute the timeout ranges for the two existing sessions, they turn out to be 2:53 - 3:26 and 3:19 - 3:47. Note that the ending boundary of the first session and the starting boundary of the second session overlap, indicating that any event falling into that time range is a part of both sessions. This is the first time in this article that we’ve seen a situation where an event belongs to two sessions. 3:25 falls into range for the first session, so we tack it on, yielding a session with bounds 2:53 - 3:25. This leaves us with two sessions that have overlapping boundaries, so we must merge them together into exactly one session, with boundaries 2:53 - 3:37.

Let’s look at our 3 original sessions interactively. Try adding the final event in the table to merge the latter two sessions together, watching how they combine in both their boundaries and purchase results.

Loading interactive content...

An additional hoop to jump through during a merge is the matter of the session state. Sessions are usually reducing their contents down to a value based off an aggregate function, like sum, min, count and so forth. In our case, we’re aggregating to determine whether a transaction has been made. When two sessions merge together, their values need to be merged, too. This is sometimes referred to as a “super aggregation” - an aggregate of aggregates. This is a little quirky because sometimes a super aggregation function is the same as its derived aggregate functions, but sometimes not. For example, we merge two “count” aggregates by summing their values together, but we merge two “max” aggregates by taking the max of the two values.

Experiment

Before we move onto the discussion of how sessions can be implemented in streaming architectures, let’s pause to interactively work with a richer data set. The dataset below spans multiple session keys over a wider range of time.

Loading interactive content...

Design anti-pattern: compute sessions on demand with a database

Now that we have a solid grasp on the mechanics of sessions, we turn to the topic of implementing this abstraction over a stream. It’s tempting to quickly settle on using a centralized database, like Amazon Redshift or Postgres to store all incoming events. The idea is it’s possible to insert all events into a table as they we receive them. When an analysis involving sessions need to be computed, we’d query the database for all events that might be relevant, then partition the data into sessions in the application that issued the query.

This approach has the nice property of being able to atomically read the events table, then figure out how to form the sessions outside of the database on demand. It works well if you have a small number of events and infrequently need to compute session boundaries. However, for most real world problems, the naive characteristics of this design quickly start to hurt.

Poor performance in the application code

Production use cases need to deal with the fact that two things tend to increase over time: the volume of events and the frequency that sessions need to be computed. Building sessions at the application level backloads all the work that needs to be done at read time. Sessions are computed from scratch every time they’re queried for. As the volume of events grows, the amount of time that it takes to construct the sessions will also increase. If the sessions support a user-facing feature, this design produces a diminished user experience. The number of use cases in a system tends to grow over time, too, meaning that sessions queries will become more demanding. This compounds the performance problem, and there aren’t many good ways to get around it when you take this approach.

Poor performance at query time in the database

Similarly, when the volumes of events increases, the raw query time against the centralized database to collect all relevant events will also increase. If you’re using a database with transaction levels that perform locking, write performance will degrade, too.

On top of all this, when a sufficient number of events are collected into Redshift or your event store of choice, it will become untenable to return all events back into the application’s memory to construct sessions as there will simply be too much data.

Using a database like Redshift is probably out as the design of choice for computing sessions over an event stream. What other options do we have?

We believe that log-centric architectures provide an excellent foundation for implementing streaming primitives. Using storage like Kafka or Kinesis provides some of the infrastructure we need to implement session windows at scale. Kafka and it’s cousins, however, aren’t enough by themselves. We need an intelligent way to manage session state if we want to avoid performance penalties.

Local state makes operations fast

Most of the problems we’d encounter by using a centralized database and application-level queries stem from laziness. When we compute sessions on demand, we not only delay the cost of determining the answer to the query, but we also defer all of the issues around locking and obtaining a snapshot of the relevant events to the caller.

A better approach is to invert the flow of events and queries. Instead of placing all events into a database and querying to build sessions, we’ll place all events into a log, such as Kafka. We’ll then start a number of parallel consumers to read from the log, sharding events by session key. This guarantees that all events for any given entity end up routing to the same process (e.g. all events for AllOfR).

With this approach to sharding, we’re now capable of creating local state storage on the consumer itself. The idea is that as the consumer receives each event, it will incrementally compute and update sessions, storing the results directly on its own disk. The critical point is that we don’t reach across the network to talk to a centralized database. This technique can be classified as eager, since we’re now paying the cost of maintaining sessions once at the time of writing. The consumer is responsible for creating new sessions, updating existing ones, and merging overlapping sessions. Queries for session state will now return in a constant amount of time since no work needs to be done at the time of reading. We might call this technique “frontloading” the work to be done.

Since all of the events for a given entity shard to the same disk location, session operations are blazingly fast compared to querying Redshift for many production streams. Every consumer has all of the information it will ever need to update its sessions in reaction to a new event. This also removes the contention problems that we discussed, since a consumer never needs to look beyond its own disk to ask a question.

Barriers to success

A log-centric architecture in combination with localized state can form the foundation for a scalable session windowing implementation. There are still a number of challenges that can make this architecture difficult to implement.

Session state likely needs to live on disk

The simplest place to accrete state for a consumer of a log is in memory. This becomes problematic as the amount of data will steadily grow over time and likely exceed the memory limits of the machine. A good strategy for dealing with this problem is to use an embedded key/value store like RocksDB or LMDB to store state on disk. Introducing a key/value store means that we maintain fast local operations, but gain access to a much larger storage area than memory.

The tricky part of working with an embedded key/value store for windowing is deciding how you’ll structure your on-disk data. It’s less of an issue when all the data is sitting in memory because it can be searched and indexed with the full power of your programming language. When the data moves to disk, you’ll need to decide on an efficient approach to querying for the minimum amount of state to avoid a performance hit when doing reads and writes.

Fast window merging strategies are hard

The problem of merging two session windows when events are received out of order is difficult when all the data lives in memory. Searching and updating all adjacent windows needs to be done within an acceptable algorithmic complexity. This becomes more challenging when you switch to storing sessions on disk since the data can’t be arbitrarily reindexed. Creating a suitable on disk format for searching and updating can take a fair amount of trial and error.

Eviction policies

If your consumer buffers up state for a sufficient period of time, you’ll eventually need to choose an eviction policy to discard “irrelevant” state, since disks and memory are of finite size. Choosing a strategy for archiving long-term results is a topic in its own right. The upshot is that log replay is always available to recover evicted sessions.

Handling consumer failures

Given that a consumer’s state is entirely local and not replicated, it would be problematic if that consumer went offline. Applications would have no way of recovering answers. Fortunately, technologies like Kafka make it easy to replay events over a new consumer and rapidly recover state. We recommend a variety of approaches to making replay even more powerful in our previous article, Challenges of Record Replay in Event Streaming Architectures.

30% better at Vegetable Towers

Armed with the power of session windows, Aubergine Games rolled out their new streaming architecture. After a few weeks of gathering data, our friends were able to make predictions about buyer patterns. When a player placed an item in their cart, but then removed it before purchasing, the development team was able to detect the situation and reactively issue a one-time coupon for the item left behind. Sessions windows make it easy to trigger new behaviors based off event stream state.