The SECRET Model - Past Project

SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems

There are many academic and commercial stream processing engines (SPEs) today, each of them with its own execution semantics. This variation may lead to seemingly inexplicable differences in query results. SECRET is a descriptive model that we have developed at ETH Zurich, which allows users to analyze the behavior of systems and understand the results of window-based queries (with time- and tuple-based windows) for a broad range of heterogeneous SPEs. The model is the result of extensive analysis and experimentation with several commercial and academic engines. This webpage presents the SECRET model and the details of our experimental setting.

SECRET in a Nutshell

Scope:Scope provides information about potential window intervals. More specifically, Scope defines the interval of active window at application time t or tuple-id i for time- and tuple-based windows, respectively(i.e., the open window with the earliest start time or tuple-id). Scope is based on window parameters (size (ω), slide (β)) and start of the first window (t0 or i0).

Content:Content maps window intervals provided by Scope into actual window contents. More specifically, Content specifies the set of input tuples that are in Scope as of given system time and application time, or tuple-id for time- and tuple-based windows, respectively.

Report:Report defines the conditions under which the window contents provided by Content become visible for further query evaluation and result reporting. Report can take a logical combination of four different strategies. In each, reporting is done for application time t or tuple-id i, only if: (i) content-change (Rcc): the content has changed since last reporting; (ii) window-close (Rwc): the active window closes; (iii) non-empty (Rne): the content is not empty; (iv) periodic (Rpr): it is a multiple of λ, where λ denotes the reporting frequency.

Tick:Tick defines the condition which drives an SPE to take action on its input. It can be based on one of the following: (i) tuple-driven: react to individual tuples; (ii) time-driven: react to all tuples with the same application time value; (iii) batch-driven: react to subsets of tuples with the same application time value. Note that tuple-driven and time-driven are in fact special cases of batch-driven.

Tick is the entry point to the control loop of the model, creating a chain reaction by invoking Report, which in turn invokes Content, which builds on Scope (Tick -> Report -> Content -> Scope), as shown in the following figure:

Experiments with SPEs

During our analysis, we ran simple aggregations (summation or average) over time- and tuple-based windows in different SPEs to understand their execution semantics. Namely, we used Coral8 Version 5.5, Oracle CEP Version 11.1, open-source academic prototype STREAM and StreamBase Version 6.5. After careful analysis, we obtained the following SECRET parameters for these systems:

SPE

Scope (t0,i0)

Report

Tick

Coral8

ceiling((tt1-ω)/β)β-1, 0

Rcc & Rne

batch-driven

Oracle CEP

ceiling(tt1/β)β-ω, β-ω

Rwc & Rcc

time-driven

STREAM

tt1-ω, β-ω

Rwc & Rcc & Rne

time-driven

StreamBase

ceiling((tt1-ω)/β)β-1, 0

Rwc & Rne

tuple-driven

In the table tt1, ω, β represent application time of the first tuple, window size and window slide, respectively. ceiling is used for ceiling function which maps a real function to its smallest following integer.In the following, we provide the details of our experimental setting including the set of queries and input streams we used, how we executed them on each of the SPEs that we studied together with the results that they produced, so that the differences in SPEs' execution semantics could be understood better. Each of these experiments and how SECRET explains their results are discussed in detail in our journal paper.

General Setup:

Inputs: Various input streams with the following schema:

InStream(Time /* application time of the tuple in seconds */,

Val /* an integer value, representing tuple content */)

Queries: Various continuous aggregation queries with time- and tuple-based windows having different size and slide parameters.