This page summarize the design of the stateful window operator, related to SAMZA-552. This operator is primarily used for stream-stream join, in which streams are windowed (we do not support infinite window in Samza).

Class APIs

As described in this document, there are two classes extending the WindowOperator interface: AggregatedWindowOperator and FullStateWindowOperator. The difference between these two operators is that for AggregatedWindowOperator you cannot access to its windowed messages, whereas for FullStateWindowOperator you can. This feature of FullStateWindowOperator is necessary for operations such as joins, etc.

Case Studies

Here are a list of example window operations and how they can be implemented.

Windowed Aggregation

The following aggregation on stream Orders:

SELECT STREAM product, AVG(price) AS avg_price
FROM Orders
OVER (ORDER BY time RANGE '10' min PRECEDING)

Can be implemented as follows:

AveragePriceTask extends StreamTask with InitableTask {
void init() {
// operator spec should include the aggregation function
WindowOperator orderWindows = new AggregatedWindowOperator(OperatorSpec);
orderWindows.init();
}
void process(Tuple tuple) {
// if there are any new results generated, they will
// be sent to output stream automatically
orderWindows.process(tuple);
}
}

Stream-Stream Join

First note that unbounded stream-stream joins are not supported, i.e. join predicates must include timestamps from both streams. For example, the following stream-stream join will be rejected at parsing time.

SELECT STREAM o.time as time, o.id as id, a.value, s.cost
FROM Orders as o
JOIN Shipments as s
ON o.id = s.id

The following join on stream Orders and Shipments aligns timestamp boundary from the two streams: