Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Stream processing comparison

2.
Stream processing systems comparison
January 20, 2016
2/15
Introduction
Process model of many big data applications are changed
from batch processing to stream processing
batch processing has advantages in throughput, while
latency of stream processing is much shorter
stream processing could get very high throughput too

7.
Stream processing systems comparison
January 20, 2016
7/15
Comparison (cnt.)
Usage
More work need be done in storm applications, but we get
more ﬂexibility.
Flink provides low-level operators which are similar to
Storm Bolts such as OneInputStreamOperator,
TwoInputStreamOperator. These operators are not too
complex to use.
Spark streaming low-level operators are a little hard to use.
Spark streaming could also lose some ability because of
micro-batch processing model.

8.
Stream processing systems comparison
January 20, 2016
8/15
Example
Problem
There are two streams: advertisement(advId, shownTime)
and click(advId, clickTime). How to get a stream that
contains all clicked advertisements (advId, shownTime,
clickTime) which are clicked in 10 minutes after shown?

9.
Stream processing systems comparison
January 20, 2016
9/15
Example
Problem
There are two streams: advertisement(advId, shownTime)
and click(advId, clickTime). How to get a stream that
contains all clicked advertisements (advId, shownTime,
clickTime) which are clicked in 10 minutes after shown?
Solution of Storm
Implement a bolt which receives records from two spouts,
cache records and do join operation

11.
Stream processing systems comparison
January 20, 2016
11/15
Example (cnt.)
Problems of Flink
1. Flink only provides join operation on the same window
2. Window without slides will cause data missing
3. Window with slides could introduce duplicate data
Solution of Flink
Implement a join operator extend
TwoInputStreamOperator which is similar to
WindowOperator.
The self-implemented operator is similar to storm solution
at some point.