About Abhishek Jain

Abhishek is a lead software engineer with Impetus Technologies, pioneered in outsource product development. He has technical expertise in designing and implementing solutions around distributed application, database technologies, middle-ware technologies and SaaS platform.

The complex (event) world

This blog entry attempts to summarize the technologies in the CEP domain, touching over their prime feature(s) as well as their lackings. It seems sometimes that the term CEP is being overused (as did happen with ‘ESB’) and the write-up below reflects our perception and version of it.

ESPER (http://esper.codehaus.org/) is a popular open source component for complex event processing (CEP) available for java. It includes rich support for pattern matching and stream processing based on sliding time or length windows. Despite the fervent discussion over the term ‘CEP’ (http://www.dbms2.com/2011/08/25/renaming-cep-or-not/), ESPER seem to be a good fit for the term CEP, as it appears to be able to really identify “complex events” from a stream of simple events, thanks to ESPER’s EPL (Event Processing Language).

Recently, while searching for an open source solution for real time CEP, our group stumbled across Twitter’s Storm project (https://github.com/nathanmarz/storm). It claims to be most comparable to Yahoo’s S4, while being in the same space as “Complex Event Processing” systems like Esper and Streambase. I am not sure about Streambase, but digging deeper into the Storm project made it look much different from CEP and from the ESPER solution. Ditto with S4 (http://incubator.apache.org/s4/). While S4 and Storm seem to be good at real time stream processing in a distributed mode and they appeared (as they claim) to be the “Hadoop for Real Time”, they didn’t seem to have provisions to match patterns (and thus to indicate complex events).

Searching for a definition for CEP (that our study can relate to) led to the following bullets, (http://colinclark.sys-con.com/node/1985400) which include the below four as prerequisite for a system/solution to be called a CEP component/project/solution:

Domain Specific Language

Continuous Query

Time or Length Windows

Temporal Pattern Matching

The lack of continuous query supporting time/length windows and temporal pattern matching seem to altogether missing in current versions of the S4 and Storm projects. Probably, it is due to their infancy and they will mature up to include such features in future. As of now, they only seem fit for pre-processing the event stream before passing it over to a CEP engine like ESPER. Their ability to do distributed processing (a la map-reduce mode) can help to speed up the pre-processing where events can be filtered off, or enriched via some lookup/computation etc. There have also been some attempts to integrate Storm with Esper (http://tomdzk.wordpress.com/2011/09/28/storm-esper/).

While the processing systems like S4 and Storm lack important features of CEP, ESPER based systems have the disadvantage of being memory bound. Having too many events or having a large time windows can potentially cause ESPER to run out of memory. If ESPER is used to process real time streams, for e.g. from a social media, there will be lot of data accumulating in the ESPER memory. On a high level, the problem statement is to invent a CEP solution for big data. On a finer level, the problem statement include architecting a CEP solution for handling on-board (batched) as well as in-flight (Real-Time) data.

In DarkStar’s terminology (http://www.eventprocessing-communityofpractice.org/EPS-presentations/Clark_EP.pdf), the requirement is “having matched a registered pattern in real time, discover similar patterns in the database”. Since being memory bound is a limitation, it may prove useful if some mechanism to condense the in-memory events can be arrived at. The condensed data however should still be meaningful and retain the context of the original stream.

Newsletter

Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

Email address:

Recent Jobs

No job listings found.

Join Us

With 1,240,600 monthly unique visitors and over 500 authors we are placed among the top Java related sites around. Constantly being on the lookout for partners; we encourage you to join us. So If you have a blog with unique and interesting content then you should check out our JCG partners program. You can also be a guest writer for Java Code Geeks and hone your writing skills!

Disclaimer

All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners. Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. Examples Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.