One of the less popular category names I deal with is “Complex Event Processing (CEP)”. The word “complex” looks weird, and many are unsure about the “event processing” part as well. CEP does have one virtue as a name, however — it’s concise.

The other main alternative is to base the name on “stream processing” instead.* The CEP-or-whatever industry is split between these choices, with StreamBase currently favoring “CEP” (despite its company name), IBM emphatically favoring “stream”, and Sybase seemingly trying to have things both ways.

*And then, of course, there is “event stream processing”, regarding which please see below.

The more I think about it, the less I like the term “event processing”. Here’s why. Events happen; data is produced; CEP systems most commonly try to identify and categorize the events based on the data. The CEP systems may then do significant further processing, but more often they just pass the information on to another system (most commonly either persistent DBMS or “real-time” business intelligence). How much of that is really “event processing”? Relatively little, I’d say. And referring specifically to “complex” events doesn’t address my complaints at all.

So I’d like to go with some version of “stream”. But “stream processing” has other computer-related uses, while “Stream management” commonly describes care and planning for small waterways. So “stream” might do best with a modifier, such as “event” or “data”. Of the two, I prefer “data stream” (or “datastream”) to “event stream”; the events aren’t really streaming, but the data is.

So should it be “data stream processing” or “data stream management”? Well, the only one of numerous Wikipedia definitions I’ve actually liked while researching this post is the one for “Data Stream Management System“:

A Data Stream Management System (DSMS) is a set of computer programs that controls the maintenance and querying of data in data streams. The use of a DSMS to manage a data stream is roughly analogous to the use of a Database Management System (DBMS) to manage a conventional database.

A key feature of a DSMS is the ability to execute a continuous query against a data stream. A conventional database query executes once and returns a set of results for a given point in time. In contrast, a continuous query continues to execute over time, as new data enters the stream. The results of the continuous query are updated as new data appears.

I think the data stream/database management analogy is spot on. Your queries work a little differently, but otherwise you’re doing pretty much the same things. Indeed, you’re probably even going to persistently store some of the data, and ideally that DBMS capability would be tightly integrated into your CEP system. (In practice they’re apt to be more loosely coupled; for most purposes that works well enough.) Query execution, data ingestion, performance monitoring/tuning, workload prioritization — it’s very DBMS-like stuff. And by the way, “data stream management system” is the term that was used by the researchers — Mike Stonebreaker, Stan Zdonik, Dan Abadi, et al. — who wrote a paper describing the project on which StreamBase was based … although some might question whether that particular observation is a strong signal of accuracy. 😉

This reasoning suggests Data Stream Management System is what it should be. The usual kinds of abbreviation — datastream (product), datastream manager, DSMS, etc. would no doubt follow. So should it be “Data Stream”, “Datastream”, or “Data-stream”? At that level of detail, I don’t yet have an opinion.

The only thing is — that’s all pretty wordy compared to CEP. So after all this, I’m still not sure which term(s) I prefer.

Comments

Sorry, the word “stream” just serves to severely narrow the applicability of CEP. And, yes, I come from a vendor perspective (having served to launch Tibco’s CEP product and now with Informatica’s CEP). Gartner would also disagree with you and so would the father of CEP (David Luckham) who has a book coming out on the subject. “stream processing” is one form of CEP.
We’ve deployed CEP solutions that are customer driven meaning some have streams, some are batch intervals and some combine both. The key is that they are looking for less-than-simple (albeit not always complex) relationships in a timely manner so they can act before issues grow.

Finally, don’t ever try to turn the acronym into a word like “Sep.” That’s just bad mojo.

Either the data can be (economically/realistically/whatever) persisted before it’s filtered, or it can’t. If it can, it probably should be, so it’s not clear to me why one would use CEP. If it can’t, it’s a lot like a stream.

I share with you your distate for the name “Complex Event Processing.” Not only does the name violate Monash’s Law, but it also violates the first law of marketing, which ios not to name something “complex” unless you want nobody to buy it.

And as you say, stream processing & event stream processing have their own sets of baggage or specific interpretations. The issue I see with characterizing this as Data Streams is that (maybe in my mind only) it implies something singular and transactive: something happens and a piece of data is generated. OLTP is a form of event processing — as an event generates a set of data that in enterprise operaitonal systems are often processed as transactions. But that’s not what we’re talking about here, and of course that is only the simplest form of event processing. (Of coruse I’m not implying that transaciton processing, with need for ACID support, is simple!).

Event processing of the complex — or maybe compound — side typically deals not with single happenings that generate individsual bits of data, but combinations of “things” that happen that generate multiple bits of data. Most of what is called CEP strives to isolate patterns of things (avoiding the term “events” here) that happen, and make that information actionable.

So while I like terms that include “stream” in it, to avoid baggage with that or other terms, let’s use the term “compound” as this sets this form of event processing apart from OLTP.

Although possibly on-target for current products and their technical orientation, I disagree with you here in the bigger picture. Events are simply a different primitive than what data represents (things). So I believe the suggested name takes the focus off-target. Perhaps “Event Stream Management” would be better.

Why does this interest me? Sooner or later, techniques and tools for business analysis need to come to grips with business events. This focus is essential for truly supporting business rules and know-how management. “Events” belong right up there with the other five primitives: things, processes, locations, roles, and goals. (Yes, there’s six — it’s a Zachman view.)

If you’re getting votes, another possible voter would be Prof. Michael Franklin at U.C. Berkeley, who has done research on this and, I think, a startup making such a product. He presented this at the New England Database Summit a few years ago (the product).

I agree with both Giles Nelson (Progress / Apama, discovered this post via him on Twitter) and Tony Baer above. The term CEP is problematic at best.

The term ‘Lean Data Management System’ comes to mind.

Darach.

Opher Etzion on
August 26th, 2011 1:02 pm

I also prefer not to use the term “complex event processing”, I also think that three letter acronyms are typically marketing buzzwords, while disciplines consist of two words: image processing, information retrieval, data mining and more..

The two words I prefer to use is “event processing” and not “stream processing”, since “event” has a semantic meaning of something that happens, the happening is of certain type, it occurred in certain time, in certain place, transitioned certain states etc, which are fundamental to the type of processing, while “stream” is a collection of data in motion, which may be of various types: voice streams, video streams and more, whose processing is somewhat different. It is also interesting to note that products that are descendants of academic projects classified under DSMS label themselves as (complex) event processing.

Google, Yahoo, Microsoft – those are all names that surely were debated with some degree of fist pounding over the lunacy of how they all sound. What happened? They made the name a reality not the other way around.

CEP is just a 3-letter acronym trying to describe a concept – much like a ton of other spaces. Which, by the way, businesses generally don’t care about. We in tech seem to need to throw these TLAs as though there are a host of suitors in waiting.

You could hold BI as a shining example of a successful TLA (two-letter in this case), but the business really knows this as reporting and analysis.

CEP will be more defined by the end-uses than by itself. And, it certainly is NOT only streams of data. There’s a much bigger debate among vendors, methods. I do have to point out that “rules” are a critical component, and surprised that the term has been mentioned just once in this discussion.

You’re mistaking the how with the what when you refer to persistence. Whether or not you persist an event is immaterial.

What is material is what differentiated event stream processing, at least initially, from event processing (where we never persisted the events either – anyone remember NEON, or MQ Series Integrator?). Those areas of differentiated were:

Most of these features can now be found in larger, more well established platforms from firms like Tibco, Progress, and Informatica. Those platforms offer far more than what any CEP engine focused vendor, like Streambase or Sybase, come close to offering.

CEP isn’t a market. It’s a couple of features.

The firms who have come closest to implementing what David’s original book have focused more on the business issues – CEP was supposed to get us closer to understanding what was going on in our domain, finding causality, refining our approach, and the implemented incremental changes quickly. It was supposed to give us insight and actionable intelligence (a phrase hijacked by Sybase lately)

As you can see, most people in the world of CEP probably haven’t even read that book. If they had, we’d be focused on things other than speeds and feeds.

Curt … what a delightful discussion you sparked! As you know, I prefer Stream Processing. But, “What’s in a name? That which we call a rose
By any other name would smell as sweet.” To take this further, can CEP detect odors? colors of a flower at a certain time of day? The marketplace will decide what to call these things. What’s more important to customers than a name, is whether or not they can economically solve a business problem.

Curt, I understand Streambase’s rationale from a technical perspective, don’t get me wrong. But defining a sector after your name…huge! It is like being a Kleenex, Xerox, Google…all corporate names that have come to generically define a things or action. BTW, great seeing you on your last trip, always challenging, insightful and fun.

I like to fondly think that Shivnath Babu and Jennifer Widom came up with the term “Data Stream Management System (DSMS)” in a vision paper we wrote in late 2000 (part of the Stanford STREAM project and published in the 2001 SIGMOD Record: http://ilpubs.stanford.edu:8090/527/). Maybe this is the place to find out that my assumption was wrong.

It is nice to see the interest in data stream processing (or CEP) coming back, in particular in the database research conferences like SIGMOD and VLDB. Data stream management was the hot research topic in the 2000-2004 time frame, and then the focus shifted to stored big data (MapReduce, column stores, etc.). Integrating high-speed stream processing and big data processing in easy ways seems very challenging. A post on that would be timely.