Stream processing: analyze data in real time for a more agile response

Editor’s Note: We continue our Smarter Computing Breakthroughs series this week with a post on Stream Processing from Nagui Halim, IBM Fellow and Director, Chief Architect of Big Data in IBM Software Group. The Breakthroughs Series will introduce you to key technological developments IBM has advanced to strengthen our integrated portfolio of systems, software and services – technologies that are often the unsung drivers behind the IT infrastructure that enables a smarter planet. You can find links to previous Breakthroughs posts at the bottom of this post.

Stream processing takes a different approach — and delivers much faster insights

Big data and analytics solutions are dominant subjects in the IT world right now, and it’s easy to see why. By detecting and acting on previously unseen trends, patterns and information of all kinds — in areas ranging from customer demand to infrastructure performance — organizations can create many compelling forms of new value.

One of the most exciting elements of the data/analytics story, though, concerns a new class of analytics tools and how they can tackle a new type of workload: stream processing.

Stream processing (if implemented effectively) can empower organizations to get insights from data far more quickly, and in far more ways, than ever before — even in cases where insights are needed in real time, or very close to it.

To understand the difference between stream processing and traditional analytics approaches, begin with the way the raw data is handled. In a conventional analytics architecture, the data is presumed to be stored in a repository (such as a relational database, or on a larger scale, a data warehouse). The organization then runs queries against that data: How many of X were sold to customers of Y demographic? What kinds of inventory challenges has X site experienced, following Y marketing campaign, compared to other sites in the last five years?

Stream processing represents a fundamental break from that paradigm. Instead of collecting and storing the data first, which creates a significant delay, you run analytics against the data as it becomes initially available to the organization.

An incredible range of new use cases for analytics

The new capabilities implied by that change are limitless, and already they’re beginning to transform the way organizations think about data and what they can do with it.

One obvious example: social media. As new products or services are released to the public, and the public responds with public tweets, Facebook wall posts, LiveJournal blog entries and in other ways, this data — all very valuable to the organization — can be assessed in something very close to real time, to understand how satisfied (or unsatisfied) those customers really are.

And because the world is generating more data, of more types, than ever before, stream processing will be applied in more and more ways going forward. Other use cases include:

• Stock trading – new data is created in mass volumes by the second; stream-based analysis yields much faster, and thus much more useful, intelligence.

• Government and law enforcement – one obvious application would be facial pattern recognition, used to identify known terrorists in public environments such as airports as drawn from real-time video sources. This is a task that would be completely unachievable using traditional analytics tools.

• Security – one example would be the analysis of botnet activity. When malware achieves control of target systems on a mass scale, and then orchestrates malicious campaigns such as a Denial of Service attack aimed at a particular government or organizational website, stream processing can help establish just where the attacks are coming from, and how they are being carried out, as they happen.

You might think that IBM, as a leader in big data analytics and IT solutions generally, would be interested in stream processing — and you’d be right. At IBM, we’ve been chartered with creating technologies designed not just to enable the concept of stream processing, but also to apply it in as many ways as possible, to maximize the value it can create for our clients and for the world.

InfoSphere Streams is the product that IBM built to host these advanced solutions, and it boasts both powerful programming expressivity and incredible performance. As an example, Streams applications in production at one client are handling ten billion messages a day with sub-second latencies. The implication is that sophisticated applications built on the Streams platform can handle the analysis of the largest data volumes and create actionable results on a continuous basis in very little time.

But it’s also important to understand just how versatile this solution is — a consequence of the fact that IBM gave the development team the freedom to build the best possible solution, from the ground up, in the pursuit of holistic excellence.

InfoSphere Streams, as a result, is much more than just another analytics tool. It is better understood as a platform and execution environment for stream processing analytics — one that can support as many different forms of stream processing as will be needed, and can combine their results to yield exceptionally sophisticated insights.

For instance, imagine leveraging social media, video, audio and stock data simultaneously, not just to understand how customer/analyst opinions are changing following a major product launch, but also to project the impact on the organization’s quarterly results and market capitalization. This is the sort of task that just a few years ago would have been nearly unimaginable for analytics solutions, but which InfoSphere Streams can make a practical reality.

Another example: the City of Dublin has already leveraged InfoSphere Insights to get much clearer insight into traffic patterns, used both to provide travel time estimates for Dublin citizens every day and to help guide future urban development for a better outcome in years to come.

This illustrates the way stream processing has both immediate and long-term potential — both of which organizations are rapidly embracing as they become aware of the bright possibilities, which will only become brighter going forward.

Enterprise IT infrastructure services will need to take the lead for the transformation to the hybrid and cloud era. The lines of business and IT department have to establish a new culture and operational model to optimize usage and manage costs. But how?

Related IBM blogs

Video

We collect your name and email address solely for the purpose of accepting and posting your comment. Only your name will be posted with your comment, not your email address.
Social media buttons on this site may log certain information such as your IP address, browser type and language, access time, and referring Web site addresses, and, if you are logged in to those social media sites, they may also link such collected information with your profile information on that site.