Web Development

Amazon To Developers: Real-Time Big Data For Everyone

By Adrian Bridgwater, November 18, 2013

Developer chance to get creative with real-time data processing applications

Amazon web services has launched Amazon Kinesis, a developer-focused tool designed to handle real-time data as it is created. Kinesis also processes data replication and onward delivery tasks for the apps within which it is required.

The name Kinesis derives from the term used for an animal's non-directional response to a stimulus; for example, humidity.

Described as essentially a big data tool by Andy Jassy, senior vice president of Amazon web services, Kinesis is a managed service designed to handle real-time streaming of big data — the tool itself is capable of accepting any amount of data, from any number of sources, scaling up and down as needed.

The Kinesis client library is an important component of your application, says Amazon. "It handles the details of load balancing, coordination, and error handling. The client library will take care of the heavy lifting, allowing your application to focus on processing the data as it becomes available."

In terms of use and usefulness, Kinesis can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources. This means that developers will be able to write applications that process information in real-time from sources (such as website click-streams or sensors for the Internet of Things) that handle social media, operational logs, or metering data.

According to Amazon, Kinesis provides developers with client libraries that enable the design and operation of real-time data processing applications. Just add the Amazon Kinesis Client Library to your Java application and it will be notified when new data is available for processing.

Kinesis works by buffering the data in hand onward into a storage system. This storage zone is built with checkpoints designed to help "ingest data" into the Redshift cloud data warehouse.

The AWS blog states, "Your application can create any number of Kinesis streams to reliably capture, store, and transport data. Streams have no intrinsic capacity or rate limits. All incoming data is replicated across multiple AWS Availability Zones for high availability. Each stream can have multiple writers and multiple readers."

Amazon CTO Werner Vogels is quoted on VentureBeat explaining that he thinks the Hadoop cluster world is great for analytics, or for actually processing large amounts of data, but says "we need to make it easier for anyone to do real-time operations".

As VentureBeat's Jordan Novet reports, "As data streams in, Kinesis replicates it to three availability zones, or facilities that are separate from each other but close enough to provide low latency among each other. If suddenly a torrent of data comes in, Kinesis can scale up. If there's a quiet period, Kinesis can automatically scale down."

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!