Why Amazon created AWS Kinesis, its live data processing service

Amazon records how much data its AWS customers require so companies only pay for what they use, explained Amazon’s Ryan Waite onstage at GigaOM’s Structure Data conference on Thursday. Because Amazon customers generate tens of billions of records daily — and lots of apps could benefit from access to that data — Amazon needed a system that could capture those tiny pieces of data in real time.

“This enables us to scale the metering service to new limits and give alerts in real time,” said Waite, who serves as general manager of data services for AWS.

Kinesis enabled Amazon to increase its load to billions of records from millions of files, or terabytes of data per hour. All of the data gets stored in Kinesis for 24 hours, and it can be offloaded to Amazon S3 for later examination.

Waite characterized the service as “incredibly cheap” — a million PUT transactions costs a customer 2.8 cents — and immensely useful for AWS customers. He highlighted digital marketing company Bizo, which uses Kinesis to bolster its real-time analytics of its clients’ online ad campaigns.

“With Kinesis, they can sit down side-by-side and watch the campaign as it’s happening,” said Waite.

By using Kinesis, “They can see what’s working in the game and what’s not working in the game and make changes.”

Cloud infrastructure services have been rolling out more data tools for their services over the past couple of months, and Amazon has led the pack with Kinesis. Its nearest competitor is the open-source Apache Storm. By pointing to the value the service generates for customers, Amazon is pressuring competitors to either clone Kinesis — no easy task — or distinguish themselves with special data tools of their own.