AWS Data Ingestion Cost Comparison: Kinesis, AWS IOT, & S3

One question we often face at Trek10 as we design Serverless AWS architectures is, what is the most cost-effective and efficient AWS platform service for a new system to use for ingesting data? Of course you could always just spin up some EC2 servers and pump your data into them, but at Trek10 we push hard to design new systems as “serverless-ly” as possible, using AWS platform services such as Lambda, DynamoDB, and S3 to their fullest extent.

A variety of use-cases face this kind of a challenge: IOT is one of the most obvious (getting data from “things” into the cloud), but there are many others: branch offices pushing data to a central system, a slow or “lazy load” migration from your data center, or even an always-on integration between legacy environments and a new AWS environment.

So back to the challenge. There are multiple AWS services that are tailor-made for data ingestion, and it turns out that all of them can be the most cost-effective and well-suited in the right situation. We’ll try to break down the story for you here.

(Two brief caveats: This is not intended to be comprehensive; there are a huge number of possibilities, we just think these are the top few. Also, all pricing is for us-east-1.)

A Breakdown of Options

A real-time streaming data queuing service. Kinesis Streams producer apps push data in, and consumer apps pull the data to process it. AWS Lambda functions can be a consumer, so there is no need to run a server to process and store the data out of Kinesis Streams.

The grandaddy of AWS services: object storage at scale. Because there is read-after-write consistency, you can use S3 as an “in transit” part of your ingestion pipeline, not just a final resting place for your data. We described an architecture like this in a previous post.

Pros: 5TB limit for an object; very very simple

Cons: Additional services needed to do any processing

Everyone knows about S3 storage costs. We’ll ignore that here, as many of these services may end up archiving the data in S3 anyway. The question here is, how does direct S3 PUTs compare to the other options?

$0.005 per 1000 PUTs

One PUT can be up to 5 MB (multi-part PUTs can allow you to push a total object of up to 50GB)

Rules of Thumb

We’ve run the numbers on all these options to compare costs at various ingestion profiles, in terms of frequency and size of data. Below are a couple general conclusions to help you make sense of all of this:

If your data producers are power/compute constrained, you’ll probably need to use AWS IOT. If your ingestion costs are too high, consider AWS Greengrass to buffer/process on the edge.

Under about 100k PUTs/hr @ 50 KB per PUT, Streams, Firehose and S3 are all in the tens to low hundreds of dollars per month, so cost does not need to be a key design factor. Pick the service that best fits your architecture.

At high PUT volume with payloads in the low hundreds of KB or less, Kinesis Streams is the clear winner: 10M PUTs/hr @ 5KB is only $255/mo ; @ 50KB payload, it is only $1700/mo!

Firehose is very competitive at most volumes but diverges from Streams in the 10’s of TBs/mo. Unless you have 10’s of TBs/mo, select Firehose if the simplicity and processing model suits your need.

At high PUT volume and low data size (10’s of KB), S3 is very uncompetitive. But as you approach the 1 MB limit of Kinesis, S3 costs look similar. 1M PUTs/hr @ 1MB costs $3400/mo with Streams, $3600/S3. So as you approach larger objects, consider S3 if the simplicity suits your needs. And above 1MB, S3 is your only option.

I hope this is useful. Hit us up on Twitter @trek10inc if you have any questions or ideas of your own about AWS data ingestion options!