Step 7: Finishing Up

Because you are paying to use the Kinesis stream, make sure you delete it and the
corresponding DynamoDB table once you are done with it. Nominal charges will occur
on an
active stream even when you aren't sending and getting records. This is because an
active stream is using resources by continuously "listening" for incoming records
and
requests to get records.

To delete the stream and table

Shut down any producers and consumers that you may still have running.

Summary

Processing a large amount of data in near real time doesn’t require writing any
magical code or developing a huge infrastructure. It is as simple as writing logic
to process a small amount of data (like writing processRecord(Record))
but using Kinesis Streams to scale so that it works for a large amount of streaming
data. You
don’t have to worry about how your processing would scale because Kinesis Streams
handles it
for you. All you have to do is send your streaming records to Kinesis Streams and
write the
logic to process each new record received.

Here are some potential enhancements for this application.

Aggregate across all shards

Currently, you get stats resulting from aggregation of the data
records received by a single worker from a single shard (a shard cannot
be processed by more than one worker in a single application at the same
time). Of course, when you scale and have more than one shard, you may
want to aggregate across all shards. This can be done by having a
pipeline architecture where the output of each worker is fed into
another stream with a single shard, which is processed by a worker that
aggregates the outputs of the first stage. Because the data from the
first stage is limited (one sample per minute per shard), it would
easily be handled by one shard.

Scale processing

When the stream scales up to have many shards (because many producers
are sending data), the way to scale the processing is to add more
workers. You can run the workers in EC2 instances and leverage Auto Scaling
groups.

Leverage connectors to S3/DynamoDB/Redshift/Storm

As a stream is continuously processed, its output can be sent to other
destinations. AWS provides connectors to integrate Kinesis Streams with other AWS services and
third-party tools.