Pages

Sunday, July 20, 2014

Lambda Architecture Overview

Nathan Marz and team has designed a generic, scalable and fault tolerant data (#bigdata) processing architecture named as a Lambda Architecture (LA), based on his working experiences and distributed data processing challenges with Backtype and Twitter.

Lambda Architecture has design goals like robust system that is fault tolerant, includes human errors and hardware failures, able to serve a huge range of use cases and workload in minimum time nearly real time. Should be scalable enough.

Lambda Architecture has 3 layers.

1. Batch Layer:

It has two function managing a master dataset and pre-compute the batch views. Batch layer includes hdfs to store the master and mapreduce to precompute the batch views.

2. Speed layer:

This layer is responsible for real time(nearly) data processing, low latency systems like Apache Storm includes in this layer to compute the data views with very minimal latency.

3. Serving layer:

This can be any NoSQL database or indexing engine that able to index the batch view and able to merge output of batch and speed layer and query on that data, ad-hoc way.