Blog

A Scrumptious Open Source Cuisines in Big Data Landscape 2016:

A food critic, a traveler or an adrenaline junkie would never limit themselves at a single cuisine. Food is not something to be compared as a day-to-day habit. It is a culture and a depiction of the place where you are from. If you want to involve with a different community of people, you’ve got to taste their food foremost and that is why a backpacker will not constrain oneself from tasting a different cuisine at the other side of the world.

In this part of the big data landscape, I would love to showcase the best scrumptious open-source cuisines in and around the world. How about that?

Framework: An Italian Pizza

In big data context, these umbrella frameworks provide for distributed storage and distributed processing of very large data sets in a computing environment that comprises of commodity hardware. These frameworks typically have modules such as distributed file system, Job Scheduler, Resource Manager, Streaming data processor and MapReduce.

Popular Products:

Hadoop HDFS

Hadoop MapReduce

Yarn

Spark

Mesos

Tez

Flink

CDAP

Apache Kylin

Query/Data Flow: The Chinese Roasted Duck

These are query engines that allow for structuring data and querying using a SQL-like language. For instance, Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation.

Popular Products:

SlamData

Apache Hive

Apache Drill

Google Cloud dataflow

Data Access: A French Baguette

This category comprises of a) non-relational, distributed, scalable, high-performance, big data stores such as Hbase, MongoDb and b) frameworks that facilitate collection and storage of data in real time such as Flume, Kafka.

Popular Products:

Cassandra

CouchDB

Apache Hbase

Flume

Accumulo

mongoDB

Kafka

Nifi

Sqoop

SciDB

OpenTSDB

Riak

Coordination: A Spanish Prawn

Data coordination is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data coordination solution delivers trusted data from a variety of sources.

Popular Products:

Talend

Oozie

Apache Zookeeper

Apache Ambari

Real-Time: Japanese Sushi

Real-time analytics is the use of, or the capacity to use, all available enterprise data and resources when they are needed. It consists of dynamic analysis and reporting, based on data entered into a system less than one minute before the actual time of use.

Popular Products:

Storm

Spark

Flink

Apex

Tachyon

Druid

Stat Tools: Indian Veg Curries

Statistics is an important part of big data analytics required to build and interpret appropriate models given the usually huge and complicated data. This includes a wide collection of data mining and machine learning topics, ranging from regularization, support vector machines, and boosting to more recent topics such as networks analysis, recommendation systems, and digitized advertising. These tools support the easy implementation of these concepts and are specifically capable of handling mammoth data volumes.

Popular Products:

R

Scala

NumPy

SciPy

Machine Learning: A Greek Salad

Machine learning delivers on the promise of extracting value from big and disparate data sources with far less reliance on human direction. It is data driven and runs at machine scale. It is well suited to the complexity of dealing with disparate data sources and the huge variety of variables and amounts of data involved. And unlike traditional analysis, machine learning thrives on growing datasets. The more data fed into a machine learning system, the more it can learn and apply the results to higher quality insights.

Data visualization is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs.

Popular Product:

Zeppelin

How was the salivating range of open source international cuisines? I hope you grabbed some bites. Stay Tuned for the last portion of big data Landscape 2016.