Saturday, March 15, 2014

Succeeding with Big Data Projects: The Secret Sauce

The architectures and software frameworks being used for big data projects are constantly evolving. Modern data lakes are consistently using Storm for real-time streaming, NoSQL databases like HBase, Accumulo and Cassandra for low-latency data access and Kafka for message processing. Open source software such as Centos, MySQL, Ganglia and Nagios are making deeper penetration in large enterprises. I am also seeing Python and JavaScript becoming more popular. Linux containers and Docker are being looked at in the future to increase hardware consolidation and utilization.

Over the next two years we will see a blending of SQL and NoSQL databases. The Stinger project (Hive optimization and Tez) have brought interactive query capability to the batch processing environments of Hadoop. Which means the way organizations are using Hadoop is changing quickly as well. Real-time query and ACID capabilities are next in the list of customer requests. As data lakes are defining the modern data architecture platform and more and more data gets stored in Hadoop, organizations are wanting to use data in lots of different ways.

Successful Big data projects have consistent patterns of success (the secret sauce). The technical infrastructure teams will be able to work with vendors to get the right hardware, stand up big data platforms and maintain them. However, big data projects can easily become science projects if the following is not addressed.

Thought leadership that creates cultural change so an organization can innovate successfully. Big data is about making better business decisions faster with higher degrees of accuracy. A sense of urgency needs to exist.

An environment of collaboration and teamwork with everyone believing in a vision. The modern data lake helps to eliminate a lot of the technology and data silos that exist across different platforms and business units. Successful big data project environments eliminate the social, territorial and political silos that often exist in traditional teams.

A strong emphasis in data/schema design and ETL reference architectures. It's still all about the data. :)

The ability to build a plane while flying it. Big data technologies, environments, frameworks and methodologies are evolving quickly. Organizations need to be able to adapt and learn fast.

"Extinction is the rule. Survival is the exception." was a quote from Carl Sagan. Being able to transform an organization into big data is one the biggest challenges an organization faces. Everyone is concerned about the development of the technical skills to succeed with big data, however the development of the internal people is just as important.