Eight Things to Watch for in Big Data

In this special guest feature, Tom Phelan, co-Founder and Chief Architect of BlueData, makes some compelling projections about the big data industry over the next year. Tom has spent the last 25 years as a senior architect, developer, and team lead in the computer software industry in Silicon Valley. Prior to co-founding BlueData, Tom spent 10 years at VMware as a senior architect and team lead in the core R&D Storage and Availability group. Most recently, Tom led one of the key projects – vFlash, focusing on integration of server-based Flash into the vSphere core hypervisor. Prior to VMware, Tom was part of the early team at Silicon Graphics that developed XFS, one of the most successful open source file systems. Earlier in his career, he was a key member of the Stratus team that ported the Unix operating system to their highly available computing platform.

Over the next year, a growing number of customers will realize the vast business benefits of Big Data and will deploy Big Data solutions across their organization. Technical innovations, the rise of BDaaS, a shifting approach to data locality, platform convergence and other trends will help drive adoption. The following are eight key things to watch for in the coming months.

Big Data across the enterprise. In 2016, Hadoop and Spark will move beyond pilot and departmental deployments to enterprise-scale, production environments. For these organizations, Big Data will transition from a “science project” experiment among a handful of users to become a business-critical initiative across the enterprise. Strong data security, multi-tenancy, and resource QoS control will be mandatory.

Spark surpasses MapReduce. Spark was red hot in 2015, and adoption of Spark will continue to accelerate throughout the year. Spark will continue to replace MapReduce as Hadoop’s general-purpose computation engine, driven by Spark’s lightning-fast computation and popularity with data scientists.

Big Data platforms converge. There will be an increasing convergence of processing technologies like Hadoop and Spark with NoSQL platforms. We will see organizations employ Hadoop and MongoDB together, or Spark and Cassandra together, rather just one or the other. Enterprises will no longer commit specifically to a Hadoop approach or a Cassandra approach, but will instead embrace a combination of data platforms to enable a more cohesive Big Data strategy.

More focus on solutions, not just tools. There will be an increasing focus on integrated solutions for Big Data in 2016 – not just point products, services, and tools. Organizations will look to combine and integrate their tools and platforms for information management, analytics, search, and other applications. For example, we expect to see more deployments weaving together Spark, Kafka and Cassandra for their streaming Big Data analytics solutions.

Hadoop data locality becomes obsolete. Throughout 2015, more and more organizations have realized the value of separating Hadoop compute from storage. Cloud service providers like AWS and Microsoft as well as infrastructure vendors like EMC and HP have demonstrated the flexibility and efficiency benefits of scaling Hadoop compute and storage independently. In 2016, the Hadoop community will recognize that data locality is no longer required for optimal performance.

Storage innovation propels Big Data. This year, we’ll see the emergence of new data center infrastructure innovations in fast storage and in-memory technology for Big Data. SSD innovations will make new computing capabilities available to Spark, while new memory technology will allow persistent storage at the speed of dynamic RAM. As these innovations gain a foothold in the data center, new analytics algorithms will emerge and Big Data processing deployments will undergo a significant change.

Containers take off. The popularity of containers and (Docker in particular) has continued to surge, and the container revolution will continue in 2016. Docker containers will be embraced by enterprise IT infrastructure teams – not just application developers – and this adoption will extend to distributed workloads like Hadoop and Spark. However, the debate around container orchestration options (such as Mesos, Kubernetes, etc.) won’t be settled until 2017 or 2018.

BDaaS booms. More Big-Data-as-a-Service offerings will be introduced by public cloud vendors like AWS, Google, Microsoft, and IBM this year. We’ll also see growing interest in BDaaS on-premises among enterprise IT organizations – especially in highly regulated industries such as financial services, government, and healthcare. This will lead to new solutions to enable hybrid cloud support for Big Data, using technologies such as containers for compute workload portability. In 2016, we’ll begin to see Big Data implementations that experiment with using the public cloud for compute while keeping their data on-premises.

Hi Daniel,
This is an excellent article by Tom who covers various aspects of Big Data especially in lieu of BDaaS, Infrastructure and Services roadmap. I especially liked the take on Spark as a technology which is upbeat due to worldwide market acceptance.

Resource Links:

Industry Perspectives

In this special guest feature, Anjali Norwood, Co-founding Engineer at Arcadia Data, discusses the importance of getting girls involved in STEM at an early age, and ways to foster a culture of female empowerment in tech companies today. [Read More...]

White Papers

Data and the way that data is used have changed, but data warehousing has not. Today’s premises-based data warehouses are based on technology that is, at its core, two decades old. To meet the demands and opportunities of today, data warehouses have to fundamentally change.