The 5 Foundational Characteristics of a Next-Gen Big Data Platform

As a big data researcher and practitioner, I've been witness to a dramatic divergence in the form and scope of enterprise computing applications within the last decade. What's become apparent is that enterprise IT organizations are now divided into two distinctly different camps.

On the one hand, legacy systems -- characterized by data warehouse deployments, ETL systems, business intelligence, and relational databases -- continue to carry out mission-critical business processing applications and need to be maintained. On the other hand, there is a growing vanguard of next-gen applications that have technical requirements that simply exceed what is possible using legacy technologies. These next-gen applications are not just offloading existing workloads onto cheaper systems but are actually net-new applications seeking out untapped sources of competitive business advantage (such as new or incremental revenue streams or increased profit margins) through optimization and cost savings.

Next-gen applications are qualitatively different from traditional enterprise computing solutions. They range from machine-learning algorithms for real-time fraud detection to high-frequency ad placement systems, from connected car and advanced driver assistance applications to large-scale genome and health record analysis for cancer research.

These applications are each characterized by technical challenges that preclude the use of legacy systems, including a combination of massive data volumes, superfast data flows, the aggregation of geographically disparate systems, or some combination of features that traditional enterprise platforms cannot support. As a result, a crop of next-gen technologies has emerged to support these growing requirements. These technologies embody a suite of popular new computational platforms and paradigms, including cloud and hybrid cloud architectures, big data platforms, IaaS/PaaS/SaaS, microservices, agile development frameworks, and NoSQL datastores.

In the midst of all of the buzz and one-off success stories about the development of next-gen technologies and applications, it can be difficult to predict which technology investments today will pay off for years to come. As technologies mature and best practices are established, which technologies and paradigms will dominate enterprise computing for the next 10, 20, or even 30 years?

In my research spanning real-world use cases across hundreds of enterprises and supercomputing research initiatives, I am often asked to describe the technology landscape from a 30,000-foot level in order to identify the disruptors as well as the sustainable trends. What's become clear to me is that the greatest value-add over time comes from the data platform itself. Applications and requirements may evolve over time, so your best bet is to choose a big data platform that can grow with you and has the capability to support a broad array of next-gen applications. An enterprise-grade next-gen platform must, therefore, exhibit several capabilities and features in order to sustain innovation at the application layer.

Here are my five foundational principles of a next-gen big data platform.

1. The platform must have a massive, multi-temperature, reliable global data storage layer with a global namespace.

The trend of ever-increasing data volumes is only going to become more pronounced, so a cost-efficient and scalable storage layer that can ingest massive volumes of data at a high rate, retrieve relevant pieces of data quickly, support sharing of data, and present multiple disparate datasets together for use in a single application is foremost.

2. The platform must support a utility-grade cloud architecture with disaster recovery and workload management, and it must be scalable.

The movement of applications into public and hybrid cloud architectures means that next-gen applications will be designed to leverage a variety of processing architectures; a next-gen data platform should enable optionality.

3. A next-gen big data platform must be able to integrate real-time decision making with deep analytics in order to operationalize whatever happens through informed decisions.

A platform for innovation will need to support real-time applications that not only provide insights and intelligence, but that also act on those insights in an automated way to impact business as it happens.

As next-gen applications evolve from being either on-premises or in-the-cloud to being global cloud applications spanning multiple private and public data centers, integrating and acting on data feeds around the world in a consistent way, these data-layer characteristics are critical.

5. An extensible next-gen platform must support multiple analysis techniques and compute engines working together simultaneously in the same platform.

To leverage the continuous disruptions in the application layer, next-gen applications are typically comprised of a set of processes working in concert. A platform capable of supporting these innovative use cases must embrace multi-tenancy and the ability to support all application processes without limit, whether they are file-based, rely on database systems, are based on a microservices paradigm, or leverage containers and virtualization.

A Final Word

It's fair to say that in enterprise computing, innovation is the norm. With new tools and applications springing up almost daily, companies need to make strategic decisions about how to sustain growth at the application layer by building a strong foundation on a big data platform that can support both existing and future workloads and that can form the infrastructure of a forward-thinking data-driven business.

Legacy systems are in rapid decline, and the vast majority (as much as 90 percent) of the world's data will be managed by next-gen technologies by 2020. Next-gen platforms will soon be the heart of enterprise computing systems. From that perspective, "the next generation" is now.

About the Author

Crystal Valentine is vice president of technology strategy at MapR, a Silicon Valley-based big data company. She has an extensive background in big data research and practice. She is the author of several academic publications in the areas of algorithms, high-performance computing, and computational biology and holds a patent for Extreme Virtual Memory. Dr. Valentine received her doctorate in Computer Science from Brown University and was a Fulbright Scholar to Italy. You can contact the author at cvalentine@maprtech.com.

Featured Resources

Find out what's keeping teams up at night and get great advice on how to face common problems when it comes to analytic and data programs. From head-scratchers about analytics and data management to organizational issues and culture, we are talking about it all with Q&A with Jill Dyche.