Open Source or Open Architecture? Big Data Needs Both

Big Data platforms are growing more popular by the day. However, it is important to keep in mind that we're early in the lifecycle of this technology, and tomorrow's platforms could look dramatically different from what we have today.

The act of publishing source code, in and of itself, doesn’t necessarily make a platform more useful. Making that source code extensible matters at least as much, especially in the era of open application programming interfaces (APIs), where many of the most useful apps are made so by other apps. Modern enterprises need both open source software and open architectures to take full advantage of Big Data.

This article will focus on how we reached this point, and provide a blueprint for CIOs who are evaluating open source and Big Data tools.

The Advantage of Extensibility

Think about the applications and services people use every day and how many of these applications integrate with one another seamlessly. For example, Google Maps can extend Uber to provide location tracking, Uber extends OpenTable to facilitate meal delivery, and Netflix extends the Apple TV interface to broaden viewers’ entertainment options, and so on. Openness is crucial for apps that exist in a connected world, creating and consuming information at a pace never before seen. According to a Domo survey of social media usage, each minute around the world:

Instagram users like 1.7 million photos

Tinder users swipe over 590,000 profiles

Vine users play 1 million videos

Facebook users like 4.1 million posts

Twitter users send just over 347,000 tweets

Turning all this data – and other data like it – into useful capabilities for customers and actionable intelligence for the enterprise is a massive task that modern enterprises can't afford to ignore. As a result, Big Data platforms are growing more popular by the day. However, it is important to keep in mind that we're early in the lifecycle of this technology, and tomorrow's platforms could look dramatically different from what we have today.

Growth will come in two ways: from open source development and from the flexible APIs to comprise an open architecture.

The Evolving Conversation

Think of the potential intelligence that exists in social media. When customers willingly reveal to you what they want and how they'll consume it – without commissioning a survey – it can pay to listen. Social data affords us just such an opportunity – if we can extract the signal from the noise. This is just one reason research firm IDC says the overall market for Big Data technology and services is on track to grow 26.4 percent annually through 2018. By that point, the firm expects Big Data spending to top $41.5 billion annually.

Chief executives and their immediate subordinates could drive a big chunk of that growth – as long as it yields better understanding of the factors that drive their businesses as well as of their customers, supply chains, operations, competitive environments. In January, the Economist Intelligence Unit (EIU) surveyed 395 such executives and found that 48 percent believe Big Data is a useful tool while 23 percent say the technology will revolutionize the way businesses are managed.

But that can't be done with data that is siloed, or where the increasingly massive volumes of data make it more difficult to extract the insight that lies inside that data. What these leaders often fail to realize is that it takes a fully open Big Data system -- built on open source software, yet boasting an open architecture – to deliver the value they so crave.

Why Digital Hoarding Isn’t as Bad as it Sounds

Call it the price of never throwing anything away. Sure, storage is getting cheaper. Computers are getting more powerful. Networks are getting faster. None of it matters if your Big Data platform is closed off from accepting information from useful apps – or siloed in a way that prevents you from seeing the interrelationships and cross-correlations between different data sources. However, improving data center economics is making this a less costly, but more complex, problem.

Look at the storage market. The industry is moving toward affordable solid-state flash, which can now be purchased on a per gigabyte basis for $1.50 or less. Companies are cashing in by adopting flash storage in greater quantity as it scales to heretofore unheard-of capacities. On the other side, new applications and devices are generating ever larger volumes of data. In a market where near-infinite storage can be had for so little, there's little to fear from generating too much data – especially with so many startups basing their business models on monetizing that data.

Couple that with the rapid advance of the Internet of Things (IoT) - machines which can create gigabytes in microseconds and networks that transmit it just as quickly – whether wired or wirelessly – and you have the infrastructure of an Idea Economy.

The opportunity is growing, but so are the challenges. Those who are able to most effectively capitalize in turning raw data into intelligence at scale and faster than rivals will be the large-cap winners of tomorrow.

The Future of Big Data is Open

Next-generation, data-centric businesses (such as Uber, Facebook, AirBnB and so many others) that turn information into products are successful because they're able to quickly gather and process data from a wide variety of sources. Open source development has given us the lowest cost processing platform in history (i.e., Hadoop) while open architectures ensure that the right data gets to the right place at the right time – and is used in the right capacity to optimize insight.

Think of an open Big Data architecture as a standard combustion engine. Each archetype has intakes. For a carbon engine, it's air, gasoline or diesel fuel and electric power delivered by a battery. Raw fuel becomes motion. In the same way, Big Data platforms have intakes called data sources: traditional enterprise data (ERP, CRM, EDW), machine data (Internet of Things) and human data (social data, audio, video, text). Analytics are what turn the raw material into intelligence – just as a powertrain turns air, fuel and electricity into motion.

Final Thought

After years of optimizing infrastructures to handle modest volumes of information, affordable storage and compute combined with powerful analytics software makes the very idea of tossing away or ignoring data unthinkable.

We need Big Data platforms to process it all and pick the wheat from the chaff, activity into insight, and we need these architectures to be both open and integrated. Only by ingesting the widest range of information, processing it at breakneck speed, and delivering the insight enabled by open and integrated systems can businesses hungry for new sources of profit find the answers that they seek. For the growing number of CIOs evaluating Big Data platforms that’s both a cautionary tale and a call to action. Is your organization ready to heed it?