Let's not repeat past mistakes with Big Data

It's been 15 years since the data management (DM) industry botched the decision support revolution by delivering inflexible data warehouse systems and developing unusable business intelligence (BI) tools.

We forced organisations to change their business processes to suit our own product agendas. Now, in the nascent Age of Big Data, we’re gearing up to do it all over again.

What we need is a conceptual shift from a product-centric to a process-centric orientation.

Stick to the process

An ideal process flow diagram would have as few interposing boxes as possible. This never, ever happens in practice and there are many reasons for this. But one of the most important is that software vendors target the product, not the process.

They pursue a strategy that attempts to insert or implant a product as one of several interposing boxes in a process flow. In effect, they design themselves into a process.

The DM industry’s response to big data has been more of the same. In most cases, this means a bouillabaisse of proprietary, stack-centric big data “solutions”, self-serving technological or architectural prescriptions, and not-yet-ready-for-prime-time front-end tools.

But big data is different because it's inescapably multi-disciplinary: it presupposes interconnectedness, interoperability and exchange, between and among domains. It is holistic in scope in precisely the way that data management is not.

From a product perspective, a big data-aware tool must operate in a context in which problems, practices and processes are multi-disciplinary. No product will be completely self-sufficient or operate in isolation. But this doesn’t mean you can’t have big data-oriented products that target very specific use cases, or more generalised big data oriented products that address specific process, domain or function practices. And it doesn’t automatically mean that an entire class of existing products will suddenly become “pre-Big Data”.

More of the same is more of the wrong approach

But most of the vendors are developing and marketing “Big Data-in- a-Platform” products. The one thing each of these “solutions” have in common is a product-centric model: each aims to insert or implant itself – as an interposing box – into a process. But each interposing box introduces latency and increases complexity and fragility.

Worse still, each interposing box has its own infrastructure. This includes its own vendor-specific support staff with its own esoteric knowledge-base.

At best, this means recruiting armies of Java or Pig Latin programmers, or training-up DBAs and SQL programmers in the intricacies of HQL. At worst, it means investing significant amounts of time and money to develop platform-specific knowledge-bases.

Automation is the answer

The way to address this dysfunction is to focus on automating the practices and processes that support and enable a data warehouse environment, such as scoping, warehouse creation, ongoing management, and periodic refactoring.

You could even automate the creation and management of warehouse documentation, diagrams, and lineage information by completely eliminating hand-coding in SQL or in esoteric, tool-specific languages.

Big data products do not need their own infrastructure. They should speak the languages and accommodate the idiosyncrasies of OLTP systems, warehouse platforms, analytic databases, NoSQL or big data repositories, BI tools, and all of the other “boxes” that collectively comprise an information ecosystem.

Products should target the disconnects between isolated systems in a process, the points at which a process flow breaks down. This type of breakdown is the inevitable consequence of a product-focused development and marketing strategy. By the looks of it, we’re going to see lots of breakdown in the big data-scape.

Think of the big data-scape as a kind of free trade- zone in which “trade” is analogous to process: i.e., data moves from box to box, with minimal restriction or interference and without platform-specific embargoes from inessential interposing boxes.

Automation is the answer. Not automation for its own sake, but automation as integral to process flow to eliminate breakdown, increase responsiveness, lower costs and empower IT to focus on value creation.