Big Data's Big Performance Headache

Adding to data storage can give rise to troubling performance issues.

Is there a dark side to big data? Mounting cost of storing data has long been deemed a top issue, but the greatest cost issue is really manifesting from the functionality of the systems tasked with retrieving data.

"It's a real problem," says David Fetter of Quadron Data Solutions, a provider of technology solutions for bank and insurance broker-dealers. "Historically it's worked better for firms to add more storage when their incoming data exceeds capacity. But eventually it's safe to say that the volume of data and the functionality that's being performed on that data is growing even faster than some of the systems. So really, more than storage itself being expensive, you start to run into performance considerations."

Risks

On the data warehousing side it's a huge issue with clients who accumulate data, buy more disk, but eventually have more data than performance considerations.

"We've heard a lot of stories," he says. "There's one about a commission databases where it takes them 12 hours to save processed commissions." The firm of a couple hundred advisors had a data warehousing commission vendor. Someone would have to grind away 12 hours or more to pull up commission reports. "That's a case where there's room to optimize, but it's a good example of the data volume problem manifesting itself as a performance issue more than as a cost of storage issue."

What about scaling down and eliminating unnecessary data from the storage systems? According to Fetter, the way applications use data it's almost impossible pull data out, even junk. All historic data has go to be there, otherwise it can hurt integrity of entire data warehouse. It can play a role in accounts moving forward and audit trails. "You can't just yank data out of that."

If you look at the way technology has evolved, it's always been to build bigger and faster computers to support more complex software. This is a case where data is so big, it's not about cost of storage, it's about performance impact. Even applications customized for complexity get in a situation where reports that should run in minutes takes more time.

Value Your Strategy

The best defense is a strategy or software solution laid out from the beginning of the data accumulation. As firms begin to build out a data warehouse the systems architected must allow for archived data. It can be difficult to do later.

Regardless of the the of data type there's strategy in how you store it, explains Fetter. You can have a ultra high performance disk at one price point, and there's less expensive slower disks at a different price point, and you can archive data that's removable. Either way, "you have got to tier storage requirements any time data gets large. Smaller firms may not look at this, but for server providers, data warehouses, larger firms, that's something that happens early as data grows."

The key point of the big data dilemma is the problems don't manifest themselves as cost of storage but as cost issues, and there are bunch of dynamics in the Financial Service industry that makes it it a big problem. For one, there are a lot of niche players that have developed software on the fly that hasn't gotten the level of architecture and performance tuning that it should have. As data volumes grows performance goes to completely unacceptable levels. From Quadron's perspective industry spend and focus is not on making data smaller, but overcoming performance issues with the data already there.

"Focus has got to come back to how systems are architected from the beginning. It's happening and it's hard to dig out of once you're there," concludes Fetter. "It's a case of pay me now or pay me later."
Becca Lipman is Senior Editor for Wall Street & Technology. She writes in-depth news articles with a focus on big data and compliance in the capital markets. She regularly meets with information technology leaders and innovators and writes about cloud computing, datacenters, ... View Full Bio

I wrote about this a little last year. It is true that big data requires big storage, and that can get unwieldy if you're overly proprietary. Cloud-based (or some sort of offsite) data centers seem to be the popular solution to this issue.

The performance of an unwieldy datawarehouse can drag down the value inherent in the Big Data. The example of taking 12 hours to grind out commission reports for several hundred advisors drives home the point that architecture is critical to planning for high data volumes. Running reports should take minutes, not hours.

I would quibble and say the "dark side" of big data has to do with potential misuse of the data (NSA, company invasions of privacy, hackers, etc.). That said, the challenges around data storage and broader data management are very significant and this analysis highlight that. It's not just about assembling as much information as you can, it's also providing access that enables smart decision-making, so the storage and architecture issues are huge.