Enterprises struggle to manage big data flows

Enterprises of all sizes are facing challenges on a range of data performance management issues from stopping bad data to keeping their data flows operating effectively.

This is a finding of a survey by data performance management firm StreamSets which finds that nearly 90 per cent of respondents reported flowing bad data into their data stores, while only 12 per cent think themselves good at the key aspects of data flow performance management.

Data quality is cited as the most common challenge when managing big data flows (selected by 68 per cent). In addition to bad data flowing into stores, 74 per cent of organisations report currently having bad data in their stores, despite cleansing data throughout its lifecycle. While 69 per cent of organisations consider the ability to detect diverging data values in flow as 'valuable' or 'very valuable,' only 34 per cent rated themselves as 'good' or 'excellent' at detecting those changes.

Areas where respondents felt weakest are performance degradation (44 per cent), error rate increases (44 per cent) and detecting divergent data (34 per cent). The only measure where a majority (66 per cent) felt confident about their capabilities was detecting a 'pipeline down' event. What's common across all performance indicators though is the gap between the respondents' self-reported capabilities and how valuable they considered each competency.

The survey also identifies problems caused by data drift - unexpected changes in data structure or semantics - 85 per cent say this has a substantial impact, and 53 per cent report that they have to alter each data flow pipeline several times a month, with 23 per cent making changes several times a week or more.

What's also interesting is the continued prevalence of hand coding, with 77 per cent using it to design their data pipelines. Two-thirds also use legacy ETL (Extract, Transform and Load) and data integration tools.

"In today's world of real-time analytics, data flows are the lifeblood of an enterprise," says Girish Pancha, CEO of StreamSets. "The industry has long been solely fixated on managing data at rest and this myopia creates a real risk for enterprises as they attempt to harness big and fast data.

"It is imperative that we shift our mindset towards building continuous data operations capabilities that are in tune with the time-sensitive, dynamic nature of today's data".