Sketching the big picture on big data

Editor's note: The links in this article were updated after its original publication to provide the final version of the report.

Experts may not yet agree on a clear-cut definition of big data, but new report suggests that this IT buzzword is chock-full of potential if several key challenges can be overcome.

The Meritalk report, titled "Big Data, Big Brains," aggregates what 17 influential big data leaders in the federal space are saying on the subject, and touches on the hurdles federal agencies will have to overcome if they are to turn big data into big mission success.

How do experts define big data?

The report is humbling in that experts from innovative companies like Cloudera and Oracle, and from tech-savvy agencies like NASA, the Energy Department and the National Oceanic and Atmospheric Administration do not share a consensus definition of big data.

Read the full report

Most of the experts said big data was the "point at which traditional data management tools and practices" no longer apply, keeping in relative agreement with how the National Institute of Standards and Technology (NIST) essentially defines the term.

"Where the data volume, acquisition velocity, or data representation limits the ability to perform effective analysis using traditional relational approaches or requires the use of significant horizontal scaling for efficient processing," said Peter Mell, computer scientist for the National Institute of Standards and Technology, providing MeriTalk his definition of big data.

Others defined it differently, however -- some with the commonly used Vs of big data (velocity, variety and volume), and others from a strategic point of view.

"Big data is part of an iterative lifecycle that should be part of an over-arching enterprise information strategy," said Department of Energy Chief Technology Officer Pete Tseronis.

Using feedback from the experts, the survey suggests a new definition of big data altogether: "The set of technical capabilities and management processes for converting vast, fast, and varied data into useful knowledge."

Interestingly, none of the surveyed feds indicated that their agency was currently implementing a big data initiative. That they aren't talking about it doesn't mean agencies aren't using big data -- NOAA, for instance, uses massive amounts of data in real-time from satellites to help forecast the weather -- but it may indicate that large-scale big data initiatives aren't as fully developed as the buzz might lead you to believe.

The hurdles they see

If big data results are to live up to the hype, a few big hurdles will have to be cleared. In the age of sequestration, the first challenge is budgets. Like almost any other federal program, a big data framework is going to cost money. The trick is going to be not wasting limited funds on big projects that ultimately fizzle out.

Van Ristau, CTO for DLT Solutions, said funding and prioritizing the need to solve a real-mission critical problem were inherent for any big data success story. That means pulling out value from big data efforts quickly and not getting lost in a sea of information too dense to pull insights from.

Other hurdles include personnel and skill sets -- qualified IT personnel are already in demand, let alone those with data analytics and data science backgrounds -- while data ownership and siloed information remain challenges, too.

Federal policy has yet to catch up to technology's rapid evolution, and "Most of the big data are behind closed doors," said Tsengdar Lee, a program manager for NASA, adding that "accessibility is a big problem."

Once information is shared, particularly between agencies, who owns it becomes another issue altogether. Also of concern is that our ability to produce new data has outpaced our ability to store and examine it.

"Significant technical breakthroughs are needed to meet the rate of data creation and service demands," said Darren Smith, IT specialist at NOAA.

Getting started

Doing big data is not as easy as getting on a bicycle with training wheels, but the analogy holds in that you don't go from traditional data management to big data in one step.

The survey, underwritten by NetApp, suggests agencies first "do no harm," meaning they need to invest in the right infrastructure before they get more data than they can handle.

It also suggests agencies be proactive in addressing the issues of data ownership and data sharing "before the stakes get too high," and they need to begin seeking out the right talent and training for the people who inevitably run future big data initiatives.

Finally, the study suggests federal agencies seek out partnerships in the public sector simply to "try it."

"Pick several data sources and try analysis that you would not have attempted before," said Douglas Neal, research fellow for CSC.

Reader comments

Tue, Apr 16, 2013
DataH

Nice insight Frank. Companies need to face the challenges that arose with the era of big data. We are seeing an increase in companies seeking specialized skills to help address these challenges. The HPCC Systems platform from LexisNexis helps to fill this gap by allowing data analysts themselves to own the complete data lifecycle. Designed by data scientists, ECL is a declarative programming language used to express data algorithms across the entire HPCC platform. Their built-in analytics libraries for Machine Learning and BI integration provide a complete integrated solution from data ingestion and data processing to data delivery. More at http://hpccsystems.com

Tue, Apr 16, 2013
John
Denver

The definition is:
An easily scalable system of unstructured data with accompanying tools that can efficiently pull structured datasets.

Please post your comments here. Comments are moderated, so they may not appear immediately
after submitting. We will not post comments that we consider abusive or off-topic.