Does It Take a Data Scientist to Find Gold in Big Data?

Any data scientist knows that the true value of data comes from its refinement. In a recent CNET article, Alex Yoder compared data to the value of other can’t-live-without assets – oil and gold. However, he says that big data has a major differentiator – “data has no intrinsic value.”

Yoder (@yodera) writes, “Gold requires mining and processing before it fines its way into our jewelry, electronics and even the Fort Knox vault. Oil requires extraction and refinement before it becomes the gasoline that fuels our vehicles. Likewise, data requires collection, mining and, finally, analysis before we can realize its true value for businesses, governments and individuals alike.”

What’s interesting about this comparison is that behind the mining, processing and analysis is big science. It takes a scientific approach and processes to find the value in data and precious commodities. Yoder defines the analysis portion of extracting value from data as the science.

The Science of Finding Gold in Data

As Yoder points out, there’s also science in the mining and processing of the data as well – just go back to our interview with Gregory Piatetsky-Shapiro (@KDNuggets) on the topic of data science and predictive analytics.

Piatetsky-Shapiro says, “What we want to find in data is some understandable knowledge and not just incomprehensible patterns. He adds, “[A] better understanding of predictive models will contribute to increased trust for such models.”

Piatetsky-Shapiro is a living example of how science can extract value from data. He’s developed models to help change the world. In our interview, he told us he has developed models for life-changing predictive analytics such as predicting child support non-payment and attrition to predicting drug effectiveness to developing methods for identifying fraud in online auctions.

Didn’t Gold Mining Start with a Simple Quest for Something More?

While Yoder and Piatetsky-Shapiro both point to the methods, models and software as keys to refining value, they also recognize the human element. Yoder says, “It takes complex algorithms, powerful computing and perhaps most of all, human analysts to build and administer the big science that turns the ‘then and now’ nature of big data into ‘when.’”

He points to a projection from McKinsey Global that the US needs 140,000 to 190,000 additional deep analytical minds working to derive the value of data. Often labeled as data scientists (quite fitting with the topic of big data being backed by big science), Yoder says these experts are thought to be the next class of “digital and corporate geniuses.”

Piatetsky-Shapiro’s take is that the data scientist role is much like an engineer or wizard with a skill set that combines “code tuning, business insight and knowing how to extract business value from data.”

Our Take: Give Your People Gold Fever

We think there’s a place and a definite need for these data wranglers. However, if an organization is going to join the gold rush that big data promises, don’t you think we have to start with the users? And give them the tools they need – the picks, the pans and the drive or “gold fever” to “head for the hills?”

Their tools are analytics systems (maps) and the “quest for knowledge” that allows them to inspect the data landscape and find the gold nugget-filled streams and hills.

Mark Lorion (@mark_lorion) alludes to why the tools and the questions matter most in a recent interview on DM radio. He says, “If you measure it, you’ll change it. What’s hiding in your data that you haven’t measured?”

For example, he says when a business user has questions, the tools often get in the way of the answers. He says it’s hard for a user to find the right answers in a spreadsheet or other tools when the presentation is hiding what he’s trying to see. It’s frustrating because the user knows he can beat the competition if he can get reliable information from the data.

Maybe the gold or “knowledge discovery” (as data scientist Piatetsky-Shapiro says) starts with the right tools and the right questions and a little bit of fever.

Next Steps: See how Spotfire version 4.5 empowers users to discover actionable insights hidden in big data and unstructured information in our on-demand webcast, “What’s New with Spotfire 4.5.”

2 Comments

[...] right questions, and technologists to “tame the infrastructure and applications,” you also need data scientists. You know, the people who can do things like understand the statistical algorithms, and drive the [...]

[...] right questions, and technologists to “tame the infrastructure and applications,” you also need data scientists. You know, the people who can do things like understand the statistical algorithms, and drive the [...]