RESEARCH & RESOURCES

Chasing the Data Science Unicorn

Data science is an exciting area that promises real benefits to organizations in nearly all industries. It should not be stuck in a frustrating pursuit of imaginary heroes of folklore.

By David Stodder

January 6, 2015

"It's like chasing unicorns" is a phrase I often hear when data professionals talk about trying to land a data scientist. Finding -- much less keeping -- these rare individuals who can perform all that is expected of virtuosos in the nebulous field of data science is difficult. To many, it can seem like hunting for an animal that only exists in folklore. Yet, having seen the value that data science has brought in the past decade to pioneering firms in ecommerce and social media such as eBay, Facebook, Google, Yahoo!, and Twitter, business and IT leaders in many organizations remain undaunted in their quest to land highly skilled data scientists.

Some of the difficulty in finding data scientists may be caused by the lack of a clear definition of data science. One can easily spend many fascinating hours reading the literature about data science without arriving at a clear definition (two of the best places to go are Vincent Granville's online resource, Data Science Central, and Gregory Piatetsky-Shapiro's KDnuggets). Data science job descriptions are so varied that they are hard to compare. Clearly defined or not, the energy around data science is enormous; universities are launching training and research facilities, municipalities such as New York and Seattle are competing to become the center of the data science world, and vendors such as Cloudera have launched data science certification programs.

Data science fuses together expertise from a range of fields including statistics, mathematics, operations research, computer science, data mining, machine learning (algorithms that can learn from data), software programming, and data visualization. The key goal of most data science projects is to realize higher value from data through advanced analytics, giving organizations insights they can use to optimize decisions and operations and improve business outcomes. Indeed, the gaining the ability to compete on analytics is the biggest reason why most organizations pursue data science. (Note: Thomas Davenport, Research Fellow at the MIT Center for Digital Business, co-author of the seminal book "Competing on Analytics," and one of the most important business thought leaders about analytics, will be keynoting TDWI's conference in Las Vegas on February 23, 2015.)

Some of the pioneers in the field have come from the hard sciences or professions that apply scientific methods to research, such as neurobiology or nuclear physics. Data scientists can have a range of expertise, but a central idea is to use scientific methods to explore and experiment with data. Thus, unless organizations are prepared to experiment with data, test assumptions, and use data insights to think outside the box, they may not succeed with data science.

Exploring All the Data Takes Time

Data scientists also need to love data, particularly in its raw form. The blessing and curse of most business intelligence (BI) applications is that they remove users from having to get their hands dirty with the data. The trend in BI today is toward greater abstraction and ease; users work with dashboards, graphical icons, and other visualizations, not the data itself. Most (if not all) data BI users touch has been carefully scrubbed, profiled, structured, and aggregated. To explore and experiment, many data scientists want the opposite; they want to touch raw, noisy, messy, and unstructured data.

As a result, data scientists often spend a great deal of time learning about the data, including looking for gaps and errors, before they set to work on developing algorithms. Tools, Hadoop (and related technologies), and analytic platforms are critical to making the various phases of raw data exploration move faster, but it can still be painstaking work. Organizations must, therefore, be patient and give data scientists the time they need to explore the data. They cannot expect instant insights or overnight competitive advantages from data science.

Governance to Avoid "Creepiness"

Although removing barriers to data access and exploration is important for data science, organizations still need to govern activities carefully. One doesn't need to look far to find news stories and commentaries detailing what many call the "creepiness factor" of data science. There's widespread concern about privacy violations, online and geolocation tracking, and the use of analytics to interpret consumer behavior and personalize marketing in ways that can feel too close for comfort.

Organizations should ensure that ethics and consumer tolerance are part of data science planning discussions along with adherence to standard data governance policies. Governance policies should be amended to address how to protect sensitive data during data science processes, particularly personally identifiable information. Data science projects should include input from business leaders who may be more cognizant of how consumers and the public in general could react to data science in practice.

Teams, Not Unicorns

Like Dorothy in The Wizard of Oz, it could it be that most organizations need look no further than their own backyard for what they desire with data science. Data science fuses the expertise of many different fields; thus, a good way to move forward with data science could be to put together a team drawn mostly from existing personnel. Data science project teams will need to combine skills in data exploration, analytics, statistics, and communication, the last being vital for articulating findings to business leaders and other nontechnical personnel.

Data science is an exciting area that promises many real benefits to organizations in nearly all industries. It should not be stuck in a frustrating pursuit of imaginary heroes of folklore.