Data Scientist; Too narrow a definition?

A reincarnation of the BI professional, as suggested by this infographic from The Guardian, then No!

A new role which encompasses a broader analytics function within the enterprise, then Yes!

For anyone in the technology sector, Data Scientist is unquestionably the hottest job in town where everyone wants to be one and every company seems to be looking for one. While the notion of Data Scientist is not particularly new, their growing profile and perceived value within the enterprise is; where they are increasingly considered as the pixie dust that will transform rusty old business data into a treasure trove of insights.

Now my initial response to this rise of the Data Scientist was positive, albeit if I was somewhat skeptical with some of the hype. I mean how could I not be positive. The title somehow legitimizes the important role that I believe many of us have been playing over the years in helping to derive insights from both structured (db) and unstructured (content) data and mapping those insights into tangeable business value. A role that is highly technical and clearly scientific in nature, a role that brings together a broad range of disciplines; from computer science, to mathematics, linguistics, semantics, and social, behavioral, and cognitive science.

However, the more I look at the current definitions of this new sexy job — definitions that seem to be all focused on structured data, as though that is the only data that is going to generate any value for the business — the more I feel excluded.

I was reading an interesting blog post from Gartner’s @doug_laney last week, where he summarized a recent Gartner Information Management and Analytics Community Twitter Chat on big data, the role of the Data Scientist, and data quality. Worth a read! In this post Doug included a link to a tag cloud of the top 200 words used in Data Scientist job descriptions. And as they say “a picture speaks a thousand words…“, where the tag cloud shows large confident words like “algorithms“, “analytics“, “data“, and “statistics” shouting at us. In comparison to a smaller and more reticent “models“, with a really nervous looking “mining“, and perfectly terrified “text“, “social“, “relationships“, and “value” trying to slink quietly off the page. Not a pretty picture for us scientists working in the content, social, or semantic space.

However, I believe that the reality in the field is so completely different. In my discussions with clients around harvesting data for deriving business value I see unstructured data (content — social and business) playing a critical role. And my concern with the narrow definition of Data Scientist is not just about the data sources.

Coming from a content analytics background, you learn very quickly that deriving value from content is not just about the analytics end-point; the stage when a Business Analyst starts to manipulate, interrogate, and report on correlations, clusters, trends, or deviations. A lot of the hard work is on preparing the content for the analysis process, where sophisticated content modelling (NLP, statistics, semantics, rules, inferences, …) transforms, normalizes, and business aligns the content into something which is amenable to business analytics; where we structure an otherwise unintelligable cacophony of noise.

And to further complicate things, with the increasing pressure to put business analytics into the hands of end business users, more and more of this analysis work is having to be integrated, in some fashion, with the end-user analytics — reporting, discovery, manipulation, etc. Or at least there needs to be some feedback between both forms of analysis.

These two different types of analysis are equally important, however significantly different in what they aim to do and the skills required to deliver them. There is clearly lots of overlap, but its safe to say that the “Data Scientist“, as its currently defined, is not ideally qualified to perform a lot of the tasks that are required to derive insight from unstructured data (content modelling). And as we bring more and more content into the data analytics world, where data science resides, the importance of this content modelling will only increase. And yet there is no mention of this in most of the definitions of Data Scientist and the skills attributed to the role.

So I guess my conclusion is that we either need to broaden out the definition of the Data Scientist role, or else we need a major reality check, where the value of more traditional roles like Content Analyst are put on a equal footing with the new cool kid on the block, the Data Scientist.

3 Responses to “Data Scientist; Too narrow a definition?”

Thanks Marie! My colleague Lisa Kart and I will have a more formal research note published in a matter of days that goes into detail as to what the data scientist role entails, even key soft skills, and other indicators of growth.