The second article (hat tip Chris Wiggins) was published in Forbes, has a great title (“Data Science: Buyer Beware“), an enormously skeptical outlook, and takes quotes from data science celebrities. From the article:

Personally, my bigger concern is that the algorithms that are shaping my digital world are too simple-minded, rather than too smart.

The second article brings up the ideas that we’ve been through similar thought and management revolutions before, and trouble lies with anything that is considered the silver bullet. Here’s my favorite part:

…data science tries to create value through an economy of counterfeits:

False elites, arising as persons are summarily promoted to high status (viz., “scientist”) without duly earning it or having prerequisite experiences or knowledge: functionaries become elevated to experts, and experts are regarded as gurus,

False roles, arising as gatekeepers and bureaucrats emerge in order to manage numerous newly created administrative processes associated with data science activities, yet whose contributions to core value, efficiency, or effectiveness are questionable,

False scarcity, arising as leaders and influencers define the data scientist role so narrowly as to consist of extremely rare, almost implausible combinations of skills, thereby assuring permanent scarcity and consequent overpricing of skills.

For the record, I’d rather define data science by what data scientists get paid to do, which is how we approached the book. Even better if we talk about data scientists as people who work on data science teams, where the “extremely rare, almost implausible combinations of skills” are represented not by one person but by the team as a whole (agreed wholeheartedly that nobody is everything a typical LinkedIn data scientist job description wants).

The only weird part of the second article is the part where writer Ray Rivera draws an analogy between data scientists and “icemen”, the guys who used to bring ice to your house daily before the invention of refrigerators. The idea here is, I guess, that you shouldn’t trust a data scientist to admit when he is not necessary because there’s better technology available, not can you trust a data scientist to invent such technology, nor can you trust a data scientist with your wife.

For whatever reason I get a thrill from the fact that I pose such a sexy threat to Rivera. I’ll end with the poem he quotes:

Share this:

Like this:

Related

I grew up in a tropical climate (Hawaii) and the ice man cameth just once a week, not daily. The ice block was close to a cubic foot, about as heavy as the ice man could carry with his tongs. I heard about ice women but never saw one in our neighborhood.

Surely a data scientist is easily defined at a sufficiently consensus level by whether or not a model promoted by such is or is not scientific. Although some “scientific” aspects can be muddied by the use of nonsense terms such as confidence level, double blind, randomized, Bayesian, Frequentist, etc., a simple qualifier is the Popperian Test: Is the model falsifiable?

If not, it has no claim to science, and its creators are not scientists, but just another morphology of numerologists, further polluting the scene still reeking from gaussian financial instruments and wealth-seeking pharma-garbage pretending to be objective testing.

Just a thought / question. I’m looking at data / information / data science from a library point of view. Shouldn’t libraries start employing data scientists to make sure that access to data and the extraction and making sense of data can be a public good? I really enjoyed your post earlier on about NGO’s getting access to data through donations of data / infrastructure, but you also spoke of the fragility of the goodwill model.
Surely data now is what books used to be a century ago – elite and hard to get to by the common man / researcher. In the early days of journals on microfiche etc. and databases if you were researching something you needed to go and pose your question to a librarian (i’m talking the university context), who would then be able to translate that into searchable terms that had a hitting chance in hell of getting you the articles you needed. Go forward 30-40 years, shouldn’t we have someone in our bigger libraries, or at least have our bigger libraries and university libraries have some kind of consortia to employ someone that an NGO or researcher or journalist or joe-soap can go to and say “hey, I need THIS” run me a model?

richl40t

January 9, 2013 at 9:50 pm

The “poem” is from Ray Charles’ ” I’m gonna move to the outskirts of town.”

The “poem” is much older than the Ray Charles recording.The song was written by William Westley Weldon and Roy Jordon: Originally recorded in 1936 by Casey Bill Weldon http://en.wikipedia.org/wiki/Louis_Jordan

The four falses fit just about any area where denial, and well funded denial at that, are at work:)

thebrasstack

February 3, 2013 at 1:54 pm

My impression is that, at their best, data scientists are there to *orient in an unfamiliar environment.* “Ok, we found this pattern, now what does it mean? How does this tell a story of what’s actually going on?” That is a rare skill. It’s not domain expertise, it’s not any particular kind of math or engineering background, though all those things are good indicators. To be really blunt, you don’t so much want a member of a profession as you want somebody who can think really well.

The media have always tried to formalize “thinking really well” into a set of professional qualifications that anyone can check off one by one; and then the average member of that profession drops in quality. In the end, “analytics,” “consulting,” “data science,” “knowledge management,” whatever you want to call it — the point is, most companies have always paid dearly for people who can think.

There’s a company called Ayasdi that built a cool topological data analysis software tool based on the work of Gunnar Carlsson at Stanford. They have a team of “data scientists” whose job is to work with client companies and *use* the tool to come up with insights through unstructured learning. Now, why would they do that? Why not just sell the software and let the client play with it? Who really needs a middleman? In fact, what customers are paying for is just *thinking*, seeing a meaningful narrative in quantitative information. There may be a data science bubble, but there isn’t a thinking bubble.

Krzysztof

July 26, 2013 at 9:37 am

One pick: the point is not overpricing but underpricing of skills. They call three expert roles: programmer o’doom, graduate statistician and database cluster administrator a “data scientist” and get away with paying $200K for one expert instead of $100K each.

On top of that your salary is seen as a cost center for the business. This pressure for commodification of distinct expert skills also happens in quant world but there you are the business, thinking is the business, and is paid off adequately by the yearly bonus.