Wednesday, March 15, 2017

OMG help me! It's a simple rule – if it's too good to be true it probably isn't. I'm hot under the collar right now from reading an academic paper that has fallen hook, line and sinker for an urban myth.

It's hard not to get lost in hype. We all believe what we want to believe. We are hard-wired to 'see' the world in a way that confirms to our existing beliefs (it's called the confirmation bias).

In IT, there is plenty of hype and lots of selective (and sometimes unconscious) use of evidence as justification for positions and beliefs. That's just the way the the world is, and I love that as an academic I have the freedom to rock the boat occasionally and poke fun at some of what goes on in industry from time to time.

I think academics have a lot to offer industry, even though we use often use language that is inaccessible to practitioners (largely because we are writing for other academics). We provide thinking that is objective, theory rich and evidence based. And when it's explained or presented in the right way, this can provide a useful perspective for practitioners to deal with real problems and issues.

However, in the last little bit, many academics have been swept up in the hype surrounding Big Data and Data Science. Davenport's famous statement that being a data scientist is 'The Sexiest Job of the 21st Century' is thrown into conversations, papers and presentations uncritically by academics and practitioners alike. We desperately want this to be true, but seriously, if committing R-code to a GIT repository is sexy, then I'm in need of a whole a new definition of sexy. As a result, we've forgotten what our role should be and have unthinkingly dived head-first into the role of Data Science evangelism.

That 'need to believe' is going to cause us problems. Data Science is, of course, great, but it has its limitations, and there are many many problems with what used to be called the 'normative approach' to decision support. These problems have been well understood since the 70s and 80s and aren't changed by the use of Hadoop or R, or whatever is the new silver bullet technology for crunching large amounts of data. Nobody that I've seen working on Data Science is seriously addressing these issues. (Have a read of Peter Keen's insightful review of problems facing the approach - written in 1987 - "Decision Support Systems: The Next Decade"). The work being done on Data Science is almost exclusively focussed on the development of new technologies and techniques for crunching data – that's nice, and makes for neat applications, but it doesn't change the kinds of problems that will be solved (which are generally narrow, structured, well defined and almost always operational.)

In reading about Data Science today, I ran head-on into the kind of mistake that an academic shouldn't make. I was reading a paper in a very good journal – highly rated, peer reviewed – on the ROI of data science. It was the kind of journal where, if you are able to get a paper published, your career is made (at least for short while). In its abstract and main body of discussion, the paper repeated an often told story of a large US-based retailer (Target) that used an algorithm to predict which female customers were pregnant, and used this information to send them offers. It's a great anecdote that illustrates the power of predictive algorithms while also showing the ethical line that can be crossed by Big Data analysis. As the paper's authors stated, the predictive algorithm "proved to be an invasion of privacy into the life of a minor [a teenage girl who was correctly identified by the algorithm as being pregnant] and informed her father of her untimely pregnancy prematurely".