Data, domain knowledge, and philosophical inquiry

There’s been some really fun debate on the internet due to this little debate at Strata that Michael Driscoll moderated. For those of you who don’t want to read Michael’s post, here is the controversial topic: “In data science, domain expertise is more important than machine learning skill.” The blog post then goes onto to argue that both are important, but shows how results can lead audiences toward machine learning. I do find it fascinating and very useful in one sense, not because I want to take sides, but because I think this starts opening the door to actual inquiry into what is knowledge, what is useful knowledge, what is merely information, etc.

Our CEO shared this great link from Paul Miller entitled “Hubris and the Data Scientist“, which I think does a great job breaking down the debate from a larger framework.

This is an extremely worrying attitude, and I can only hope that those who hold it realise the error of their ways before they make a catastrophic mistake that adversely affects the rest of us.

Data scientists are an increasingly capable bunch, and the tools at their disposal sometimes appear almost magical in their capability to derive insight. * * *

But to suggest that simply “letting the numbers speak for themselves” is an effective way to make real decisions is, quite simply, bonkers. Data is merely one input to an effective decision making process. Prior knowledge, policy considerations, and an awareness of experimental bias, sampling error, and quaint notions such as ground truth continue to play a fundamental part.

In responding to Brockmeier’s post, Strata co-chair Alistair Croll also makes an important point: “Of course, understanding which data to apply to a problem, and when to listen to the numbers, is a nuanced thing.”

* * *

Data Science — and the data scientist — are here to stay, and they bring tremendous value with them. But they’re an adjunct to domain knowledge, not a replacement for it.

Paul makes great observations: first, that you need knowledge and expertise to actually use data properly. I strongly agree with that. What good are numbers if they are not targeted properly? (Or provocatively, like if you read Freakonomics or Outliers) It takes expertise or at least interest (with some knowledge) to ask the right questions for which you can use data to do interesting things. Otherwise, you just information–you don’t know knowledge. (Alistair Croll’s point)

Backing up even one more step, I like to throw out this idea: what is “expertise”? “Science”? For all the way we use them, these are actually terms that are loosely defined, if at all, and not with a lot of precision. Results and data (information) can be supremely interesting in testing out the posits of science or expertise. And so they become an interesting tool to dismantle “expertise” when used properly. But these end up being tools to ask the deeper question of what is knowledge, expertise, science, etc.

And you know what? The best place at that point isn’t the data or the expert or the scientist. It isn’t audience polls. The king of such inquiry inquiry is actually the philosopher. I don’t claim to be one, by the way–just find the field quite fascinating. Because philosophers don’t even have a great working definition for “science”. Philosophers debate and argue about knowledge in its most fundamental sense, and as such, every other debate (such as the one about data/machine learning vs. domain expertise/science). We don’t need polls. We need more philosophers to properly unravel this question. (I’ll bet they probably have, it’s been over a decade since I’ve taken a philosophy course)

Epistemology is a great place to start this debate, not by polling the audience.