Four Ways Data Science Goes Wrong and How Test-Driven Data Analysis Can Help

Four Ways Data Science Goes Wrong and How Test-Driven Data Analysis Can Help

If, as Niels Bohr maintained, an expert is a person who has made all the mistakes that can be made in a narrow field, we consider ourselves expert data scientists. After twenty years of doing what’s been variously called statistics, data-mining, analytics and data-science, we have probably made every mistake in the book—bad assumptions about how data reflects reality; imposing our own biases; unjustified statistical inferences and misguided data transformations; poorly generalized deployment; and unforeseen stakeholder consequences. But at least we’re not alone. We believe that studying all the ways we get it wrong suggests a powerful “test driven” approach that can help us avoid some of the more egregious mistakes in the future. By extending the principles of test-driven development, we can prevent some errors altogether and catch others…