Cognitive Bias in Machine Learning June 8, 2018

I’ve danced around this topic over the last eight months or so, and now think I’ve learned enough to say something definitive.

So here is the problem. Neural networks are sets of layered algorithms. It might have three layers, or it might have over a hundred. These algorithms, which can be as simple as polynomials, or as complex as partial derivatives, process incoming data and pass it up to the next level for further processing.

Where do these layers of algorithms come from? Well, that’s a much longer story. For the time being, let’s just say they are the secret sauce of the data scientists.

The entire goal is to produce an output that accurately models the real-life outcome. So we run our independent variables through the layers of algorithms and compare the output to the reality.

There is a problem with this. Given a complex enough neural network, it is entirely possible that any data set can be trained to provide an acceptable output, even if it’s not related to the problem domain.

And that’s the problem. If any random data set will work for training, then choosing a truly representative data set can be a real challenge. Of course, will would never use a random data set for training; we would use something that was related to the problem domain. And here is where the potential for bias creeps in.

Bias is disproportionate weight in favor of or against one thing, person, or group compared with another. It’s when we make one choice over another for emotional rather than logical reasons. Of course, computers can’t show emotion, but they can reflect the biases of their data, and the biases of their designers. So we have data scientists either working with data sets that don’t completely represent the problem domain, or making incorrect assumptions between relationships between data and results.

In fact, depending on the data, the bias can be drastic. MIT researchers have recently demonstrated Norman, the psychopathic AI. Norman was trained with written captions describing graphic images about death from the darkest corners of Reddit. Norman sees only violent imagery in Rorschach inkblot cards. And of course there was Tay, the artificial intelligence chatter bot that was originally released by Microsoft Corporation on Twitter. After less than a day, Twitter users discovered that Tay could be trained with tweets, and trained it to be obnoxious and racist.

So the data we use to train our neural networks can make a big difference in the results. We might pick out terrorists based on their appearance or religious affiliation, rather than any behavior or criminal record. Or we might deny loans to people based on where they live, rather than their ability to pay.

On the one hand, biases may make machine learning systems seem more, well, human. On the other, we want outcomes from our machine learning systems that accurately reflect the problem domain, and not biased. We don’t want our human biases to become inherited by our computers.