Tuesday, November 22, 2016

Once more on artificial intelligence and machine learning

In an earlier blog post,
I expressed my scepticism regarding the scientific value of non-transparent machine learning approaches, which only provide a result but no
transparent explanation of how they arrive at their conclusion. I am aware that
I run the risk of giving the impression of abusing this blog for my own agenda, against artificial intelligence and machine learning approaches in the
historical sciences, by bringing the problem up again. However, a
recent post in Nature News (Castelvecchi 2016)
further substantiates my original scepticism, providing some interesting
new perspectives on the scientific and the practical consequences, so I could not
resist mentioning it in my post for this month.

Deep learning approaches in research on artificial
intelligence and machine learning go back to the 1950s, and have now become so successful that they are
starting to play an increasingly important role in our daily lives, be it that
they are used to recommend to us yet another book that somebody has bought
along with the book we just want to buy, or that they allow us to take a little
nap while driving fancy electronic cars and saving carbon footprints for our next round-the-world trip. The same holds, of course, also for
science, and in particular for biology, where neural networks have been used
for tasks like homolog detection (Bengio et al. 1990) or
protein classification (Leslie et al. 2004). This is true even more
for linguistics, where a complete subfield, usually called natural language
processing, has emerged (see Hladka and Holub 2015 for an
overview), in which algorithms are trained for various tasks related to
language, ranging from word segmentation in Chinese texts (Cai and Zhao
2016) to the general task of morpheme detection, which
seeks to find the smallest meaningful units in human languages (King
2016).

In the post by Castelvecchi, I found two aspects that triggered my
interest. Firstly, the author emphasizes that answers that can be easily and
often accurately produced by machine learning approaches do not automatically
provide real insights, quoting Vincenco Innocente, a physicist at CERN,
saying:

As a scientist ... I am not satisfied with just distinguishing cats from
dogs. A scientist wants to be able to say: "the difference is such and
such." (Vincenco Innocente, quoted by Castelvecchi 2016: 22)

This expresses precisely (and much more transparently) what I tried to
emphasize in the former blog post, namely, that science is primarily concerned
with the questions why? and how?, and only peripherally with the question what?

The other interesting aspect is that these apparently powerful approaches
can, in fact, be easily betrayed. Given that they are trained on certain data,
and that it is usually not known to the trainers what aspects of the training
data effectively trigger a given classification, one can in turn use
algorithms to train data that will betray an application, forcing it to give
false responses. Castelvecchi mentions an experiment by Mahendran and Vedaldi
(2015) which illustrates how "a network might see wiggly
lines and classify them as a starfish, or mistake black-and-yellow stripes for
a school bus" (Castelvecchi 2016: 23).

Putting aside the obvious consequences that arise from abusing the neural networks that are
used in our daily lives, this problem is surely not unknown to us as
human beings. We can likewise be easily betrayed by our expectations, be it
in daily life or in science. This, finally, brings us back to networks and
trees, as we all know how difficult it is at times to see the forest behind the
tree that our software gives us, or the tree inside the forest of incompletely sorted lineages.