Automation tools promise to accelerate machine learning

Dreaming up obscure insults might be a good way to pass the time in a bar, but it’s a strange day job. Nonetheless, it’s a serious business if you are trying to train a machine to spot unacceptable online behaviour. Data scientists not only need to provide training data; they also need to describe which language is likely to offend within that data. The process, known as annotation, is just one of the laborious tasks data scientists face that IT firms are promising to make easier with automation. Amazon, Microsoft, Google and IBM are offering a raft of technologies to automate machine learning processes (see box). But smaller firms are providing more niche technologies.

Automating annotation

Explosion AI provides Prodigy, software which automates some parts of annotation. It can extrapolate a corpus of relevant terms from a few seed words and helps data scientists quickly confirm the targeted language using a Tinder-like graphical interface.

Co-founder Ines Montani has demonstrated the efficiency of Prodigy in annotating insulting language to help moderate online behavior, for example on social media or ecommerce feedback comments, but the tools have been used to build applications analyzing text in financial services, she says.

“The bottleneck is training data. Companies are amassing data, hoping they can do something with it. While machine learning might provide some good applications, you still have to document and label the data to use it for training machine learning models,” Montani says.

Lindsay Clark is a freelance journalist specialising in business IT, supply chain management, procurement and business transformation. He has worked as news editor at Computer Weekly and several other leading trade magazines. He has also written for The Guardian, The Financial Times and supplements to The Times.