Stories of AI Failure and How to Avoid Similar AI Fails

January 30, 2020

&nbsp 11 m, 1 s

Don’t fail prey to the AI hype machine. These stories of AI failure are alarming for consumers, embarrassing for the companies involved, and an important reality-check for us all. This article includes stories of recent, high-profile AI fails, as well as information and advice on how to avoid your own AI failure:

Full disclosure if you’re new to Lexalytics: we provide a software platform that uses AI and machine learning to help people analyze text documents, including tweets, reviews and contracts. But the stories and the advice presented here are relevant for anyone involved in AI/machine learning – and anyone else, really.

Fail: IBM’s “Watson for Oncology” Cancelled After $62 million and Unsafe Treatment Recommendations

No AI project captures the “moonshot” attitude of big tech companies quite like Watson for Oncology. In 2013, IBM partnered with The University of Texas MD Anderson Cancer Center to develop a new “Oncology Expert Advisor” system. The goal? Nothing less than to cure cancer.

The first line of the press release boldly declares, “MD Anderson is using the IBM Watson cognitive computing system for its mission to eradicate cancer.” IBM’s role was to enable clinicians to “uncover valuable insights from the cancer center’s rich patient and research databases.”

So, how’d that go?

“This product is a piece of sh–.”

In July 2018, StatNews reviewed internal IBM documents and found that IBM’s Watson was making erroneous, downright dangerous cancer treatment advice.

According to StatNews, the documents (internal slide decks) largely place the blame on IBM’s engineers. Evidently, they trained the software on a small number of hypothetical cancer patients, rather than real patient data.

The result? Medical specialists and customers identified “multiple examples of unsafe and incorrect treatment recommendations,” including one case where Watson suggested that doctors give a cancer patient with severe bleeding a drug that could worsen the bleeding.

“This product is a piece of s—,” one doctor at Jupiter Hospital in Florida told IBM executives, according to the documents. “We bought it for marketing and with hopes that you would achieve the vision. We can’t use it for most cases.”

In February 2017, Forbes reported that MD Anderson had “benched” the Watson for Oncology project. A special report from University of Texas auditors said that MD Anderson had spent more than $62 million without reaching their goals.

Fail: Microsoft’s AI Chatbot Corrupted by Twitter Trolls

Microsoft made big headlines when they announced their new chatbot. Writing with the slang-laden voice of a teenager, Tay could automatically reply to people and engage in “casual and playful conversation” on Twitter.

Tay grew from Microsoft’s efforts to improve their “conversational understanding”. To that end, Tay used machine learning and AI. As more people talked with Tay, Microsoft claimed, the chatbot would learn how to write more naturally and hold better conversations.

Microsoft won’t say exactly how the algorithms worked, of course. Perhaps because of what happened next.

Less than 24 hours after Tay launched, internet Trolls had thoroughly “corrupted” the chatbot’s personality.

By flooding the bot with a deluge of racist, misogynistic, and anti-semitic tweets, Twitter users turned Tay – a chatbot that the Verge described as “a robot parrot with an internet connection” – into a mouthpiece for a terrifying ideology.

Microsoft claimed that their training process for Tay included “relevant public data” that had been cleaned and filtered. But clearly they hadn’t planned for failure, at least not this kind of catastrophe.

After a cursory effort to clean up Tay’s timeline, Microsoft pulled the plug on their unfortunate AI chatbot.

Fail: Apple’s Face ID Defeated by a 3D Mask

Apple released the iPhone X (10? Ten? Eks?) to mixed, but generally positive reviews. The phone’s shiniest new feature was Face ID, a facial recognition system that replaced the fingerprint reader as your primary passcode.

Apple said that Face ID used the the iPhone X’s advanced front-facing camera and machine learning to create a 3-dimensional map of your face. The machine learning/AI component helped the system adapt to cosmetic changes (such as putting on make-up, donning a pair of glasses, or wrapping a scarf around your neck), without compromising on security.

But a week after the iPhone X’s launch, hackers were already claiming to beat Face ID using 3D printed masks. Vietnam-based security firm Bkav found that they could successfully unlock a Face ID-equipped iPhone by glueing 2D “eyes” to a 3D mask. The mask, made of stone powder, cost around $200. The eyes were simple, printed infrared images.

Bkav’s claims, outlined in a blog post, gained widespread attention, not least because Apple had already written that Face ID was designed to protect against “spoofing by masks or other techniques” using “sophisticated anti-spoofing neural networks”.

Not everyone was convinced by Bkav’s work. Publications such as Wired had already tried and failed to beat Face ID using masks. And Wired’s own article on Bkav’s announcement included some skepticism from Marc Rogers, a researcher for security firm Cloudflare. But the work – and this glimpse into the weakness of AI – is fascinating.

Fail: Amazon Axes their AI for Recruitment Because Their Engineers Trained It to be Misogynistic

Artificial intelligence and machine learning have a huge bias problem. Or rather, they have a huge problem with bias. And the launch, drama, and subsequent ditching of Amazon’s AI for recruitment is the perfect poster-child.

Amazon had big dreams for this project. As one Amazon engineer told The Guardian in 2018, “They literally wanted it to be an engine where I’m going to give you 100 résumés, it will spit out the top five, and we’ll hire those.”

In fact, that’s not even the first time someone’s proven that Rekognition is racially biased. In another study, University of Toronto and MIT researchers found that every facial recognition system they tested performed better on lighter-skinned faces. That includes a 1-in-3 failure rate with identifying darker-skinned females. For context, that’s a task where you’d have a 50% chance of success just by guessing randomly.

This is, of course, horrifying. It’s not even an “AI fail” so much as a complete failure of the systems, people and organizations that built these systems.

Real Quick: 5 More AI Fails

In one story, Facebook had to shut down their “Bob” and “Alice” chatbots after the computers started talking to each other in their own language. And that’s just the beginning. Srishti continues with more examples from Mitra, Uber and Amazon.

9 More Ways to Guarantee an AI Fail

Francesco’s list is comprehensive, funny, and thought-provoking. It features some classic paths to failure, such as “Cut R&D to save money” and “Work without a clear vision”. But, Francesco says, “there is a plethora of ways to fail with AI”.

My favorite is #2, “Operate in a technology bubble.”

As Francesco points out, AI doesn’t always fail due to technical problems. Sometimes, the problem is a lack of social need or interest.

“Artificial intelligence technologies cannot be built in isolation from the social circumstances that make them necessary,” Francesco writes.

This is a fantastic point. In the rush to stay ahead of the technology curve, companies often fail to consider the impact of their inherent biases. This is particularly dangerous for companies working in data analytics for healthcare, biotechnology, financial services and law.

Why Maintenance is Critical to Avoiding an Embarrassing AI Failure

Just like a car, Paul explains, an AI can tick along for a while on its own. But failing to maintain it can destroy your project or product, and maybe even your company.

As cars become more complex, insurance companies advise owners to keep up with preventative maintenance before the cost of repairs becomes staggering. Similarly, as an AI grows more complex, the risks and costs of AI failure grow larger. And the longer you wait to repair your AI, the more expensive it’ll be.

Just like your car, an AI requires maintenance to remain robust and valuable. And just like your car, you may be faced with a sudden, catastrophic failure if you don’t keep it up-to-date.

In this article, Paul explains how data scientists can avoid AI failure by maintaining it with new training data, methods and models.

How to Get Real Value from Artificial Intelligence in 2020 and Beyond

Big AI projects, such as Watson for Oncology and self-driving cars, get most of the press coverage. But as the past few years have shown, moon-shots like these are the most likely to fail. And when they fail, they fail spectacularly (as we’ve been discussing).

How, then, can you build an AI system that actually succeeds? The answer is deceptively simple:

Focus on solving a real business problem.

Our own CEO, Jeff Catlin, has spent the past 15 years watching AI and machine learning get over-hyped and under-delivered. In this article on Forbes, he examines a number of business applications for AI solutions to:

Predict customer churn

Create better surveys

Read and handle online reviews

Craft effective messaging

“Building a business case for AI isn’t so different from building one for any other business problem,” Catlin writes. “First, identify a need and a desired outcome (automation and efficiency are common drivers of successful AI projects). Then undertake a feasibility assessment.”

The key is to look for business use cases where AI is already in action, or where it’s emerging as an effective solution.

Jeff puts it best: “With the right business case and the right data, AI can deliver powerful time and cost savings, as well as valuable insights you can use to improve your business.”