When you can’t trust the machines: Beware of bad data in AI engines

This article was contributed as part of Tearsheet’s new Thought Leaders contributor program. Author Giles Nelson is Chief Technology Officer, Financial Services at MarkLogic.

Financial services companies are deploying artificial intelligence faster than other industries, according to research from Econsultancy and Adobe.

In a survey done about a year ago, more than 60 percent of 700 senior financial services leaders said they were already using AI or planned to within the year. That put the industry far ahead of other sectors. Among the financial service companies using AI, almost half did so for data analysis, the survey shows.

No doubt, financial services is betting big on AI. Market researcher IDC predicted that global banks will have spent more than $4 billion on AI last year alone. That is a huge investment—made for good reason.

Most financial services firms hold a lot of information about individuals. By using AI to analyze this treasure trove of information, financial services firms can better deliver personalized suggestions, tailored financial products and stand out in an industry full of commoditized products.

America’s biggest bank, JPMorgan Chase, has gone so far as to call AI a ‘game-changer’, the bank’s global head of AI, Apoorv Saxena, told [email protected]. Saxena had been an AI executive at Google before joining JPMorgan last year.

The problem of bad data
Yet AI efforts will fall short unless the data that goes into the AI engine is clean— and dirty data is an enormous problem.

When different departments of financial services giants enter data into separate data silos, records will be duplicated, names will be spelled different ways, addresses won’t be updated. The dates of transactions, account numbers and personal information may also end up in different formats in different data silos—making it difficult to reconcile them automatically and thus impossible to accurately analyze. Dirty data can also remain hidden for years, which makes it even more difficult to deal with when it is actually found.

MIT Sloan Management Review reported that bad data “is the norm,” and ends up costing businesses an average of 15 percent to 25 percent of revenue. “These costs come as people accommodate bad data by correcting errors, seeking confirmation in other sources, and dealing with the inevitable mistakes that follow,” the review reported.

If this is the data feeding AI engines, the results will be intelligence that cannot and should not be trusted. In fact, one of the biggest pain points and barriers to AI moving from pilot to production is lack of well curated data to train AI solutions, Forrester Research’s Michele Goetz was quoted as saying in CIO.

Even IBM’s highly lauded Watson has struggled with data issues. IBM and Watson promised to transform cancer care with the help of artificial intelligence at the University of Texas MD Anderson Cancer Center. But the project fell far short of expectations, in part because of struggles to integrate AI software with complicated healthcare data.

Get data ready for AI
Before deploying AI solutions, companies need to get their data in order. That includes such things as:

Unifying data. This way, companies will have a comprehensive view of all data and will more easily spot, and thus be able to correct, errors or inconsistencies. Nothing will kill a marketing effort to sell someone a new mortgage or other financial instrument as quickly as a personal offer that doesn’t include correct names, addresses or financial details.

Integrating data. By integrating data from different silos, AI will more easily spot useful insights and not simply crunch one or two sets of data that won’t reveal the whole picture of a consumer’s financial preferences and commitments. AI thrives on lots of data. To make AI useful, data from different parts of a financial services company need to be accessible so the AI systems can use it.

Implementing good data governance. This will help build confidence in the data. Too often, data is present in isolation, with no knowledge of its provenance – when it was created, by whom, if it was changed and by whom. Such metadata is critical to being able to trust the data that feeds any analysis, whether driven by AI or not.

Securing the data. When data is appropriately secured—which includes such things as having proper access controls, personal information that’s appropriately restricted and security that moves with the data—data can then be safely shared. Being able to share data—whether across companies or even with partners—will enable more analysis and insights to be gleaned.

Getting data in shape will have benefits for financial services firms beyond AI. Missing, incomplete and inaccurate data can also lead to the wrong trade being made and slower decision. As data and privacy regulations continue to mount, any effort taken at the front end to get data in shape will also pay off in faster regulatory compliance.

ABN AMRO, for instance, aggregates vast amounts of unstructured and structured trade data into one central operational trade store. With a consistent, transparent record of every order and trade event, ABN AMRO is able to comply with internal and external reporting requirements in a fast and flexible manner. Also, by searching and analyzing all of its trade data in new ways, ABN AMRO can see future benefits such as discovering trends in how trading occurs.

Personalizing financial services
The holy grail of any industry is to improve the customer journey, delight customers and provide them with goods and services they need and may not even know they want until they’re exposed to them.

In the Econsultancy survey, companies noted that their top priorities were to optimize the customer experience and make data driven marketing decisions focused on the individual.

AI solutions will no doubt help financial services firms do this better and better. But to achieve its full potential, AI needs the best data possible. And to deliver the best data possible, financial organizations need a holistic view of their data along with agile data models that can evolve as business requirements change.