5 Q’s for Financial Text Analysis Expert Aneet Kumar

The Center for Data Innovation spoke with Aneet Kumar, president of financial text analysis software company Contexxia. Aneet talked about the information overload financial analysts face and some of the unusual insights her company’s software has turned up.

Aneet Kumar: Contexxia is a software product we’ve created to bring transparency to U.S. Securities and Exchange Commission (SEC) filings. We’ve used natural language processing and semantic technologies to provide three different views into filings, thereby allowing the user to grasp hard-to-understand data fairly easily. The users are regulators, investors, academic institutions, auditors, investigative journalists, and others.

Korte: How does it work?

Kumar: We get the SEC filings in real time when they are posted on the SEC’s Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system and we run this through an extensive model we’ve created which breaks down the filings into individual pieces of text and then puts them back together in a format that looks like HTML. It looks like the same filing but we have individual identifiers for each of these fragments within our tool. This allows us to use natural language processing to compare fragments with previous filings from the same organization fairly easily. We’ve looked at unstructured data, since we believe there is a host of information in footnotes and other annexes.

People do this kind of analysis on a regular basis, but it takes a lot of effort to find changes a company has made. Sometimes critical pieces of information are even being hidden in these filings, and what Contexxia does is let the user identify all changes that have occurred between this filing and a previous filing. Most of these filings have a lot of repetition of content, so people have content overload. Getting the real value out of these filings is a real problem.

Another view in our tool lets the user see all the events that have occurred over a period of time for a company. This could include lawsuits, acquisitions, mergers, or new management being hired. And all those critical events are sorted out so a user can very quickly get to know what’s going on at a company. The third view, which is the numerical view, uses the XBRL data to pull out all the numeric information. Users can see if the company is tagging new concepts in XBRL. If the same data has been tagged in two different ways, this can raise questions so users might want to look at the information in more detail. We also look at outliers, looking at things that have increased by over 500 percent in the numeric data to spot them quickly.

Korte: Talk about the information overload. Without using Contexxia, what would the analyst face? Where is the labor saved with this product?

Kumar: If you look at traditional documents, they sometimes run into thousands of pages, but so much of the content is repetitive. Sometimes there is a lot of wordsmithing going on. We had a fortune 500 CEO come into our office and say “We spent days trying to hide this information and Contexxia found it right away.” If you just change the numbers or the language of the paragraph but the content is the same, it’s hard to determine what are the real meaningful changes in the document. Reading those kind of documents for an analyst becomes extremely difficult. You could also have the same information mentioned in completely different sections from filing to filing. An analyst using traditional tools would find it very difficult to learn that the two sections refer to the same content. With Contexxia, it will find this information easily.

Korte: Last time we spoke you mentioned a case study of a company attempting to hide information. Can you describe it further?

Kumar: There was a company that mentioned in a very minor way that there was a division that was running into some kind of issues. Using Contexxia, we saw in the next filing that the division had been sold. This was in an entirely different section: it wasn’t mentioned under risk. We picked out the two related text fragments immediately. We see this fairly often. We are able to find a lot of these fragments where a minor mention to something leads to a major event in the next filing. If any analyst is looking at changes in a regular filing, they might not pay as much attention to the minor mention, but Contexxia highlights it.

There was another very big company that went into bankruptcy recently, and in their previous filing made no mention of any risk. They said they had no risks to report. And then in the next quarterly filing talked about running into serious money issues and filed for bankruptcy two months later. We detected it. And everything they talked about was so visible that there was a class-action suit filed on them because the company did not expose the risk.

Korte: Talk a little about about vision for the future of Contexxia. Do you see the product being useful beyond SEC filings?

Kumar: Already in the filing space we’ve been working with regulators to expand the software. We’ve also started adding new features around pre-filed documents so companies can use this for compliance purposes or audit the filing before they submit it. Other areas we’re looking into in the future are health and contracts, both of which we believe will lend themselves very well to Contexxia’s feature set and model.

Travis Korte is a research analyst at the Center for Data Innovation specializing in data science applications and open data. He has a background in journalism, computer science and statistics. Prior to joining the Center for Data Innovation, he launched the Science vertical of The Huffington Post and served as its Associate Editor, covering a wide range of science and technology topics. He has worked on data science projects with HuffPost and other organizations. Before this, he graduated with highest honors from the University of California, Berkeley, having studied critical theory and completed coursework in computer science and economics. His research interests are in computational social science and using data to engage with complex social systems. You can follow him on Twitter @traviskorte.