Mine your language: Software decodes company reports

COMPANY financial reports don't usually make for thrilling reading, but with the ability to make or break fortunes, they come under intense scrutiny. Now software that can extract information from the nuanced language of such reports could provide investors with the edge they need to stay ahead of the competition.

"Financial statements carry important information about the health of reporting companies," says Chao-Lin Liu at National Chengchi University in Taipei, Taiwan. But companies habitually downplay negative aspects by using ambiguous language and burying nuggets of information in pages of droning prose.

Text-mining techniques generally concentrate on single words: counting the number of negative or positive words in a body of text can give an indication of the overall tone, for example. But it is impossible to say whether certain words taken in isolation - such as "increased" - are positive or negative, says team member Yuan-Chen Chang. So the team designed an algorithm to recognise meaningful phrases instead (arxiv.org/abs/1210.3865).

To do this, Liu and his colleagues use statistical models to automatically identify what they call opinion patterns - subjective phrases paired with an opinion holder. For example, the sentence "The Company believes the profits could be adversely affected" contains the opinion holder "The Company" and the subjective phrases "believes" and "could be adversely affected".

"Computer linguistics and automated textual-information processing are one of the new frontiers in the world of finance," says Werner Antweiler of the Sauder School of Business at the University of British Columbia in Vancouver, Canada. "This technique adds another tool to our statistical toolbox of text-mining algorithms."

Trading algorithms mostly rely on quantitative information, says Liu, "but it is obvious that textual information should be considered as well". For example, the team's software could flag up phrases that don't appear to tally with a company's stated earnings, prompting a financial analyst to take a closer look. "Numbers can be used to convey a picture that does not correspond to reality," says Vincent Papa, director of financial-reporting policy at the CFA Institute in London. "They tend not to reveal what really keeps managers awake at night. The tone of a report is a very useful complementary piece of information."

Murray Frank at the University of Minnesota in Minneapolis points out that sophisticated linguistic analysis is a very hard task. The software needs to learn which words are positive, which are negative and which are neither. Phrases need to appear often enough for a statistical-learning algorithm to accurately categorise them. Multi-word phrases might not occur often enough to help. "Bundles of words tend to be rare things," he says.

The whole point of the account reporting system is to release information in a way that is fair to all investors, says Frank. "But if you can guess correctly ahead of others, you can make a lot of money." If the team's system provides an edge, it could prove extremely valuable.

Mining text to monitor trends and opinions in the financial world is a rapidly growing field. "Some of the news-feed providers such as Reuters already use sentiment analysis," says Antweiler. But such technology shouldn't be relied on for automated decisions, he warns. "The simple truth is that text mining can be helpful, but it doesn't replace sound judgement and common sense."

Mining for a gaming smash

The success of video games could be predicted by data mining. Christian Bauckhage at the University of Bonn, Germany, and colleagues applied pattern recognition and statistical analysis techniques to data gathered from more than 250,000 players of five blockbuster games in the months after their release. They found that the decline in frequency and time people spent playing each game fit well-known mathematical models.

If similar monitoring was done during pre-release consumer testing, game publishers could use these models to predict the popularity and lifespan of a new game once it hits the market. The work was presented at the Game/AI conference in Vienna, Austria, last month.

If you would like to reuse any content from New Scientist, either in print or online, please contact the syndication department first for permission. New Scientist does not own rights to photos, but there are a variety of licensing options available for use of articles and graphics we own the copyright to.