Taking the Pulse of the World

“Buy the rumour, sell the news” is an old adage that many people in Finance might still consider true. In their daily business, people operating in Finance have to analyze an ever-increasing stream of news and separating the wheat from the chaff is becoming a formidable challenge. Automation can provide a substantial help in doing that, and may give an information edge to an investor.

To process news feeds, a few companies have started offering products and services to analyze the sentiment of a text, where by sentiment we intend the general mood – bullish or bearish. Sentiment analysis, coupled to some basic keyword-based filtering, can provide an indication of the mood of the market with respect to a given business sector or stock. For example, we can apply a filter to a news-stream and select only those news that contain either the words Apple or AAPL. We can subsequently use a sentiment analysis Software-as-a-Service (SaaS) to evaluate the average bullish- or bearish-ness of the news and act upon it. But there is more to it, because sentiment analysis is just a special application of the much broader discipline called Text Classification, a branch of Natural Language Processing.

Natural Language Processing is a mix of linguistics and machine-learning whose main goals include the understanding of human – natural – language. Many things that are easy and straightforward for humans can be difficult to replicate with a computer. Recognizing a face, navigating a building or understanding a sentence are three common examples. These problems can become more manageable by circumscribing their field of application and limiting the range of possible solutions. For this reason, a substantial amount of resources has been devoted to text classification, also called document classification or document categorization. Text classification is the task to assign a document, for example a piece of news or an email, to one or more classes or categories. Assigning a piece of news to either class positive or negative is what we normally refer to as sentiment analysis, but automatic (or algorithmic) text classification is currently employed in a broad range of applications, for example to filter spam, spot topics, route email and identify languages, just to name a few.

Machine learning practitioners usually raise an eyebrow when they hear the term Artificial Intelligence, which is too easily linked to popular science fiction movies, but that’s what text classification aims at. Machine learning algorithms, namely statistical learning, automatically learn the rules needed to classify text and in this way mimic human intelligence. Algorithms are trained on a set of examples provided by a human operator. This means that a human operator must compile a list of manually-classified text snippets (for example, news), which are then used by the AI algorithm to infer the classification rules. A typical input file would contain from a few hundreds to a few thousands lines like the following:

“Positive”, “ACME Inc has just announced excellent result for the last quarter”

“Negative”, “The high price of oil is negatively affecting the operations of ACME Inc”

“Negative”, “The layoff of 1’500 ACME Inc. employees has caused riots in the streets of Utopia”

“Positive”, “The soaring demand for ACME’s widgets is bringing about unexpected financial results for this company”

Although some companies advocate the use of semantic methodologies, which assign a meaning to the words in each single sentence and eventually to the sentence as a whole, simple and robust statistical methods can provide a high level of classification performance. Typically 8 to 9 sentences out of 10 are correctly classified during a standard out-of-sample classifier validation (during an out-of-sample validation, a classifier is typically trained on 90% of the examples and tested on the remaining 10%. This procedure is repeated 10 times and the results averaged). To gauge the performance of a classifier, just imagine how would a junior employee of your company do on a classification test. Instead of spending months to transfer your experience to younger colleagues, you can now transfer most of it (but – beware – in a very limited field) in just a few minutes by training a AI-based classifier.

The current “off-the-shelf” commodity sentiment analysis services offer already-trained classifiers, to relieve their clients from the tedious task of manual classification in preparation for the classifier training. Although this makes business sense, pre-trained classifiers have the major shortcoming of being intended to be good-enough solution for a wide range of users and applications, which ends up having a negative influence on their performance. In reality, each risk manager and trader might have her own definition of the sentiment to be attached to a piece of news. What’s positive for one person might be negative for another one, depending on personal experience and material, objective reasons. On to top that, any person is usually interested in a subset of all available news, which might be either processed completely automatically, and their aggregated statistics displayed on a dashboard, or automatically picked to be submitted to a more careful, human-based analysis.

The Jamie Oliver Version

It is straightforward to conclude that there is no unique, one-fits-all solution to the problem of text classification in financial applications, but its future lies in the development of tailored or highly customizable solutions, where each user can autonomously train an automated system to, for example, select news relevant to an investment portfolio and analyze their sentiment in the context of its exposures. The difference between text classification done by a “commodity” service and a customizable solution is similar to the one between going to the fast-food and have a chef cooking your favorite dish at your place. The good thing is that, in the case of text classification, the chef is not awfully more expensive than the fast food. On a commodity server, the time needed to train a state-of-the-art classifier on a few thousands examples is barely enough to drink an espresso. In both cases, following the right recipe, cooked by either a data-scientist or Jamie Oliver himself, is the most critical step of the process.

Configurable and customizable text classification is a technology that is not only within reach, but it being used in production in the context of Customer Experience Management (CEM) to analyze in real time thousands of feedbacks from the clients of cable TV and telecommunication operators, airways and retailers. In a world where competition and technology have commoditized services and products, the personal relationship between a corporation and its customers is becoming more and more a success factor. Although “personal” and “corporation” are words that, historically, haven’t gone together well, Customer Experience Managers are focusing their efforts on collecting their customers’ comments, the so-called Voice of the Customer (VOC) and turning that information into actionable knowledge. As CTO of CustVox AG, a Zurich-based company specializing in solutions for CEM, I designed and developed a Web API and Application for customizable text classification that is gaining wide acceptance in several large corporations.

Although some high-tech hedge funds have been using text classification for quite a few years, its importance in Financial applications has been acknowledged only recently. The fields of application are manifold and the collaboration of practitioners, technologists and researchers is all it’s needed to bring its application to maturity. Visionaries can already see real-time sentiment indicators focused on specific business and geographical areas flickering on their monitors, taking the pulse of a world where brains and voices enjoy the same level and speed of connectivity that exchanges and electronic marketplaces have. That’s quite some food for thought for those who think that moving averages and Bollinger bands are all they need.