Treasure Hunting in the 21st Century: Seeking Alpha in the Infinite Ocean of Data

Mon, 27 Jul 2015 03:34:59 GMT

By Igor Tulchinsky

Many investors rely on traditional research to drive their investment decisions, diving deep into the financial histories and management practices of a relatively small number of companies. As long as such research is able to derive unique insight/an advantage, the best practitioners of this rigorous, time-tested and time-intensive discipline will continue to earn above-market returns.

But by some estimates, seventy percent of total trade volume is now driven and executed by computer code. Algorithmic activity encompasses most medium and all high-frequency trading worldwide. This kind of trading analyzes and detects patterns in massive amounts of numerical data, scanning trillions of bits of data in milliseconds. The more data an algorithm can make use of, the greater its effectiveness.

Let’s look at this thought process more carefully. There are three observations to consider and an important conclusion that follows from them.

First, data has indeed become “big data”—in finance it is naturally very big, and growing. The size and growth rate of the digital data universe can only be estimated, and estimates vary widely. One respected source (http://www.emc.com/leadership/digital-universe/2012iview/executive-summary-a-universe-of.htm) projects that from 2005 to 2020, data will have grown by a factor of 300, from 130 exabytes to 40,000 exabytes, or 40 trillion gigabytes. From 2013 until 2020, the amount of data will roughly double every two years. Putting this last fact in mathematical terms, the growth of Big Data is exponential –(√2)^ (time-in-years) –very rapid indeed. It requires retrieval and storage capacities to match if it is to be used.

Second, big data requires big data processing capacity (and big data itself derives in part from big processing capacity). Moore’s “Law” tells us that processing capacity doubles roughly every one and a half to three years (though slowing of late, and with unpredictable changes looming as components grow tinier and quantum effects begin to interfere—perhaps one day these effects will be harnessed). Thus the growth of processing capacity is also exponential, and at about the same rate as data growth: (√1.5 to √3.0)^ (time-in-years). It is not clear which is faster, but what matters is that they are both exponential in time. Third, in finance big and growing data is recursive as it is not in other domains. In medical research, for example, previously invisible patterns among widely disparate biological variables lead to new hypotheses about both illness and treatment. These have already yielded a rich harvest of new treatments—and the potential for treatments designed on a per-individual basis. But however large medically pertinent data grows, the underlying universe of all such possible facts is static. It doesn’t change by being used. In finance this is not so. The simplest example of this is large-trade impact on the bid-ask spread. But a much larger effect is due to the impact of all trades. Market data by definition reflects trading action but trading rules and action—especially in the algorithmic domain—depend on that data. Thus the domain of working trading rules is ever-changing. It follows that alpha—let’s use the noun alphas, trading rules that yield superior performance—grows without out bound: Ever bigger data and computational capacity enlarges the ocean of alphas, it does not exhaust it. Crucially, this ocean is enlarging at a much faster rate than either data or processing capacity. This is because candidate alphas represent operations on permutations and combinations of the data; and as processing capacity increases, the length and complexity of time-series data that can be used to formulate alphas both increase. If data grows exponentially, and if processing does, too, then the number of possible alphas grows superexponentially. The number of alphas to hunt and discover is for practical purposes infinite. This is not a mere guess: Formal studies of agent-based market models show this explicitly (for example, see the “Minority Game” developed at the Santa Fe Institute and a central object of study in the burgeoning domain of Econophysics--http://www3.unifr.ch/econophysics/). In even the very simplest binary prediction models of just a single time series, “strategy space” (the ocean of all possible alphas) grows with m, the number of prior data points used, as 2^(2^m), i.e., superexponentially. This is not all. The SEC recently decided to allow companies to release material information via Twitter. But it is not just financial data from new media sources that affects markets. Social chatter of almost any kind can influence the market in milliseconds. While data is growing exponentially, the amount of information that moves prices is virtually endless, limited only by the amount of total activity in the economy, which itself is an increasing number. The result is a premium to those who digest the data faster, but it is not possible to catch up—to exhaust the search space. There will always be room for more alphas. Extracting signals from an ever-expanding ocean of noise is a growing challenge. The solution space is non-convex, discontinuous, and dynamic: good signals often arise where least expected. How does one extract such signals? By limiting the search space, using methods previously used by treasure hunters: Search in the vicinity of previous discoveries; Conserve resources to avoid digging too deep; Use validated cues to improve the probability of a find. Yet always allocate some processing power to test wild ideas. Will exponentially growing data lead to ever diminishing returns? For each individual alpha, the answer is, “yes.” For alphas in the aggregate, the answer is “no”. Who then can use this amount of data? Who can afford to use it? In an ocean of exponentially expanding data and processing, the successful treasure hunter is the one with the most alphas, who harnesses the latest machines to find and use new signals faster than the competition. The complexity and dimensionality of the alpha search game will keep increasing. The proportion of easy-to-find alphas will keep decreasing. The advantage / upper hand /benefit / gain will increasingly be had by the trader who sees the whole picture, who can combine millions and billions of ever-fainter and subtler signals. It will take ever faster, bigger and better equipped ships to search successfully the ever-growing ocean. What will follow is sharp differentiation with a few sophisticated and technologically well-equipped players able to demonstrate an ever-adaptive, unbounded capacity to discover new alphas.