Contents

Introduction

Perhaps more than any other kind of time series data, financial markets have been scrutinized by countless mathematicians, economists, investors and speculators over hundreds of years. Even in modern times, despite all scientific advances, the effort of predicting future movements of the stock market sometimes still bears resemblance to the ancient alchemistic aspirations of turning base metals into gold. That is not to say that there is no genuine scientific effort in studying financial markets, but distinguishing serious research from charlatanism (or even fraud) remains remarkably difficult.

We neither aspire to develop a crystal ball for investors nor do we expect to contribute to the economic and econometric literature. However, we find the wealth of data in the financial markets to be fertile ground for experimenting with knowledge discovery algorithms and for generating knowledge representations in the form of Bayesian networks. This area can perhaps serve as a very practical proof of the powerful properties of Bayesian networks, as we can quickly compare machine-learned findings with our own understanding of market dynamics. For instance, the prevailing opinions among investors regarding the relationships between major stocks should be reflected in any structure that is to be discovered by our algorithms.

More specifically, we will utilize the unsupervised and supervised learning algorithms of the BayesiaLab software package to automatically generate Bayesian networks from daily stock returns over a six-year period. We will examine 459 stocks from the S&P 500 index, for which observations are available over the entire timeframe. We selected the S&P 500 as the basis for our study, as the companies listed on this index are presumably among the best-known corporations worldwide, so even a casual observer should be able to critically review the machine-learned findings. In other words, we are trying to machine-learn the obvious, as any mistakes in this process would automatically become self-evident. Quite often experts’ reaction to such machine-learned findings is, “well, we already knew that.” That is the very point we want to make, as machine-learning can — within seconds — catch up with human expertise accumulated over years, and then rapidly expand beyond what is already known.

The power of such algorithmic learning will be still more apparent in entirely unknown domains. However, if we were to machine-learn the structure of a foreign equity market for expository purposes in this paper, chances are that many readers would not immediately be able to judge the resulting structure as plausible or not.

In addition to generating human-readable and interpretable structures, we want to illustrate how we can immediately use machine-learned Bayesian networks as “computable knowledge” for automated inference and prediction. Our objective is to gain both a qualitative and quantitative understanding of the stock market by using Bayesian networks. In the quantitative context, we will also show how BayesiaLab can reliably carry out inference with multiple pieces of uncertain and even conflicting evidence. The inherent ability of Bayesian networks to perform computations under uncertainty makes them highly suitable for a wide range of real-world applications.

Continuing the practice established in our previous white papers, we attempt to present the proposed approach in the style of a tutorial, so that each step can be immediately replicated (and scrutinized) by any reader equipped with the BayesiaLab software. This reflects our desire to establish a high degree of transparency regarding all proposed methods and to minimize the risk of Bayesian networks being perceived as a black-box technology.