Each year, the field of computer science becomes more sophisticated as new types of technologies hit the market. Despite that, the problem of developing intelligent agents that will precisely simulate human brain activity is still unsolved. One of the most prominent models of intelligent agents built in computer memory is represented by neural networks (NN). Thus in this article, the reader will be introduced to the basics of NN, alongside with the prediction pattern that can be successfully used in different types of "smart" applications. Specifically, a financial predictor based upon neural networks will be explored.

During my intellectual trip into the world of artificial intelligence, I was fascinated how "magically" a correctly constructed artificial neural network (specifically feed-forward network) can predict values, according to those specified at the input. This "forecasting" capability makes them a perfect tool for several types of applications:

Function interpolation and approximation

Prediction of trends in numerical data

Prediction of movements in financial markets

All the examples are actually very similar, because in mathematical terms, you are trying to define a prediction function F(X1, X2, ..., Xn), which according to the input data (vector [X1, X2, ..., Xn]), is going to "guess" (interpolate) the output Y. The most exciting domain of prediction lies in the field of financial market. An investment strategy based on computer intelligence sounds like a very prominent and interesting field of study. Next, I'm going to describe a relatively simple program which will attempt to predict S&P500, DOW, NASDAQ Composite indexes, and Prime Interest Rate, according to the input data which will be described shortly. Before going into details, I would like to warn you that the entire article is written for educational purposes, thus the described application cannot be used in real-world scenario.

The data that will be feed to neural network at the input, represents historical data of the S&P500, DOW, NASDAQ Composite and Prime Interest Rate. In general terms, these are leading indicators of stock market activity, which have a common fluctuation pattern.

The S&P500 is a free-float capitalization-weighted index published since 1957 of the prices of 500 large-cap common stocks actively traded in the United States. The stocks included in the S&P500 are those of large publicly held companies that trade on either of the two largest American stock market companies; the NYSE Euronext and the NASDAQ OMX. Actually, the S&P500 is one of the most widely followed indexes of large-cap American stocks. It is considered a bellwether for the American economy, and is included in the Index of Leading Indicators. S&P500 index fluctuations are dependent upon a lot of factors, thus the entire prediction pattern is very complex. In this application, the input data is represented only by historical items of 4 important economical indicators. It is essential to mention that if you want a better predictor, you should feed your neural network with more indicators that are more or less important for the entire interpolation.

As you can see in Figure 1, the value of the S&P500 has generally increased over time, having a significant decrease in year 2000-2005.

The Dow Jones Industrial Average (DJIA), also referred to as the Industrial Average, the Dow Jones, the Dow 30, or simply the Dow, is a stock market index, and one of several indices created by Wall Street Journal editor and Dow Jones & Company co-founder Charles Dow. It is an index that shows how 30 large, publicly owned companies based in the United States have traded during a standard trading session in the stock market. Along with the NASDAQ Composite, the S&P500 Index, and the Russell 2000 Index, the Dow is among the most closely watched benchmark indices tracking targeted stock market activity. To calculate the DJIA, the sum of the prices of all 30 stocks is divided by a Divisor, the Dow Divisor. The divisor is adjusted in case of stock splits, spinoffs or similar structural changes, to ensure that such events do not in themselves alter the numerical value of the DJIA.

The NASDAQ Composite is a stock market index of the common stocks and similar securities listed on the NASDAQ stock market, meaning that it has over 3,000 components. It is highly followed in the U.S. as an indicator of the performance of stocks of technology companies and growth companies. Since both U.S. and non-U.S. companies are listed on the NASDAQ stock market, the index is not exclusively a U.S. index.

Prime rate, or Prime Lending Rate, is a term applied in many countries to a reference interest rate used by banks. The term originally indicated the rate of interest at which banks lent to favored customers, i.e., those with high credibility, though this is no longer always the case. Some variable interest rates may be expressed as a percentage above or below prime rate. Generally, prime interest rate is a significant determinant in the world of financial marketing. This is because monetary policy is aimed at influencing domestic interest rates, which drive currency rates relative to other currencies with different interest rates. Domestic interest rates also influence overall economic activity, with lower interest rates typically stimulating borrowing, investment, and consumption, while higher interest rates tend to reduce borrowing, and increase saving over consumption. Below is shown Federal Funds Rate History graph. This data will be used in the current application.

Neural networks have been used with computers since the 1950s. Through the years, many different models have been presented. The perceptron is one of the earliest neural networks. It was an attempt to understand human memory, learning and cognitive processes. To construct a computer capable of "human-like thought", the researchers have used the only working model they have available - the human brain. However, the human brain as a whole is far too complex to model. Rather, the individual cells that make up the human brain are studied. Following is introduced the schema of the most used artificial neural network.

For the task of predicting the indexes, we'll be using the so called multilayer feed forward network which is the best choice for this type of application. In a feed forward neural network, neurons are only connected forward. Each layer of the neural network contains connections to the next layer, but there are no connections back. Typically, the network consists of a set of sensory units (source nodes) that constitute the input layer, one or more hidden layers of computation nodes, and an output layer of computation nodes. In its common use, most neural networks will have one hidden layer, and it's very rare for a neural network to have more than two hidden layers. The input signal propagates through the network in a forward direction, on a layer by layer basis. These neural networks are commonly referred as multilayer perceptrons (MLPs). Shown below is a simple MLP with 4 inputs, 1 output, and 1 hidden layer.

The input layer is the conduit through which the external environment presents a pattern to the neural network. Once a pattern is presented to the input layer, the output layer will produce another pattern. In essence, this is all the neural network does - it matches the input pattern to one which best fits the training's output. It is important to remember that the inputs to the neural network are floating point numbers, represented as C# double type (most of the time you'll be limited to this type).

The output layer of the neural network is what actually presents a pattern to the external environment (the result of the computation). The number of output neurons should be directly related to the type of work that the neural network is to perform.

There are really two decisions that must be made regarding the hidden layers: how many hidden layers to actually have in the network and how many neurons will be in each of these layers. Problems that require two hidden layers are rarely encountered. There is currently no theoretical reason to use neural networks with any more than two hidden layers, thus almost all current problems solved by neural networks are fine with just one hidden layer. Even though the hidden layers do not directly interact with the external environment, they have a tremendous influence on the final output, thus you should carefully choose the number of neurons within it. Using too few neurons in the hidden layers will result in so called "under-fitting", which occurs when the hidden layers are not able to adequately detect the signals in a complicated data set. The "over-fitting" problem can occur, when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers. There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, here are just a few of them:

The number of hidden neurons should be between the size of the input layer and the size of the output layer.

The number of hidden neurons should be 2/3 the size of the input layer plus the size of the output layer.

The number of hidden neurons should be less than twice the size of the input layer.

Multilayer perceptrons have been applied successfully to solve some difficult and diverse problems, by training them in a supervised manner with a highly popular algorithm known as the error back-propagation algorithm (described further). Please note that in our application, we will be using the Resilient propagation algorithm, which is very similar to back-propagation. The neural network itself will be composed from neurons (main information-processing units as neurons within a human brain) of the same kind, placed within different layers. They will exhibit the same characteristics; hence if you understand how one neuron is designed you will not have problems in understanding how the entire network works. Generally, the model of a neuron can be summarized in the following block diagram:

One can see that there are 3 basic element of a neuronal model:

A set of synapses or connecting links, each characterized by a weight or strength of its own: X1,X2,...,Xn with corresponding weights: Wk0, Wk1,...,Wkm. As you will see further, the weights represent the "knowledge" that the neural network contains about a specific training data. Their values will directly affect the output of the neural network.

An adder for summing the input signals, weighted by the respective synapses of the neuron: Vk = ∑(WkjXj+bk), where k=[1,r], (r=number of neurons), j=[1,m] (m=number of input synapses). Simply speaking - the input signal X is multiplied by the weight W and summed in the adder with all the other items. The result of this summation V will go to the input of the activation function.

An activation function for limiting the output of a neuron: Yk = Φ(x). The activation function has an important role in the schema of a neuron. It generates the output according to the summed input signals calculated in the adder. Summarized, the output signal of each neuron can be defined as follows: Yk = Φ(∑(WkjXj+bk)). It is important to emphasize that if you want to use Back Propagation learning algorithm for training, then you should take care that your activation function is differentiable. This requirement comes from the fact that since this method requires computation of the Gradient of the error function at each iteration step, we must guarantee the continuity and differentiability of the error function. A commonly used non-linearity that satisfies this requirement is sigmoid non-linearity defined by the logistic function: Φ(v) = 1/(1+exp(-αv)), where a is the slope parameter of the sigmoid function. By varying the parameter a, we obtain sigmoid functions of different slopes, as illustrated in the following figure (3 different a values):

Training is the means by which the weights and threshold values of a neural network are adjusted to give desirable outputs, thus making the network adjust the response to the value which best fits the training data. Propagation Training is a form of supervised training, where the expected output is given to the training algorithm. Propagation training can be a very effective form of training for feed-forward, simple recurrent and other types of neural networks. There are several forms of propagation training. We will analyze 2 of them.

Back Propagation algorithm is by far one of the most commonly used algorithms of learning. It is a supervised learning method, and is a generalization of the delta rule. It requires a teacher that knows, or can calculate, the desired output for any input in the training set.

Generally, it can be summarized in the following main steps:

Present a training sample to the neural network.

Compare the network's output to the desired output from that sample. Calculate the error in each output neuron.

For each neuron, calculate what the output should have been, and a scaling factor, how much lower or higher the output must be adjusted to match the desired output. This is the local error.

Adjust the weights of each neuron to lower the local error.

Assign blame for the local error to neurons at the previous level, giving greater responsibility to neurons connected by stronger weights.

Repeat from step 3 on the neurons at the previous level, using each one's blame as its error.

In the below figure, one can visualize the process within which the neural network is trained to work as XOR logical gate.

Generally, XOR problem is considered the "Hello World" application in this field of science. The purpose is very straightforward: we will make our neural network "smart enough" to solve the XOR problem.

Truth table:

X1

X2

Result

0

0

0

0

1

1

1

0

1

1

1

0

The structure of the neural network is very simple: the input layer consists of 2 elements (XOR gate needs 2 Boolean values as input parameters, thus the input is of size 2). The hidden layer contains 3 neurons and finally the output layer has one, which represents the result of XOR operation. At its initial stage (Iteration 0), the weights between the neurons are assigned random values, thus the network does not contain any valuable information by now. Once, we start using Back Propagation algorithm (Iteration 1 - 59), the weights between the neurons are adjusted in a manner that will decrease the error rate, and will generate the output which we do expect. By Iteration 59 we achieve acceptable error rate, thus training process ends, and we can proudly say that the network contains enough "knowledge" to solve the XOR problem. By visualizing the way the values are changed, you can observe that at the initial iterations they fluctuate dramatically on each step (mathematically speaking the algorithm tries to find the steepest descent for the error function). Once the error value starts decreasing significantly, (Iteration 30-59), the weights of the neural network are adjusted in a more granular fashion. The network was trained with 4 combination of XOR gate. Because of the 2D limitation, the figure itself contains an example of only 1 training set (True - True (encoded as 1)), which ultimately should generate False at the output (encoded as 0). If you are interested in more details related to this algorithm, please consult any available material related to it. I will not discuss the mathematics behind Back Propagation algorithm, because we'll use a framework which already has this algorithm implemented (Encog framework).

One of the problems with the Back Propagation training algorithm is the degree to which the weights are changed. In order to understand better the way error decreases, consider the following error surface:

Our initial point resides within a place where the error value is highest. The goal of any training algorithm is to minimize the error function. In an ideal case, the algorithm will choose the path (from an infinite amount of paths) to the global minimum, thus achieving the best possible adjustment for the weight components. Unfortunately, Back Propagation algorithm doesn't handle well scenarios when the error surface contains local minims. There is a high probability that the path chosen will lead the error decrease in the direction of local minima. Once it will achieve the point where it cannot decrease anymore (getting stuck into the deepening), it will stop looking for new paths (simply speaking it won't be able to "jump" out from the local minima "hole"). In order to use a "smarter" way of searching the global minimum, the Resilient Propagation algorithm has been introduced. As the Back Propagation algorithm can often apply too large of a change to the weight matrix (delta parameter being too big, which may alter significantly the path chosen in the direction of error decrease), the Resilient Propagation training algorithms only use the sign of the gradient and not the value itself (which will allow it to minimize the chance of falling into the local minimum trap). Once the magnitude is discarded, this means it is only important if the gradient is positive, negative or near zero. The Resilient Propagation training (RPROP) algorithm is usually the most efficient training algorithm provided by Encog (framework used in this application) for supervised feed-forward neural networks. One particular advantage to the RPROP algorithm is that it requires no setting of parameters before using it. There are no learning rates, momentum values or update constants that need to be determined. This is good because it can be difficult to determine the exact learning rate that might be optimal.

As it was stated earlier, we are going to feed the neural network with the historical data of the indexes described above. One of the important things about the input data is that it will be formed of 10 consecutive values (sorted by date) of each of the 4 parameters (S&P500, NASDAQ Composite, DOW, Prime Interest Rate), total 40 input values, corresponding to a granularity of 10 business days. The network will try to predict the 11th value, corresponding to the next day in the row, of each of the indexes (4 output data). Speaking mathematically, 10 previous points will be used to interpolate the next coordinate through which the function of NASDAQ Composite, Dow, S&P500 and Prime Interest Rate will pass.

Pairs used in prediction:

#

1

2

...

10

11

NASDAQ

2288.55

2301.66

...

2231.65

?

DOW

12376.72

12319.73

...

12350.61

?

S&P500

1110.88

1112.92

...

1099.5

?

PIR

3.25

3.25

...

3.25

?

One of the important heuristics of making the neural network perform better relates to input normalization. Each input variable should be preprocessed so that its mean value, averaged over the entire training set, is close to zero, or else it is small compared to its standard deviation. The ranges of the indexes vary slightly, as their domain is totally different. In order to normalize each index range to [-1, 1], we are going to use the following simple formula: Index(x) = (Index(x) - Min(Index))/(Max(Index) - Min(Index)). Thus each of the input variables will lie in the same range. Take a look at the regression plot of the training set. Each of the figures corresponds to a specific target from the output array. As all the R parameters are very close to 1, this means that the correlation between the outputs and the targets is very high (regression plot can be performed using Neural network toolbox from MATLAB).

It is hard to underestimate the importance of normalization. If it won't be used, you most probably fail to train your network, because the weights won't be able to adjust accordingly.

4 outputs correspond to each of indexes on the input (S&P500, DOW, NASDAQ Composite, and Prime Interest Rate). The neural network job will be to find hidden patterns in the input data which influences the overall output. After training the network using 40-41-41-4 topology (40 input units, 2 hidden layers with 41 units, 4 outputs), and trying to predict the values, the following results have been obtained:

As one can see, the network is able to interpolate the results in a fairly good manner. The error rate summed over the entire training session decreased to a value of ~0.008. Of course, you cannot consider this data as an input to your investment strategy, since past information does not really indicates future returns (a more granular approach should be developed as the fluctuations are dependent of many other strategical data), but for the academical purpose we can consider this as a good result.

Neural networks have a great capability of finding hidden patterns and trends, if they are provided in the training session with a reasonable amount of input data and desired output. As the number of input parameters increases, the quality of prediction increases as well. Thus for a better indexes predictor, you would like to use more parameters than just the prime interest rate and indexes historical data. Anyway, as the purpose of this article is simplified, we'll feed the neural network just with the parameters specified above. During the application development, Encog framework was used in order to build and train the neural network. Personally, I consider this library to be the best choice for application that uses Java or .NET platform and require AI constructs. It has a lot of already written algorithms, so it can greatly help you in developing applications of this kind.

As you can see, the training procedure runs until the error rate becomes less than the MaxError constant. In any case, one can abort the training session if such a need appears.The network creation method is very straightforward. You specify the number of hidden units and layers at the input and get a new BasicNetwork created. Each layer within the newly created network will have a hyperbolic tangent activation function - ActivationTANH. You can view the CreateNetwork method below:

Constants used in this method can be inferred from the article's description: INPUT_TUPLES = 10 (number of pairs used in prediction, corresponding to 10 business days), INDEXES_TO_CONSIDER = 4, OUTPUT_SIZE = 4, (4 indexes at the input/output).

In this article, the topic of neural networks and their prediction capabilities have been analyzed. Feed forward neural networks proved to be a reliable solution for applications that need to predict something. Generally speaking, function interpolation is one of the major fields of study in stock market environment. A strategy based upon technical indicators, can really help you in achieving good trading results. Of course, the application that is presented in this article cannot be used in a real world environment, because normally you would need not only an almost precise prediction, but also a program that will perform the market analysis in short bursts (each 15-30 seconds), opposite to the values predicted in this application (closing stock value). In order to achieve better results, you would rather want to combine classical trading strategy with one based upon real-time technical indicators. As for the studying purposes, the main objective has been achieved. It is important to mention that Encog framework for neural networks was used while developing the application. In my opinion, it is the best choice one can have while choosing an API for NN. Thanks for reading.

Share

About the Author

Interested in computer science, math, research, and everything that relates to innovation. Fan of agnostic programming, don't mind developing under any platform/framework if it explores interesting topics. In search of a better programming paradigm.

hi,
Its really a very nice implementation and demonstration of usage of Neuro Network.
I tried to use it to actually predict next day open,close,high and low prices.. and finally i found that their is a major flaw here. For every prediction, you always consider last 10 day actual data... ideally it should be the predicted data(if no prediction for that date, then we can take the actual data). As we are taking the actual data, %ge difference between the original and the prediction will always be smaller...

Nice Article and source code, i have a question for u are u still programming, and are u still programming in the financial sector? I was wondering if u could lead me in the right direction, How can i construct a AI to identify what number(s) are currently leading? So in others words there are (0-9) numbers, if i needed to find out what top 80%, or even the top 60% of numbers are currently leading, how would i go about doing this?

I understand the aim of your article is to provide an explanation of neural networks and in this respect you've done a great job. That said, the manner in which you've approached the prediction problem leaves a lot to be desired. Your inputs are non-stationary and the network is way too big in terms of number of neurons. For example, you could generate a random walk and use lagged values to predict future values and you'll get a great fit but that doesn't mean very much. Try calculating log returns for both inputs and target. If you do you'll find the results almost random - i.e. very little evidence of any predictive ability.

Firstly thanks in pointing me in the direction of encog, it has saved me loads of time !
I have searched the encog site for an answer to the question i am about to ask.

The purposes of this application as an analysis tool are cool, but how can you predict the next days figures. I note the csv files go up to a certain date, but how can you predict beyond that. I havent seen a suitable answer to this anywhere on the encog forums, and help would be appreciated

The Financial Predictor is peforming predictions within the date range of the input data. It fails to predict results beyond the date of the last date. What is the best way to modify this code to peform predictions into the near future?

You mention in your article on ANN financial prediction that it is important to normalize the input values to a range [-1, 1] and you have provided a simple formula: Index(x) = (Index(x) - Min(Index))/(Max(Index) - Min(Index)).
This formula appears to produce results in the range [0,1]. Could you please tell me if this is the intended result and if/how you then convert the values to the desired range [-1,1].
Thanks
Steve

Hi Steve, indeed it somehow slipped to me that in the final version of the app, I've used normalization ranging in [0, 1]. Anyway, it doesn't change much of a case (because it's not a big deviation from activation function range). You still might try normalizing to [-1, 1], the according formulas can be found here[^]. The Min/Max formula used in the article, can be easily adapted to the new range: E.g. 1-2*(existing formula), or other adapting methods, that can be generated ad-hock. Also, if you change the normalization procedure, it should be changed in the Predict method of the PredictIndicator class.
Regards

Nice article Ciumac. Have you also considered using Support Vector Machines? My understanding is that they are the next generation of machine learning algorithms and don't have the local minima problems?

Thanks George,
SVM's are better suited for classification and regression analysis, thus they are better suited for pattern recognition. You might adapt them for prediction purpose, but that won't guarantee you that you'll get rid of false-negatives. If you'd like to explore this topic in a greater details, I'd rather suggest you taking a look on Markov chains (SVM's also can be adapted with hidden Markov model). Using their theoretical constructs you can build a predictor that will be much closer to real world scenarios.
Regards

In summary on finance.yahoo.com you can find almost anything related to stock market activity. Keep in mind that the data that is fed to the neural network resides in "Adj Close" column of the spreadsheet data (as in the sample .csv file provided with the code). If this column is missing the app won't be able to train/predict.

Have you tried this with a recursive Neural Network, where an output feeds back into the input? If future financial measures are dependent on past values, you might be able to get improved predictions.