Predicting Stock Prices With Linear Regression

Linear regression is widely used throughout Finance in a plethora of applications. In previous tutorials, we calculated a companies’ beta compared to a relative index using the ordinary least squares (OLS) method. Now, we will use linear regression in order to estimate stock prices.

Linear regression is a method used to model a relationship between a dependent variable (y), and an independent variable (x). With simple linear regression, there will only be one independent variable x. There can be many independent variables which would fall under the category of multiple linear regression. In this circumstance, we only have one independent variable which is the date. The date will be represented by an integer starting at 1 for the first date going up to the length of the vector of dates which can vary depending on the time series data. Our dependent variable, of course, will be the price of a stock. In order to understand linear regression, you must understand a fairly elementary equation you probably learned early on in school.

y = a + bx

Where:

Y = the predicted value or dependent variable

b = the slope of the line

x = the coefficient or independent variable

a = the y-intercept

Essentially, this will constitute our line of best fit on the data. A multitude of lines are drawn through the dataset in the OLS process. The goal of the process is to find the best-fitting line that minimizes the sum of squared errors (SSE) with the actual value of a stock price (y) and our predicted stock price over all the points in our dataset. This is represented by the figure below. For each line drawn, there is a difference between each point in the dataset with it’s corresponding predicted value outputted by the model. Each one of these differences is added up and squared to produce the sum of squares. From the list, we take the minimum which leads to our line of best fit. Consider the diagram below: