Concepts 10: Linear Regression

Linear regression is a statistical tool for projecting future outcomes based on previous historical data. A good example of this would be using the average wager to win/loss data from patrons to project how much a casino would be making for a particular period.

Here’s the formula:

Outcome = (Variable Multiplier x Variable) + Constant

Or

Y = bX + a

This means that in order to find out the projected win/loss for a certain period, we would have to use a combination of a Constant Factor in addition to a Variable that is affected by a multiplier.

We can find Y and X by getting the means of both data (the mean of the win/loss and average wager).

Now, if we square r2 = -0.99863 x -0.99863 = 0.997253 or 99.7253% – this is the probability that Y is caused by X. This means that the probability that a higher average wager equates to a higher loss is 99.7253%!

This means that our linear model could deviate positively or negatively by 0.1 x σ, as probability goes according to the central limit theorem.

T-Value = b/standard error

= -3.3 / 0.1 = -33

We now compare -33 against the T table with our degrees of freedom being 5-2 = 3.

We see that our reading of -33 is way off the charts! That means it is ALMOST CERTAIN that average wager correlates to a loss.

Multivariate Regression

How then do we calculate a linear regression model with 2 variables? How would we calculate the win/loss of patrons by their average wagers AND time spent at the table? Here’s our data again, but with an additional factor, TIME.

Win/Loss

Average Wager

Time

-15

5

15

-30

10

30

-45

15

35

-65

20

45

-80

25

60

Now, our formula is a bit different.

Y = bX1 + bX2 + a

Y, X1 and X2 are still calculated the same way, by finding the means of the values.

a can be found once we have the b values for X1 and X2.

So, what about b, then? There are now 2 b values, each being a multiplier of X1 and X2. Let’s call them b1 and b2

b1 = r1 x (standard deviation of Y/standard deviation of X1)

b2 = r2 x (standard deviation of Y/standard deviation of X2)

Now that more than 1 variable is being used, r is a lot different.

r1 = (r(X1Y) – (r(X2Y) x r(X1 X2)) / 1- r(X1 X2)2

r2 = (r(X2Y) – (r(X1Y) x r(X1 X2)) / 1- r(X1 X2)2

I know, like WTF? Yeah, I thought so too at first. But remember that we KNOW how to get r for 2 different sets of values: