Sunday, 14 February 2016

LINEAR REGRESSION ROUNDUP

Hello, welcome to my blog. In my previous posts, I have been
talking about linear regression which is a technique used to find the
relationship between one or more explanatory variables (also called independent
variable) and a response variable (also called dependent variable) using a
straight line. Furthermore, I said that when we have more than one explanatory
variable it is called multiple linear
regression. Finally, I also implemented both types of regression using Python.

As a roundup I will just mention some precautions that should
be taken when applying linear regression. Here are some tips to remember:

Make sure the relationship between both variables is linear. This is very important because
using linear regression to explain the association between variables that do
not have a linear relationship is just plain wrong. This is because (as you can
remember) linear regression tries to use a straight line to fit the data. If
the relationship between both variables is better explained by a curved line,
linear regression will produce horrible results. This can be taken care of
using polynomial regression (which I will talk about in another post).

Another
very important tip to remember – the fact that you see an association between
two variables from your regression analysis does not mean a change in the
independent variable causes a change in the dependent variable. This may seem
counter-intuitive but allow me to clarify. For example, the consumption of
ice-cream (pints per person) and the number of murders in New York are
positively correlated. That is, consumption of ice cream increases with the
number of murders in New York. Strange but true! This is because regression
will show a positive association between variables that are generally
increasing independent of each other. So a key thing a take away is that correlation does not imply causation. Using
our ice-cream example, we can say ice-cream sold in New York is positively
correlated with murders in New York but it would be wrong to say an increase in
ice-cream consumption causes more murders in New York (or vice versa).

Also beware of confounding variables – variables that are
associated with both the explanatory and response variable. Let me give another
example, research has shown (in America) that the number of cars a family has
is positively associated with the SAT scores for children from that family. Now
while this may seem true, it leaves out a very important variable – family income. This is because children
from higher income families tend to do better in the SAT than children from
lower income families. This is because they can afford things like textbooks,
after-school tutors and generally resources that will help them pass and get
higher scores in the SAT compared to children from lower income families. Also,
the more a family earns, the more cars they can afford to buy. So you see that in
this example, family income is a confounding variable because it is associated
with both our explanatory and response variables. When carrying out regression
analysis (especially multiple regression) make sure confounding variables are
well accounted for before you jump to conclusions.

Finally,
another important thing to remember – Do not extrapolate beyond your data;
always keep your interpretation of your model in context. Another example, in
the previous post, we generated a regression model to predict the prices of
houses in King County. Imagine if we tried to use that model to predict the
price of houses in Abuja. This would be a big mistake because the model was not
generated from data gotten from the sale of houses in Abuja and thus our model
would perform very poorly.

Hope this tips help you in to make sensible interpretations
using your linear regression model. The tips given above is by no means
exhaustive. If you want to read more on this I recommend Chapter 12 of this
book – Naked Statistics by Charles Whelan.

This is the end of the post. Hope you liked it. If you didn’t
leave a comment suggesting how I can make it better. If you need any extra
clarification on what I have just said leave a comment. I will be happy to
answer any questions you have. Have a wonderful week ahead. Cheers!!!