Pages

Machine Learning with Python - Linear Regression

Thursday, October 27, 2011

Hi all,

I decided to start a new series of posts now focusing on general machine learning with several snippets for anyone to use with real problems or real datasets. Since I am studying machine learning again with a great course online offered this semester by Stanford University, one of the best ways to review the content learned is to write some notes about what I learned. The best part is that it will include examples with Python, Numpy and Scipy. I expect you enjoy all those posts!

Linear Regression

In this post I will implement the linear regression and get to see it work on data. Linear Regression is the oldest and most widely used predictive model in the field of machine learning. The goal is to minimize the sum of the squared errros to fit a straight line to a set of data points. (You can find further information at Wikipedia).

The linear regression model fits a linear function to a set of data points. The form of the function is:

Y = β0 + β1*X1 + β2*X2 + … + βn*Xn

Where Y is the target variable, and X1, X2, ... Xn are the predictor variables and β1, β2, … βn are the coefficients that multiply the predictor variables. β0 is constant.

For example, suppose you are the CEO of a big company of shoes franchise and are considering different cities for opening a new store. The chain already has stores in various cities and you have data for profits and populations from the cities. You would like to use this data to help you select which city to expand next. You could use linear regression for evaluating the parameters of a function that predicts profits for the new store.

The final function would be:

Y = -3.63029144 + 1.16636235 * X1

There are two main approaches for linear regression: with one variable and with multiple variables. Let's see both!

Linear regression with one variable

Considering our last example, we have a file that contains the dataset of our linear regression problem. The first column is the population of the city and the second column is the profit of having a store in that city. A negative value for profit indicates a loss.

Before starting, it is useful to understand the data by visualizing it. We will use the scatter plot to visualize the data, since it has only two properties to plot (profit and population). Many other problems in real life are multi-dimensional and can't be plotted on 2-d plot.

If you run this code above (you must have the Matplotlib package installed in order to present the plots), you will see the scatter plot of the data as shown at Figure 1.

Now you must fit the linear regression parameters to our dataset using gradient descent. The objective of linear regression is to minimize the cost function:

where the hypothesis H0 is given by the linear model:

The parameters of your model are the θ values. These are the values you will adjust to minimize cost J(θ). One way to do it is to use the batch gradient descent algorithm. In batch gradient, each iteration performs the update:

With each step of gradient descent, your parameters θ, come close to the optimal values that will achieve the lowest cost J(θ).

For our initial inputs we start with our initial fitting parameters θ, our data and add another dimmension to our data to accommodate the θo intercept term. As also our learning rate alpha to 0.01.

As you perform gradient descent to learn minimize the cost function J(θ), it is helpful to monitor the convergence by computing the cost. The function cost is show below:

A good way to verify that gradient descent is working correctly is to look at the value of J(θ) and check that it is decreasing with each step. It should converge to a steady valeu by the end of the algorithm.

Your final values for θ will be used to make predictions on profits in areas of 35.000 and 70.000 people. For that we will use some matrix algebra functions with the packages Scipy and Numpy, powerful Python packages for scientific computing.

Our final values as shown below:

Y = -3.63029144 + 1.16636235 * X1

Now you can use this function to predict your profits! If you use this function with our data we will come with plot:

Another interesting plot is the contour plots, it will give you how J(θ) varies with changes in θo and θ1. The cost function J(θ) is bowl-shaped and has a global mininum as you can see in the figure below.

This minimum is the optimal point for θoand θi, and each step of gradient descent moves closer to this point.

Ok, but when you have multiple variables ? How do we work with them using linear regression ? That comes the linear regression with multiple variables. Let's see an example:

Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.

Our training set of housing prices in Recife, Pernambuco, Brazil are formed by three columns (three variables). The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

But before going directly to the linear regression it is important to analyze our data. By looking at the values, note that house sizes are about 1000 times the number of bedrooms. When features differ by orders of magnitude, it is important to perfom a feature scaling that can make gradient descent converge much more quickly.

The basic steps are:

Subtract the mean value of each feature from the dataset.

After subtracting the mean, additionally scale (divide) the feature values by their respective “standard deviations.”

The standard deviation is a way of measuring how much variation there is in the range of values of a particular feature (most data points will lie within ±2 standard deviations of the mean); this is an alternative to taking the range of values (max-min).

Now that you have your data scaled, you can implement the gradient descent and the cost function.

Previously, you implemented gradient descent on a univariate regression problem. The only difference now is that there is one more feature in the matrix X. The hypothesis function and the batch gradient descent update rule remain unchanged.

In the multivariate case, the cost function can also be written in the following vectorized form:

J(θ)=12m(Xθ−y)T(Xθ−y)

After running our code, it will come with following function:

215810.61679138, 61446.18781361, 20070.13313796

The gradient descent will run until convergence to find the final values of θ. Next, we will this value of θ to predict the price of a house with 1650 square feet and 3 bedrooms.

θ:=θ−α1mxT(xθT−y)

Predicted price of a 1650 sq-ft, 3 br house: 183865.197988

If you plot the convergence plot of the gradient descent you may see that convergence will decrease as the number of iterations grows.

The code for linear regression with multi variables is available here.

Extra Notes

The Scipy package comes with several tools for helping you in this task, even with a module that has a linear regression implemented for you to use!

The module is scipy.stats.linregress and implements several other techniques for updating the theta parameters. Check more about it here.

Conclusions

The goal of regression is to determine the values of the ß parameters that minimize the sum of the squared residual values (difference betwen predicted and the observed) for the set of observations. Since linear regression is restricted to fiting linear (straight line/plane) functions to data, it's not adequate to real-world data as more general techniques such as neural networks which can model non-linear functions. But linear regression has some interesting advantages:

Linear regression is the most widely used method, and it is well understood.

Training a linear regression model is usually much faster than methods such as neural networks.

Linear regression models are simple and require minimum memory to implement, so they work well on embedded controllers that have limited memory space.

By examining the magnitude and sign of the regression coefficients (β) you can infer how predictor variables affect the target outcome.

It's is one of the simplest algorithms and available in several packages, even Microsoft Excel!

I hope you enjoyed this simple post, and in the next one I will explore another field of machine learning with Python! You can download the code at this link.

I'm pretty sure gradient descent isn't actually linear regression, its a more general solver thats actually more advanced and used with non-linear data. Linear regression will fit only the simplest models but its FAST. Gradient descent is far slower.

How can I use that code in mongodb ? Sry I am quite new, I have a MongoDB Database with a collection of "documents". How can I run this code against my collection. The Objects are only filled with 2 attributes and the attributes are numeric. I want to run a linear regression over these :) Thanks

Welcome to Wiztech Automation - Embedded System Training in Chennai. We have knowledgeable Team for Embedded Courses handling and we also are after Job Placements offer provide once your Successful Completion of Course. We are Providing on Microcontrollers such as 8051, PIC, AVR, ARM7, ARM9, ARM11 and RTOS. Free Accommodation, Individual Focus, Best Lab facilities, 100% Practical Training and Job opportunities.

Wiztech Automation is the Leading Best IEEE Final year project Centre in Chennai and the final year students are provided complete guidance and support in their final year projects. The IEEE projects in Chennai that Wiztech Automation offers guidance and support for include complete range of system domains – such as PLC projects, embedded projects, VLSI projects, software projects, IT projects, Civil projects. Students looking for specific projects pertaining to departments of ECE, EEE, E&I, Mechanical, Mechatronics, bio-medical, IT, Computer, Civil projects in B.E, M.E, B.Tech, M.Tech, B.SC., and M.Sc Electronics, could also get turnkey solutions at Wiztech Automation Solutions to turn out successful project outcomes and models. Since the students at Wiztech Automation gain thorough theoretical and practical knowledge and skills as they pursue their final year projects and develop 2015 and 2016 Latest IEEE Projects portraying them well.

This can be a single case in which your Uk teacher had been right. But if your article is actually riddled together with punctuation blunders in addition to grammatical mishaps, you'll almost instantly get rid of reliability.nurse personal statement

Hi admin thanks for sharing informative article on hadoop technology. In coming years, hadoop and big data handling is going to be future of computing world. This field offer huge career prospects for talented professionals. Thus, taking Hadoop & Spark Training in Hyderabad will help you to enter big data hadoop & spark technology.

Paris airport transfer - Parisairportransfer is very common in Paris that provides facilities to both the businessmen and the tourists. We provide airport transfers from London to any airport in London and also cruise transfer services at very affordable price to our valuable clients.

Java Training Institute in Noida - Croma Campus imparts the most effective JAVA Training in Noida which is based on the principle write once and run anywhere which means that the code which runs on one platform does not need to be complied again to run on the other.

Informatica training institutes in noida - Croma campus offers best Informatica Training in noida with most experienced professionals. Our Instructors are working in Informatica and joint technologies for more years in MNC’s. We aware of industry needs and we are offering Informatica Training in noida.

Informatica training institutes in noida - Croma campus offers best Informatica Training in noida with most experienced professionals. Our Instructors are working in Informatica and joint technologies for more years in MNC’s. We aware of industry needs and we are offering Informatica Training in noida.

Analogica data is a one of the Best Big Data Services Provider Company in India, provide acumens on operations, products and customers. We also support predictive analysis,Big Data Services, master data management, and real time dashboards.

Analogica Data We are a Big Data Analytics, Processing and Solutions company based in India. Automation testingBig data analysis today is ubiquitous, but with 100+ man years of technical experience, we stand amongst the Top Big Data Analytics Services and Solution in India and US

Just found your post by searching on the Google, I am Impressed and Learned Lot of new thing from your post. I am new to blogging and always try to learn new skill as I believe that blogging is the full time job for learning new things day by day."Emergers Technologies"

I have been surfing the internet for more than two hours looking for Page by Page Reviewing services and I have not come across such a wonderful and interesting blog. It has good content and a unique design. I will be visiting it occasionally to read both new and old articles.

Great post! I am actually getting ready to across this information, It's very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well. Python Training in Chennai

This is extremely great information for these blog!! And Very good work. It is very interesting to learn from to easy understood. Thank you for giving information. Please let us know and more information get post to link.Analytics Training in Chennai

Thank you for taking the time to provide us with your valuable information. We strive to provide our candidates with excellent care and we take your comments to heart.As always, we appreciate your confidence and trust in usMatlab Training in chennai

Automation engineering is all about selecting, integrating, configuring and troubleshooting of various readymade products in different engineering branches which makes the machine run automatically. Autonetics helps to reduce gap between industry and yourself, helps you according to current market trend and industry need. Autonetics provide you portal to meet your professional characteristic and make you industry ready professional. Autonetics offers certification course in PLC Training programs for B.E. and Diploma graduating under and working profession. For a better career and higher post opportunities join Autonetics Training Center.To know more visit: http://autoneticstraining.com/Contact: +91 7721988881 / 7721988882 0253 6615509

I have found this post to be very helpful, it has the kind of information that i would like to see more often. You have a way of getting the attraction of the readers. Its my wish that you will keep on posting. Translating a Novel Written in Kiswahili into English isnt always a walk in the park, at times its recommendable to seek professional help.

DIAC - We are Training industries in the field of industrial automation, industrial maintenance and industrial energy conservation. This opportunity for Fresher/Experienced ENGINEERS in terms of CORE Training And Placements. Call 9310096831.

Search in this blog

Join the Brazilian Python Conference PythonBrasil 2013

Marcel Caraciolo

I am a brazilian data scientist, entrepreneur, python hacker and technology consultant. Nowadays I work with data-centric applications, specially in machine learning, recommender systems and bioinformatics. I am also interested in distributed computing, high performance and data visualization, educational and bioinformatics ventures.

Until 2013 I was the co-founder of two companies Atepassar.com, a social network for students in Brazil and co-founder of PyCursos, a on-line startup for python training and on-line courses. In 2014, I assumed a new position at Genomika Diagnósticos, a brazilian genetics tests laboratory, as CTO.