Predicting Housing Prices in Ames: A Machine Learning Project

Introduction

In this project we use machine learning techniques to attempt to predict housing prices in Ames, Iowa. The data comes from the Kaggle competition "House Prices: Advanced Regression Techniques". The team consists of Gregory Brucchieri, Billy Fallon and Adrian Phillips-Samuels. We are The Fighting Mongooses.

The Data

The data contained a training set with 1460 observations of 79 features and the target variable Sale Price. The features were a mix of 28 continuous variables and 51 categorical. 34 features contained missing values. We used a variety of techniques to impute these values, usually drawing from the variables and their description, which can be seen in the final code. We treated categorical variables as ordinal whenever possible. We observed 2 outliers with an unusual price/square footage ratio and chose robust scaling to account for these values. A number of additional features were engineered and categoricals were replaced through one hot encoding.

The Models

A number of models were used to explain and predict the sales price, including Random Forest, Gradient Boosting, XGBoost and linear modeling. In the end a weighted ensemble of Random Forest, Gradient Boosting and XGBoost provided our best model. We recieved a Kagle Score of .1232.

All code and results can be seen here, in our github repo. The final prediction code is in the Project_Consolidated.py file.

About Authors

Gregory Brucchieri

Gregory has a Master of Arts in Economics from NYU. He is a former business analyst with Humana, Inc, where he maintained provider relations and contract databases for smaller, local networks Humana had paired with. He is driven...