Automobile Price Prediction_Boosted Decision Tree - Bhanu Surnilla

Predicted Variable: Price
Train Data/Score Date: 70/30
**Data Manipulations performed:**
- List item
Used median values (as the data is not normally distributed) for missing data for following variables: bore, stroke, horsepower, peak-rpm
Reason: If we drop the records, we might not get accurate predictions for few of the 'makes'. The data might be insufficient for these makes and overall we will get less coefficient values
- List item
Used custom value as 'four' for the following variable: num-of-doors
Reason: This is a categorized variable and thus I used the most repeated value. If we dropped the records, there will be less data for a couple of makes that might result in incorrect prediction or low coefficient values.
**Liner Regression observations:**
Coefficient of Determination: 0.82
Tested with 3 different records and successfully got accurate values.
**Boosted Regression Tree observations:**
Trees generated: 100
Coefficient of Determination: 0.72
Tested with 3 different records and successfully got accurate values.
Recommendation: As there are many dependent/classified variables in the data, it would be difficult to analyse if decision tree algorithms are used. Using linear regress algorithm might give more accurate predictions as there are more dependent variable and less correlated variables.