Saturday, March 05, 2016

a Young Data Scientist- Kaggle Competition Top 5% Winner: Yuyu Zhou

Yuyu Zhou is a graduate student in Analytics in University of New Hampshire. His team has achieved the top 3% and 5% in two Kaggle prediction competitions respectively. In an interview, I asked him how their predictive models performed so well. Yuyu said,

"One of the keys to the success is that we spend tremendous amount of time working on building feature variables. Those variables are usually the results of combining several raw variables. For example, the ratio between the body weight and height is a better variable in predicting a patient's health than using body weight or height alone."

"My training in computer science is extremely helpful in these projects. I am able to write Java, Python and SQL scripts to perform tasks such as data cleansing, data merge, and data transform, etc. As we know, more than 80% of time in a project is typically spent on those tasks before we start building predictive models."

"We have tried many type of predictive models and found that gradient boosting trees have consistently perform the best."

The following is a summary of Yuyu's contribution in those two projects.

Developed Statistics models to predict risk level of properties which Liberty Mutual Inc is going to protect.

Led the team and conducted cost and benefit analysis on new ideas.

Implemented ideas using statistical packages from Python.

Prediction accuracy was ranked at 71 out of 2236 teams.

Yuyu is currently looking for a full time job in data analytics. Please feel free to contact him if you are hiring. He can be reached by email yuyu.zhou@hotmail.com or phone (508) 933-7311. Here is his LinkedIn Profile.