Kaggle and R

Following up on last week’s post on doing a Kaggle competition, I then decided to see if I could explore the data more in R on my local desktop. The competition is about analyzing a large group of house claims to give them a risk score.

So there is clearly a diminishing return going on. As of this writing, the leader is at 40%, which is about 20,400 of the 51,000 entries. So if you could identify all of the ones correctly, you should get 37% of the way there. To test it out, I submitted to Kaggle only ones:

LOL, so they must take away for incorrect answers as it is same as “all 0” benchmark. So going back, I know that if I can predict the ones correctly and make a reasonable guess at the rest, I might be OK. I went back and tuned my model some to get me out of the bottom 25% and then let it be. I assume that there is something obvious/industry standard that I am missing because there are so many people between my position and the top 25%.