Machine Learning Throwdown: The Reckoning

As you, our faithful readers know, we compared some machine learning services several months ago in our machine learning throwdown. In another recent blog post, we talked about the power of ensembles, and how your BigML models can be made into an even more powerful classifier when many of them are learned over samples of the data. With this in mind, we decided to re-run the performance tests from the fourth throwdown post using BigML ensembles as well as single BigML models.

You can see the results in an updated version of the throwdown details file. As you’ll be able to see, the ensemble of classifiers (Bagged BigML Classification/Regression Trees) almost always outperform their solo counterparts. In addition, if we update our “medal count” table tracking the competition among our three machine learning services, we see that the BigML ensembles now lead in the number of “wins” over all datasets:

Contender

Gold

Silver

Bronze

Total

BigML (with Bagging)

12

11

5

28

Google Prediction API

10

13

5

28

Prior Knowledge (acquired by Salesforce)

6

4

11

21

Are we saying this just to raise our self-esteem by bringing down others? Yes, absolutely. In your face, Google Prediction API! On an admittedly limited sample of datasets in a wide variety of domains, ensemble of trees tend to outperform all other off-the-shelf classifiers. Oh, you don’t believe us? Well then why don’t you go and ask SCIENCE?

As we’ve said before, the differences between classifiers are fairly small, and performance alone probably shouldn’t drive your decision to use one service or another. Perhaps more important than its performance is the fact the BigML gives you the best of both worlds: A fully-white-boxed, downloadable model that is easy to interpret, and even beautiful, with the power to kick things into high gear for maximum performance.