Getting Started with the Google Prediction API

The Google Prediction API is a Google labs project that can aid you in many types of predictive analysis and content recommendation. The nature of the software makes is difficult to explain exactly what it is and how it works. In fact, Google itself does not even supply concrete definitions, only examples of its uses.

In a nutshell, you supply Google with a file full of historical data points that influence a single “answer” result. Google then applies machine learning techniques to predict a likely future outcome.

The first thing (and very important) is to figure out the “answer” you need to be returned from the prediction API. The answer will be the first data field in your training model and can be thought of as the prediction. There are two types of answers, categorical (text based) and regression (number based). The answer to your scenario might be a radio program name. You should spend a good deal of time considering all possible options for your answer since you can only have 1. A good answer will be able to tell you as much as possible about a prediction.

Once you define your answer you need to create your training model. The training model should be known scenarios, including a known answer and all data points related to that decision. Again, take care when desiging this model as the prediction is only as good as the data you train it on. Garbage in Garbage out. Another important consideration to note is that you should only use data points in the model that you will have access to for the prediction (excluding the answer).

For example you have a website that allows users to listen to radio programs online. A user listened to an NPR program first, then 5 other. NPR is your known answer and the other programs are the related data points:

“NPR”,”Program1″,”Program2″,”Program3″,”Program4″,”Program5″

You would then create another csv line for every known match up you currently have recorded (typically this is an automated process).

Once you have a file full of these, you can train your model on that file. Once your model is trained, you can ask it to predict. A prediction request is basically the same except you leave off the first data point since that is what you’re asking to be predicted. If a new user comes to your site and listens to Program1 – Program5 (or close to it) and you submit a prediction request with the 5 items they have listened to, the system would return “NPR” which you could then recommend to the user.

The technical requirements for actually accomplishing this are fairly complicated. I’ll design a tutorial for creating a working prediction example if I get any requests for one.

If you’ve got an hour to burn, check out the Google I/O discussion on the prediction API: