Applications of R at Google

At a talk I saw at the useR!2012 conference last month, Googler Karl Millar estimated that there are at least 200 active R users at Google, plus another 300+ occasional users participating in Google’s internal R support list. But what are all these Google employees doing with R? A post from the Google Research team published on Google+ yesterday sheds some light:

At a talk I saw at the useR!2012 conference last month, Googler Karl Millar estimated that there are at least 200 active R users at Google, plus another 300+ occasional users participating in Google’s internal R support list. But what are all these Google employees doing with R? A post from the Google Research team published on Google+ yesterday sheds some light:

At Google we use Statistics every day to improve products, optimize infrastructure, and understand users. We’ve built a number of engineering systems to process and store massive amounts of data. These systems often use thousands of computers in parallel to process and manipulate the data. For many of our statisticians and data analysts, however, such systems provide only the first step of an interactive data analysis workflow that also involves filtering, classifying, modeling, visualizing, and forecasting quantitative data across all aspects of our business.

R is the main Statistics language at Google, according to Karl Millar. Here are some of the specific applications of R at Google mentioned in the post:

The same framework is used to study the effectiveness of search advertising at Google, to reveal that search ads drive an additional 89% of web traffic (compared to organic search results alone).

Google uses R for large-scale, computationally intensive forecasting in R (as presented in a talk at the R/Finance 2012 conference)

Google uses an integration of R and FlumeJava to do very large-scale structured data analysis. (At his presentation at useR!2012 Karl Millar said such analyses are at the terabyte-scale today, and will be at the petabyte scale within two years.) This allows Googlers to do large-scale statistical analysis with code that “reads like R, and scales like Map-Reduce”, and runs at 90% of the speed of hand-coding in JavaMR directly. (Karl will be talking about Scaling R to Internet Scale Data at the JSM 2012 conference.)