Beginner's guide to R: Introduction

R is hot. Whether measured by more than 10,000 add-on packages, the 95,000+ members of LinkedIn's R group or the Â more thanÂ 400 R Meetup groups currently in existence, there can be little doubt that interest in the R statistics language, especially for data analysis, is soaring.

Why R? It's free, open source, powerful and highly extensible. "You have a lot of prepackaged stuff that's already available, so you're standing on the shoulders of giants," Google's chief economist told The New York Times back in 2009.

Because it's a programmable environment that uses command-line scripting, you can store a series of complex data-analysis steps in R. That lets you re-use your analysis work on similar data more easily than if you were using a point-and-click interface, notes Hadley Wickham, author of several popular R packages and chief scientist with RStudio.

That also makes it easier for others to validate research results and check your work for errors -- an issue that cropped up in the news recently after an Excel coding error was among several flaws found in an influential economics analysis report known as Reinhart/Rogoff.

The error itself wasn't a surprise, blogs Christopher Gandrud, who earned a doctorate in quantitative research methodology from the London School of Economics. "Despite our best efforts we always will" make errors, he notes. "The problem is that we often use tools and practices that make it difficult to find and correct our mistakes."

Sure, you can easily examine complex formulas on a spreadsheet. But it's not nearly as easy to run multiple data sets through spreadsheet formulas to check results as it is to put several data sets through a script, he explains.

Indeed, the mantra of "Make sure your work is reproducible!" is a common theme among R enthusiasts.

Who uses R?

Relatively high-profile users of R include:

Facebook: Used by some within the company for tasks such as analyzing user behavior.

Google: There are more than 500 R users at Google, according to David Smith at Revolution Analytics, doing tasks such as making online advertising more effective.

National Weather Service: Flood forecasts.

Orbitz: Statistical analysis to suggest best hotels to promote to its users.

Why not R? Well, R can appear daunting at first. That's often because R syntax is different from that of many other languages, not necessarily because it's any more difficult than others.

"I have written software professionally in perhaps a dozen programming languages, and the hardest language for me to learn has been R," writes consultant John D. Cook in a Web post about R programming for those coming from other languages. "The language is actually fairly simple, but it is unconventional."

And so, this guide. Our aim here isn't R mastery, but giving you a path to start using R for basic data work: Extracting key statistics out of a data set, exploring a data set with basic graphics and reshaping data to make it easier to analyze.

Copyright 2018 IDG Communications. ABN 14 001 592 650. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of IDG Communications is prohibited.