R for beginners: How to transition from Excel to R

Data Editor

Switching from Excel to R for data analysis can seem daunting. Over time, the open-source statistical programming language has consistently grown in popularity among those who work with numbers, with thousands of user-created libraries to expand on its power.

Though it was first created primarily to make it easier to create statistical models and output very basic visuals to explore data, it’s expanded to the point that people can use R to do a multitude of advanced processes such as scrape websites, communicate with APIs, and publish beautiful interactive charts and maps.

All with just a few lines of code.

The practice of “Reproducible Research” has been spreading outside the world of academia to other areas, like non-profits and journalism. It’s the idea that analyses should be published with the original data, as well as the methodology or software code so that others can verify or build on them.

Others can reproduce your practice, but the big draw is that for other projects, so can you.

In Excel, a user might run a formula, do some sorting, create a couple of pivot tables— but the work you did on that data cannot be replicated quickly on another spreadsheet with a similar data structure. The whole process has to be repeated step by step.

The point of doing data analysis in R is that a user can write a script to slice up and analyze a spreadsheet, then it can be saved, and then brought back to be used on another spreadsheet with just a few tweaks in the code.

Difficulty level

Easy.

This tutorial is based on a presentation I gave at the Boston University Storytelling with Data Workshop. It’s meant to ease seasoned Excel users into the world of R. With GIFs.

Setting up R

Navigating RStudio

We will be working out of RStudio, which is a powerful shell that runs on top of R.

Data structures and variables in R

Get acquainted with the different types of data structures within R.

Spreadsheets in Excel are usually recognized as Matrices and Dataframes. When working in R, it’s important to understand how they’re made up by individual vectors and lists.

Start by creating some arrays and assigning them to variables using the arrow.
The c() command stands for combine.

Type in sports in the console and this is what you get: A dataframe made up of different types of arrays (String, Number, Boolean).

Opening a file

Note: You can type out or copy the code in this tutorial into RStudio’s console (bottom left window) line by line but the point is to reproduce this in the future. That means you should put all your code in a script in the top left window. File > New File > R Script

First, you have to set the directory to where your files (like csvs) are.

Or, you can set it via the RStudio menu up top: Session > Set Working Directory > Choose Directory

Looking up number of observations

Focus on first five rows

Checking structure of the data

If you look closely at this column in Excel, you see the $ symbol disappears when you click into it. The raw data has no $ sign but only puts it there for cosmetic purposes. In R, it doesn’t intuitively know that a $ means currency. That will cause errors when you try to run math on it because R won’t let you perform calculations on strings. Here’s how to check and see what kind of variables are in your dataframe.

Focusing on Columns or Rows or Cells

Sometimes you want to look at a single cell, column, or row. To do so, add brackets to the dataframe in the console, like earnings[1,3].

Think “Data RoCks.” As in, Dataframe[Rownumber,Columnnumber]

Most of the time, though, it’s difficult to reference columns by number — especially when there’s many, many columns. Here’s an alternative: dataframe$NameOfColumn

Changing format of a column

Sort a data frame based on a column

Creating a formula

Filtering data by column content

Calculations on columns

Data to Columns in Excel

What if you want to split up a name into separate columns? It’s pretty easy to do in Excel.

It’s a bit more complicated in R, but here’s how. The gist of it is: Creating new columns by copying the original name column and deleting everything before or after a comma.

Pivot table for simple count

How many employees are there per department?

Advanced pivot table

What’s the sum of income for all employees in each department? Here’s how to do it in R.

Other options other than “sum” include “median” and “mean.”

Check it and save file as a spreadsheet (CSV)

You could also save as a text file, if you want.

All set for now

Hopefully, this gets you on your way to working with R. If you wanted to run the same analysis on the second spreadsheet included in the files, you’d only have to adjust some of the column names, but you could run the entire thing and it would export you a new CSV in seconds.

Some of the steps in R were more complicated to pull off than in Excel. There are actually some great libraries that are more efficient, but I wanted to show you how to execute the basics before we get into the shortcuts that people have put together in packages.

The next R tutorial will expand on translating advanced Excel techniques to R.

Andrew is a data editor at TrendCT.org and the Connecticut Mirror. He teaches data visualization at Central Connecticut State University as well intro to data journalism at Wesleyan University as a Koeppel Fellow.
He was a founding producer of The Boston Globe's Data Desk where he used a variety of methods to visualize or tell stories with data. Andrew also was an online producer at The Virginian-Pilot and a staff writer at the South Florida Sun-Sentinel. He’s a Metpro Fellow, a Chips Quinn Scholar, and a graduate of the University of Texas.

What do you think?

sir, My name is Amit i have been recently learning Excel, Sql and R but i am not getting any project to implement it….Please guide as how i should start this journey as a data analyst as i am not able to get a proper guidance for this domain.PLEASE HELP SIR….

Limbu Malongo Limbu

Loving it, very helpful!

Kelum Perera

Its so helpful to understand how to use r for excel users, thank you very much.

I have a large data data set in excel where i need to do a calculation as follows. Can i do this calculation in R?

So want to get “1” to cell B2 & i use this fourmular =IF($A2″”,IFERROR(MID($A2,(FIND((B$1&”:”),$A2)+LEN(B$1)+1),IFERROR((FIND((“+”),$A2,(FIND((B$1&”:”),$A2)+LEN(B$1)+1))-(FIND((B$1&”:”),$A2)+LEN(B$1)+1)),LEN($A2))),””),”

More details
In my data set , column A i have similar data to A2, which has the details of products.(each parameter & its answer separated by a “:”, different parameters separated by “+”.
A3 = Code:2+Name:Pen+Size:Standard+Price:10+Vendor:Abooks
A4 = Code:3+Name:Bag+Size:Standard+Price:250+Vendor:TopBags

List of parameters are stated in the Row no 1 from cell no B1 onwards.
C1= Name , D1=Size , E1=Price, D1=Vendor