Pivot Tables in R with aggregate

I noted that a previous post about Pivot Tables in R with melt and cast has gained quite some popularity, so I thought to write on the same topic but using this time a different R function: aggregate. While aggregate is not as powerful as melt and cast from the reshape package, it may result sometimes simpler and more intuitive to use, so it is worth to learn how it works.

The steps to obtain, prepare and read into R the data we need to work with aggregate are exactly the same as in the article mentioned previously, however I will include them below for your convenience.

1. Download the sample data file

Use the link at the bottom of this article and save the file somewhere on your hard drive. So far so good!

2. Open the file in Excel and save a CSV copy of the first sheet

Double-click the file from the location where you saved it to open it inside Excel. The Excel worksheet has 3 tabs. We will be using the first one, containing the source data, and the last one, which contains a ready-made Pivot Table to play with. If you do not know how to make Pivot Tables in Excel, I recommend you read the Microsoft tutorial. In this article we will focus on how to obtain the same results with R and we will use the ready-made pivot table on sheet 3.

Once you are on the first sheet of the sample data set (the one with the source data), save a copy of it as CSV (Comma Separated Values) file. This will generate a file which is easier to import into R. R has packages which allow you to read directly Excel files without converting them to CSV, but for the sake of simplicity we will stick to CSV.

Before saving as CSV, change the format of column E (Order Amount) to General so that the orders amounts appear as numbers with 2 decimals and not with a $ sign in front and comma separated thousands. If you export to CSV without doing this first, there will be extra steps required in R to convert the sales amounts to numeric (see the note at the end of the article).

3. Load the CSV file in R

This is an easy step accomplished with the read.csv function as follows.

Load the data file into R

R

1

data<-read.csv("file_name.txt",header=TRUE)

file_name.txt is the name of the CSV file you created from Excel. If it is not in the current working directory in R, you can either change R’s working directory with setwd or move the file to the current working directory or add a path to the file before the file name.

At this point you should have the same base data open in Excel and loaded into R. In R the first lines will look like this:

First lines of the loaded data file

R

1

2

3

4

5

6

7

8

>head(data)

Country Salesperson Order.Date OrderID Order.Amount

1UK Suyama37812102491863.40

2USA Peacock37813102523597.90

3USA Peacock37814102501552.60

4UK Dodsworth37817102552490.50

5USA Leverling3781710251654.06

6UK Buchanan3781810248440.00

Now we are ready to get to work with aggregate.

4. How aggregate works

As mentioned above, aggregate is a much simpler function than melt and cast. While it can have multiple syntaxes, the one we will work with, which applies well to data frames, is the following:

aggregate syntax

R

1

aggregate(formula,data,FUN,...,subset,na.action=na.omit)

The most relevant argument to understand is formula. A formula as applicable to aggregate is something like this:

A formula in aggregate

R

1

what_to_aggregate~aggregate_by

In the case of our data and following the example in the previous post, we want to aggregate the Order.Amount by Salesperson, therefore the formula we need is simply:

Aggregate formula for our example

1

Order.Amount~Salesperson

Beside the formula, we need to specify the data argument, which will be equal to our data frame data (I know, sorry for haven chosen a data frame name that is the same of an argument name! I hope you can bear with me anyway…), and the FUN argument, which is the function we want to use for the aggregation. In the case of our example, we want to aggregate by summing up all sales order for each sales person, therefore FUN=sum is what we need.

The extra arguments subset allows us to subset the data before aggregating them. Say, for example, you want to make a pivot only for the sales persons in USA, you can use subset=Country==”USA”. Remember that the assignment operator = is different than the comparison operator ==. If it is confusing to you how to read the subset argument, you can also write it like this: subset=(Country==”USA”).

Finally the argument na.action allows us to specify what to do with missing (N/A) values. The default is to omit them from the aggregation.

Ok, enough with the explanations, now we are ready to make our first pivot table with aggregate. Here is the command:

First pivot table with aggregate

1

data.a<-aggregate(Order.Amount~Salesperson,data=data,FUN=sum)

And this is the result (which, as expected, is identical to that from melt and cast):

Output of the aggregate command

R

1

2

3

4

5

6

7

8

9

10

11

>data.a

Salesperson Order.Amount

1Buchanan68792.25

2Callahan123032.67

3Davolio182500.09

4Dodsworth75048.04

5Fuller162503.78

6King116962.99

7Leverling201196.27

8Peacock225763.68

9Suyama72527.63

Congratulations on your first pivot table with aggregate! 🙂

5. Adding Grand Totals

Sorry to delude you, but this is where aggregate falls short of melt and cast. In fact there is no option to add Grand Totals directly through aggregate. Yes, there are workarounds available which I may cover them in a future post, but no direct methods.

6. How to filter

We briefly touched on filtering while describing the subset argument above. Just add a subsetting criteria to the call to aggregate to pre-filter the data on which to aggregate. For example, wanting to aggregate only the sales order for the sales persons based in USA the new syntax is:

Congratulations! You have just learned another easy way to make Pivot Tables in R using aggregate. 🙂

You can stop here or, if you feel brave enough and want to move to the next level, read the next optional paragraph.

7. Aggregating more columns and with more criteria at once

You may wonder about the syntax of the formula inside aggregate in case you want (or need) to aggregate more columns at once (the left side of the formula) according to multiple criteria (the right side of the formula). Well, here is the syntax you can use for 3 columns and 3 criteria (by):

Aggregation of multiple columns with multiple criteria

R

1

cbind(column_1,column_2,column_3)~criteria_1+criteria_2+criteria_3

The aggregation function specified by FUN will be applied to each column specified on the left side of the formula after the data have been partitioned (grouped) according to the criteria specified on the right side of the formula.

To make a simpler example that fits with our data (which only have one numerical column), let’s say we want to aggregate on the Order Amount primarily by Order Data and secondarily by Country. We can easily issue to R a command like this:

Aggregation on two criteria

R

1

data.a<-aggregate(Order.Amount~Order.Date+Country,data=data,FUN=sum)

This will produce a Pivot Table where the Order Amount for each Order Date and Country combination has been aggregated using the sum function.

Hi Tom,
thanks for informing me!
Luckily enough I could find on my hard-drive a copy of the original file. I posted it at the bottom of the first post of the series (Pivot Tables in R with melt and cast). Feel free to grab it from there.
Best Regards,
Marco.