Add two columns

Figure 1 shows some numbers in two columns and the start of adding those two columns to each other in a third column.

Figure 1: Adding two columns in a spreadsheet.

The next step is to fill the addition formula down the column.

It is not so different to do the same thing in R. First create two objects that are equivalent to the two columns in the spreadsheet:

A <- c(32.5, -3.8, 15.9, 22.5)
B <- c(48.1, 19.4, 46.8, 14.7)

In those commands you used the c function which combines objects. You have created two vectors. The rules for a vector are:

it can be any length (up to a very large value)

all the elements are of the same type — all numbers, all character strings or all logicals

the order matters (just like it matters which row a number is in within a spreadsheet)

To summarize: they’re in little boxes and they all look just the same.

You have two R vectors holding your numbers. Now just add them together (and assign that value into a third name):

C <- A + B

This addition is precisely what is done in the spreadsheet: the first value in C is the first value in A plus the first value in B, the second value in C is the second value in A plus the second value in B, and so on.

See the values in an object by typing its name:

> C
[1] 80.6 15.6 62.7 37.2

The “> ” is the R prompt, you type only what is after that: ‘C‘ (and the return or enter key).

Also note that R is case-sensitive — C and c are different things:

> c
function (..., recursive = FALSE) .Primitive("c")

(Don’t try to make sense of what this means other than that c is a function.)

Multiply by a constant

One way of multiplying a column by a constant is to multiply the values in the column by the value in a single cell. This is illustrated in Figure 2.

Figure 2: Multiply a column times the value in a single cell, shown before filling down column E.

Another way of doing the same thing is to fill the value in D1 down column D and then multiply the two columns.

Do this operation in R with:

> C * 33
[1] 2659.8 514.8 2069.1 1227.6

In this command you didn’t create a new object to hold the answer.

You can think of R as doing either of the spreadsheet methods, but the fill-down image might be slightly preferable.

Recycling in R

The R recycling rule generalizes the idea of a single value expanding to the length of the vector. It is possible to do operations with vectors of different lengths where both have more than one element:

> 1:6 + c(100, 200)
[1] 101 202 103 204 105 206

Figure 3 illustrates how R got to its answer.

Figure 3: Equivalent of the example of R’s recycling rule.

Column F shows how column G was created: use the ROW function and fill it down the column. That sequence of numbers was created in R with the `:` operator.

Note how the shorter vector is replicated to the length of the longer one. Each value is used in order, and when it reaches the end it goes back to the beginning again.

You are free to think this is weird. However, it is often useful.

Functions

Table 1 translates between spreadsheet and R functions. The spreadsheets consulted were Excel, Works and OpenOffice. Note there is some variation between spreadsheets.

scm is in the schoolmath package. For more than two numbers you can do: Reduce(scm, numVector)

LEFT

substr

LEN

nchar

(Excel, OpenOffice)

LENGTH

nchar

(Works)

LINEST

use lm

LN

log

danger: the default base in R for log is e

LOG

log

danger: the default base in spreadsheets for log is 10

LOG10

log10

LOGINV

qlnorm

LOGNORMDIST

plnorm

LOWER

tolower

MATCH

match or which

match only does exact matches. Given that MATCH demands a sorted set of values for type 1 or -1, then MATCH(x, vec, 1) is sum(x <= vec) and MATCH(x, vec, -1) is sum(x >= vec) when vec is sorted as MATCH assumes.

MAX

max or pmax

max returns one value, pmax returns a vector

MDETERM

det

MEDIAN

median

MID

substr

MIN

min or pmin

min returns one value, pmin returns a vector

MINVERSE

solve

MMULT

%*%

MOD

%%

MODE

the table function does the hard part. A crude approximation to MODE(x) is as.numeric(names(which.max(table(x))))

MUNIT

diag

diag is much more general

N

as.numeric

the correspondence is for logicals, as.numeric is more general

NEGBINOMDIST

dnbinom

NORMDIST, NORMSDIST

pnorm or dnorm

pnorm when cumulative is true, dnorm when false

NORMINV, NORMSINV

qnorm

NOT

!

NOW

date or Sys.time

OR

any

the or operators in R are | and ||

PEARSON

cor

PERCENTILE

quantile

PERCENTRANK

similar to ecdf but the argument is removed from the distribution in PERCENTRANK

PERMUT

function(n,k) {choose(n,k) * factorial(k)}

PERMUTATIONA

PERMUTATIONA(n, k) is n^k

PHI

dnorm

POISSON

ppois or dpois

ppois if cumulative, dpois if not

POWER

^

PROB

you can use the Ecdf function in the Hmisc package (the probabilities in the spreadsheet are the weights in Ecdf), then you can get the difference of that on the two limits

Related Posts

Nice post! I think you touched one of the most negative sides of R: the default recycling to the length of the longer vector (or column). I could guess that many errors stay unnoticed when doing column-wise summary statistics, i.e. the mean will be completely wrong if the original entry was of shorter length. I think, the default should be to fill with NA’s… and I heard many other people feel the same.

Thank you. Your post finally helped me to have an overview of an excel functions…

I do not really like it and I use R. At home… However at work I mostly do VBA and while it is possible to have nice repository for more frequently used functions, some excel functions are handy. Say all distributions used for hypothesis testing. I would rather use excel version that is most likely slow than code it up in vba.

I think just about anything is better than Excel for complex modeling, and R is a very good alternative. However, one main problem with R is that it presents itself in linear, procedural line code. Working through a comprehensive understanding of a model with a client, such as a CFO, is very difficult.

I attempted to address these issues in my introductory tutorial “Business Case Analysis with R” (https://leanpub.com/bizanalysiswithr) by introducing the idea of using an influence diagram in parallel with communicating the flow of R logic in complex business case models.

However, the modeling application Analytica (http://www.lumina.com) actually resolves just about all of the problems with Excel and many of the remaining problems with R, namely that Analytica uses an integrated influence diagram to demonstrate the flow of logic of a model and that Analytica uses a technology called Intelligent Arrays that operates much more intuitively than R’s array system. I wrote a review for INFORMS here (http://www.incitedecisiontech.com/anareviewredirect.html).

Of course, Analytica doesn’t have anywhere near the broad array of libraries and classes that R does, and it is sold commercially (although there is a free version), but as far as communicating logic, auditing logic, and extending logic once a base model has been created, Analytica is definitely worth the time to consider.

After thinking a bit more about my previous comment, I think it’s worth adding that we should always try to use the best tool available for the task/problem at hand. If data analysis is the task at hand, R is probably among the best of tools available. However, if business simulation/systems engineering is the task at hand, and those analyses require recursive time dependencies, multiple parallel threads of logic, multi-dimensional spaces that may need to be extended (both in size and number of dimensions) easily with little to no additional programming, something like Analytica is the best tool. When it comes to the graphical presentation of results, I would recommend marrying both R and Analytica together, as Analytica’s charting environment is still a little primitive (although very useful). The number and scale of quality of useful R charting packages available to produce beautiful and compelling graphics is unparalleled.