Preface

This is a tutorial (previously known as “Some hints for the R beginner”) for beginning to learn the R programming language. It is a tree of pages — move through the pages in whatever way best suits your style of learning.

You are probably impatient to learn R — most people are. That’s fine. But note that trying to skim past the basics that are presented here will almost surely take longer in the end.

This page has several sections, they can be put into the four categories: General, Objects, Actions, Help.

Help

Introduction

The primary purpose of this tutorial is — in the first few days of your contact with R — to help you become as comfortable with R as possible.

I asked R users what their biggest stumbling blocks were in learning R. A common answer that surprised me was:

The biggest stumbling block was thinking that R is hard.

On reflection perhaps I shouldn’t have been so surprised by that answer. The vastness of the functionality of R can be quite intimidating (even to those of us who have been around it for years), but doing a single task in R is a logical and often simple process.

What happens at R startup

R is mainly used as an interactive program — you give R a command and it responds to that command. The result may influence the next command that you give R.

Between the time you start R and it gives you the first prompt, any number of things might happen (depending on your installation). But the thing that always happens is that some number of “packages” are “attached” to the “search list”. (The quotation marks indicate words that are used in a technical sense — that is, the words in quotes are part of the R jargon.)

You can see what those packages are in your case with the command:

> search()

(You don’t type the “> ” — that is the R prompt, but you do hit the return key at the end of the line.)

The first item on the search list is the “global environment”. This is your work space where the objects that you create during the R session will be.

You quit R with the command:

> q()

R will ask you if you want to save or delete the global environment when you quit. (At that point it is all or nothing — see Saving objects for how to save just some of the objects.)

If you do save the global environment, then you can start another R session with those objects in the global environment at the start of the new session. You are saving the objects in the global environment, you are not saving the session. In particular, you are not saving the search list.

Key objects

An important strength of R is that it is very rich in the types of objects that it supports. That strength is rather a disadvantage when you are first learning R.

But to start, you only need to get your head around a few types of objects.

basic objects

Here are three important basic objects:

“atomic vector”

“list”

NULL

atomic vector

There are three varieties of atomic vector that you are likely to encounter:

“numeric”

“logical”

“character”

The thing to remember about atomic vectors is that all of the elements in them are only of one type. There can not be an atomic vector that has both numbers and character strings, for instance.

list

Lists can have different types of items in different components. A component of a list is allowed to be another list as well as an atomic vector (and other things).

NULL

The final object in the list above is NULL. This is an object that has zero length. Virtually all of the other objects that you deal with will have length greater than zero.

derived objects

There are three important types of what might be called derived — or non-basic — objects.

matrix

data frame

factor

matrix and data frame

Matrices and data frames are both rectangular data objects. The difference between them is that everything in a matrix has to be of the same atomic type, but data frames can have different types in different columns. Each column of a data frame has to be of a single type.

A matrix can look exactly like a data frame, but they are implemented entirely differently.

Sometimes it doesn’t matter whether you have a matrix or a data frame. Other times it is very important to know which you have.

Key actions

Three basic actions in R are assignment, subscripting and random generation.

assignment

The action in R is precipitated by function calls. Most functions return a value (that is, some data object). You will often want to assign that result to a name. There are two ways of doing that. You can do:

meanx <- mean(x)

or

meanx = mean(x)

Once you have executed one of those commands, then meanx will be an object in your global environment.

There is a shocking amount of controversy over which form of assignment to use. The position I’ll take here is to say to use whichever one you are more comfortable with. There are ways of running into trouble with either one, but using the arrow surrounded by spaces is probably the safest approach by a slight margin.

Note that R is case-sensitive. The two names meanx and Meanx are different.

subscripting

Subscripting is important. This is the act of extracting pieces from objects. Subscripting is done with square brackets:

x[1]

extracts the first element from x.

The command:

x[1, 3]

extracts the element in the first row and third column of a matrix or data frame.

Subscripting also includes replacing pieces of an object. The command:

graphics

Reading data into R

Transferring data from one place to another is always fraught with danger. Expecting it to always be smooth is just setting yourself up for disappointment. But sometimes getting data into R does go smoothly.

If you are trying to get rectangular data (something that looks like a matrix or a data frame) into R, then the read.table function or one of its relatives will be what you want to use. This function returns a data frame. Note: a data frame, not a matrix.

Errors and such

Hint: the universe doesn’t collapse into a singularity just because of an error in R. Actually, it builds character — see Make mistakes on purpose.

R produces errors and warnings. Both errors and warnings write a message — the difference is that errors halt the execution of the command but warnings do not.

We’ll categorize errors into three types: syntax errors, object-not-found errors, and all the rest.

syntax errors

If you get a syntax error, then you’ve entered a command that R can’t understand. Generally the error message is pretty good about pointing to the approximate point in the command where the error is.

Common syntax mistakes are missing commas, unmatched parentheses, and the wrong type of closing brace [for example, an opening square bracket but a closing parenthesis).

object not found

Errors of the object-not-found variety can have one of several causes:

the name is not spelled correctly, or the capitalization is wrong

the package or file containing the object is not on the search list

something else (let your imagination run wild)

other errors

There are endless other ways of getting an error. Hence some detective work is generally necessary — think of it as a crossword puzzle that needs solving.

It should become a reflex reaction to type:

> traceback()

whenever you get an error.

The results might not mean much to you at the moment, but they will at some point. The traceback tells you what functions were in effect at the time of the error. This can give you a hint of what is going wrong.

warnings

A warning is not as serious as an error in that the command runs to completion. But that can mean that ignoring a warning can be very, very serious if it is suggesting to you that the answer you got was bogus.

It is good policy to understand warning messages to see if they indicate a real problem or not.

Graphics

In order to have a picture, you need a canvas for it to be on. In R such a canvas is called a “graphics device”. If you are just making graphs interactively, you don’t need to worry about graphics devices — R will start a default device for you. If you want to save graphs to share, then you will need to decide on a graphics device.

The main function for creating a graph is plot. Often a command like:

> plot(x)

will work. It might not be the picture that you most want to see, but often it does something at least semi-sensible.

A plot doesn’t need to be created all in one command — you can add to plots. For instance:

> abline(0, 1)

adds a line of slope 1 and intercept 0 to the current plot (but, depending on the plot, it might not be visible).

Vectorization

R is a vector language. An object is unlikely to be just one number or character string or logical value. More likely there will be multiple values in the object — sometimes dozens, sometimes millions.

Vectorization is when an operation treats the object as a whole rather than treating each value separately. For example:

> x + 2

adds 2 to each value in x. It doesn’t matter if there is one value in x or two thousand.

How to read a help file

The first point about help files is that they are not novels. You shouldn’t feel compelled to read them from start to finish.

Focusing on the examples to start may be a good strategy. (Though this has the obvious weakness that it depends on there being good examples in the help file.)

It may not be wise to expect yourself to understand everything before you use the function. Try it out, see if it looks like it will be useful to you, only then should you invest a lot of time understanding the details.