Data Analysis and Visualization Using R: Lesson 1

David Robinson1/27/14

How to Read These Slides

In these slides, we show blocks of R code, which are immediately followed by their output:

print("hello world")

[1] "hello world"

The gray box shows the original R code, which you can copy and paste into your own R console to try yourself. The white box shows the code's output: you can compare it to your own results (or just trust us that that's the output).

Numeric variables

Assigning a variable

You store a value in a variable using the = operator:

x = 42

This gives the variable a a value of 42. You can show the value of a with:

print(x)

[1] 42

You can also assign a variable with <-: this is equivalent.

x <- 42

Variable names

Variable names consist of letters, digits, periods and underscores (_), and cannot start with a digit. Convention is to use periods as spaces.

Legal variable names include:

my.variable

my_variable

Illegal names include:

my-variable

dave's.variable

2ndvariable

Using R like a scientific calculator

You can perform mathematical operations using +, -, *, and /:

x = 6 + 4
print(x)

[1] 10

x / 2

[1] 5

y = 4
x / y

[1] 2.5

Using R like a scientific calculator

You can use exponentiation with ^, or calculate the natural log:

x^2

[1] 100

y^3

[1] 64

log(x)

[1] 2.303

Assigning variables: FAQ

What is the difference between <- and =?

In 99% of cases, they act exactly the same, so it's personal preference. See here to see a description of the rare cases where they differ.

When do you need print(x) to display a variable, and when x?

When working in the R interactive terminal, the result of each line are displayed after being evaluated- print is unnecessary. When you source a .R file, you need print(x) in the line or it won't display.

Assigning variables: FAQ

Why is there a [1] before each result?

You'll find out in the next section!

Vectors

You may have noticed the [1] at the start of each result. That's because all numbers in R are actually represented as vectors of length 1. The [1] is there to indicate rows of results.

Vectors

For example, you can use : to create a long vector of consecutive integers:

data.table

data.table is a third-party package that improves in many ways on the built-in data.frame.

We'll go over some of its advantages on Wednesday and Friday, but will focus on one- how it makes filtering more convenient- today.

Turn a data.frame into a data.table

Since data.table is a third-party package, you need to install it first. Once it is installed, you still have to load it into R:

library("data.table")

(You'll have to re-do that line each time you reopen R). Then convert your data.frame to a data.table:

mtcars.dt = as.data.table(mtcars)

Filtering a data.table

A data.table looks identical in many ways to a data.frame, but has some useful features. One is that when you're filtering, you don't need to say mtcars$ each time when you're in the brackets- you can just refer to the column names: