################################################################
#
#
# INTRO TO R
#
# The R website:
#
# http://cran.r-project.org
#
# (Google "R" -> one of the first entries)
#
# Downloading R:
#
# -> Sidebar "Download, Packages": CRAN
# -> any site in the US
# -> Windows
# -> base
# -> "Download R-2.... for Windows (32 bit build)"
# -> installation dialog in setup wizzard
#
# The setup program should self-install and create an icon on your desktop.
# Clicking the icon should open up an R interpreter window ("R Gui").
#
# The base is really just the base. There are many contributed
# library packages whose binaries can be downloaded from
#
# -> Packages
#
# You will not have to download them explicitly, though;
# there are R functions that allow you to get them while running R.
# In the R Gui you can also go through the "Packages" item in the toolbar.
#
################
#
# OPERATION OF R:
#
# * For non-Emacs users:
#
# 0) go to the class web page and download this file:
# www-stat.wharton.upenn.edu/~buja/STAT-541/Stat-541-R-intro.R
# Open an editor (Word, Wordpad,...) on this file (change .R to .txt)
# AND open up an R GUI window by clicking on the R icon.
# Reduce the size of both windows so both are accessible.
#
# 1) Copy R code from this file into the R interpreter window.
# Use shortcuts: In the editor highlight lines, hit -C,
# then move to the R window and hit -V.
# Examples:
1+2
1:10
2^(1:20)
runif(10)
rnorm(10)
1:10 + rnorm(10)
#
# 2) Experiment with R code
# by editing THIS file in the editor window, or
# by editing the command line in the R window (if it's one-liners).
#
# Commands for line editing in the R interpreter window:
# Note: "^p" means you hold down the modifier key and hit "p",
# just like the modifier key used for capitalization
# ^p get back the previous command line for editing and executing
# repeating ^p goes further back in the command history
# ^b step back one character in the command line
# ^f step forward one character in the command line
# ^a move to the beginning of the line
# ^e move to the end of the line
# ^d or to delete the character under the cursor
# ^h or to delete the previous character
# ^k kill the rest of the line starting from the cursor
# otherwise: you are in insert mode
# (These editing commands are borrowed from the Emacs editor.)
#
#
# * For Emacs users:
#
# download the ESS macros ("Emacs Speaks Statistics") from r-project.org:
# -> R GUIs -> Emacs (ESS) -> Downloads
# Download the latest zip or tar.gz file.
# Unpack and install; ESS should work right away. Skip to "Operation:" and try.
# If it doesn't work right away,
# you may have to put these lines in your .emacs file:
## (setq inferior-R-program-name "c:/Program Files/R/R-2.7.1pat/bin/Rterm.exe")
# ^^^^ path to your R executable ^^^^
## (load-file "c:/EMACS/ESS/ess-5.3.0/lisp/ess-site.el")
# ^^^path to the file "ess-site.el"^^^
#
# Operation:
# - Split the Emacs window into two windows:
# ^x 2
# - Edit THIS file in the upper window.
# ^x ^f filepath
# - Start R in the lower window:
# ^x o (move the cursor to the lower window)
# -x R (start R inside Emacs)
# - If you like to shrink one of the windows, put this line in your .emacs:
# (global-set-key "\M-o" (lambda () (interactive) (shrink-window 1)))
# Then -o will shrink the present window and expand
# the other window by one line.
# - There are macros to copy and execute lines, functions, and regions
# from the upper buffer into the lower buffer:
# ^c ^j (execute current line and leave the cursor in place)
# ^c ^n (execute current line and move to next line of R code)
# ^c ^f (execute function, assuming the cursor is
# inside its body)
# ^c ^r (execute region)
# - A small nuisance is that the lower (R) window does not move
# the bottom line to the center after executing an expression.
# This can be fixed by putting the following in your .emacs:
# (global-set-key "\M-s" "\C-xo\M->\C-l\C-xo")
# Then -s will move the bottom line to the center.
#
#
##################
#
#
# * R is an INTERPRETED LANGUAGE:
# Users type expressions and see results immediately.
# Example:
for(i in 1:10) { if(i%%2==0) print(i) }
# As opposed to:
# - ... languages (C, Fortran)
# - ... software (such as SAS' JMP)
#
#
# * R is HIGH-LEVEL:
# It operates on complex data structures such as
# vectors, matrices, arrays, lists, dataframes,
# as opposed to C and Fortran that operate on individual numbers only.
# (This requires some getting used to for C programmers.)
#
#
# * PRIMARY BEHAVIOR: Whatever is typed, print the results.
2
print(2) # same
"a"
print("a") # same
# (Q: Why is there '[1]' preceding the results? A: ...)
# Vector of length greater than 1:
1:3
print(1:3) # same
#
#
# * SYNTAX:
# - Largely scientific/math notation; base 10.
# - A wealth of functions.
# - Comments run from a "#" to the end of the line; no multiline comments.
# - Spaces are irrelevant, except inside strings:
2+3; 2 + 3; "ab"; "a b"
# - Statements can run over multiple lines:
2 + 3 + # \
4 # / One statement
# But if a statement is syntactically complete at
# the end of the line, it won't continue:
2 + 3 # \
+ 4 # / Two statements
# - Statements can be separated by ";".
2; 3^3; sqrt(9)
#
#---
#
# * BASIC DATA TYPES:
#
#
# - NUMERIC: double precision by default (How many bytes?)
# Integers are represented as doubles, although the print function
# shows them as integer:
-2.000
1E5
2E-3
# The usual unary and binary operations and analytic functions:
# +, -, *, /, %%, %/%, ^, log, sqrt, sin, acos...
2+3 # Add.
5.3*1E10 # Multiply.
10%%3 # Modulo.
exp(1) # Exponentiation.
log(2.718282) # Log of the number 'e'; 'log' is e-based.
log10(10) # 10-based log
pi # That's the number 'greek pi', 3.14159
sin(pi/2) # Angles are to be given in arcs, not degrees.
sin(pi) # Dito.
acos(0) # This is the inverse of cos, arccos, hence pi/2.
pi/2 # This is the only hardwired constant: 3.14159...
#
#
# - STRINGS: can be single or double quoted, but the print function
# uses double quotes.
'a'; "a"; 'abc'; "abc"
# (In C and Python strings are character vectors.
# In R strings are basic types; there is no single character type.
# Characters are just strings of length 1.
# There is no indexed access to individual characters and
# substrings in R; one uses the "substring" function instead:
substring("abcxyz",4,6)
# Other basic string manipulations:
paste("my","word")
nchar("Supercalifragilistikexpialidocious")
# There are two hardwired character vectors that contain the lower and
# upper case letters:
letters
LETTERS
#
#
# - LOGICAL values: have two names each, but the print function
# always uses the longer.
TRUE; FALSE; T; F
# They are implemented as the values 1 and 0 for T and F, respectively.
# They are the result of the usual comparisons: , <=, >=, ==, !=
1<2; 1>2; "ab" <= "abcd"
"ab" > "ac"; "ab" != "AB"
"ab" != 2; 0==F; 1==T
#
#
# - MISSING values NA, Inf, -Inf:
NA; NaN; Inf; -Inf; 1/0; Inf==1/0; 0/0
# Watch out: the following does not give T!!!
NA==1
# If you want to test for NA, you must use the function is.na():
is.na(NA)
#
#
# - FUNCTIONS:
# * R is a FUNCTIONAL LANGUAGE:
# Functions return values that in turn can be arguments to functions.
# Expressions evaluate inside out, e.g., log(2*2.5))^3:
2.5; 2*2.5; log(2*2.5); log(2*2.5)^3
#
#
# * STATEMENTS/EXPRESSIONS:
# There are two types of expressions: assignments and side effects.
# 1) Assignments allocate data structures and
# make variables point to them.
x x
# assign("x",...)
# Examples:
x x # This can be used, too, if you must...
y 2; a[a>2] # !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
b 2); a[b] # dito
# Caution: If the index vector is not of equal length,
# it will be cyclically repeated:
a[F] # c(F,F,F) 'F' is repeated 3 times
a[T] # c(T,T,T)
a[c(T,F)] # c(T,F,T)
a[c(T,T,F,F,F)] # If too long, the index vector is truncated.
(1:12)[c(T,T,F)] # Leave out every third item.
# (The above scheme can be used to create arbitrary repeat patterns.)
# - Vectors can be indexed repeatedly:
a[c(1,3)][2] # Select item two of a[c(1,3)], i.e. item 3 of 'a'.
# (Looks like a matrix element in C, but isn't!!)
(a[c(1,3)])[2] # This is what the previous expression really means.
# Think of a[c(1,2)] as the result of some selection function.
a[c(1,3)][c(1,2,1,2)]
# - Vector indexing and subsetting can be used for assignment:
a[1:2] 5)
(1:10)[(1:10) > 5]
#
# - Random numbers:
x COLUMN MAJOR ORDER, as in Fortran, but unlike in C.
# Reformatting as a matrix is achieved by giving the vector
# a dimension attribute consisting of the numbers of rows and cols.
#
# - Reformatting vectors as matrices by filling successive cols or rows:
matrix(1:12, ncol=4) # Column major (default)
matrix(1:12, nrow=3) # Same; ncol is inferred
matrix(1:12, ncol=4, byrow=T) # Row major, forced with "byrow".
matrix(1:12, nrow=3, byrow=T) # Same
matrix(0:1, nrow=2, ncol=4) # What happened?
matrix(0, nrow=2, ncol=4) # "
matrix(letters, ncol=2) # Elements are now of type 'character'.
matrix(paste("Letter",letters), ncol=2)
# When reading data in text files, 'byrow=T' is needed
# for row-by-row input (download 'laser.dat' from the course page first):
m warning
cbind(1:3,matrix(11:14,2)) # clipping: the second arg dictates nrow
# Don't rely on cyclic extension except for the simplest cases
# such as repeating constants.
# - Coercion of matrices to vectors:
# A matrix can always be treated as a vector.
# The following does not create an error message:
m col(x)
x[row(x)>col(x)]
#
################
#
# * ARRAYS: the generalization of matrices to more than 2 indexes
#
a Matrices cannot accommodate variables of both types...
# Solution: Data frames. They are similar to matrices,
# but columns may differ in basic data types.
# (The entries have to be basic, not complex.)
#
# Main use of dataframes: data tables with mixed-type variables
#
# Dataframes are printed like matrices, but they are internally
# implemented as lists.
#
# - The function "data.frame()" can bundle conforming vectors,
# matrices, other dataframes into a single dataframe:
myframe 15) { print("m is greater than 10") }
# With "else" clause:
if(length(m) > 15) {
print("m > 10")
} else {
print("m <= 10")
}
# This sort of thing is most useful inside loops; see below.
# - The vectorized "ifelse()" function:
# Not a flow control construct, but often replaces a combination of
# for-loop and if-else statements.
ifelse(c(T,F,T), c(1,2,3), c(-1,-2,-3))
# The function runs down the three arguments in parallel,
# checks each element in the first argument,
# if true, picks the corresponding element in the second argument,
# if false, picks the corresponding element in the third argument,
# returns a vector/matrix/array of the size of the first argument.
# If the second or third argument are not conforming,
# they are cyclically repeated, as in this implementation of
# 10 Bernoulli trials:
ifelse(runif(10) > 0.5, "H", "T")
# - Note the difference between 'if' and 'ifelse()':
# * 'if' is a syntactic element that dispatches execution
# depending on the outcome of a single logical outcome.
# * 'ifelse()' is a function that takes a logical vector
# and selects from two other vectors depending on the logicals.
# - for-loop: runs over the elements of a vector.
for(i in c(10,100,1000)) { j 0.9) { print(c("Found one:",rnum)); break } }
# - while-loop:
str 1] File -> Change Dir
# - permanently: R-click on R icon -> Edit Properties -> Start in:
#
################################################################