Generation of normals

There are additional arguments to control the mean and standard deviation.

Two types of uniform

You can have a distribution that has all numbers in some range to be equally likely — a continuous uniform. Alternatively you can have a distribution that is equally likely for some finite set of objects, such as a range of integers — a discrete uniform.

Continuous uniform

You can generate 100 numbers that are continuously uniform between 0 and 1 with:

> xcontu <- runif(100)

You will get different numbers in xcontu if you do the command again.

There are additional arguments to change the range.

Discrete uniform

Use the sample function to generate uniformly from some set of integers (or other types of objects). For example:

> xdiscu <- sample(1:100, 4, replace=TRUE)

selects 4 numbers between 1 and 100, inclusive, with replacement.

You will get different numbers in xdiscu if you do the command again.

You can get a random color from among the named colors with the command:

> sample(colors(), 1)

The prob argument to sample allows you to give different probabilities to the elements of the vector that is being selected from. Thus sample will perform non-uniform sampling as well.

Random permutations

The sample function also does random permutations. In fact, that is its default behavior:

> xpermute <- sample(x)
> sample(1:9)
[1] 1 9 3 8 5 2 6 4 7

You will get a different order in xpermute if you do the command again. (We are assuming here that x is a vector with more than one element.)

Seed setting

In all of the commands above, you get different answers as you repeat them. That is pretty much the point of them. However, it can be useful to know that you will get the same answers again even though you are generating random numbers. You can do that by setting the random seed.

In R there is an object called .Random.seed that controls random generation. Once you have generated something random, there will be a .Random.seed object in your global environment. (It doesn’t show up in ls() because the name starts with a dot — you can see such objects by saying: ls(all=TRUE).)

Calls to random functions change the value of .Random.seed. That is, these calls not only return a value, they also have the side effect of changing .Random.seed.

But if the random seed is the same at the start of a call, then the results will be the same. There are two ways of setting the seed: you can save the seed and then assign it, or you can use set.seed

The preferred method is to use set.seed. You can just give a number as the first argument:

Probability distributions

R has functions for a number of probability distributions. In general, there are four functions for each distribution as shown in Table 1.

Table 1

Function name

Description

rxxx

random generation

dxxx

density function

pxxx

cumulative probability function

qxxx

quantile function

For example rnorm is the random generation function for the normal distribution. dnorm is the density for the normal. pnorm is the cumulative probability function for the normal — that is, this gives the probability of being less than or equal to a given quantile. qnorm is the quantile function — the inverse of the probability function (that is, it returns a quantile given a probability).

Table 2 shows a few of the distributions that are available in R.

Table 2

Distribution

Functions

Uniform

runif dunif punif qunif

Normal

rnorm dnorm pnorm qnorm

Student’s t

rt dt pt qt

F

rf df pf qf

Exponential

rexp dexp pexp qexp

Log normal

rlnorm dlnorm plnorm qlnorm

Beta

rbeta dbeta pbeta qbeta

Binomial

rbinom dbinom pbinom qbinom

Poisson

rpois dpois ppois qpois

You can see a more complete list with the command:

> ??distribution

The ecdf function takes a data vector as an argument and returns a function that is the cumulative probability function of the data.

Pseudorandomness

In a certain sense most of what is said on this page is a lie. When you use a function like rnorm or sample, you are not generating randomness at all. These are pseudorandom functions. Technically you are generating chaos when you use them, not randomness. There are two main reasons to use pseudorandomness rather than randomness.

The first is convenience. In the early days of computing there was no way to actually get true random values, so they had to invent pseudorandom methods. Now there is the possibility of using truly random values, but it is generally harder to do and seldom offers an advantage.

The second reason to prefer pseudorandomness is reproducibility. Random numbers (by definition) are not reproducible. A program without reproducible results is a program that can not be debugged.

It is largely accidental that we have pseudorandom functions and not truly random functions. It’s a happy accident.