Merging data frames

In R, there is often the need to merge two
data.frame objects (say one with individual samples and the other with population coordinates. The
merge() function is a pretty awesome though it may take a little getting used to.
Here are some things to remember:

You need to have two
data.frame objects to merge

The first one in the function call will be the one merged on-to the second one is added to the first.

Each will need a column to use as an index—it is a column that will be used to match rows of data. If they are the same column names then the function will do it automagically, if no common names are found in the
names() of either
data.frame objects, you can specify the columns using the optional
by.x= and
by.y= function arguments.

Here is an example. I’m going to load in some data from the
popgraph library. First, I’ll load up the library and hten grab the population meta data for the lophocereus data set we analyzed in Dyer & Nason (2004).

1

2

3

4

5

6

7

8

9

10

11

require(popgraph)

data(baja)

summary(baja)

Region Population Latitude Longitude

Baja:16BaC:1Min.:22.93Min.:-114.7

Sonora:13Cabo:11stQu.:24.451stQu.:-112.6

CP:1Median:27.95Median:-111.8

Ctv:1Mean:27.33Mean:-111.8

ELR:13rdQu.:29.593rdQu.:-110.7

IC:1Max.:31.95Max.:-109.5

(Other):23

The graph itself has nodes indicated as populations and perhaps we are interested in plotting node size as a function of spatial location. We can grab the names and sizes from the
popgraph object (it is a kind of
igraph ) by:

1

2

3

4

5

6

7

8

9

10

11

data(lopho)

df.nodes<-data.frame(Population=V(lopho)$name,Size=V(lopho)$size)

summary(df.nodes)

Population Size

BaC:1Min.:2.500

CP:11stQu.:5.291

Ctv:1Median:6.860

LaV:1Mean:7.770

LF:13rdQu.:10.925

Lig:1Max.:16.001

(Other):15

Now we have
baja and
df.nodes as two data.frames and can merge them by their common column Population. If we merge
df.nodesonto
baja then we get the new
data.frame: