3. Organizing data

4 points possible

2pts - fix data

2pts - describe problems

The original data file had duplicated column names (length and diameter), relying on another column to differentiate the anatomical element. This is a no, no. Column names must stand alone (e.g. tibia_diameter).

The rows containing species names violate the 1 row = 1 observation priciple. These rows should be turned into a column which records the species for each observation.

4. Manipulation

7 pts possible

1pt gorilla

2pts sex

1pt table

1pt hist

1pt small

1pt big

gorilla <- read.table("http://hompal-stats.wabarr.com/datasets/gorilla_sizes.txt", header=TRUE)
#figure out sex
#make a new column for sex, with 59 NA values as a placeholder
gorilla$sex <- rep(NA, 59)
#replace the NA values with 'Male' or 'Female' with the help of grep()
gorilla$sex[grep("m", gorilla$specimen)] <- "Male"
gorilla$sex[grep("f", gorilla$specimen)] <- "Female"
table(gorilla$sex)