Miscellany .............................................................................................................................. 263Chapters Not Covered in this Book ..................................................................................................263

G–test of Independence ______________________________________________________________ 47Examples in Summary and Analysis of Extension Program Evaluation __________________________________ 47Packages used in this chapter __________________________________________________________________ 47When to use it ______________________________________________________________________________ 48G-test example with functions in DescTools and RVAideMemoire __________________________________ 48Post-hoc tests ______________________________________________________________________________ 48Post-hoc pairwise G-tests with RVAideMemoire ________________________________________________ 49Post-hoc pairwise G-tests with pairwise.table __________________________________________________ 49Examples __________________________________________________________________________________ 50G-tests with DescTools and RVAideMemoire ___________________________________________________ 50How to do the test ___________________________________________________________________________ 52G-test of independence with data as a data frame _______________________________________________ 52

Analysis of Covariance _______________________________________________________________ 206How to do the test __________________________________________________________________________ 206Analysis of covariance example with two categories and type II sum of squares ______________________ 206Analysis of covariance example with three categories and type II sum of squares _____________________ 211

Miscellany ____________________________________________________________________ 263Chapters Not Covered in this Book _____________________________________________________ 263

Other Analyses ________________________________________________________________ 264Contrasts in Linear Models ___________________________________________________________ 264Contrasts within linear models __________________________________________ Error! Bookmark not defined.Example for single degree-of-freedom contrasts__________________________________________________ 264Example with lsmeans ____________________________________________________________________ 265Example with multcomp __________________________________________________________________ 266Example for global F-test within a group of treatments ____________________________________________ 268Tests of contrasts with lsmeans _____________________________________________________________ 269Tests of contrasts with multcomp ___________________________________________________________ 271Tests of contrasts within aov _________________________________________________________________ 273

Purpose of This BookThis book is intended to be a supplement for The Handbook of Biological Statistics by John H.McDonald. It provides code for the R statistical language for some of the examples given in theHandbook. It does not describe the uses of, explanations for, or cautions pertaining to theanalyses. For that information, you should consult the Handbook before using the analysespresented here.

The Handbook for Biological StatisticsThis Companion follows the .pdf version of the third edition of the Handbook of BiologicalStatistics.The Handbook provides clear explanations and examples of some the most common statisticaltests used in the analysis of experiments. While the examples are taken from biology, theanalyses are applicable to a variety of fields.The Handbook provides examples primarily with the SAS statistical package, and with onlinecalculators or spreadsheets for some analyses. Since SAS is a commercial package that studentsor researchers may not have access to, this Companion aims to extend the applicability of theHandbook by providing the examples in R, which is a free statistical package.The .pdf version of the third edition is available atwww.biostathandbook.com/HandbookBioStatThird.pdf.Also, the Handbook can be accessed without cost at www.biostathandbook.com/. However, thereader should be aware that the online version may be updated since the third edition of thebook.Or, a printed copy can be purchased from http://www.lulu.com/shop/johnmcdonald/handbook-of-biological-statistics/paperback/product-22063985.html.

About the Author of this CompanionI have tried in this book to give the reader examples that are both as simple as possible, and thatshow some of the options available for the analysis. My goal for most examples is to make thingscomprehensible for the user without extensive R experience. The reader should realize thatthese goals may be partially frustrated either by the peculiarities in the R language or by thecomplexity required for the example.

1

ABOUT R

AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS

I am neither a statistician nor an R programmer, so all advice and code in the book comeswithout guarantee. I’m happy to accept suggestions or corrections. Send correspondence tomangiafico@njaes.rutgers.edu.

About RR is a free, open source, and cross-platform programming language that is well suited forstatistical analyses. This means you can download R to your Windows, Mac OS, or Linuxcomputer for free. It also means that, in theory, you can look at the code behind any of theanalyses it performs to better understand the process, or to modify the code for your ownpurposes.R is being used more and more in educational, academic, and commercial settings. A fewadvantages of working with R as a student, teacher, or researcher include:

R functions return limited output. This helps prevent students from sorting through a lotof output they may not understand, and in essence requires the user to know what outputthey’re asking R to produce.



Since all functions are open source, the user has access to see how pre-defined functionsare written.



There are powerful packages written for specific type of analyses.



There are lots of free resources available online.



It can also be used online without installing software.

For a brief summary of some the advantages of R from the perspective of a graduate student, seehttps://thetarzan.wordpress.com/2011/07/15/why-use-r-a-grad-students-2-cents/.It is also worth mentioning a few drawbacks with using R. New users are likely to find the codedifficult to understand. Also, I think that while there are a plethora of examples for variousanalyses available online, it may be difficult as a beginner to adapt these examples to her owndata. One goal of this book is to help alleviate these difficulties for beginners. I have somefurther thoughts below on avoiding pitfalls in R.

Obtaining RStandard installation

To download and install R, visit cran.r-project.org/. There you will find links for installation onLinux, Mac OS, and Windows operating systems.2

A FEW NOTES TO GET STARTED WITH R

AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS

R StudioI also recommend using R Studio. This software is an environment for R that makes it easier tosee code, output, datasets, plots, and help files together on one screen.www.rstudio.com/products/rstudio/. It is also possible to install R Studio as a portableapplication.

Portable applicationR can be installed as a portable application. This is useful in cases where you don’t want toinstall R on a computer, but wish to run it from a portable drive. Seeportableapps.com/node/32898 or sourceforge.net/projects/rportable/. My portableinstallation of R with a handful of added packages is about 250 MB. The version on R Studio Ihave is about 400 MB. So, 1 GB of space on a usb drive is probably sufficient for the softwarealong with additional installed packages and projects.

R Online: R FiddleIt is also possible to access R online, without needing to install software. One example of this is RFiddle: www.r-fiddle.org/. R Fiddle also works with common add-on packages, though I havehad it refuse to use a couple of less common ones.

A Few Notes to Get Started with RPackages used in this chapterThe following commands will install these packages if they are not already installed:if(!require(dplyr)){install.packages("dplyr")}if(!require(psych)){install.packages("psych")}

A cookbook approachThe examples in this book follow a “cookbook” approach as much as possible. The reader shouldbe able to modify the examples with her own data, and change the options and variable names asneeded. This is more obvious with some examples than others, depending on the complexity ofthe code.

Color coding in this bookThe text in blue in this book is R code that can be copied, pasted, and run in R. The text in red isthe expected result, and should not be run. In most cases I have truncated the results andincluded only the most relevant parts. Comments are in green. It is fine to run comments, butthey have no effect on the results.

Copying and pasting code

3

A FEW NOTES TO GET STARTED WITH R

AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS

From the websiteCopying the R code pieces from the website version of this book should work flawlessly. Codecan be copied from the webpages and pasted into the R console, the R Studio console, the RStudio editor, or a plain text file. All line breaks and formatting spaces should be preserved.The only issue you may encounter is that if you paste code into the R Studio editor, leadingspaces may be added to some lines. This is not usually a problem, but a way to avoid this is topaste the code into a plain text editor, save that file as a .R file, and open it from R Studio.From the pdfCopying the R code from the pdf version of this book may work less perfectly. Formatting spacesand even line breaks may be lost. Different pdf readers may behave differently.It may help to paste the copied code in to a plain text editor to clean it up before pasting into R orsaving it as a .R file. Also, if your pdf reader has a select tool that allows you to select text in arectangle, that works better in some readers.

A sample programThe following is an example of code for R that creates a vector called x and a vector called y,performs a correlation test between x and y, and then plots y vs. x.This code can copied and pasted into the console area of R or R Studio, or into the editor area ofR Studio or R Fiddle and run. You should get the output from the correlation test and thegraphical output of the plot.x = c(1,2,3,4,5,6,7,8,9)y = c(9,7,8,6,7,5,4,3,1)

# create a vector of values and call it x

cor.test(x,y)

# perform correlation test

plot(x,y)

# plot y vs. x

You can run fairly large chunks of code with R, though it is probably better to run smaller pieces,examining the output before proceeding to the next piece.This kind of code can be saved as a file in the editor section of R Studio, or can be storedseparately as a plain text file. By convention files for R code are saved as .R files. These files canbe opened and edited with either a plain text editor or with the R Studio editor.

Assignment operatorsIn my examples I will use an equal sign, =, to assign a value to a variable.height = 127.5

In examples you find elsewhere, you will more likely see a left arrow, operator.height 4

A FEW NOTES TO GET STARTED WITH R

AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS

These are essentially equivalent, but I think the equal sign is more readable for a beginner.

CommentsComments are indicated with a number sign, #. Comments are for human readers, and are notprocessed by R.

Installing and loading packagesSome of the packages used in this book do not come with R automatically, but need to beinstalled as add-on packages. For example, if you wanted to use a function in the psych packageto calculate the geometric mean of x in the sample program above:x = c(1,2,3,4,5,6,7,8,9)

First you would need to the install the package psych:install.packages("psych")

Then load the package:library(psych)

You may then use the functions included in the package:geometric.mean(x)[1] 4.147166

In future sessions, you will need only to load the package; it should still be in the library from theinitial installation.If you see an error like the following, you may have misspelled the name of the package, or thepackage has not been installed.library(psych)Error in library(psych) : there is no package called ‘psych’

Data typesThere are several data types in R. Most commonly, the functions we are using will ask for inputdata to be a vector, a matrix, or a data frame. Data types won’t be discussed extensively here, butthe examples in this book will read the data as the appropriate data type for the selectedanalysis.

Creating data frames from a text string of data

For certain analyses you will want to select a variable from within a data frame. In mostexamples using data frames, I’ll create the data frame from a text string that allows us to arrangethe data in columns and rows, as we normally visualize data.5

A FEW NOTES TO GET STARTED WITH R

AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS

Here, Input is just a text string that will be converted to a data frame with the read.table function.Note that the text for the table is enclosed in simple double quotes and parentheses.read.table is pretty tolerant of extra spaces or blank lines. But if we convert a data frame to amatrix—which we will later—with as.matrix—I’ve had errors from trailing spaces at the ends oflines.Values in the table that will have spaces or special characters can be enclosed in simple singlequotes (e.g. 'Spongebob & Patrick').Input =("SexHeightmale175male176female 162female 165")D1 = read.table(textConnection(Input),header=TRUE)D1Sex Height1male1752male1763 female1624 female165

Reading data from a fileR can also read data from a separate file. For longer data sets or complex analyses, it is helpful tokeep data files and r code files separate. For example,D2 = read.table("male-female.dat", header=TRUE)

would read in data from a file called male-female.dat found in the working directory. In this casethe file could be a space-delimited text file:Sexmalemalefemalefemale

R Studio also has an easy interface in the Tools menu to import data from a file.The getwd function will show the location of the working directory, and setwd can be used to setthe working directory.getwd()[1] "C:/Users/Salvatore/Documents"

setwd("C:/Users/Salvatore/Desktop")

Alternatively, file paths or URLs can be designated directly in the read.table function.

Variables within data framesFor the data frame D1created above, to look at just the variable Sex in this data frame:D1$ Sex

# Note: the space is optional

[1] malemalefemale femaleLevels: female male

Note that D1$Height is a vector of numbers.D1$ Height[1] 175 176 162 165

So if you wanted the mean for this variable:mean(D1$ Height)[1] 169.5

7

A FEW NOTES TO GET STARTED WITH R

AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS

Using dplyr to create new variables in data framesThe standard method to define new variables in data frames is to use the data.frame$ variablesyntax. So if we wanted to add a variable to the D1 data frame above which would double Height:D1$ Double = D1$ Height * 2D1

# Spaces are optional

Sex Height Double1male1753502male1763523 female1623244 female165330

Another method is to use the mutate function in the dplyr package:library(dplyr)D1 =mutate(D1,Triple = Height*3,Quadruple = Height*4)D1Sex Height Double Triple Quadruple1male1753505257002male1763525287043 female1623244866484 female165330495660

The dplyr package also has functions to select only certain columns in a data frame (selectfunction) or to filter a data frame by the value of some variable (filter function). It can be helpfulfor manipulating data frames.In the examples in this book, I will use either the $ syntax or the mutate function in dplyr,depending on which I think makes the example more comprehensible.

Extracting elements from the output of a functionSometimes it is useful to extract certain elements from the output of an analysis. For example,we can assign the output from a binomial test to a variable we’ll call Test.Test = binom.test(7, 12, 3/4,alternative="less",conf.level=0.95)

To view the upper confidence limit from Test:Test$ conf.int[2][1] 0.8189752

Exporting graphicsR has the ability to produce a variety of plots. Simple plots can be produced with just a few linesof code. These are useful to get a quick visualization of your data or to check on the distributionof residuals from an analysis. More in-depth coding can produce publication-quality plots.In the Rstudio Plots window, there is an Export icon which can be used to save the plot as imageor pdf file. A method I use is to export the plot as pdf and then open this pdf with either AdobePhotoshop or the free alternative, GIMP (www.gimp.org/). These programs allow you to importthe pdf at whatever resolution you need, and then crop out extra white space.9

AVOIDING PITFALLS IN R

AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS

The appearance of exported plots will change depending on the size and scale of exported file. Ifthere are elements missing from a plot, it may be because the size is not ideal. Changing theexport size is also an easy way to adjust the size of the text of a plot relative to the otherelements.An additional trick in Rstudio is to change the size of the plot window after the plot is produced,but before it is exported. Sometimes this can get rid of problems where, for example, words in aplot legend are cut off.Finally, if you export a plot as a pdf, but still need to edit it further, you can open it in Inkscape,ungroup the plot elements, adjust some plot elements, and then export as a high-resolutionbitmap image. Just be sure you don’t change anything important, like how the data line up withthe axes.

Avoiding Pitfalls in RGrammar, spelling, and capitalization countProbably the most common problems in programming in any language are syntax errors, forexample, forgetting a comma or misspelling the name of a variable or function.Be sure to include quotes around names requiring them; also be sure to use straight quotes ( " )and not the smart quotes that some word processors use automatically. It is helpful to writeyour R code in a plain text editor or in the editor window in R Studio.

Data types in functionsProbably the biggest cause of problems I had when I first started working with R was trying tofeed functions the wrong data type. For example, if a function asks for the data as a matrix, andyou give it a data frame, it won’t work.A more subtle error I’ve encountered is when a function is expecting a variable to be a factorvector, and it’s really a character (“chr”) vector.For instance if we create a variable in the global environment with the same values as Sex andcall it Gender, it will be a character vector.Gender = c("male", "male", "female", "female")str(Gender)

# What is the structure of this variable?

chr [1:4] "male" "male" "female" "female"

While in the data frame, Sex was read in as a factor vector by default:str(D1$ Sex)Factor w/ 2 levels "female","male": 2 2 1 1

10

HELP WITH R

AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS

One of the nice things about using R Studio is that it allows you to look at the structure of dataframes and other objects in the Environment window.Data types can be converted from one data type to another, but it may not be obvious how to dosome conversions. Functions to convert data types include as.factor, as.numeric, andas.character.

StyleThere isn’t an established style for programming in R in many respects, such as if variable namesshould be capitalized. But there is a Google R Users Style Guide, for those who are interested. Idon’t necessarily agree with all the recommendations there. And in practice, people use differentstyle conventions. google.github.io/styleguide/Rguide.xml.

Help with RIt’s always a good idea to check the help information for a function before using it. Don’tnecessarily assume a function will perform a test as you think it will. The help information willgive the options available for that function, and often those options make a difference with howthe test is carried out.

Help in RIn order to see the help file for the chisq.test function:?chisq.test

In order to specify the chisq.test function in the stats package, you would use:?stats::chisq.test

orhelp(chisq.test, package=stats)

In order to search all installed packages for a term:??"chi-square"

In order to view the help for a packagehelp(package=psych)

11

R TUTORIALS

AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS

CRAN documentationDocumentation for packages are also available in a .pdf format, which may be more convenientthan using the help within R. Also very helpful, some packages include vignettes, which describehow a package might be used.For a list of available packages, visit cran.rproject.org/web/packages/available_packages_by_name.html.And clicking on the link for the psych package, will bring up a page with a link for the .pdfdocumentation, two .pdf vignettes, and other information.

Summary and Analysis of Extension Education Program Evaluation in RMost of the analyses in this book are also presented in Summary and Analysis of ExtensionEducation Program Evaluation in R (SAEEPER). It may be useful for the reader to consult thatbook for additional examples and discussion.

Other online resourcesSince there are many good resources for R online, an internet search for your question oranalysis including the term “r” will often lead to a solution. The reader is cautioned, however, toalways check the original R documentation on functions to be sure it will perform an analysis asthe user desires.A convenient tool is the RSiteSearch function, which will open a browser window and search fora term in functions and vignettes across a variety of sources:RSiteSearch("chi-square test")

This tool can also be accessed from: http://search.r-project.org/nmz.html.

R TutorialsThe descriptions of importing and manipulating data and results in this section of this book don’teven scratch the surface of what is possible with R. Going beyond this very brief introduction,however, is beyond the scope of this book. I have tried to provide only enough information sothat the reader unfamiliar with R will find the examples in the rest of the book comprehensible.Luckily, there are many resources available for users wishing to better understand how toprogram in R, manipulate data, and perform more varied statistical analyses.One free online resource I’ve found helpful is Quick-R (www.statmethods.net/).CRAN hosts a collection of R manuals (cran.r-project.org/manuals.html). One that might behelpful is An Introduction to R by Venables.12

FORMAL STATISTICS BOOKS

AN R COMPANION FOR THE HANDBOOK OF BIOLOGICAL STATISTICS

CRAN also hosts a collection of contributed documentation (cran.r-project.org/other-docs.html),in several languages, which may prove helpful.If readers wish to purchase a more-comprehensive and well-written textbook, The R Book byMichael Crawley is one option.

Formal Statistics BooksWhen describing a particular statistical analysis—especially one that your readers may not befamiliar with—it’s a good idea to cite an authoritative statistical source. A few that may be usefulfor this purpose:

Biostatistical Analysis by Jerrold Zar



Introduction to Biostatistics by Sokal and Rohlf



Categorical Data Analysis by Alan Agresti



Mixed-Effects Models in S and S-Plus by José Pinheiro and Douglas Bates