Introduction to biostatistics

About

An introductory course on data handling and biostatistics for students studying towards a Bachelor of Health Sciences (BHSc) Honours in Physiology at the University of the Witwatersrand, South Africa. The course is based around the statistical programming language R.

The aims of the course are to introduce participants to the basics of data wrangling, plotting, and reproducible data analysis and reporting. These aims are explored using the statistical computing programme R in the RStudio integrated development environment (IDE), and git (with the GitHub web-based git repository hosting service) for version control. The reason for choosing these apps is that they are free (as in beer and as in speech), and have well-established and active user and developer communities. You need a basic working knowledge of the command line, R and git to complete the course. So if you are not familiar with these apps, I suggest that you complete some free online courses before starting (see examples below).

Course assessment

The year mark for the course will constitute 40% of the total course mark, and will be assessed by a series of 6 short assignments, each worth 10 marks. The biostatistics examination will constitute the remaining 60% of the total mark for the course. Assignments must be submitted by 23:59 on the due date. No extensions will be granted, and 10% will be deducted from the assignment mark for each day the assignment is late.

The table below provides a link to the assignments and indicates the due date for each assignment.

Tutorials

These tutorials do not count for course credit, but give you a chance to get hands-on experience applying what you learn in the lectures. The tutorials will take place with the course instructor in the computer laboratory immediately after the relevant lecture has finished. You may work alone or in groups. You may also work through them tutorials in your own time.

The majority of the tutorials are deployed through the R package swirl. The swirl package was developed by the Swirl Development Team, and includes a suite of step-by-step interactive training courses on R, which are aimed primarily at the novice and intermediate R user.

Follow the instructions below to access swirl courses:

# Re-type or copy and paste the text below into the R console, # pressing 'Enter' after each step.# If you haven't already installed swirlinstall.packages('swirl')
# Load the 'swirl' packagelibrary(swirl)
# Launch a 'swirl' session and follow the promptsswirl()

To install swirl courses:

# Re-type or copy and paste the text below into the R console, # pressing 'Enter' after each step.# Load the 'swirl' packagelibrary(swirl)
# Download a course from the 'swirl' github repositoryinstall_from_swirl('Course Name Here')
# Launch a 'swirl' session and follow the promptsswirl()

Resources

Visualizing statistics

I strongly recommend that all students go play around with the interactive plots at Seeing Theory, a project designed and created by Daniel Kunin with support from Brown University’s Royce Fellowship Program and National Science Foundation group STATS4STEM. The goal of the project is to make statistics more accessible to a wider range of students through interactive visualizations.

Once you have downloaded and installed R and RStudio, I recommend that you install the following R packages (you may need others during the course, but the suggested packages will get you through all activities in the course):

Offline installation of recommended packages and swirl courses

If you are working behind a corporate proxy you may experience problems installing packages from the CRAN servers. To help you get the packages required for this course, I have written a package that will install the packages and swirl tutorials from a local source.

The package is called biostatSetup, and to reduce the package size (it’s essentially a mini CRAN repository), I have cteated three versions for each of the major operating systems: biostatSetupSrc for Linux, biostatSetupMacOS for Mac, and biostatSetupWindows for Windows.

Please note that the package was developed for R v3.3. If you have a lower version of R, please upgrade your version before installing the package.

git

Miscellaneous

Configuring git

Global configuration

You need to configure git after you install it. If you are going to be the only one using the computer, then open Terminal (OSX and Linux) or Git Bash (Windows) and enter the following text (substituting your username and email address as required):

If you configure your computer using the --global tag, you only have to enter this information once. Thereafter, git will assume that all commands are being eneterd by you. As you may expect then, configuring your user details with the --global tag is not a good idea if the computer you use has multiple users working, for example, through a ‘Guest Account’. In that situation, rather individually set the user configuration for each directory (project) you initiate as follows:

Open Terminal (OSX and Linux) or Git Bash (Windows) and navigate to the directory you want to initiate as a repository;

Enter the following text (substituting your username and email address as required):