Lesson 2: Basic Plotting

by David Beskow

In this lesson, we will continue to use the birth data in order to learn some basic plotting skills. Let’s start by reading the birth data into our Global Environment:

births<-read.csv("births.csv",as.is=TRUE)

Line Plot

Let’s start by learning how to do a line plot. An informative line plot for this data would be to plot the average APGAR score for each by gestation period. Our intution would tell us that premature babies would have a lower APGAR score. Let’s see if this is correct. To check this, we first need to calculate the mean APGAR score for each of the levels of gestation period. We use the aggregate command to do this (aggregate is similar to a pivot table in Excel):

apgar<-aggregate(APGAR5~ESTGEST,data=births,mean)

Line charts are created with the function plot(x, y, type=) where x and y are numeric vectors of (x,y) points to connect. type= can take the following values:

type

description

p

points

l

lines

o

overplotted points and lines

b, c

points (empty if “c”) joined by lines

s, S

stair steps

h

histogram-like vertical lines

n

does not produce any points or lines

The points( ) function adds information to a graph. It can not produce a graph on its own. Usually it follows a plot(x, y) command that produces a graph.

Let’s plot the APGAR data:

plot(apgar$ESTGEST,apgar$APGAR5)

This is generally what we expect, except we see there may be some errors with the data. Note that we have one data point with a gestation of 100 weeks. This would mean that the woman carried the baby for close to 25 months, which we assume is impossible. Let’s delete this data point and replot it. We also seem to have some bad data at the lower end of the spectrum. This is saying that we have a very healthy baby born at 12 weeks of gestation. This is also impossible.