A scatterplot is a useful way to visualize the relationship between two variables. Similar to correlations, scatterplots are often used to make initial diagnoses before any statistical analyses are conducted. This tutorial will explore the ways in which R can be used to create scatterplots.

Tutorial Files

Before we start, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains pre and post test scores for 66 subjects on a series of reading comprehension tests (Moore & McCabe, 1989). Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.

Plotting Two Variables

The simplest way to create a scatterplot is to directly graph two variables using the default settings. In R, this can be accomplished with the plot(XVAR, YVAR) function, where XVAR is the variable to plot along the x-axis and YVAR is the variable to plot along the y-axis. Suppose that we want to get a picture of the relationship between pretest 1 (PRE1) and posttest 1 (POST1). The following example demonstrates how to use the plot(XVAR, YVAR) function to visualize this relationship.

#create a scatterplot of Y on X using plot(XVAR, YVAR)

#what does the relationship between pretest 1 and posttest 1 look like?

plot(PRE1, POST1)

The output of the preceding function is pictured below.

Plotting All Variables

When beginning to analyze a dataset, researchers often want to get a complete picture of all relationships, rather than just a single one. Conveniently, the plot() function can also be run on an entire set of data. The format for this operation is plot(DATAVAR), where DATAVAR is the name of the R variable containing the data. Suppose now that our interest is in visualizing all of the scatterplots at once, in order to diagnose the various relationships present in our data. The following example demonstrates how to use the plot(DATAVAR) function.

#create scatterplots of all variables using plot(DATAVAR)

#what do all of the relationships in the data look like?

plot(datavar)

The output of the preceding function is pictured below.

Note that the image above has been resized to fit on this page. In the R Quartz Window, the scatterplots could be made much larger for easier viewing.

Custom Plotting

Additional Plot() Arguments

Up to this point, we have been using the default values for all of our scatterplots' elements. However, R also allows for the customization of scatterplots. In addition to x and y axis variables, the plot() function also accepts the following arguments ("The Default Scatterplot Function", n.d.).

Even more arguments are accepted by the plot() function. Take a look at the referenced page if you wish to explore further options.
Now let's recreate the plot of posttest 1 on pretest 1 yet again, but this time with the inclusion of customized aesthetic parameters.

#create a scatterplot of Y on X incorporating the custom aesthetic parameters of the plot() function

Note that the c() function is used for a number of the parameters in the plot function above. This allows one to define multiple values as a "vector" that can be fed into a single argument. For example, if one wanted to use only a single line color, then col = "red" would be acceptable. However, to use multiple colors, all items must be placed into a vector such as col = c("red", "green", "blue"). Without using a vector for multiple colors, as in col = "red", "green", "blue", an error would occur because the colors would be treated as separate arguments rather than a single entity.

Complete Plot Examples

To see a complete example of how scatterplots can be created in R, please download the plot examples (.txt) file.

Even More Visualizations

R has much more sophisticated graphic capabilities than have been demonstrated in this tutorial. In fact, opportunities exist to make very complex and unique visuals. To see examples of the kinds of charts that can be generated with R, I recommend that you visit the R Graph Gallery (François, 2006).

I'm a bit confused by the colour coded scatter plot example here. I would have thought colours would be assigned according to a catagorical variable (ie "Group" in this case), but it seems that the colours here are assigned randomly to data points? Could you perhaps post an example of how to assign colours based on a third variable? I think this would have far more practical application.... thanks!

I think using a grouping variable would help in the format that Heretic explains. The grouping variable would be the column in your dataset that identifies which group each object belongs to. For example, with students you might have freshman, sophomore, junior, and senoir in your "class" grouping variable. Since each student in your dataset has a class, R could use this information to plot a color for each student that matches his/her class.

Hello, Thanks for the tutorial.How can I removed the plot borders and maintain only the x and y axes? I have tried the function frame.front but it does not work. x and y appear separately. Please help me.

HiI have two questions1) How do I change the color of the background of plots ?

2) More precisely how can we paste an image as background picture of a plot ?I mean I have gps coordinates, and I plot them (y axis=lat, and x axis=long). I thus get a scattered point pattern on a 500x500m white scatterplot. I want the background to be the google earth image of the landscape on which the gps coords were recorded and not the default. I was thinking that if we can make the background of the plot transparent then I can superpose the images, if well georeferencedAny script for this ? any ideas ?