I have added 2 important new variables, quality and batch. The motivation for these variables is akin to an RNAseq analysis set where you have a quality measure like read depth, and where the data were processed in different batches. The y variable is based both on the batch effect and the quality.

We construct the ggplot2 object for plotting x against y as follows:

g = ggplot(data, aes(x = x, y=y)) + geom_point()
print(g)

Coloring by a 3rd Variable (Discrete)

Let's plot the x and y data by the different batches:

print({ g + aes(colour=batch)})

We can see from this example how to color by another third discrete variable. In this example, we see that the relationship is different by each batch (each are shifted).

Coloring by a 3rd Variable (Continuous)

Let's color by quality, which is continuous:

print({ gcol = g + aes(colour=quality)})

The default option is to use black as a low value and blue to be a high value. I don't always want this option, as I cannot always see differences clearly. Let's change the gradient of low to high values using scale_colour_gradient:

print({ gcol + scale_colour_gradient(low = "red", high="blue") })

This isn't much better. Let's call the middle quality gray and see if we can see better separation:

I'm not a fan that the default for stat_density is that the geom = "area". This creates a line on the x-axis that closes the object. This is very important if you want to fill the density with different colors. Most times though, I want simply the line of the density with no bottom line. We can achieve this with:

print({gdens2 = g+ geom_line(stat = "density")})

What if we set the option to fill the lines now? Well lines don't take fill, so it will not colour each line differently.

Now, regardless of the coloring, you can't really see the difference in people since there are so many. What if we want to do the plot with a subset of the data and the object is already constructed? Again, use the %+% operator.

Overlaid Densities with Alpha Blending

Let's take just different subsets of groups, not people, and plot the densities, with alpha blending:

print({ group_dens = ggroup+ geom_density(alpha = 0.3) })

That looks much better than the histogram example for groups. Now let's show these with lines:

Conclusion

Overall, this post should give you a few ways to play around with densities and such for plotting. All the same changes as the previous examples with scatterplots, such as facetting, can be used with these distribution plots. Many times, you want to look at the data in very different ways. Histograms can allow you to see outliers in some ways that densities do not because they smooth over the data. Either way, the mixture of alpha blending, coloring, and filling (though less useful for many distributions) can give you a nice description of what's going on a distributional level in your data.

PS: Boxplots

You can also do boxplots for each group, but these tend to separate well and colour relatively well using defaults, so I wil not discuss them here. My only note is that you can (and should) overlay points on the boxplot rather than just plot the histogram. You may need to jitter the points, alpha blend them, subsample the number of points, or clean it up a bit, but I try to display more DATA whenever possible.