ggstatsplot: ggplot2 Based Plots with Statistical Details

Package

Status

Usage

GitHub

References

Overview

ggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the plots themselves and targeted primarily at behavioral sciences community to provide a one-line code to produce information-rich plots. In a typical exploratory data analysis workflow, data visualization and statistical modeling are two different phases: visualization informs modeling, and modeling in its turn can suggest a different visualization method, and so on and so forth. The central idea of ggstatsplot is simple: combine these two phases into one in the form of graphics with statistical details, which makes data exploration simpler and faster.

Currently, it supports only the most common types of statistical tests: parametric, nonparametric, robust, and bayesian versions of t-test/anova, correlation analyses, contingency table analysis, and regression analyses.

It, therefore, produces a limited kinds of plots for the supported analyses:

violin plots (for comparisons between groups or conditions),

pie charts and bar charts (for categorical data),

scatterplots (for correlations between two variables),

correlation matrices (for correlations between multiple variables),

histograms and dot plots/charts (for hypothesis about distributions),

dot-and-whisker plots (for regression models).

In addition to these basic plots, ggstatsplot also provides grouped_ versions for most functions that makes it easy to repeat the same analysis for any grouping variable.

Future versions will include other types of statistical analyses and plots as well.

Statistical reporting

For all statistical tests reported in the plots, the default template abides by the APA gold standard for statistical reporting. For example, here are results from Yuen’s test for trimmed means (robust t-test):

Summary of supported statistical analyses

The table below summarizes all the different types of analyses currently supported in this package-

Functions

Description

Parametric

Non-parametric

Robust

Bayes Factor

ggbetweenstats

Between group/condition comparisons

Yes

Yes

Yes

Yes

gghistostats, ggdotplotstats

Distribution of a numeric variable

Yes

Yes

Yes

Yes

ggcorrmat

Correlation matrix

Yes

Yes

Yes

No

ggscatterstats

Correlation between two variables

Yes

Yes

Yes

Yes

ggpiestats, ggbarstats

Association between categorical variables

Yes

No

No

Yes

ggpiestats

Proportion test

No

No

No

No

ggcoefstats

Regression model coefficients

Yes

No

Yes

Yes

Effect sizes and confidence intervals available

ggstatsplot provides a wide range of effect sizes and their confidence intervals.

If you are not using the RStudio IDE and you get an error related to “pandoc” you will either need to remove the argument build_vignettes = TRUE (to avoid building the vignettes) or install pandoc. If you have the rmarkdown R package installed then you can check if you have pandoc by running the following in R:

Usage

ggstatsplot relies on non-standard evaluation (NSE), i.e., rather than looking at the values of arguments (x, y), it instead looks at their expressions. This means that you shouldn’t enter arguments with the $ operator and setting data = NULL: data = NULL, x = data$x, y = data$y. You must always specify the data argument for all functions. On the plus side, you can enter arguments either as a string (x = "x", y = "y") or as a bare expression (x = x, y = y) and it wouldn’t matter. To read more about NSE, see- http://adv-r.had.co.nz/Computing-on-the-language.html

ggstatsplot is a very chatty package and will by default print helpful notes on assumptions about linear models, warnings, etc. If you don’t want your console to be cluttered with such messages, they can be turned off by setting argument messages = FALSE in the function call.

Here are examples of the main functions currently supported in ggstatsplot.

Note: If you are reading this on GitHub repository, the documentation below is for the development version of the package. So you may see some features available here that are not currently present in the stable version of this package on CRAN. For documentation relevant for the CRAN version, see:

ggbetweenstats

This function creates either a violin plot, a box plot, or a mix of two for between-group or between-condition comparisons with results from statistical tests in the subtitle. The simplest function call looks like this-

Note that this function returns a ggplot2 object and thus any of the graphics layers can be further modified.

The type (of test) argument also accepts the following abbreviations: "p" (for parametric) or "np" (for nonparametric) or "r" (for robust) or "bf" (for Bayes Factor). Additionally, the type of plot to be displayed can also be modified ("box", "violin", or "boxviolin").

A number of other arguments can be specified to make this plot even more informative or change some of the default options.

In case of a parametric t-test, setting bf.message = TRUE will also attach results from Bayesian Student’s t-test. That way, if the null hypothesis can’t be rejected with the NHST approach, the Bayesian approach can help index evidence in favor of the null hypothesis (i.e., BF01).

By default, Bayes Factor quantifies the support for the alternative hypothesis (H1) over the null hypothesis (H0) (i.e., BF10 is displayed). Natural logarithms are shown because BF values can be pretty large. This also makes it easy to compare evidence in favor alternative (BF10) versus null (BF01) hypotheses (since log(BF10) = - log(BF01)).

Additionally, there is also a grouped_ variant of this function that makes it easy to repeat the same operation across a single grouping variable:

Variant of this function ggwithinstats is currently under work. You can still use this function just to prepare the plot for exploratory data analysis, but the statistical details displayed in the subtitle will be incorrect. You can remove them by adding + ggplot2::labs(subtitle = NULL) to your function call.

As a temporary solution, you can use the helper function from ggstatsplot to display results from within-subjects version of the test in question. Here is an example-

ggpiestats

This function creates a pie chart for categorical or nominal variables with results from contingency table analysis (Pearson’s chi-squared test for between-subjects design and McNemar’s test for within-subjects design) included in the subtitle of the plot. If only one categorical variable is entered, results from one-sample proportion test will be displayed as a subtitle.

This function can also be used to study an interaction between two categorical variables. Additionally, this basic plot can further be modified with additional arguments and the function returns a ggplot2 object that can further be modified with ggplot2 syntax:

ggcorrmat

ggcorrmat makes a correlalogram (a matrix of correlation coefficients) with minimal amount of code. Just sticking to the defaults itself produces publication-ready correlation matrices. But, for the sake of exploring the available options, let’s change some of the defaults. For example, multiple aesthetics-related arguments can be modified to change the appearance of the correlation matrix.

Note that if there are NAs present in the selected dataframe, the legend will display minimum, median, and maximum number of pairs used for correlation matrices.

Alternatively, you can use it just to get the correlation matrices and their corresponding p-values (in a tibble format). Also, note that if cor.vars are not specified, all numeric variables will be used.

combine_plots

The full power of ggstatsplot can be leveraged with a functional programming package like purrr that replaces for loops with code that is both more succinct and easier to read and, therefore, purrr should be preferrred 😻. (Another old school option to do this effectively is using the plyr package.)

In such cases, ggstatsplot contains a helper function combine_plots to combine multiple plots, which can be useful for combining a list of plots produced with purrr. This is a wrapper around cowplot::plot_grid and lets you combine multiple plots and add a combination of title, caption, and annotation texts with suitable defaults.

theme_ggstatsplot

All plots from ggstatsplot have a default theme: theme_ggstatsplot. You can change this theme by using the argument ggtheme for all functions.

It is important to note that irrespective of which ggplot theme you choose, ggstatsplot in the backdrop adds a new layer with its idiosyncratic theme settings, chosen to make the graphs more readable or aesthetically pleasing. Let’s see an example with gghistostats and see how a certain theme from hrbrthemes package looks with and without the ggstatsplot layer.

Using ggstatsplot helpers to display text results

Sometimes you may not like the default plot produced by ggstatsplot. In such cases, you can use other custom plots (from ggplot2 or other plotting packages) and still use ggstatsplot (subtitle) helper functions to display results from relevant statistical test. For example, in the following chunk, we will use pirateplot from yarrr package and use ggstatsplot helper function to display the results.

Code coverage

Contributing

I’m happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. I personally prefer using the Github issues system over trying to reach out to me in other ways (personal e-mail, Twitter, etc.). Pull requests for contributions are encouraged.