pingouin.anova

DataFrame. Note that this function can also directly be used as a
Pandas method, in which case this argument is no longer needed.

dvstring

Name of column in data containing the dependent variable.

betweenstring or list with N elements

Name of column(s) in data containing the between-subject factor(s).
If between is a single string, a one-way ANOVA is computed.
If between is a list with two or more elements, a N-way ANOVA is
performed.
Note that Pingouin will internally call statsmodels to calculate
ANOVA with 3 or more factors, or unbalanced two-way ANOVA.

ss_typeint

Specify how the sums of squares is calculated for unbalanced design
with 2 or more factors. Can be 1, 2 (default), or 3. This has no impact
on one-way design or N-way ANOVA with balanced data.

The classic ANOVA is very powerful when the groups are normally distributed
and have equal variances. However, when the groups have unequal variances,
it is best to use the Welch ANOVA (pingouin.welch_anova()) that
better controls for type I error (Liu 2015). The homogeneity of variances
can be measured with the pingouin.homoscedasticity() function.

The main idea of ANOVA is to partition the variance (sums of squares)
into several components. For example, in one-way ANOVA:

The default effect size reported in Pingouin is the partial eta-square.
However, one should keep in mind that for one-way ANOVA
partial eta-square is the same as eta-square and generalized eta-square.
For more details, see Bakeman 2005; Richardson 2011.

Note that missing values are automatically removed. Results have been
tested against R, Matlab and JASP.

Warning

Versions of Pingouin below 0.2.5 gave wrong results for
unbalanced N-way ANOVA. This issue has been resolved in
Pingouin>=0.2.5. In such cases, the ANOVA is calculated via an
internal call to the statsmodels package.

Same but using a standard eta-squared instead of a partial eta-squared
effect size. Also note how here we’re using the anova function directly as
a method (= built-in function) of our pandas dataframe. In that case,
we don’t have to specify data anymore.