Thursday, August 28, 2014

Based on Monte Carlo simulations, it seems that the minimum sample size in PLS-SEM can be reliably and conservatively estimated based on the inequality below:

N > ( 2.48 / Abs(bm) ) ^ 2

In this inequality, N is the required sample size, and Abs(bm) is the absolute value of the path coefficient with the minimum expected magnitude in the model. This inequality assumes that:

- One-tailed P values are used for hypothesis testing. A previous post discusses this issue in more detail ().

- The threshold for P values is .05. That is, P values should be equal to or lower than .05.

- Effect sizes (ESs), as calculated by WarpPLS, are also used for hypothesis testing ().

- The threshold for ESs is .02. That is, ESs should be equal to or greater than .02.

- Acceptable statistical power is equal to or greater than .8.

- The latent variables in the model are not collinear, when both lateral and vertical collinearity are considered. That is, the full collinearity VIFs calculated by WarpPLS for all latent variables are equal to or lower than 3.3 ().

This inequality highlights the fact that path coefficient strength is a much stronger determinant of statistical power in Monte Carlo simulations than the configuration of the structural model.

The inequality is proposed as an alternative to the widely used (and discredited) "10 times rule". It yields minimum sample sizes that are consistent with Cohen's power tables for multiple regression.

For example, let us say one has a model where the path coefficient with the minimum expected magnitude is .3. Then the required sample size is:

N > ( 2.48 / .3 ) ^ 2 = 68.34

The minimum required sample size is thus:

Nm = 69

The above assumes a pre-analysis minimum sample size estimation, where the path coefficient with the minimum expected magnitude is set prior to the analysis.

A post-analysis minimum sample size estimation, on the other hand, would be based on the results of a full PLS-SEM analysis. Generally pre-analysis estimation is recommended over post-analysis estimation.

The latter, post-analysis estimation, can only confirm that an appropriate sample size was used.

Ned Kock

About Me

I am a researcher, software developer, consultant, and college professor. Two of my main areas of research are nonlinear variance-based structural equation modeling, and evolutionary biology as it applies to the study of human-technology interaction. My degrees are in engineering (B.E.E.), computer science (M.S.), and business (Ph.D.). I am interested in the application of science, statistics, and technology to the understanding of human health and behavior. Here I blog about statistics, and more specifically about nonlinear variance-based structural equation modeling and WarpPLS, the first software to enable this type of analysis. My personal web site contains my contact information and freely available articles related to the topic of this blog: nedkock.com.

Ned Kock on the Web

Copyright

The contents of this blog may be used with proper attribution. Most of the issues covered here are also covered in the latest version of the WarpPLS User Manual. Therefore, you can cite the Manual to refer to issues covered here in this blog.