Wednesday, August 16, 2017

I would like to thank the participants in the workshop on PLS-SEM with WarpPLS, conducted on 12-13 August 2017. The workshop took place in the beautiful Penang, Malaysia. The group in the workshop was very smart and inquisitive, making it highly interactive.

Several of the participants were expert users of WarpPLS, and highly knowledge about structural equation modeling (SEM) issues; both from applied and philosophical perspectives. We used WarpPLS version 6.0 (see blog post below).

The full User Manual is also available for download from the web site above separately from the software.

Some important notes for users of previous versions:

- There is no need to uninstall previous versions of WarpPLS to be able to install and use this new version.

- Users of previous versions can use the same license information that they already have; it will work for version 6.0 for the remainder of their license periods.

- Project files generated with previous versions are automatically converted to version 6.0 project files. Users are notified of that by the software, and given the opportunity not to convert the files if they so wish.

- The MATLAB Compiler Runtime 7.14, used in this version, is the same as the one used in versions 2.0-5.0. Therefore, if you already have one of those versions of WarpPLS installed on your computer, you should not reinstall the Runtime.

WarpPLS is a powerful PLS-based structural equation modeling (SEM) software. Since its first release in 2009, its user base has grown steadily, now comprising more than 7,000 users in over 33 countries.

Some of the most distinguishing features of WarpPLS are the following:

At the beginning of the User Manual you will see a list of new features in this version, some of which are listed below. The User Manual has more details on how these new features can be useful in SEM analyses.

- Factor-based PLS algorithms building on consistent PLS. There has been a long and in some instances fairly antagonistic debate among proponents and detractors of the use of Wold’s original PLS algorithms in the context of SEM. This debate has been fueled by one key issue: Wold’s original PLS algorithms do not deal with actual factors, as covariance-based SEM algorithms do; but with composites, which are exact linear combinations of indicators. The previous version of this software offered various factor-based PLS algorithms to address this limitation. Those algorithms use the Cronbach’s alpha coefficient as a basis to estimate measurement error and true composite weights. This version of the software continues this tradition, by offering the following new factor-based PLS algorithms: Factor-Based PLS Type CFM3, Factor-Based PLS Type CFM2, Factor-Based PLS Type REG2, and Factor-Based PLS Type PTH2. A common characteristic of these new factor-based PLS algorithms is that they build on Dijkstra's consistent PLS (a.k.a. PLSc) technique, whose reliability measure appears to be, in many contexts, a better approximation of the true reliability than the reliability measures usually reported in PLS-based SEM contexts – the composite reliability and Cronbach’s alpha coefficients.

- Statistical power and minimum sample size requirements. The new menu option “Explore statistical power and minimum sample size requirements” now allows you to obtain estimates of the minimum required sample sizes for empirical studies based on the following model elements: the minimum absolute significant path coefficient in the model (e.g., 0.21), the significance level used for hypothesis testing (e.g., 0.05), and the power level required (e.g., 0.80). Two methods are used to estimate minimum required sample sizes, the inverse square root and gamma-exponential methods. These methods simulate Monte Carlo experiments, and thus produce estimates that are in line with the estimates that would be produced through the Monte Carlo method.

- T ratios and confidence intervals. While P values are widely used in PLS-based SEM, as well as in SEM in general, the statistical significances of path coefficients, weights and loadings can also be assessed employing T ratios and/or confidence intervals. These can now be obtained through the new menu option “Explore T ratios and confidence intervals”, which also allows you to set the confidence level to be used.

- Conditional probabilistic queries. If an analysis suggests that two variables are causally linked, yielding a path coefficient of 0.25 for example, this essentially means in probabilistic terms that an increase in the predictor variable leads to an increase in the conditional probability that the criterion variable will be above a certain value. Yet, conditional probabilities cannot be directly estimated based on path coefficients; and those probabilities may be of interest to both researchers and practitioners. By using the “Explore conditional probabilistic queries” menu option, users of this software can now estimate conditional probabilities via queries including combinations of latent variables, unstandardized indicators, standardized indicators, relational operators (e.g., > and <=), and logical operators (e.g., & and |).

- Full latent growth. Sometimes the actual inclusion of moderating variables and corresponding links in a model leads to problems; e.g., increases in collinearity levels, and the emergence of instances of Simpson’s paradox. The new menu option “Explore full latent growth” now allows you to completely avoid these problems, and estimate the effects of a latent variable or indicator on all of the links in a model (all at once), without actually including the variable in the model. Moreover, growth in coefficients associated with links among different latent variables and between a latent variable and its indicators, can be estimated; allowing for measurement invariance tests applied to loadings and/or weights.

- Multi-group analyses and measurement invariance assessment. The new menu options “Explore multi-group analyses” and “Explore measurement invariance” now allow you to conduct analyses where the data is segmented in various groups, all possible combinations of pairs of groups are generated, and each pair of groups is compared. In multi-group analyses normally path coefficients are compared, whereas in measurement invariance assessment the foci of comparison are loadings and/or weights. The grouping variables can be unstandardized indicators, standardized indicators, and labels. As mentioned above, these types of analyzes can now also be conducted via the new menu option “Explore full latent growth”, which presents several advantages (as discussed in the WarpPLS User Manual).

- Analytic composites and instrumental variables. Analytic composites are weighted aggregations of indicators where the relative weights are set by you, usually based on an existing theory. The new menu option “Explore analytic composites and instrumental variables” allows you to create analytic composites. This new menu option also allows you to create instrumental variables. Instrumental variables are variables that selectively share variation with other variables, and only with those variables.

- Endogeneity assessment and control. Instrumental variables can now be used to test and control for endogeneity, which occurs when the structural error term for an endogenous variable is correlated with any of the variable’s predictors. For example, let us consider a simple population model with the following links A > B and B > C. This model presents endogeneity with respect to C, because variation flows from A to C via B, leading to a biased estimation of the path for the link B > C via ordinary least squares regression. Adding a link from A to C could be argued as “solving the problem”, but in fact it creates the possibility of a type I error, since the link A > C does not exist at the population level. A more desirable solution to this problem is to create an instrumental variable iC, incorporating only the variation of A that ends up in C and nothing else, and revise the model so that it has the following links: A > B, B > C and iC > C. The link iC > C can be used to test for endogeneity, via its P value and effect size. This link (i.e., iC > C) can also be used to control for endogeneity, thus removing the bias when the path coefficient for the link B > C is estimated via ordinary least squares regression. To create instrumental variables to test and control for endogeneity you should use the sub-option “Single stochastic variation sharing”, under the new menu option “Explore analytic composites and instrumental variables”.

- Reciprocal relationships assessment. Instrumental variables can also be used to estimate reciprocal relationships. For this, you should use the sub-option “Reciprocal stochastic variation sharing”, under the new menu option “Explore analytic composites and instrumental variables”. To illustrate the sub-option “Reciprocal stochastic variation sharing” let us consider a population model with the following links: A > C, B > D, C > D and D > C. To test the reciprocal relationship between C and D you should first control for endogeneity in C and D, due to variation coming from B and A respectively, by creating two instrumental variables iC and iD via the sub-option “Single stochastic variation sharing” and adding these variables to the model. Next you should create two other instrumental variables through the sub-option “Reciprocal stochastic variation sharing”, which we will call here iCrD and iDrC, referring to the conceptual reciprocal links C > D and D > C respectively. (No links between C and D should be included in the model graph, since reciprocal links cannot be directly represented in this version of this software.) The final model, with all the links, will be as follows: A > C, iC > C, B > D, iD > D, iDrC > D and iCrD > C. Here the link iDrC > D represents the conceptual link C > D, and can be used to test this conceptual link; and the link iCrD > C represents the conceptual link D > C, and can similarly be used to test this conceptual link.

- Numeric-to-categorical conversion. The new menu option “Explore categorical-numeric-categorical conversion” now allows you to perform numeric-to-categorical conversions. In a numeric-to-categorical conversion one or more of the following are converted into a single data label variable: latent variable, standardized indicator, or unstandardized indicator. This option is useful in multi-group analyses where the investigator wants to employ more than one numeric field for grouping. For example, let us assume that the following two unstandardized indicators are available: C, with the values 1 and 0 referring to individuals from the countries of Brazil and New Zealand; and G, with the values 1 and 0 referring to females and males. By using a numeric-to-categorical conversion a researcher could create a new data label variable to conduct a multi-group analysis based on four groups: “C=1G=1”, “C=1G=0”, “C=0G=1” and “C=0G=0”.

- Categorical-to-numeric conversion. The new menu option “Explore categorical-numeric-categorical conversion” also allows you to perform categorical-to-numeric conversions. In a categorical-to-numeric conversion a user can convert a categorical variable, stored as a data label variable, into a numeric variable that is added to the dataset as a new standardized indicator. This new variable can then be used as a new indicator of an existing latent variable, or as a new latent variable with only one-indicator. Three categorical-to-numeric conversion modes, to be used under different circumstances, are available: anchor-factorial with fixed variation, anchor-factorial with variation diffusion, and anchor-factorial with variation sharing.

- Dijkstra's consistent PLS outputs. The new menu option “Explore Dijkstra's consistent PLS outputs” now allows you to obtain key outputs generated based on Dijkstra's consistent PLS (a.k.a. PLSc) technique. These outputs include PLSc reliabilities for each latent variable, also referred to as Dijkstra's rho_a's, which appear to be, in many contexts, better approximations of the true reliabilities than the measures usually reported in PLS-based SEM contexts – the composite reliability and Cronbach’s alpha coefficients. Also included in the outputs generated via this menu option are PLSc loadings; along with the corresponding standard errors, one-tailed and two-tailed P values, T ratios, and confidence intervals.

- Fit indices comparing indicator correlation matrices. The new menu option “Explore additional coefficients and indices” now allows you to obtain an extended set of model fit and quality indices. The extended set of model fit and quality indices includes the classic indices already available in the previous version of this software, as well as new indices that allow investigators to assess the fit between the model-implied and empirical indicator correlation matrices. These new indices are the standardized root mean squared residual (SRMR), standardized mean absolute residual (SMAR), standardized chi-squared (SChS), standardized threshold difference count ratio (STDCR), and standardized threshold difference sum ratio (STDSR). As with the classic model fit and quality indices, the interpretation of these new indices depends on the goal of the SEM analysis. Since these indices refer to the fit between the model-implied and empirical indicator correlation matrices, they become more meaningful when the goal is to find out whether one model has a better fit with the original data than another, particularly when used in conjunction with the classic indices. When assessing the model fit with the data, several criteria are recommended. These criteria are discussed in the WarpPLS User Manual.

- New reliability measures. The new menu option “Explore additional coefficients and indices” now also allows you to obtain an extended set of reliabilities. The extended set of reliabilities includes the classic reliability coefficients already available in the previous version of this software, plus the following, for each latent variable in your model: Dijkstra's PLSc reliability (also available via the new menu option “Explore Dijkstra's consistent PLS outputs”), true composite reliability, and factor reliability. When factor-based PLS algorithms are used in analyses, the true composite reliability and the factor reliability are produced as estimates of the reliabilities of the true composites and factors. They are calculated in the same way as the classic composite reliabilities available from the previous version of this software, but with different loadings. When classic composite-based (i.e., non-factor-based) algorithms are used, both true composites and factors coincide, and are approximated by the composites generated by the software. As such, true composite and factor reliabilities equal the corresponding composite reliabilities whenever composite-based algorithms are used.

Sunday, July 9, 2017

Structural equation modeling (SEM), or path analysis with latent variables, is one of the most general and comprehensive statistical analysis methods. Path analysis, multiple regression, ANCOVA, ANOVA and other widely used statistical analysis methods can be seen as special cases of SEM.

WarpPLS 6.0 is a very user-friendly and powerful SEM software tool, arguably the first of its kind to implement linear and nonlinear algorithms. It provides one of the most extensive sets of SEM outputs. Among other things it automatically calculates indirect and total effects and respective P values, as well as full collinearity estimates.

This workshop (details below) is aimed at beginner and intermediate SEM practitioners. Among possible participants are those who are interested in: (a) being productive co-authors or research collaborators, even if not doing SEM analyses themselves; (b) conducting basic SEM analyses occasionally in the future; (c) conducting SEM analyses of intermediate complexity on a regular basis.

Participants will receive a one-year license of WarpPLS 6.0. See the link below for more details on this software.

The main goal of this workshop is to give participants a practical understanding of how to use the software WarpPLS to conduct variance-based SEM. The workshop is very hands-on and covers linear and nonlinear applications. Many topics will be covered directly or indirectly, depending on the questions received from the participants, who are welcome to bring their own datasets to the workshop. Below is a tentative overview of the topics covered:

Wednesday, April 12, 2017

This is just a thank you note to those who participated,
either as presenters or members of the audience, in the 2017 PLS Applications
Symposium:

http://plsas.net/

As in previous years, it seems that it was a good idea to
run the Symposium as part of the Western Hemispheric Trade Conference. This
allowed attendees to take advantage of a subsidized registration fee, and also
participate in other Conference sessions and the Conference's social event.

I have been told that the proceedings will be available soon
from the Western Hemispheric Trade Conference web site.

Also, the full-day workshop on PLS-SEM using the software
WarpPLS was well attended. This workshop was fairly hands-on and interactive.
Some participants had quite a great deal of expertise in PLS-SEM and WarpPLS.
It was a joy to conduct the workshop!

As soon as we define the dates, we will be announcing next
year’s PLS Applications Symposium. Like this years’ Symposium, it will take
place in Laredo, Texas, probably in mid-April as well.

The partial least squares (PLS) method has increasingly been used in a variety of fields of research and practice, particularly in the context of PLS-based structural equation modeling (SEM). The focus of this Symposium is on the application of PLS-based methods, from a multidisciplinary perspective. For types of submissions, deadlines, and other details, please visit the Symposium’s web site:

On 5 April 2017 a full-day workshop on PLS-SEM will be conducted by Dr. Ned Kock, using the software WarpPLS. Dr. Kock is the original developer of this software, which is one of the leading PLS-SEM tools today; used by thousands of researchers from a wide variety of disciplines, and from many different countries. This workshop will be hands-on and interactive, and will have two parts: (a) basic PLS-SEM issues, conducted in the morning (9 am - 12 noon); and (b) intermediate and advanced PLS-SEM issues, conducted in the afternoon (2 pm - 5 pm). Participants may attend either one, or both of the two parts.

Tuesday, March 7, 2017

How do we interpret the results of a model with an endogenous dichotomous variable? Let us use the model below to illustrate the answer to this question. In this model we have one endogenous dichotomous variable “Success” that is significantly caused in a direct way by two predictors: “Projmgt” and “JSat”. The direct effect of a third predictor, namely "ECollab", is relatively small and borderline significant.

Let us assume that the unit of analysis is a team of people. The variable “Success” is coded as 0 or 1, meaning that a team is either successful or not. After standardization, the 0 and 1 will be converted into a negative and a positive number. The standardized version of the variable “Success” will have a mean of zero and a standard deviation of 1.

One way to interpret the results is the following. The probability that a team will be successful (i.e., that “Success” > 0) is significantly affected by increases in the variables “Projmgt” and “JSat”.

In version 6.0 of WarpPLS, users will be able to calculate conditional probabilities as shown below, without having to resort to transformations based on assumed underlying functions, such as those performed by logistic regression. In this screen shot, only latent variables are used, and they are all assumed to be standardized.

In the screen shot above, we can see that the probability that a team will be successful (i.e., that “Success” > 0), if “Projmgt” > 1 and “JSat” > 1, is 52.2 percent. Stated differently, if “Projmgt” and “JSat” are high (greater than 1 standard deviation above the mean), then the probability of success is slightly greater than chance.

A probability of 52.2 percent is not that high. The reason why it is not higher, in the context of the conditional probabilistic query above, is that we are not including the variable "ECollab" in the mix. Still, it does not seem like “Projmgt” and “JSat” being high are sufficient conditions for success, although they may be necessary conditions.

Consider a different set of conditional probabilities. If a team is successful (i.e., if “Success” > 0), what is the probability that “Projmgt” and “JSat” are low for that team. The answer, shown in the screen below, is 1.3 percent. That is a very low probability, suggesting that “Projmgt” and “JSat” matter as necessary but not sufficient elements for success.

These are among the conditional probabilistic queries that users will be able to make in version 6.0 of WarpPLS, which should be released in a few months. Bayes’ theorem is used to produce the answers to the queries.

Ned Kock

About Me

I am a researcher, software developer, consultant, and college professor. Two of my main areas of research are nonlinear variance-based structural equation modeling, and evolutionary biology as it applies to the study of human-technology interaction. My degrees are in engineering (B.E.E.), computer science (M.S.), and business (Ph.D.). I am interested in the application of science, statistics, and technology to the understanding of human health and behavior. Here I blog about statistics, and more specifically about nonlinear variance-based structural equation modeling and WarpPLS, the first software to enable this type of analysis. My personal web site contains my contact information and freely available articles related to the topic of this blog: nedkock.com.

Ned Kock on the Web

Copyright

The contents of this blog may be used with proper attribution. Most of the issues covered here are also covered in the latest version of the WarpPLS User Manual. Therefore, you can cite the Manual to refer to issues covered here in this blog.