analysis, visualisation and playing around with data

lavaan

Visualisation of structural equation models is done with path diagrams. They are an important means to give your audience an easier access to the equation system, that represents the theory you want to test. A path diagram is kind of like a flow-chart that uses arrows to show direct and indirect causal links between your exogenous and endogenous variables, as well as your latent and your observed variables. As structural equation models can become complex and contain a lot of parameters to describe the relationships between observed and latent variables, it´s an important step to visualize them properly. The automatically produced path-diagrams are often good enough as you work out your model, but they´re not polished enough for publication. In this post, i´ll show a selection of tools and their output.

There are many software solutions to do structural equation modeling. LISREL, AMOS, MPLUS, STATA, SAS, EQS and the R-packages sem, OpenMX, lavaan, Onyx – just to name the most popular ones. Most of these solutions have a built-in possibility to visualize their models. AMOS is a special case, because the modeling is done via drawing path diagrams. Onyx can do this, too. This can make it easy, especially for beginners. Sometimes you can find these AMOS path diagrams beeing published in articles.

In my experience the other SEM-tools (LISREL,MPLUS,STATA) don´t produce very appealing diagrams. Especially if your model is a little bigger. When it comes to the R-packages, there are significantly better attempts to generate visualisations of structural equation models. As a third solution, you can just use usual graphics software and type parameter-estimates by hand. It seems to me, that – at this point – this will generate the highest quality path diagrams.

Path diagrams consist of rectangles for observed variables, ellipses for latent variables, curves with arrow-heads on both sides for correlations and most important: straight lines with arrow-heads on one end as paths, that link a predicting and a predicted variable. Here is an example of what it could look like:

In the rest of this blog entry, i will show you examples of path diagrams:

I don´t have much experience with the semPlot-package, but i think it´s offers a fast and good solution for CFA-pathdiagrams or small SEM-pathdiagram. Bigger pathdiagrams will need more work. Here´s a little example for a two-factor CFA:

For the sem-package by John Fox , there is a function named „pathDiagram()“, which produces graphviz/dot-code that can be imported in graphviz. The dot-code is a description, that defines the latent and manifest variables as nodes and the interconnections as edges of a diagram.
The semPlot-package also supports the sem-package.

OpenMX: For OpenMX, a free SEM-software that can be run via R. Exporting the model to dot-code and plotting it with graphviz is the recommended workflow.

Onyx: Onyx by Andreas Brandmaier is a free standalone SEM-tool. It offers an Amos-like graphical interface to specify the model and is capable of importing OpenMX-Code, but not lavaan-code.

DiagrammeR: Twitter-user @timelyportfolio (thank you!) recommended me the R-Package DiagrammeR by Richard Iannone. It doesn´t import fit-models from SEM-packages, but has it´s strengths in an easy syntax and fastly growing feature-list. I think, it´s very worthy to give it a try, because the path diagrams are not as hard to do as with graphviz but also reproducible.

UPDATE: Richard Iannone produced this example for me on stackoverflow

R

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

devtools::install_github("rich-iannone/DiagrammeR")

library(DiagrammeR)

grViz("

digraph SEM {

graph [layout = neato,

overlap = true,

outputorder = edgesfirst]

node [shape = rectangle]

a [pos = '-4,1!', label = 'e1', shape = circle]

b [pos = '-3,1!', label = 'ind_1']

c [pos = '-3,0!', label = 'ind_2']

d [pos = '-3,-1!', label = 'ind_3']

e [pos = '-1,0!', label = 'latent a', shape = ellipse]

f [pos = '1,0!', label = 'latent b', shape = ellipse]

g [pos = '1,1!', label = 'e6', shape = circle]

h [pos = '3,1!', label = 'ind_4']

i [pos = '3,-1!', label = 'ind_5']

j [pos = '4,1!', label = 'e4', shape = circle]

k [pos = '4,-1!', label = 'e5', shape = circle]

a->b

e->b [label = '0.6']

e->c [label = '0.6']

e->d [label = '0.6']

e->f [label = '0.321', headport = 'w']

g->f [tailport = 's', headport = 'n']

d->c [dir = both]

f->h [label = '0.6', tailport = 'ne', headport = 'w']

f->i [label = '0.6']

j->h

k->i

}

"

This produces this path-diagram:

update on DiagrammeR for SEM
Recently Tristan Mahr blogged his proof-of-concept that it´s possible to convert a lavaan-dataframe into node and edge dataframes for DiagrammeR. Wow, i´m really curious if this approach will be pursued any further. Here is the link: https://rpubs.com/tjmahr/sem_diagrammer

another update on pathdiagrams in R
Stas Kolenikov from the University of Missouri did another example for SEM-pathdiagrams in R on his website http://staskolenikov.net/graphviz_sem.html. Instead of DiagrammeR he uses Graphviz. A problem he encountered concerns displaying covariances by curved two-sided arrows. It´s possible to do this, but as he writes „their aesthetic appeal is probably not that great“.

3. other / graphics software (selection)
If you want to use Graphviz or Tikz, you´ll get to very good looking diagrams, but you´ll also have to learn the „dot language“. If you have to do a lot of diagrams it can be worth learning it, but for my purposes, it´s kind of overkill.
Here are some Graphviz-Examples: pathdiagram with Graphviz

This leads us to „normal“ multi-purpose graphics software. Doing the graphs with an office-suite is pretty straightforward and selfexplaining. On the other hand, i wouldn´t trust office that everything stays in its place, when i move it around in a document.
Inkscape is a tool, that´s often mentioned by SEM-analysts. At the moment, i´m giving yed a try, which seems to be easy and produce quick and good looking graphs. Dia could also be an alternative, but i haven´t tried it, yet.

request for tipps
I´m really looking out for best practices in drawing path diagrams for structural equation models. Please leave a comment, if you know another tool, that isn´t listed, or if you have a workflow, that can be adapted by others. I think there´s a gap between working-state path-diagrams and diagrams suitable for publication.

The R-Package lavaan is my favourite tool for fitting structural equation models (SEM). Its biggest advantages: It´s free, it´s open source and its range of functions is growing steadily. Before lavaan, i used MPLUS, which still has the widest functionality of all SEM-Tools and is the most sophisticated software for latent variable modeling. The Muthéns and their MPLUS-team offer incredibly good support and documentation. The only problem is, that the software isn´t free and without a license you can´t get any of the support.
For me, one drawback of lavaan is, that it can´t model latent class models or mixture models …yet! Yves Rosseel is planning to add this in the next two years.

lavaan stands for „latent variable analysis“. The package is available via CRAN and has a good tutorial on the lavaan project homepage. Models are specified via syntax. Thankfully, the lavaan-syntax is kept pretty simple. At least, it´s a lot easier than the LISREL-syntax (the first, and original SEM-software). But it´s not as easy as drawing a path-model in AMOS, the SPSS-module. Anyway, once you get to a little more complex models, you´ll find working with syntax a lot more efficient. If you don´t like working with syntax, i recommend having a look at Onyx – a graphical interface for structural equation modeling by Andreas Brandmaier. It´s a free tool in which you can draw your SEM as a path diagram and generate the lavaan-syntax from it. But, when you do SEM-models the syntax will be the least complicated thing you had to learn, so i don´t think that will be a problem at all.

Install lavaan
If you want to use survey weights, you have to install lavaan, the survey package and lavaan.survey. Lavaan is the package used for modeling and the survey-package converts your data into an survey-design-object. After you specified the model in a lavaan fit object and you have generated a survey-design-object from your data, these two objects are passed to the lavaan.survey function, which will calculate the weighted model.

First, you install the packages:

R

1

2

3

4

5

6

7

8

9

10

11

#Install lavaan

install.packages("lavaan",dependencies=TRUE)

library(lavaan)

#install lavaan.survey

install.packages("lavaan.survey")

library(lavaan.survey)

#Install survey-package

install.packages("survey")

library(survey)

Generate the survey-design object
After the packages and the data are loaded, a svydesign-object is generated from our data. It´s not a suprise, that with „id=~ID“ the column „ID“ in the dataframe will be used as id-variable. With „weights= ~weights_trunc“ the column which holds the survey-weights is defined and with „data=data“ the dataframe is chosen.

R

1

2

3

4

5

6

7

8

9

10

library("survey")#load survey package

data<-read.csv(file="data.csv",header=T,sep=",")#read data

#if necessary - recode missing value "9" to NA

df[df==9]<-NA

#generate survey-design object

svy.df<-svydesign(id=~ID,

weights=~weight_trunc,

data=data)

Specifying the model
I´ll use a simple structural equation model with two latent variables, measured by three and two indicator-variables. The exogenous latent variable „latent_a“ is measured by x1-x3, the endogenous latent variable „latent_b“ is measured by y1-y2. The variable „latent_b“ is regressed on (predicted by) „latent_a“.

R

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

library(lavaan)

model_1<-'# measurement model

latent_a =~ F09_a + F09_b + F09_c

latent_b =~ F12_a + F12_b

# regressions

latent_b ~ latent_a

'

lavaan.fit<-sem(model_1,

data=data,

estimator="MLR",# robust fit / when you have missing data

missing="ml",#fiml for missing data

mimic="Mplus")

#you can run the model (unweighted) at this point and inspect it

summary(lavaan.fit,fit.measures=TRUE,standardized=TRUE)

Normally, i would use MLM as estimator to get robust estimates (robust against non-normality of the endogenous variable), but in this case i chose MLR, because FIML is not available with MLM.
FIML (Full Information Maximum Likelihood algorithm- defined with missing=“ml“) is regarded as equally efficiant to multiple imputation in handling item-nonresponse. But, it can be a good idea to do multiple imputation anyway, because bootstrapping the standard errors is only available with ML-estimator. On the other Hand, it´s an advantage that with FIML it´s not necessary to explicitly model missingess, because FIML uses the already specified SEM.
When using the lavaan.survey-package, you can´t use fiml (yet). You have to do a multiple imputation for your data, if you have missings, and instead of MLR lavan.survey uses MLM as default.

Fitting the model
When the model is fitted with lavaan.survey, the covariance-matrix will be estimated using the svyvar-object generated by the survey-package . The lavaan model uses this weighted covariance-matrix with the MLM-estimator to fit the model. MLM is not compatible with missing=“fiml“, so if your data has missings you have to do multiple imputation first and pass your imputed dataframes as a list to the svydesign-package so it becomes a svy.design-object which can be used as data in lavaan.survey. The resulting parameters, fit indices and statistics will be adjusted for the sampling design. Also, if MLM is used, the chi-square (likelihood-ratio) test-statistic will be transformed to a Satorra-Bentler corrected chi-square. [This information stems from the lavaan.survey documentation]. In lavaan, you can choose the form of your output. Because i worked a lot with MPLUS, i prefer the MPLUS-Output.