Addressing large matrices with missing information with MASEM in metaSEM

My apologies if I am double posting this topic as I am new to the site. Our team is currently working to conduct a MASEM analysis of 40 studies assessing a multiple mediator model with treatment condition as a dichotomous predictor, posttreatment symptoms as the outcome, multiple mediators at posttreatment, and pretreatment symptoms and mediators as covariates. Some studies only reported data for some mediators, and other did not report data on any mediator, but all reported data for outcomes. We know that too much missing data can lead to not positive definite matrix. Thus I am coding all possible mediators now, but intend to either drop or combine some mediator categories that have too little data. We would appreciate guidance on how to assess how much data is enough within and between correlation matrices in order to run our analyses, so we can decide what to drop or combine.

For example, should each cell of the pooled correlation matrix contain at least X% out of the 40 studies?

Should each study provide data in at least Y% of the cells in the correlation matrix?

We are grateful for any estimates for X and Y. This will help us to streamline our coding process.

In case this is helpful, we have attached syntax and data files with sample sizes and correlations for 3 studies (a test sample).

Replies to This Discussion

It is not easy to give a simple rule. You may consider a random-effects meta-analysis as a simple analysis in estimating the means and covariance matrix of the effect sizes.

There are 7 variables in your model. Thus, there is a total of 7*6/2=21 effect sizes (correlation coefficients). That is, there are 21 variables. Now, the random-effects meta-analysis needs to estimate 21 means with 21 variances assuming independence among the effect sizes. There is no way to estimate them with only 3 studies with lots of missing values in your example.

Thank you for your quick response and for directing us to the related thread. The thread provides useful suggestions on what to do when missing data lead to a nonpositive definite matrix. What we need now in general guidance on how to estimate how much is the minimum amount of data that is likely to be enough to estimate the means and covariance matrix of the effect sizes in a model with X number of variables. We can simplify our model by reducing the number of variables if we don't have enough data, or we can drop studies that have too many missing cells in the correlation matrix. But we are unsure how much is enough data, or too many missing cells. We would be grateful if you could suggest some guidelines or point us to helpful resources.

In our earlier post, we created a sample dataset of 3 studies to test out the syntax, but we have a total of 40 studies that contribute at least 1 correlation to the matrix. We are currently in the process of extracting correlations from the other studies, and it would be helpful to know at this point if we should reduce the number of variables in our model so we can adjust our data extraction procedures.

Thank you so much!

Best,

Mei Yi

Mike Cheung said:

Dear Mei Yi Ng & Katherine,

It is not easy to give a simple rule. You may consider a random-effects meta-analysis as a simple analysis in estimating the means and covariance matrix of the effect sizes.

There are 7 variables in your model. Thus, there is a total of 7*6/2=21 effect sizes (correlation coefficients). That is, there are 21 variables. Now, the random-effects meta-analysis needs to estimate 21 means with 21 variances assuming independence among the effect sizes. There is no way to estimate them with only 3 studies with lots of missing values in your example.

I am afraid that I don't know any simple guidelines. If fact, I don't think that it is easy to derive such guidelines. There are a couple of factors affecting it. For example, the no. of variables, the percentages of missing data, and the patterns of missing data, etc. Your strategy sounds reasonable.

Thanks for letting us know that there are no simple or clear guidelines. I guess we just have to make a decision now based on what is practical for extracting data from studies, and then later use a trial-and-error approach to reduce the number of variables in our model and/or drop studies that contribute too few correlations to the matrix. We will be glad to provide an update when we have a larger dataset to work with.

Best wishes,

Mei Yi

Mike Cheung said:

Dear Mei Yi,

I am afraid that I don't know any simple guidelines. If fact, I don't think that it is easy to derive such guidelines. There are a couple of factors affecting it. For example, the no. of variables, the percentages of missing data, and the patterns of missing data, etc. Your strategy sounds reasonable.