I am working with DHS (Demographic and Health Survey Data) data. DHS uses a two-stage cluster sampling process. In first stage, clusters (Primary Sampling Units) are randomly selected with probability proportional to their size. In second stage, 20-30 households are randomly selected from each cluster.
Most DHS have more than 300 clusters. However, large countries like India (Present in the analysis) have far more clusters (25000).
I have pooled 3-4 waves each from 25 countries making the total samples as 95. No PSU information is missing.

About the Model

My dependent variable is neghaz (negative of height for age (cm/months)) which is continuous in nature. My regression specification includes several control variables including square terms and interaction terms. The specification also includes variables that have been calculated at PSU/cluster level (Mean Employment Rate in the Cluster, etc) and also variables at country level (GDP, Average Life Expectancy etc.). I have already de-normalized the weights.

The model failed to converge. After that I tried a null model. The null model also failed to converge. I am not able to understand why null model fails to converge when there are 95 surveys and every survey has 300 clusters at least.

I also tried the null model after converting neghaz in to a dichotomous variable (xtmelogit) stunted which takes the value 1 if the child is stunted (chronic malnutrition). The convergence failed again

Afterwards, I tried running 2 level models with PSUs and Surveys independently. The models worked with the full control set. However, the standard errors were different in the two models.

ICC for model with PSU – 0.98; ICC for model with survey – 0.02

Questions

Can somebody help me to understand why is convergence failing and how to fix it?

Can I safely neglect the survey random effects in this case?

Is there any other way of combining the survey effects (Random /Fixed) along with the PSU random effects?

I also tried models with only survey fixed effects (i.survey with normal ols) However, the standard errors were different. What model shall I finally use in such a case?

$\begingroup$It's been a while since I used Stata but did you try the difficult option ?$\endgroup$
– Robert LongFeb 10 at 20:14

$\begingroup$Yes I did. It didn't work out.$\endgroup$
– Mayank AgrawalFeb 11 at 9:07

$\begingroup$Then I would set this up as a Bayesian Hierarchical model using Bugs, Jags, Stan etc. Stata has interfaces to these, and it's a good way to investigate convergence problems.$\endgroup$
– Robert LongFeb 11 at 9:59

$\begingroup$Thanks a lot for the suggestion. But I have never used Bayesian statistics before. Can you please mention a source from where I can learn about these programs you mentioned (Bugs, Jags and Stan)? I found that Stata also has Bayes module for mixed models (bayes: mixed....). Can I directly use this for the purpose of this analysis? Will it make sense? Sorry if the questions sounds noobish.$\endgroup$
– Mayank AgrawalFeb 11 at 19:53

$\begingroup$Yes, you can use that directly. The Stata documentation is very thorough - another resource is Bayesian Data Analysis by Gelman et al$\endgroup$
– Robert LongFeb 11 at 20:07