Structural equation modeling using gllamm, confa, and gmm

Stas Kolenikov

University of Missouri–Columbia

In this talk, I introduce the main ideas of structural equation models
(SEMs) with latent variables and Stata tools that can be used for such
models. The two approaches most often used in applied work are numeric
integration of the latent variables and covariance structure modeling. The
first approach is implemented in Stata via gllamm, which was
developed by Sophia Rabe-Hesketh. The second approach is currently
implemented in confa for confirmatory factor analysis models. Also,
introduction of the generalized method of moments (GMM) estimation and
testing framework in Stata 11 make it possible to estimate SEMs by using
moderately complex parameter and matrix manipulations. I provide working
examples with some popular datasets (Holzinger–Swineford factor
analysis model and Bollen’s industrialization and political democracy
model).

Although cluster–robust standard errors are now recognized as
essential in a panel-data context, official Stata only supports clusters
that are nested within panels. This rules out the possibility of defining
clusters in the time dimension, and modeling contemporaneous dependence of
panel units’ error processes. We build upon recent analytical
developments that define two-way (and conceptually, n-way) clustering, and the
2010 implementation of two-way clustering in the widely used ivreg2
and xtivreg2 packages. We present examples of the utility of one-way
and two-way clustering using Monte Carlo techniques, a comparison with
alternative approaches to modeling error dependence, and consider tests for
clustering of errors.

Implementation of a multinomial logit model with fixed effects

Klaus Pforr

Mannheim Center for European Social Research (MZES)

Fixed-effect models have become increasingly popular in the field of
sociology. The possibility of controlling for unobserved heterogeneity makes
these models a prime tool for causal analysis.

As of today, fixed-effects models have been derived and implemented for many
statistical software packages for continuous, dichotomous, and count-data
dependent variables, but there are still many important and popular
statistical models, for which only population-average estimators are
available, such as models for multinomial categorical dependent variables. In
a seminal paper by Chamberlain (1980) such a model was derived.
Possible applications would be analyses of effects on employment status with
special consideration of part-time or irregular employment and analyses of
the effects on voting behavior that impicitly control for longtime party
identification rather than having to measure it directly. This model has not
yet been implemented in any statistical software package.

In this presentation, I show a first version of an ado-file, that closes this gap.
The implementation draws on the native Stata multinomial logit and
conditional logit model implementations. The actual ml evaluator
utilizes Mata functions to implement the conditional likelihood function.
To show the numerical stability and computational speed of the
implementation, comparison results with the built-in clogit
are shown, as well as some basic results with simulated data.

Plagiarism in student papers and cheating on exams: Results
from surveys using special techniques for sensitive questions

Ben Jann

University of Bern

Eliciting truthful answers to sensitive questions is an age-old problem in
survey research. Respondents tend to underreport socially undesired or
illegal behaviors, while overreporting socially desirable ones. To combat
such response bias, various techniques have been developed that are geared
toward providing the respondent greater anonymity and minimizing the
respondent’s feelings of jeopardy. Examples of such techniques are the
randomized response technique, the item count technique, and the crosswise
model. I will present results from several surveys, conducted among
university students, that employ such techniques to measure the prevalence of
plagiarism and cheating on exams. User-written Stata programs for analyzing
data from such techniques are also presented.

orderalpha: Nonparametric order-α efficiency analysis for Stata

Harald Tauchmann

RWI

Despite its frequent use in applied work, nonparametric approaches to
efficiency analysis, namely data envelopment analysis (DEA) and free
disposal hull (FDH), have bad reputations among econometricians. This is
mainly due to DEA and FDH representing deterministic approaches that are
highly sensitive to outliers and measurement errors. However, recently,
so-called partial frontier approaches—namely order-m (Cazals, Florens, and Simar, 2002,
Journal of Econometrics 106:1–25) and order-a (Aragon, Dauia, and Thomas-Agnan, 2005,
Economic Theory 21: 358–389)—have been developed; they
generalize FDH by allowing for super-efficient observations to be located
beyond the estimated production-possibility frontier. Although these methods
are purely nonparametric too, sensitivity to outliers is substantially
reduced by partial frontier approaches enveloping just a subsample of
observations. I present the new Stata command orderalpha that
implements order-a efficiency analysis in Stata. The command allows for
several options, such as statistical inference based on subsampling
bootstrap. In addition, I present the accompanying Stata command
oaoutlier, which is an explorative tool that employs
orderalpha for detecting potential outliers in data meant for
subsequent efficiency analysis using DEA.

Investigating the effects of factor variables

Jeff Pitblado

StataCorp

Stata has a rich set of operators for specifying factor variables in linear
and nonlinear regression models. I will show how to test for the effects
of factor variables in these models. I will also show how to compare and
contrast these effects using linear combinations of the model coefficients.

Correlation metric

Kristian B. Karlson

Danish National Center for Social Research and the Center for Research in Compulsory Schooling

The logit model is a widely used regression technique in social research.
However, the use and interpretation of coefficients from logit models have
proven contentious. Problems arise because the mean and the variance
of discrete variables cannot be separated. Logit coefficients are identified
relative to an arbitrary scale, which makes the coefficients difficult both
to interpret and to compare across groups or samples.

Do differences in
coefficients reflect true differences or differences in scales? This
cross-sample comparison problem raises concerns for comparative research.
However, we suggest a new correlation metric, derived from logit models,
which gives new interpretation to the estimates of logit models (log
odds-ratios). The metric leads the way to a reorientation of the use of
logit models, because it helps to clarify what logit coefficients are and
how and when logit coefficients can (or cannot) be used in comparative
research. The metric recovers the correlation between a predictor variable x
and a continuous latent outcome variable y* assumed to underlie a binary
observed outcome y. This metric is truly invariant to differences in the
marginal distributions of x and y* across groups or samples, making it
suitable for situations met in real applications in comparative research.
Our derivations also extend to the probit and to ordered and multinomial
models. The new metric is implemented in the Stata command nlcorr.

Comparing coefficients between nested nonlinear probability models

Ulrich Kohler

WZB

In a series of recent articles, Karlson, Holm, and Breen have developed a
method for comparing estimated coefficients of nested nonlinear probability
models. The KHB method is a general decomposition method that is unaffected
by the rescaling or attenuation bias that arises in cross-model comparisons
in nonlinear models. It recovers the degree to which a control variable Z
mediates or explains the relationship between X and a latent outcome
variable Y* underlying the nonlinear probability model. It also
decomposes effects of both discrete and continuous variables, applies to
average partial effects, and provides analytically derived statistical
tests. The method can be extended to other models in the generalized linear
model family. This
presentation describes this method and the user-written program khb
that implements the method.

Currently, we observe in the social and behavioral sciences an increasing
demand on complex longitudinal household survey data for national and
cross-national analyses. The state of the art (for national as well as
international comparative data collections) provides two types of solutions:
either the full presentation of all original wave-specific variables over
time or the creation of fixed variables according to common time-consistent
standards. The first type of solution leaves it to the researcher to choose
how to encapsulate differing categories over time, and thus, it is rather
time-consuming. The second type of solution is very easy to use; however,
it does not provide the user with information on possibly necessary annual
extensions or modifications for specific years. In both cases, the
researcher has no further information on potential changes of variables over
time. This paper addresses the topic of how complex representative
longitudinal data can be disseminated for analyses in the social and
behavioral sciences such that the amount of time for data preparation is
reduced to a minimum while information on consistency and changes of
variables over time remains fully available. It turns out that if we want
to monitor changes in living conditions by permanent, regular observations
using panel surveys, adaptations in variables seem to be the rule rather
than the exception. Therefore, our solution for the restructuring of
longitudinal data fulfils the requirements of permanently ongoing
adaptations in variables as a reflection of adapted measures according to
new social conditions, new theoretical backgrounds, or improved conceptual
measures when monitoring changes in living conditions directly over time.

Using Stata, we provide a conceptual and technical solution for how to
restructure the full set of SOEP variables with a complete documentation of
all adaptations over time. Our Stata programs generate two output files: one
covering the restructured data and another one for the full documentation on
the consistency of the variables over all waves. SOEPlong has been released
in 2010 for the first time as a beta version, together with the usual data
dissemination on DVD for the full set of SOEP variables for 26 waves of
data. While the paper is specifically addressed to the German
Socio-Economic Panel (SOEP) study, our general approach on how to deal with
complex household panel data might well be applied to other national and
cross-national longitudinal household surveys.