Tricks with Hicks: Stata gmm code for nonlinear GMM

Carl Nelson

University of Illinois–Urbana–Champaign

In a June, 2009 American Economic Review article entitled
“Tricks with Hicks: The EASI demand system”, Arthur Lewbel and
Krishna Pendakur proposed the exact affine Stone index demand system. This
system allows Engel curve behavior higher than rank 3, demographics, and
unobserved heterogeneity in tastes. The American Economic Review web supplement for the article
provides Stata code to estimate linear and iterative linear versions of
the model. But the full nonlinear system instrumental variable estimates
were obtained with TSP econometric software using command frml to obtain
nonlinear three-stage least-squares estimates. I present Stata code to estimate the nonlinear
exact affine Stone index demand system using the Stata gmm command. This is an example of the
important estimation extensions that have been made possible by the
introduction of the gmm command.

xtmixed and denominator degrees of freedom: Myth or magic

Phil Ender

UCLA Statistical Consulting Group

I review issues and controversy surrounding F-ratio denominator degrees
of freedom in linear mixed models. I will look at the
history of denominator degrees of freedom and survey their use in
various statistical packages.

Using the margins command to estimate and interpret adjusted predictions
and marginal effects

Richard Williams

University of Notre Dame

As Long and Freese show, it can often be helpful to compute
predicted and expected values for hypothetical or prototypical cases. Stata 11
introduced new tools—factor variables and the margins
command—for making such calculations. These can do many of the things
that were previously done by Stata’s own adjust and mfx
commands, as well as Long and Freese’s spost9 commands like
prvalue. Unfortunately, the complexity of the margins syntax, the
daunting 50-page reference manual entry that describes it, and a lack of
understanding about what margins offers over older commands may have
dissuaded researchers from using it. This paper therefore shows how
margins can easily replicate analyses done by older commands. It
demonstrates how margins provides a superior means for dealing with
interdependent variables (for example, X and X2; X1,
X2, and X1 × X2; multiple dummies created from a
single categorical variable), and is also superior for data that are
svyset. The paper explains how the new asobserved option works
and the substantive reasons for preferring it over the atmeans
approach used by older commands. The paper primarily focuses on the
computation of adjusted predictions, but also shows how margins has
the same advantages for computing marginal effects.

Using margins to test for group differences in growth
trajectories in generalized linear mixed models

Sarah Mustillo (with L.R. Landerman and K.C. Land)

Purdue University, Duke University School of Medicine, and Duke University

To test for group differences in growth trajectories in mixed (fixed and
random-effects) models, researchers frequently interpret the coefficient of
group-by-time product terms. While this practice is straightforward in
linear mixed models, testing for group differences in generalized linear
mixed models is more complex. Using both an empirical example and simulated
data, we show that the coefficient of group-by-time product terms in mixed
logistic and Poisson models estimate the multiplicative change with respect
to the baseline rates, while researchers often are more interested in
differences in the predicted rate of change between groups. The latter can
be obtained by using the margins command in Stata. This may be
especially desirable when the mean of the outcome variable is low and
marginal change differs from multiplicative change. We propose and
illustrate the use of margins to interpret group differences in rates
of change over time following estimation with generalized linear models.

Graphics tips for all

Nicholas J. Cox

Durham University, United Kingdom

Stata’s graphics were completely rewritten for Stata 8, with further
key additions in later versions. Its official commands have, as usual, been
supplemented by a variety of user-written programs. The resulting variety
presents even experienced users with a system that undeniably is large,
often appears complicated, and sometimes seems confusing. In this talk, I
provide a personal digest of graphics strategy and tactics for Stata users;
I emphasize details large and small that, in my view, deserve to be known by
all.

Stata as a data-entry management tool

Ryan Knight

Innovations for Poverty Action

It is increasingly common for social scientists to be involved in primary
data collection, whether through the administration of unique survey
instruments or the execution of field experiments. Novel datasets present
novel challenges for researchers, who may find themselves responsible for
ensuring that any information collected is entered into the computer
accurately. This presentation discusses why and how one might use Stata as a
tool for data-entry management and introduces three new user-written
commands that streamline the data-entry process. The commands are:
cfout, which is an extension of the cf command that outputs a user-friendly
list of all discrepancies between two datasets (for example, the first and second
entry of a double-entered dataset); readreplace, which makes many
replacements to a dataset, based on a corrected list of the discrepancies
generated by cfout; and mergeall, which merges many files without
loss of information due to string and numeric differences. This suite of
commands can help reduce the cost and increase of the accuracy of primary
data collection, and it extends Stata’s data-management capabilities to
include the management of data entry.

Universal and mass customization of tables in Stata

Roy Wada

University of Illinois–Chicago

There is a strong demand for a systematic and uniform approach to
table-making, yet it is currently believed that this is not plausible or
is nonexistent in Stata. There is also an impression that tabulation tables
are inherently different from summary tables or regression tables. This
presentation shows that it is possible to design a programmatic, universal
solution once the similarities between the apparently different types of
tables are understood. The universal approach to table-making is implemented
in the latest version of outreg2. Thus a mass customization of
various types of tables, including cross-tabulations and stub-and-banner
types of tables, can be readily produced in Stata.

In this talk, I will discuss ways of using Stata to fit fractional
response models when explanatory variables are not exogenous. Two questions
are of primary concern: First, how does one account for endogenous
explanatory variables, both continuous and discrete, when the response
variable is fractional and may take values at the corners? Second, how can
we incorporate unobserved heterogeneity in panel-data fractional models when
the panel might be unbalanced? I will draw on Papke and Wooldridge (2008,
Journal of Econometrics 145: 121–133) and two unpublished
papers of mine, “Quasi-maximum likelihood estimation and testing for
nonlinear models with endogenous explanatory variables” and
“Correlated random effects models with unbalanced panels”. One
practically important conclusion is that by expanding the scope of existing
Stata commands to allow fractional responses—in particular, the
ivprobit, biprobit, hetprob, and (user-written)
gllamm commands—flexible fractional response models can easily
be fit.

Causal inference for binary regression with observational data

Austin Nichols

Urban Institute

Special problems arise when trying to do causal inference for binary
regression with observational data; we will examine some of these problems
and critically examine several common and not-so-common solutions.

Estimating the parameters of simultaneous-equations models with the sem command in Stata 12

David M. Drukker

StataCorp

In this talk, I introduce Stata 12’s new sem command for
estimating the parameters of
simultaneous-equations models. Some of the considered models
include unobserved factors. Estimation methods include maximum likelihood
and the generalized method of moments.

Calculating bronchiolitis severity using ordinal regression with a new function in Stata

Carl Mitchell (with Paul Walsh)

Kern County Medical Center Department of Emergency Medicine/UCLA

A new command has been developed implementing a previously validated tool
for describing bronchiolitis severity. Bronchiolitis is one of the most
common causes of hospital admission for infants and it is widely studied.
This command classifies predicted severity of illness using an ordinal
regression model. Optionally, the user can obtain the predicted probability of
hospital admission and the probability of an infant falling into a
severity of illness classification different than that predicted.

Teaching statistics with Stata in emergency medicine (EM) journal club

Muhammad Waseem

Lincoln Medical and Mental Health Center

Residency training is an important period when a physician learns and
acquires the necessary skills of searching for, evaluating, and applying medical
knowledge. The journal club is an academic event and an important forum for
this purpose. The objective of the journal club is to learn and develop a
skill to find, appraise, and implement practice-changing advancements in the
medical literature. We report our experience with Stata in journal club in
teaching emergency medicine residents statistics in addition to critical appraisal
skills. To understand and utilize the current literature effectively, an
understanding of basic statistical methods is essential. We introduced Stata
while discussing the methods and results section of an article in the
journal club to teach application of some common statistical tests.
Published studies were selected to illustrate and provide the insight of
commonly used statistical concepts. We noted that improved understanding of
statistics resulted in increased interest and enthusiasm of residents to
participate in journal club. Integrating a statistical software program such
as Stata into journal club can serve as an important tool to enhance learning.
Further studies should be conducted to fully utilize these
opportunities for enhanced learning of in-training physicians.

Use of cure fraction models for the survival analysis of uterine cancer patients

Noori Akhtar-Danesh (with Alice Lytwyn and Laurie Elit)

McMaster University

In population-based cancer studies, a cure fraction model
classifies patients into those who survive the cancer and those who
encounter excess mortality risk compared with the general population
(2007, Stata Journal 7: 1–25). In
this presentation, we report the proportion cured and the relative survival
pattern for patients diagnosed with uterine cancer in Canada over the period
of 1992–2005. We used a nonmixture cure fraction model to estimate
the cure fraction rate and the relative survival among “uncured”
patients (2007, Stata Journal 7: 1–25). Then we predicted the cure fraction rate and median survival
for each age group based on the year of diagnosis. Relative
survival and cure fraction rate decreased with age but increased gradually
over time. Relative survivals for Eastern Canada and Ontario were lower
compared with the other regions. The same applies to the comparison of
cure fraction rates between the geographical regions. This is
the first study using a cure fraction model for analysis of uterine cancer.
Although there are some limitations attached to this model, it is flexible
enough to be used with different parametric distributions and to include
different link functions for relative survival analysis.

Texas A&M Health Science Center School of Rural
Public Health and University of Texas School of Public Health

Modern genetic genome-wide association studies typically rely on
single nucleotide polymorphism (SNP) chip technology to determine hundreds
of thousands of genotypes for an individual sample. Once these genotypes are
ascertained, each SNP (alone or in combination) is tested for association
outcomes of interest such as disease status or severity. Project Heartbeat!
was a longitudinal study conducted in the 1990s that explored changes in
lipids and hormones and morphological changes in children from age 8–18
years. A genome-wide association study is currently being conducted to look
for SNPs that are associated with these developmental changes. While there
are specialty programs available for the analysis of hundreds of thousands
of SNPs, they are not capable of modeling longitudinal data. Stata is
well-equipped for modeling longitudinal data but cannot load hundreds of
thousands of variables into memory simultaneously. This talk will briefly
describe the use of Mata to import hundreds of thousands of SNPs from the
Illumina SNP chip platform and how to load those data into Stata for
longitudinal modeling.

Graphics tricks for models

Bill Rising

StataCorp

Visualizing interactions and response surfaces can be difficult. In this
talk, I will show how to do the former by graphing adjusted means and the
latter by showing how to roll together contour plots. I will demonstrate
this for both linear and nonlinear models.

Malmquist productivity analysis using DEA frontier in Stata

Choonjoo Lee

Korea National Defense University

In this presentation, the author presents a procedure and an illustrative
application of a user-written Malmquist productivity analysis (MPA) using
data envelopment analysis (DEA) frontier in Stata. MPA measures the
productivity changes for units between time periods. MPA has been used
widely for assessing the productivity changes of public and private sectors,
such as banks, airlines, hospitals, universities, defense firms, and
manufacturers, when the panel data are available. The MPA using DEA frontier
in Stata will allow Stata users to conduct not only the stochastic approach
for productivity analysis using stochastic-frontier analysis but also the nonstochastic
approach using DEA frontier, also suggested by the author. The user-written
MPA approach in Stata will provide some possible future extensions of Stata
programming in productivity analysis.

An interpretation and implementation of the Theil–Goldberger
“mixed” estimator

Christopher Baum

Boston College and DIW Berlin

In the early 1960s, Theil and Goldberger proposed a
generalized least-squares approach to “mixing” sample
information and prior beliefs about the coefficients of a regression
equation. Their “mixed” estimator may be considered as a
stochastic version of constrained least squares (Stata’s
cnsreg). Although based on frequentist statistics, the Theil–Goldberger estimator
is identical to that used in a Bayesian estimation approach when an
informative prior density is employed. It may also be
viewed as a one-shot application of the Kalman filter,
providing an updating equation for point and interval coefficients based on
prior and sample information. I discuss the
motivation for the estimator and my implementation in Stata code,
tgmixed, and give illustrations of how it might be usefully employed.

Multilevel regression and poststratification in Stata

Maurizio Pisati (with Valeria Glorioso)

University of Milano–Bicocca and Harvard School of Public Health

Sometimes, social scientists are interested in determining whether, and to
what extent, the distribution of a given variable of interest Y
varies across the categories of a second variable D. When the number of
valid observations within one or more categories of D is small or the
collected data are affected by selection bias, relatively accurate estimates
of E(Y|D) can be obtained by using a proper combination
of multilevel regression modeling and poststratification, called the multilevel regression modeling and poststratification
approach (Gelman and Little 1997, Survey Methodology 23: 127–135; Gelman and Bafumi 2004, Political Analysis 12: 375–385; and Lax and Phillips 2009, American Journal of Political Science 53: 107–121). The purpose of this talk is to illustrate the main features
and applications of mrp, a new user-written program that implements
the multilevel regression modeling and poststratification approach in Stata.

Mata, the missing manual

William W. Gould

StataCorp

Mata is Stata’s matrix programming language. StataCorp provides
detailed documentation on it, but so far has failed to give users—and
especially users who add new features to Stata—any guidance in when
and how to use the language. In this talk, I provide what has been missing.
In practical ways, I show how to include Mata code in Stata ado-files,
reveal when to include Mata code and when not to, and provide an
introduction to the broad concepts of Mata—the concepts that will make the
Mata Reference Manual approachable.

Stata Graph Library for network analysis

Hirotaka Miura

Federal Reserve Bank of San Francisco

Network analysis is a multidisciplinary research method that is fast
becoming a popular and exciting field of study. Though a number of
statistical programs possess sophisticated packages for analyzing networks,
similar capabilities have yet to be made available in Stata. In an effort to
motivate the use of Stata for network analysis, I designed in Mata the Stata
Graph Library (SGL), which consists of algorithms that construct matrix
representations of networks, compute centrality measures, and calculate
clustering coefficients. Performance tests conducted between C++ and SGL
implementations indicate gross inefficiencies in current SGL routines, making
SGL practically infeasible to be used for large networks. The obstacles are,
however, welcome challenges in the effort to spread the use of Stata as an
instrument for analyzing networks, and future developments will focus on
addressing computational time complexities as well as integrating additional
capabilities into SGL.

Filtering and decomposing time series in Stata 12

David M. Drukker

StataCorp

In this talk, I introduce new methods in Stata 12 for filtering and
decomposing time series and I show how to implement them. I
provide an underlying framework for understanding and comparing the
different methods. I also present a framework for interpreting the
parameters.