Abstracts

midasinla: midas goes Bayesian via R-INLA

Ben Adarkwa Dwamena

University of Michigan Medical School

Integrated nested Laplace approximation (INLA) has been
developed as a computationally fast, deterministic alternative to Markov
chain Monte Carlo (MCMC)-based Bayesian modeling. An R interface to the
C-based INLA (R-INLA) program is available with extensive and diverse
applications, including diagnostic test accuracy meta-analysis. In this
presentation, I discuss the INLA methodology briefly and, in more
detail, an illustrated application of the user-written ado-file
midasinla, a deterministic Bayesian version of midas
(a comprehensive and medically popular module for diagnostic test accuracy
meta-analysis). This Stata routine provides R-INLA estimation of the
bivariate random-effects model for diagnostic accuracy meta-analysis
with data pre- and post-processing within Stata. A dataset of studies
evaluating auxillary staging performance of positron emission tomography
in breast cancer patients is provided for illustration of the omnibus
capabilities of midasinla.

In this presentation, I introduce four new modules:
treatoprobit, switchoprobit, treatoprobitsim, and
switchoprobitsim. Each of these routines estimates a model in which a
binary endogenous variable affects an ordered outcome. treatoprobit
and switchoprobit estimate treatment and outcome under the assumption
that the error terms in the selection and outcome process are
distributed as bivariate normal. treatoprobitsim and
switchoprobitsim allow researchers to relax this assumption by
estimating models in which a latent factor with a potentially nonnormal
distribution accounts for the correlation between treatment and outcome.
treatoprobit and treatoprobitsim operate under the assumption of a
single outcome regime for treated and untreated groups; switchoprobit
and switchoprobitsim work under (and test) the assumption that outcome
processes for treated and untreated ought to be handled as distinct. The
presentation will introduce the modules, show Monte Carlo evidence
regarding their performance, and offer an example of their use. This
presentation is based on an article that is currently under review at
the Stata Journal.

Panel data make it possible both to control for unobserved
confounders and to include lagged, endogenous regressors. Trying to do
both at the same time, however, leads to serious estimation
difficulties. In the econometric literature, these problems have been
solved by using lagged instrumental variables together with the
generalized method of moments (GMM). In Stata, commands such as xtabond
and xtdpdsys have been used for these models. Here we show that the same
problems can be addressed via maximum likelihood estimation implemented
with Stata's structural equation modeling (sem) command. We show that
the ML (sem) method is substantially more efficient than the GMM method
when the normality assumption is met and suffers less from finite sample
biases. We introduce a command named xtdpdml with syntax similar to
other Stata commands for linear dynamic panel-data estimation. xtdpdml
simplifies the SEM model-specification process, makes it possible to
test and relax many of the constraints that are typically embodied in
dynamic panel models, and takes advantage of Stata's ability to use full
information maximum likelihood (FIML) for dealing with missing data.

15 years a consultant

Phil Ender

UCLA Statistical Consulting Group (Ret)

I present the origins and evolution of the UCLA Statistical
Consulting Group. The presentation will cover the history of the UCLA
Statistical Consulting Group as well as one approach to the practice of
statistical consulting in an academic environment. UCLA Statistical
Consulting provides services to faculty, graduate students, and campus
researchers. Additionally, the group maintains a website popular not
only with Stata users but also with users of other statistical packages.

Robust inference in regression-discontinuity designs

Matias Cattaneo

University of Michigan

Sebastian Calonico

University of Miami

Rocio Titiunik

University of Michigan

In this presentation, I will review main methodological
results from the regression-discontinuity (RD) design literature and
illustrate them using the Stata rdrobust package provided by the
authors. More information about the Stata package and background
methodological and theoretical papers may be obtained here:
https://sites.google.com/site/rdpackages/rdrobust.
If time permits, I will also discuss two ongoing research projects on
RD methods and their corresponding Stata implementations. The first
project focuses on RD inference under a local randomization assumption,
while the second project discusses a new manipulation test for RD
designs.

In this presentation, I introduce a new user-written Stata
command, xtregarp. This command considers the problem of estimation in a
panel-data model with both individual effects and AR(p) remainder
disturbances. It utilizes a simple exact transformation for the AR(p)
time-series process derived by Baltagi and Li (1994) and obtains the
generalized least-squares estimator for this panel model as a
least-squares regression. This command allows the individual effects to
be either random effects or fixed effects. The performance of this
estimator is illustrated using an empirical example.

Item response theory models in Stata

Rebecca Pope

Health Econometrician, StataCorp

Stata 14 provides several new commands for fitting item
response theory (IRT) models. IRT has a long history in test development
and psychometrics and is now being adopted more broadly in fields such
as health services research. In this presentation, I will provide an
overview of IRT, demonstrate fitting models with binary and categorical items,
and discuss postestimation tools such as plotting characteristic curves and
information functions.

Meta-analysis on the effects of interviewer supportiveness on the accuracy of children's reports

Christine Wells

Statistical Consulting Group, UCLA

Karen Saywitz, PhD

UCLA

Rakel Larson, MA

University of California, Riverside

Sue Hobbs, PhD

University of California, Davis

Increasingly, children are called upon to participate in
decisions that affect their welfare, from providing testimony in court
to providing input to public policies. However, many questions remain
regarding how to elicit accurate, reliable information from children. A
meta-analysis was conducted to investigate the effect of a supportive
interviewer on the accuracy of information provided by children (ages 4
to 12). The interviewers asked both neutral and misleading questions in
both supportive and nonsupportive conditions. Our results suggest that
interviewer supportiveness, when provided in a nonsuggestive manner,
bolsters the reliability of children's reports, and that supportiveness
lowers children's errors on misleading questions. Despite the importance
of this topic, only eight randomized control studies were identified to
be included in the meta-analysis. These studies hail from the psychology
literature and were published over 18 years. These two facts introduced
some interesting challenges in preparing the data for the meta-analysis.
The analyses included the meta-analysis, investigation into possible
nonindependence, a search for outliers, and cumulative meta-analyses.
The current guidelines for publishing a meta-analysis in the
psychological literature, specifically the MARS guidelines, will be
discussed as well as the user-written commands and their options used
to perform these analyses.

tetrad: A program for confirmatory tetrad analysis

Shawn Bauldry

University of Alabama at Birmingham

Kenneth Bollen

University of North Carolina at Chapel Hill

Confirmatory tetrad analysis (CTA) is a method of testing and
comparing the fit of structural equation models (SEMs) based on tetrads
(differences in the product of pairs of covariance of observed
variables). CTA has a few benefits over alternative methods of testing
SEM model fit, including (1) some underidentified SEMs are still
testable using their vanishing tetrads, (2) some SEMs are nested in
their vanishing tetrads and can be directly compared while they are not
nested using alternative estimators, and (3) researchers can perform
tests on parts of the model as well as the whole model. We have
developed a Stata command that conducts CTA based on the approach
outlined in Bollen (1990) and Bollen and Ting (1993). The approach
involves 4 steps: (1) identify vanishing tetrads (tetrads that equal 0)
for a given model, (2) compute the asymptotic covariance matrix for the
vanishing tetrads, (3) identify nonredundant vanishing tetrads, and (4)
compute the tetrad test statistic. The Stata command takes as input the
set of observed variables and an implied covariance matrix from a
hypothesized model (or two implied covariance matrices if a nested test)
that can be obtained following the sem command and then returns the
tetrad test statistic.

Postestimation parameter recentering and rescaling

Douglas Hemken

Social Science Computing Cooperative, University of Wisconsin–Madison

Recoding data prior to model estimation is a frequent part of
analysis. For linear models, this can be thought of as a change of basis
that is common to the data and the model. Where the change of basis in
the data is linear, the change in the model is also linear. We can
calculate the transformed parameters (and the transformed parameter
variance–covariance matrix) without actually recoding our data. The same
mathematics that is used to design factorial experiments or design
contrasts that include interactions can be extended to include
recentering and rescaling continuous variables in models with
interaction terms. This gives us a general solution to such problems as
calculating standardized coefficients, or converting models expressed in
American units of measure to international units, regardless of whether
the models include interaction terms or whether we have access to the
original data. This is implemented here as a Stata program, stdParm,
that produces centered or standardized parameters and precision
matrices, postestimation.

Statistical process control charts

Barbara Williams

Virginia Mason Medical Center

Statistical process control (SPC) charts are used to assess
outcomes measured over time, usually with the purpose of detecting
improvement or maintaining a high level of performance. Traditionally
used in industrial engineering for quality control, these methods are
now frequently employed in healthcare and are the standard method of
analysis for quality improvement work. In this presentation, I define
methods to improve on current Stata syntax to generate useful and
reader-friendly SPC charts. I build on existing Stata cchart (count),
pchart (proportion), rchart (range), and xchart (average)
commands to
produce SPC charts with a clear, easy-to-read visual display. This
presentation will explore default and edited pchart and xchart
examples
using health services research data, including the syntax for creating
these graphs. Graphic elements include customized axis labels, text,
colors, lines, notes, fonts, and titles. Under this approach, Stata can
replace current SPC chart generators, including macros for Excel and
stand-alone programs.

Data workflows with Stata and Python

Stephen Childs

Education Policy Research Initiative, University of Ottawa

Dejan Pavlic

Education Policy Research Initiative, University of Ottawa

Python is a general purpose programming language with a large
library of packages that extend into domains that Stata does not touch.
In this presentation, I will identify the key packages from Python that
will allow it to work with Stata, primarily the pandas framework. Pandas
is a relatively new, but extremely powerful, package for data
preparation and analysis that works well with Stata–including support
for categorical variables. I will discuss some new tools that have been
developed to make it easier to connect Stata to Python. I will also
discuss using Stata with the IPython Notebook, a tool that allows
researchers to combine code and text in an easy-to-access document.
During their work with the Education Policy Research Initiative, the
authors have successfully transitioned much complex data preparation
from Stata to Python while still supporting Stata's powerful analytical
tools. This presentation is ideal for those interested in incorporating
some Python into their workflow or planning a larger transition.

In this presentation, I demonstrate how to implement two
recent semiparametric estimators for binary response models in Stata.
These estimators do not require parametric assumptions on the
distribution of the error term, unlike the logit and probit models, and
they allow for general forms of heteroskedasticity. I begin with a short
introduction to binary response models and the various known identifying
assumptions, including the weak conditional median independence
assumption that the two estimators of interest are based on. Then, I
focus on two recently proposed semiparametric estimators: a sieve
nonlinear least-squares estimator and a local nonlinear least-squares
estimator. I demonstrate how both estimators can be easily implemented
in Stata via simple modifications to the standard probit objective
function, and I give several applied examples and Monte Carlo results.
Finally, I introduce the dfbr package by Blevins and Khan (2013,
Stata Journal, st0310) for distribution-free estimation of binary response
models. Although the estimators can be implemented by hand using
standard Stata commands, this package provides a standard Stata
interface for the user, automates constructing the modified probit
objective functions, and calculates bootstrap standard errors.

A comparison of modeling scales in flexible parametric models

Noori Akhtar-Danesh

McMaster University

Cox regression and parametric survival models are quite
common in the analysis of survival data. Recently, flexible parametric
models (FPM) have been introduced that are extensions of the parametric
models such as the Weibull (hazard-scale) model, the loglogistic
(odds-scale) model, and the lognormal (probit-scale) model. In this
presentation, I aim to statistically compare these modeling scales. I
used Stata code stpm2 to compare flexible parametric models based on
these three different scales. I used two subsets of the U.S. National
Cancer Institute's Surveillance, Epidemiology, and End Results (SEER)
dataset for this illustration: one on ovarian cancer diagnosed between
1991 and 2010 and one on colorectal cancer diagnosed in men between 2001
and 2010. The ovarian and colorectal datasets included data from 13,810
and 42,002 patients, respectively. Patients were classified into
different age groups. I present results using graphs to compare
survival curves, trends in one-year and five-year survival rates, and
mortality rates. In general, there were no substantial differences
between the three modeling scales, although the probit-scale showed
better fit based on the Akaike information criterion (AIC) for both
datasets.

Estimating Markov-switching regression models in Stata

Ashish Rajbhandari

Senior Econometrician, StataCorp

Many datasets are not well characterized by linear autoregressive
moving-average (ARMA) models. In this presentation, I will describe the
new mswitch command, which implements Markov-switching regression models,
which characterize many of these datasets well. Markov-switching
regression models allow the time series to switch between unobserved
states according to a Markov process. mswitch can estimate the
parameters of the Markov-switching dynamic regression (MSDR) model and
Markov-switching autoregressive (MSAR) model. This talk outlines the
models, discusses the relative advantages of MSDR and MSAR models, and
discusses examples of how to intepret mswitch output and its
postestimation features.

Between and beyond: Irregular series, interpolation, variograms, and smoothing

Nicholas Cox

Department of Geography, Durham University

Time series (and similar one-dimensional series) are more
often irregularly spaced than many methods texts or courses admit. Even
with a plan of regular measurements, gaps can arise for many human or
inhuman reasons, while some series are naturally irregular.
Interpolation of values between known values is a centuries-old need
but one neglected by official Stata, which offers only linear
interpolation and cubic spline interpolation (in Mata). I review
additional user-written commands for interpolation, including those for
cubic, nearest neighbor, and piecewise cubic Hermite methods available
from SSC. Beyond interpolation of irregular series lie the questions of
characterizing the structure of such series and smoothing in various
ways. One useful tool standard in spatial statistics is the variogram,
which relates dissimilarity as squared differences between values to
their separation in time or distance in space. Diggle and others have
shown uses for variograms in time-series and longitudinal data analysis.
I discuss user-written Stata commands for variogram calculation,
plotting and use in relation to exploratory data analysis on the one
hand and smoothing on the other.

Public program sensitivity: Using ROC curves to characterize classification efficiency of state Medicaid systems

Lisa Frazier

John Glenn College of Public Affairs, The Ohio State University

Despite being the largest single source of health care
coverage in the U.S., Medicaid fails to capture all eligible citizens.
This is a well-known problem among means-tested programs like Medicaid;
discussions of take-up and churning attend to this failure. Cases of
fraud in programmatic enrollments represent another classification
failure in these systems. Reports on rates of fraud, take-up, and churn
rarely acknowledge that such outcomes are ultimately features of the
same tradeoff function: the sorting of citizens into benefit groups on
the basis of membership to some a priori category. This research
elucidates the implicit tradeoffs being made in the Medicaid
citizen-sorting mechanism by using administrative data to construct ROC
curves for each state Medicaid system before and after the passage of
the Affordable Care Act.

Small-sample inference for linear mixed-effects models

Xiao Yang

Senior Statistician and Software Developer, StataCorp

Researchers are often interested in making inferences about
fixed effects in a linear mixed-effects model. For a large sample, the
null sampling distributions of the test statistics can be approximated
by a normal distribution for a one-hypothesis test and a chi-squared
distribution for a multiple-hypotheses test. For a small sample, these
large-sample approximations may not be appropriate, and t and F
distributions may provide better approximations. In this presentation,
I will describe five denominator-degrees-of-freedom (DDF) methods available
with mixed in Stata 14, including the Satterthwaite and Kenward–Roger
methods, and I will demonstrate examples of when and how to use these methods.

Development of a project-based statistics course for applied biostatistics using Stata

Frank Snyder

Purdue University

Project-based learning is an instructional approach that is
designed to build students' skills and offer real-world activities, such
as defining a research question and using nationally representative data
to find an answer (Dierker et al. 2012). The purpose of this
presentation is to describe an innovative, project-based statistics
course for applied biostatistics using Stata. The semester-long course
is designed as a graduate-level introductory biostatistics course;
however, it could easily be adapted for use in an undergraduate public
health program. The course combines two textbooks (Acock 2014; Bush
2012) and traditional lecture and assessment with computer lab
activities and a research project. The project-based course structure
offers students the opportunity to directly apply course content to
their unique research question, with the intent to increase students'
motivation and interest in statistics. Each student's culminating
experience is a 15-minute presentation or poster that explains his or her
research and results to classmates or an alternative audience. Course
evaluation data demonstrate that students rate the course as excellent,
and students strongly agree the course encourages learning. A course
syllabus, lab activities, Stata do-files, and a description of the
research project and final presentation will be available upon request.

Brewing color schemes in Stata: Making it easier for end users to customize Stata graphs

William Buchanan

Mississippi Department of Education

Although Stata graphs can be created to satisfy customized needs, it can
be time consuming to specify all the unique options required to create
clean customized graphs. Graph schemes provide a method to help
alleviate this difficulty, but customizations to graph schemes are typically
fixed for a single scheme. In this presentation, I will be discussing a
new Stata program, brewscheme, that allows end users to generate
customized graph schemes using color palettes available from
www.colorbrewer2.org. The program allows users to specify a single color
palette for all graph types, unique color palettes for individual graph
types, or a combination (for example, to specify color palettes and the number of
colors to select from the palette) for scatterplots and to set a default
color palette for the other graph types. Additionally, the schemes
generated by the program also set clean graph defaults (for example, all
white backgrounds and foregrounds, no grid lines, etc.), orient axis
labels horizontally, and remove boxes around legends. The program
brewmeta also allows users to quickly access metadata about
specific palettes (for example, colorblindness, LCD display, print, and
photocopier friendliness).

Colombian industrial structure behavior and its regions between 1974 and 2005

Luis Fernando Lopez Pineda

Chamber of Commerce of Cartagena

This presentation analyzes Colombian industrial structure
behavior and its regions between 1974 and 2005 to determinate if the
liberal reform at the end of the 20th century caused the industrial
stagnation and its lack of diversification. Evidence proves that the
"slowdown" of industrial growth and the stagnation of
productive transformation were caused by the greatest competition for
national industry since the application of an opening model. The process
was not similar in all regions covered in the study. The more industrial
regions, specifically, Antioquia, Atlantico, Valle, and Bogota, suffered
from deindustrialization. The less industrial regions, like Bolivar
and Cundinamarca, became industrial regions.