These pages were first made available in January 2000, and based on Stata version 6. This June 2008 release is based on Stata version 10.

Contents

Aims

To provide an introduction to the analysis of spell duration data (‘survival analysis’); and

To show how the methods can be implemented using Stata, a program for statistics, graphics and data management.

The focus of the Lessons is on models for single-spell survival time data with no left censoring or left truncation (see the Lecture Notes for more details about these issues).

How to use these resources

These materials are a do-it-yourself learning resource. Work through the Lessons below in parallel with reading of the draft book manuscript (see below). There is material to read followed by exercises. Stata do files (names prefixed by ‘ex’) provide code to reproduce the material shown in the lessons and also to do the exercises. You are encouraged to run the do files yourself (do filename) – preferably after attempting the exercises by yourself!

You can download module materials from here. There are Lessons and related materials (pdf files), Exercises (Stata do files, i.e. ascii format), and Data Sets (Stata dta files). See below. University of Essex readers: you are recommended to create a new subdirectory called ‘ec968’ in your ‘home’ directory (drive m: on the University of Essex network) and then download all the files to m:\ec968. (Change ‘ec968’ to some other name of your choosing, if you prefer.)

Stata resources

Stata programs for survival analysis written by S.P. Jenkins

pgmhaz(8)

This is a program for discrete time proportional hazards regression, estimating the models proposed by Prentice and Gloeckler (Biometrics 1978) and Meyer (Econometrica 1990), and was circulated in the Stata Technical Bulletin STB-39 (insert ‘sbe17’). pgmhaz runs with Stata version 5 or later. Users with version 8.2 should use pgmhaz8.

Get the programs by typing net describe sbe17, from (http://www.stata.com/stb/stb39) or ssc install pgmhaz8 in an up-to-date Stata

The program estimates by ML two discrete time (grouped duration data) proportional hazards regression models, one of which incorporates a gamma mixture distribution to summarize unobserved individual heterogeneity (or ‘frailty’). Covariates may include regressor variables summarizing observed differences between persons (either fixed or time-varying), and variables summarizing the duration dependence of the hazard rate. With suitable definition of covariates, models with a fully non-parametric specification for duration dependence may be estimated; so too may parametric specifications. Your data must be suitably organised before using the model: see the help file after installation, the STB article, or Lesson 3. The program is used in Lesson 8.

Note: the likelihood ratio test of whether the gamma variance is equal to zero that pgmhaz reports does not take account of the fact that the null distribution is not the usual chi-squared(d.f. = 1) but is rather a 50:50 mixture of a chi-squared(d.f. = 0) variate (which is a point mass at zero) and chi-squared(d.f. = 1). See Gutierrez et al. (2001) for more details (Gutierrez, R.G., Carter, S., and Drukker, D., ‘On boundary-value likelihood-ratio tests’, insert sg160, Stata Technical Bulletin, STB-60, StataCorp, College Station TX.) In the meantime, note that the LR test statistic is correct, but the correct p-value for the test is half the reported p-value. The correct statistic is reported by pgmhaz8.

Discrete time hazard models with Normally distributed unobserved heterogeneity (rather than Gamma) can be now estimated in Stata. See also Lesson 7.

spsurv

This is a program for estimating ‘split population’ survival models, otherwise known in biostatistics as ‘cure’ models. Like pgmhaz, spsurv is for discrete time (grouped duration) data. It runs with Stata version 6 or later. The data need to be organised in the same way as for pgmhaz (see above) and one may also use time-varying covariates or non-parametric duration dependence in the same way.

You can download from here a copy of the presentation discussing the program that was given at the 7th UK Stata Users’ Group meeting (May 2001). (UKSUG7-spsurv.pdf)

In the standard survival model, all cases are assumed to fail within finite time. The split population model generalises this to suppose that an estimable fraction of the population never fails. Thus there is a form of mover-stayer heterogeneity within the population.

hshaz

This is a program for discrete time proportional hazards regression but, unlike pgmhaz8, hshaz assumes that the mixture distribution summarizing frailty is a discrete one, following Heckman and Singer (1984). The distribution is characterised by a number of ‘mass points’ and associated probabilities. (The location of the mass points, and probabilities, are estimable parameters; the number of mass points may be chosen by the user, with two being the default.)