Title

Author

Date of Award

12-2002

Degree Name

Doctor of Philosophy

Department

Statistics

First Advisor

Dr. Joseph W. McKean

Second Advisor

Dr. Michael Stoline

Third Advisor

Dr. Joshua Naranjo

Fourth Advisor

Dr. Jerry Sievers

Abstract

This study discusses rank-based robust methods for estimation of parameters and hypotheses testing in the generalized linear models (GLM) and generalized estimating equations (GEE) setting. The robust estimates are obtained by minimizing a Wilcoxon drop in dispersion function for linear or nonlinear regression models. In addition, diagnostic tools for outliers and influential observations are being developed. These models are generalizations of linear and nonlinear models. They allow for both nonlinear mean functions and heteroscedasticity of their random errors. This makes them quite useful in practice. Rank-based inference has been developed for linear models over the last thirty years. This inference is both robust and highly efficient and it can be extended to estimates which have high breakdown. It has recently been extended to nonlinear models. In this work, we extend this inference to GLM and GEE models. The robust estimates of the mean function are obtained by minimizing a norm based on Wilcoxon scores in much the same way least squares type estimates are obtained by minimizing the Euclidian norm. For the heteroscedasticity problem where the errors are independent but have non-constant variances, we show that these robust estimates retain their consistency and asymptotic normality provided scale is consistently estimated. We further develop asymptotic theory for robust testing based on both Wald type tests and drop in dispersion tests. In addition, diagnostic tools for outliers and influential observations are developed. We discuss extensions to high-breakdown estimates. We discuss a robust estimate of the variance-covariance matrix for the auto-regressive structure, used for the GEE models. Examples and simulation studies illustrate the robustness of the procedure and Its superiority against the classical statistical techniques currently used. Data for the examples include a multiple sclerosis longitudinal trial and a cholesterol data from randomly selected individuals from the Framingham study.