In longitudinal data analysis, masking and swamping (MS) are two common effects that can cause severe problems. Successful identification of MS effects is essential to both outlier detection and longitudinal data analysis because ignorance of the MS effects can make the conclusion of analysis totally meaningless and misleading.
In this thesis, a statistical method for analyzing and diagnosing longitudinal data sets is proposed as the forward search of the generalized estimating equation (GEE) method (FSGEE). Starting from an outlier-free initial subset of the data selected using a robust method, FSGEE makes its progress to the next subset by expanding the subset according to the distance of the observations to the GEE model fitted from the current subset.
Through monitoring statistical diagnostics during the forward search process, the forward plots are produced by plotting the diagnostics against the sizes of the forward search subsets. The MS effects can then be discovered by simply investigating the forward plots of residuals. When the inclusion of an observation affects the model and the diagnostics of other points significantly, the observation is suspected to be an outlier. When necessary, by examining the forward plots of various statistical diagnostics, a deeper understanding of the observation can be acknowledged, for example changes in the values of the coefficients after the observation is included, or changes in the diagnostics of other observations when the suspicious outlier is removed from the data set. The acknowledgement will help in deciding whether the observation is a true outlier, or just a non-outlying observation with relatively high leverage. Through simulation studies and the analysis of seizure data and hormone data, the forward search of the GEE method is shown to be able to provide a wealth of information for guiding both outlier detection and the identification of MS effects.

In longitudinal data analysis, masking and swamping (MS) are two common effects that can cause severe problems. Successful identification of MS effects is essential to both outlier detection and longitudinal data analysis because ignorance of the MS effects can make the conclusion of analysis totally meaningless and misleading.
In this thesis, a statistical method for analyzing and diagnosing longitudinal data sets is proposed as the forward search of the generalized estimating equation (GEE) method (FSGEE). Starting from an outlier-free initial subset of the data selected using a robust method, FSGEE makes its progress to the next subset by expanding the subset according to the distance of the observations to the GEE model fitted from the current subset.
Through monitoring statistical diagnostics during the forward search process, the forward plots are produced by plotting the diagnostics against the sizes of the forward search subsets. The MS effects can then be discovered by simply investigating the forward plots of residuals. When the inclusion of an observation affects the model and the diagnostics of other points significantly, the observation is suspected to be an outlier. When necessary, by examining the forward plots of various statistical diagnostics, a deeper understanding of the observation can be acknowledged, for example changes in the values of the coefficients after the observation is included, or changes in the diagnostics of other observations when the suspicious outlier is removed from the data set. The acknowledgement will help in deciding whether the observation is a true outlier, or just a non-outlying observation with relatively high leverage. Through simulation studies and the analysis of seizure data and hormone data, the forward search of the GEE method is shown to be able to provide a wealth of information for guiding both outlier detection and the identification of MS effects.

-

dc.language

eng

-

dc.publisher

The University of Hong Kong (Pokfulam, Hong Kong)

-

dc.relation.ispartof

HKU Theses Online (HKUTO)

-

dc.rights

The author retains all proprietary rights, (such as patent rights) and the right to use in future works.