Approaches to Modelling
Heterogeneity in Longitudinal
Studies

Abstract:

This thesis is about estimation bias of longitudinal data when there is correlation between
the explanatory variable and the individual effect. In our study, we firstly introduce what is
longitudinal data, then we introduce the commonly used estimation methods for the general
linear model: the least squares method and maximum likelihood method. We apply these
estimation methods to three simple general models which are commonly used to analyse
longitudinal data. Secondly, we use frequentist and Bayesian analysis to explore the estimation
bias theoretically and empirically, with an emphasis on the heterogeneity bias. This
bias occurs where random effect estimation is used to analyse data with nonzero correlation
between explanatory variables and the individual effect. We then empirically compare the
estimated value with the true value. In this way, we demonstrate and verify the theoretical
formulation which can be used to determine the size of the bias [Mundlak, 1978]. In
order to avoid the estimation bias, the fixed effect estimation should be used to get the better
solution under nonzero correlation situation. The Hausman test is used to confirm this.
However, the bias not only occurs when we use frequentist analysis, but also exist by using
the Bayesian estimation of random effect model. Finally, we follow the Mundlak [1978]
idea, then define the special Bayesian model which can be used as Hausman test and as a
comparable model. We also prove that it is best fit model among the random effect, fixed
effect and pooled model if there is correlation between explanatory variables and individual
effect. Throughout this thesis, we illustrate this ideas using examples based on real and
simulated data.