Abstract

This thesis considers two issues. First, the traditional problem of optimal sample design is examined without the usual constraint that data on the same set of data items be collected from all units selected to be in the survey. In particular, this thesis allows all possible sets of data items to be collected in the survey. Such surveys, referred to as Split Questionnaire Designs (SQDs), have been historically used to manage the burden on respondents to a survey. While addressing the issue of respondent burden, this thesis develops an approach to find the sets of data items to be collected in a survey, and the number of units in the sample from which to collect them in order to achieve the optimal trade-off between the cost of the design and the accuracy of the estimates. The parameters that determine the accuracy and cost of an SQD are clearly identified. The estimation of means, regression coefficients and the probabilities associated with a contingency table are considered.

Second, estimation and inference about fixed and random effects of linear mixed models (LMM) with missing continuous covariates are considered. Missing data occurs commonly in practice. It is well-known that only using observations in analysis which contain no missing variables, called the complete case approach, can lead to biased estimates. The thesis develops a method of estimation and inference that is easy to implement and can significantly improve the reliability of inferences compared with what would otherwise be obtained from using only the complete cases. Developing closed-form expressions of the accuracy of estimates for parameters in a mixed model, for a given allocation, is a major step towards optimal SQDs allocation for mixed models analysis.