Outline

Background: In clinical trials with longitudinal counts as primary endpoint, interest often focuses on the entire profile (e.g., difference in slope between groups) instead of single time point analysis. Accordingly, the primary efficacy analysis of longitudinal counts is based on a careful pre-defined statistical model which has to be specified in the statistical section of the protocol or the statistical analysis plan. This poses challenges for model choice and model validation since the presumed model may affect the conclusion about the key scientific parameters. Besides deciding on the random effects and fixed effects structure, there are different distributions which can be assumed to model the count efficacy data: Poisson, zero-inflated, negative binomial (NB), or normal scale after a variance-stabilizing transformation. As frequentist strategies for model assessment and model diagnosis are cumbersome in random effects settings and have several limitations, it is of interest to explore Bayesian instruments which provide the needed decision support.

Material and Methods: We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism and apply them to an open trial on vertigo attacks. A novel Bayesian approach based on integrated nested Laplace approximation (INLA) proposed by Rue et al. [1] is used for fitting generalized linear mixed models (GLMMs). With this approach the posterior marginal distributions are accurately approximated in a fully automated way. To challenge the model INLA methodology enables the computation of leave-one-out predictive measures without reanalyzing the posited model. We evaluate rival GLMMs for count outcome according to the DIC, PIT, log marginal likelihood, and by using proper scoring rules which allow ranking and validating various model alternatives.

Results: The ability of the logarithmic scoring rule and DIC to discriminate between Poisson and NB mixed-effects model was illustrated in a simulation study for a longitudinal NB data generating process which mimics characteristics of the clinical trial. Simulation procedure assumed varying degrees of overdispersion and sample sizes. Under the given conditions, it becomes obvious that the naÃ¯ve choice of a random effects Poisson model is often inappropriate for real-life count outcomes [2].

Discussion: We conclude that Bayesian methods are not only appealing for inference but notably provide better insight into different aspects of model performance, such as forecast verification, calibration checks, or model selection techniques. The mean of log score provides a robust tool for model ranking which is not sensitive to sample size.