Abstract

This paper deals with Bayesian inferences of animal models using Gibbs sampling. First, we suggest a general and efficient method for updating additive genetic effects, in which the computational cost is independent of the pedigree depth and increases linearly only with the size of the pedigree. Second, we show how this approach can be used to draw inferences from a wide range of animal models using the computer package Winbugs. Finally, we illustrate the approach in a simulation study, in which the data are generated and analyzed using Winbugs according to a linear model with i.i.d errors having Student’s t distributions. In conclusion, Winbugs can be used to make inferences in small-sized, quantitative, genetic data sets applying a wide range of animal models that are not yet standard in the animal breeding literature.

INTRODUCTION

The beginning point for the development of new models in quantitative genetics often originates with a fixed model that assumes individual animals to be independent. The idea is to include the effect of additive gene action on a linear additive scale, together with other explanatory variables, and then to assume a multivariate normal distribution for the unobserved additive genetic effects. Examples include the mixed model for ordered categorical data (Sorensen et al., 1995), which is based on the threshold liability concept first proposed by Wright (1934), and the mixed model for survival data (Ducrocq and Casella, 1996), which is based on the Cox proportional hazards model (Cox, 1972).

Although well-documented software often exists for the fixed analog of the additive genetic model, and in some cases also a simple mixed model, the applicability of such software in animal breeding is often limited. The reason is that it is impossible to set up the additive genetic relationship matrix for models with a pedigree structure more complex than that of a sire model. Hence, the limiting factors for applying new models suggested in the statistical literature to analyze quantitative genetic data are the time and resources that necessarily must be allocated to the development of new software.

In this paper the goal was first to develop a computationally efficient method for updating additive genetic effects, and second to show how this method could be used to analyze genetic data using Bayesian probability models with the flexible computer package Winbugs (Spiegelhalter et al., 2006). The idea was first proposed by Mrode and Thompson (1989), in which they parameterized the mixed linear model in terms of the transformed additive genetic effects based on the usual partition of the additive genetic relationship matrix (Henderson, 1976). Finally, the approach was illustrated in a simulation study, in which the data were generated according to a linear model, with i.i.d errors having Student’s t distributions.

MATERIALS AND METHODS

Animal Care and Use Committee approval was not obtained for this study because only no animals were used.

The Bayesian Additive Genetic Model

Bayesian probability models are constructed by specifying the joint distribution, say p(y, 𝛉), of the observable y, which is the data, and the unobservable 𝛉, which comprises the model parameters, missing data, and so on. As soon as the data are observed, model inferences are based on the posterior distribution p(y | 𝛉). Using Bayes’s theorem, the posterior distribution is, up to proportionality, given by the product of the sampling distribution (likelihood) p(y | 𝛉) and the prior p(𝛉):

For the underlying genetic model, we assume an additive genetic infinitesimal model (Bulmer, 1980). We will classify the parameter vector into a vector of additive genetic effects, a, and associated covariance matrix, G, and the remaining parameters, η. Moreover, we adapt the often-used assumption of conditional independence, which implies that the data are independent given the model parameters. A priori, we assume that p(𝛉) = p(η)p(a | G)p(G). According to quantitative genetic theory, a | G is assumed to be multivariate normally distributed, with a mean vector 0 and a covariance matrix G ⊗ A, where A is the additive genetic relationship matrix. The prior specification for η and G will depend on the problem at hand and will be left unspecified until, in the last part of this paper, an example is considered. Under the above assumption, a Bayesian additive genetic model may be represented by

Reparameterized Model

In the following model, the additive genetic effects are reparameterized according to a = (L ⊗ TD1/2)γ, where L is the lower triangular Cholesky decomposition of G (G = LL′), and T and D correspond to the factorization A = TDT′ (Henderson, 1976). This implies that the vector γ is standard, multivariate normally distributed. By the rules for the multidimensional change of random variables, the posterior distribution for the reparameterized model is, up to proportionality, given by

where a has been exchanged by (L ⊗ TD1/2)γ, and the prior distribution for L is given by the prior distribution for G and the transformation G = LL′. For G to be positive definite, the diagonal elements of L must be greater than zero.

The parameter vector 𝛉̃ is given by (η, γ,L). Therefore, if Markov chain Monte Carlo methods are used for the inferences, then posterior samples are obtained of γ and L (Metropolis et al., 1953; Hastings, 1970). These are, however, easily transformed to posterior samples of a and G, and Bayesian inferences including posterior point estimates and credibility intervals are done as usual.

It is important to note that the prior distribution is not generally invariant to the reparameterization. As an example, consider the situation in which we have only a direct additive genetic effect, set G = σ2a, from which it follows that . If here we assume a priori a bounded uniform prior for L on the interval (a, b) with 0 < a < b, then the prior distribution for σ2a is given by ½ (σ2a)−1/2/(b2 − a2) on the interval (a2,b2), which is bounded but not uniform.

Gibbs Updating

In a Gibbs sampler, posterior samples are obtained by repeatedly updating from the full conditionals (Sorensen and Gianola, 2002). These are easily derived, at least up to proportionality, by retaining the parameters of interest from the posterior distribution.

From the reparameterized posterior distribution (equation [2]), it follows that updating from the full conditional for γ requires the computation of (L ⊗ TD1/2)γ . Because the element Tij of T is the fraction of genes of animal i inherited by animal j, the average number of nonzero elements in one row of T is the average number of ancestors per animal. The T matrix is therefore relatively sparse, and it would be an inefficient approach to form T and evaluate (L ⊗ TD1/2)γ explicitly. Fortunately, Quaas (1989) presented a recursive algorithm for calculating v = Ts (where s is an arbitrary vector), working directly from a list of sires and dams and therefore with cost independent of the pedigree depth. Thus, the matrix T is never formed. Assuming that animals have been sorted from the oldest to the youngest and renumbered from 1 to q (parents precede offspring), the recursive formula is

where the subscripts si and di are the sire and dam index of animal i. If a sire (dam) is unknown, then vsi = 0 (vdi = 0). This result easily extends to the calculation of a = (L ⊗ TD1/2)γ.

Clearly, this result is useful for updating the elements of γ = (γ1, ..., γq), irrespective of whether it is done jointly or univariately. We will focus on the latter, because this allows us to use the standard computer package Winbugs. In Table 1, the general idea of how to update the elements γ1, ..., γq for the univariate case is sketched. A specific example of how to fit an animal model in Winbugs is given in the next section. For ease of representation, one trait with a direct additive genetic effect is considered, in which case L is a scalar and equal to . Extension to several additive genetic effects and multivariate models is straightforward but notationally cumbersome and will not be considered here.

Different strategies for univariate updating of parameters can be applied if the normalizing constant cannot be found. These include adaptive rejection sampling (Gilks and Wild, 1992; Gilks and Best, 1995) and a Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970). In situations in which it is possible to find the normalizing constant, direct updating using standard methods is an obvious choice. Winbugs frees the user from these computational details. This advantage makes Bayesian inference applicable for a wider audience. However, it is still the analyst’s responsibility to ensure that the model assumptions are satisfied; otherwise, misleading inferences may be obtained. In the future, model validation will be even more important because computational advances will enable the fitting of increasingly complex models.

EXAMPLE

Student’s t Animal Model

To illustrate the method, we generate data according to an animal model with errors independently and identically distributed as t(0, σ2, ν). Here, σ2 is a positive scale parameter and ν is a degrees of freedom parameter. If ν is ≤ 2, then the variance of the Student’s t process is infinite. When ν goes toward infinity, the Student’s t process tends toward a normal process (e.g., t(0, σ2, ν) → N(0, σ2) for ν → ∞). Compared with the normal distribution, the Student’s t distribution has thicker tails and is therefore less sensitive to outliers, which may have a marked impact on inferences in the normal model (Zellner, 1976). The use of t distributed errors in models for analyzing quantitative genetic data was addressed only recently (Strandén and Gianola 1999). I am not aware of any easy-to-use software for inferring this model, and the possibility of using Win-bugs is therefore relevant.

Let Y = (Y1,...,Yn) denote the random vector of observations. The sampling distribution is given by

where xi′ is the ith row of the n × p design matrix X, β is the p × 1 vector of fixed effects, zi′ is the ith row of the n × q design matrix Z, and a is the q × 1 vector of additive genetic effects that, according to quantitative genetic theory, is assumed to be multivariate normally distributed with mean vector 0 and covariance matrix Aσ2a. The matrix A is the additive genetic relationship matrix and σ2a is the additive genetic variance in the base population 𝛉 is the complete parameter vector, and a priori we assume that β1, ..., βp, (a, σ2a), σ2, and ν are independent.

Data Simulation

The simulation study was designed to mimic a pedigree structure from a potential study in pig breeding. More specifically, the pedigree was generated by selecting a multiplier herd from Danbred’s database, which is the major pig breeding organization in Denmark, and tracing the sows with first farrowing in the period from January 1, 2002, to December 31, 2002, back 3 generations. Altogether, the pedigree included 2,284 animals and the records were simulated for the 2,084 oldest animals according to model [4]. In addition to σ2 = 100 and ν = 10, the model included a fixed effect with 2 levels equally assigned to the animals (β1 = 100, β2 = 120), and an animal effect with additive genetic variance (σ2a = 30).

Data Analysis

The data were analyzed with the same model as used to generate the data. The assumptions were vague, normally distributed priors for fixed effects [N(0,1000)], gamma distributed priors for the additive genetic variance component and the scale parameter [gamma(0.001,0.001)], and a bounded uniform prior for the degrees of freedom parameter [uniform(2,100)].

By studying the instructive examples given in the manual for Winbugs (Spiegelhalter et al., 2006), it follows that data and model specification can be done in various ways. Data for the present analysis are given in 2 files (Table 2). The first data file included the actual data for animals without and with records, in which a zero indicated missing data or an unknown parent. Because the “if-then” statement does not exist in Win-bugs and animals without and with records are handled differently, the second data file included 2 vectors that pointed to the position of the animals without and with records in the first data file.

Except for the line numbers, Table 3 is a copy of the model specification in Winbugs that was used to analyze the simulated data. The model was fitted by running 3 independent chains for a total of 10,000 iterations, in which the model parameters were saved every 10th iteration. After the initial values for levels of fixed effects, genetic variance, scale parameter, and degree of freedom parameter were assigned, initial values for additive genetic effects were generated by Winbugs. Different beginning values were used in each chain (see Table 4). Although convergence for additive genetic effects was not checked, it was concluded, based on plots of the Gibbs chains for the remaining parameters, that an acceptable degree of convergence had been obtained. This was further supported by the fact that posterior summary statistics were very similar for the 3 chains (Table 4). The scale parameter and the degree of freedom parameter were the most poorly behaved parameters in terms of mixing, with an autocorrelation at lag 20 (saved iterations) equal to 0.18 and 0.44, respectively, for all 3 chains. Note that those parameters also are the parameters for which the posterior summary statistics varied the most, and one may consider extending the Gibbs chain to decrease the Monte Carlo error. For posterior summarization, the first 100 saved iterations were discarded and results for the 3 chains are given in Table 4.

Model Extensions

A clear advantage with the model specification in Winbugs is that very few modifications are necessary if one is interested in fitting alternative models. If, for example, we want to run an animal model with normal errors, then we only need to change “dt(.)” in line 3 to “dnorm(.)” in line 3. If we want to fit additional fixed or random effects, then these should be included on line 4, and an obvious place to include their prior specification is after line 17. Finally, if we want to estimate heritability, then the formula for this variable can be defined after line 22. Note that the second argument in the “dnorm” statement, which is the code for a normal distribution, is not the variance, but is one over the variance.

In the post-Gibbs analysis, one may take advantage of the tools available in Winbugs. Alternatively, one can use the facility for outputting Gibbs samples in a format that is compatible with the post-Gibbs analysis computer package, CODA (Best et al., 1997).

Results

The posterior mean values and 95% highest posterior density regions given in Table 4 show that the parameters can be correctly inferred using the proposed approach in Winbugs. In all cases, the 95% highest posterior density regions assign relatively high probability mass to values of the parameters in the neighborhood of the true values.

DISCUSSION

A general and computationally efficient method for updating additive genetic effects in Bayesian inferences of animal models using Gibbs sampling was presented. It also was shown how the method can be used to draw inferences from animal models using the flexible computer package Winbugs (Spiegelhalter et al., 2006).

The idea sketched here for updating additive genetic effects is computationally efficient and relevant not only for fitting models in Winbugs. The reason is that the cost associated with updating additive genetic effects is independent of the pedigree depth and increases linearly only with the size of the pedigree. The idea of parameterizing an animal model in terms of transformed additive genetic effects is not new (Mrode and Thompson, 1989). In a Bayesian implementation of a genetically structured residual variance model, Sorensen and Waagepetersen (2003) used Langevin-Hastings (Besag, 1994) to jointly update transformed additive genetic effects. According to these authors, it is easier to obtain a well-mixing Markov chain Monte Carlo algorithm for the posterior distribution of the transformed variables than the original genetic effects. We have had a similar experience in a Bayesian implementation of the semiparametric Cox model with time varying genetic effects (Damgaard, 2006).

The flexibility of Winbugs is best illustrated by looking at the long list of examples provided with the documentation. Among many others, these examples include models for continuous data, counting data, binary and categorical data, and survival data. Following the example in this paper, it should be straightforward to extend these models to animal models and use Winbugs to make additive genetic inferences. In addition to the possibility of fitting a wide range of standard Bayesian models, it is also possible for the user to specify his or her own distribution at each step of Bayesian model building.

As soon as data have been set up for Winbugs, it is very easy to fit alternative models. This feature enables the analyst to consider a large group of competing models and use Bayesian model selection criteria to select the best one for drawing inferences (e.g., Sorensen and Gianola, 2002). This is clearly an advantage in that it will decrease the risk of using an incorrect or inadequate model to interpret data. The reason is that a larger group of models will tend to be considered, compared with a situation in which resources must be allocated to implement new software to fit alternative models.

Unfortunately, the use of Winbugs in quantitative genetics is limited to relatively small data sets, beginning from a few thousand records. The analysis performed in this paper took about 1 h on a Pentium computer (1.6 GHz, 512 MB RAM). Hence, it is unlikely that we will see Winbugs used for genetic evaluation. Its applicability is limited to inference in small populations and experiments designed with the number of records, for example, limited by a high recording cost, such as in many behavioral studies. In addition, Winbugs may in turn be useful for educational purposes.