Development of algorithms and tools for a decision support system to assist sheep breeders to design crossbreeding programs

Abstract

This study is part of a sheep crossbreeding decision support system (DSS) project that aimed to provide quality decision oriented information to support NZ sheep breeders make decisions about appropriate crossbreeding systems. Unlike much 'mainstream' research, this study is based mainly on data from the published literature, with only limited data being collected from farm trials. The main objective of the study was to develop useful algorithms and tools to help with the development of the DSS for the delivery of the quality information. These algorithms and tools can be categorised into four groups.
The first group was developed for systematically reviewing the published literature, categorising reported results and collating qualified data. A relational database was developed to store and manipulate the sheep crossbreeding data, using modem computing technologies and tools. Crossbreeding data, mainly least squares means (local means) and standard errors (SE) are organised in the database based on their genotypes, traits, sources and environments , to allow easy management and search as well as further genetic analyses. Compatibility with the NZ Sheep Improvement Limited (SIL) database was also considered in the DSS database development to allow data communication between the two.
The NZ farm class classification, that has been used by NZ Meat and Wool Service for years to categorise NZ farms based on their topography, soil types, management styles and regional locations, was chosen as the environmental identifiers of the crossbreeding data to be collated. It was well recognized that this was a less than perfect classification because by its nature it was unable to differentiate the within farm or within trial environmental influences, and was unable to reflect environmental changes with time. However, this was the closest environmental classification that could be used to provide useful categories for combining local results from different trials according to their relevance to particular farming systems, which are necessary to the development of the other algorithms and tools in this study.
Using this algorithm, literature from sheep crossbreeding studies within NZ from 1972 onwards was reviewed intensively, and identified useful data were collated into the database. It was found that most sheep crossbreeding experiments conducted in NZ to date were introductory studies, which were unrelated, small in scale, mainly for breed comparisons rather than explicitly studying heterotic effects. Data from these experiments covered a wide range of breeds, traits and NZ environments, but were generally sparse. In many cases, for the performance merit of a genotype for a trait, estimates from different studies were very different and, consequently, hard to use. Considerable information gaps, not only in the trait performance of the genotypes under consideration, but also in the heterosis estimates were also identified from the review. To assist in the study, literature on decision support system development in agriculture, meta-analysis methodology for data combining and methodology for estimating crossbreeding effects were also reviewed.
The second group was developed, consequently, for combining the local, replicated estimates from different studies into generalised means for each available genotype per farm class. The generalised results, also termed as regional means, are the genotypic effects averaged across particular farm classes, ignoring specific within-region genotype x environment interactions, and therefore are applicable to the whole farm class or associated region. The weighted least squares approach was used in the data combining process, where 1/SEi² was used as the weight for the ith local mean, breed (genotype) and year effect were fitted as factors, and covariate was used for linear adjustment when necessary. Given insufficient local means in a number of farm classes, farm class was unable to be fitted as a factor in the analyses, and consequently the genotype x environment interaction at the farm class level was also ignored.
Using the meta-analysis algorithm, analyses were performed per farm class (if possible) and the regional means and associated SEs for each available genotype were estimated within the farm class. These results were to be used in further analyses of crossbreeding effects. From these analyses, conflicts existing in the published papers were also detected, and research areas that needed further study were identified as well.
A detailed discussion was given on the factors that were likely to cause biases in the analyses results. This covered the suitability of using farm class as the environmental identifiers, within study/trial biases, the year effect factor, the weighting policy of using 1/SEi², linear adjustment method when covariate was used, and other unaccountable factors such as animal age, publication biases. Therefore, the generalised data produced were regarded as preliminary results, and further work on the process was suggested.
The third group was developed for the estimation of crossbreeding effects based on the generalised data. A computer program, named HeterosisEstimator, was explicitly developed as a DSS tool to estimate crossbreeding effects for different crossbreeding plans (models). The algorithm of analysing a large number of crossbreeding plans, a genetic model that accounted for additive, dominance and additive x additive epistatic effects, and a statistical routine of weighted least squares (l/SE² of each regional mean were used as the weight), were implemented in the program. Automation mechanisms for reading input data, forming the relationship matrix, calculating crossbreeding parameters and writing results in specified Microsoft Excel files, were also implemented in the program, which consequently improved the speed and efficiency of the estimation analyses considerably.
A large number of estimates of underlying crossbreeding parameters, in particular the direct and maternal heterosis, was produced from the estimation analyses. These estimates were based on existing crossbreeding data from the NZ sheep industry which had seldom been reused before, and therefore should be regarded as increased/extended information. It was found that for many breed combinations, the estimates of heterotic effects (either direct or maternal) had ranges of values across different crossbreeding plans, indicating that the ranges rather than single point values should be used in the following prediction of crossbred performance. It was also found that maternal heterosis was unable to be estimated for many breed combinations owing to insufficient input data, which brought uncertainties to the prediction of crossbred performance. The algorithm and factors that were likely to affect the quality of estimation of crossbreeding effects were discussed.
The fourth group was to demonstrate how to use the developed simulation algorithms and model to explore the variations in crossbred performance predictions arising from different crossbreeding plans and parameter estimates with uncertainties. A simplified dominance model and quantitative risk analysis concepts and technologies were used in the simulation under a set of assumptions. Crossbreeding parameter estimates were fitted with normal distribution functions to cope with uncertainties. The simulation model was built using a Microsoft Excel add-in program package named @Risk and a predefined design matrix for genotypes under consideration. The relevant generalised regional means were used as target data of the simulation model and the deviations of the predicted means against the target means were monitored during simulation to meet a predefined criterion.
The normal distribution functions were inputs to the simulation model. Simulation started and iterated many times. At the end of each run, sensitivity results of the monitored variables were analysed and the distribution functions were adjusted accordingly for the next simulation run. When the criterion was met, the simulation was completed and the simulated results, including the point values of parameters and predicted performance merits for genotypes within the current crossbreeding system were produced as part of the information that the DSS is expected to provide to decision makers. An algorithm to calculate aggregate economic returns for each genotype and, consequently, the best genotypes within the current system, was also demonstrated using arbitrary relative economic weights for each trait under consideration. The simulation algorithms, genetic model, simulation model and issues developed during simulation and corresponding solutions to them were also discussed.
A general discussion was given about the contributions that this study has made to the DSS and the NZ sheep industry, the advantages and disadvantages of the algorithms and tools, and suggestions to further development of the DSS. The major conclusions drawn are: a) This study has made contributions to improve the understanding of sheep crossbreeding in NZ as a whole, including the systematic review of published literature and collation of identified results, the approach for combing local data to allow estimates of underlying crossbreeding parameters to be obtained, and their incorporation into a simulation model using genetic prediction algorithms and risk analysis procedures to evaluate variations in crossbred performance predictions; b) The algorithms and tools developed in this study are important to the DSS and can be incorporated into the DSS in the future; c) The DSS is a good solution to provide quality decision oriented information to NZ sheep breeders and help with their crossbreeding practice; d) Merging the local data into regional means is crucial to the entire study. This is an open-ended and iterative process, as are the estimation and simulation processes.... [Show full abstract]