Received 2018 January 18; Revised 2018 March 20; Accepted 2018 May 15.

Abstract

Objective

The purpose of this study was to investigate a single step genome-wide association study (ssGWAS) for identifying genomic regions affecting reproductive traits in Landrace and Large White pigs.

Methods

The traits included the number of pigs weaned per sow per year (PWSY), the number of litters per sow per year (LSY), pigs weaned per litters (PWL), born alive per litters (BAL), non-productive day (NPD) and wean to conception interval per litters (W2CL). A total of 321 animals (140 Landrace and 181 Large White pigs) were genotyped with the Illumina Porcine SNP 60k BeadChip, containing 61,177 single nucleotide polymorphisms (SNPs), while multiple traits single-step genomic BLUP method was used to calculate variances of 5 SNP windows for 11,048 Landrace and 13,985 Large White data records.

Results

The outcome of ssGWAS on the reproductive traits identified twenty-five and twenty-two SNPs associated with reproductive traits in Landrace and Large White, respectively. Three known genes were identified to be candidate genes in Landrace pigs including retinol binding protein 7, and ubiquitination factor E4B genes for PWL, BAL, W2CL, and PWSY and one gene, solute carrier organic anion transporter family member 6A1, for LSY and NPD. Meanwhile, five genes were identified to be candidate genes in Large White, two of which, aldehyde dehydrogenase 1 family member A3 and leucine rich repeat kinase 1, associated with all of six reproduction traits and three genes; retrotransposon Gag like 4, transient receptor potential cation channel subfamily C member 5, and LHFPL tetraspan subfamily member 1 for five traits except W2CL.

Conclusion

The genomic regions identified in this study provided a start-up point for marker assisted selection and estimating genomic breeding values for improving reproductive traits in commercial pig populations.

INTRODUCTION

The genetic improvement is one approach to improve reproductive performance. However, the reproduction traits are characteristic of low heritability and difficult using conventional selection method to improve. The conventional selection method may provide a lower accuracy, in comparison to the whole genome selection [1]. In the current development of molecular technique, such as single nucleotide polymorphism (SNP) chip has been widely used in for genome-wide association study (GWAS) to be a powerful tool in the identification of genomic regions or quantitative trait loci (QTL) related to an economically important trait. Single-step GWAS (ssGWAS) is the new GWAS approach which utilized all data (genotypes, phenotypes, and pedigree information) jointly in one step, proposed by Wang et al [2]. This approach can use for many models and computing is fast and simplicity [2]. In pigs have been GWAS study using SNP chip in reproduction traits especially litter trait such as the number of born alive (NBA), total number born, mummy (MUM), stillborn (SB) and total litter birth weight [3,4]. While, no previous literature has studied the GWAS of pig weaned per sow per year (PWSY), litter per sow per year (LSY), pigs weaned per litters (PWL), born alive per litters (BAL), non-productive day (NPD), and wean to conception interval per litters (W2CL). Therefore, finding the genomic regions and candidate genes in the regions of significant SNPs from the whole genome that related to the reproduction trait can be used as a powerful tool for selection to obtain the high reproductive performance. The objective of this study was to investigate a GWAS for identifying novel regions affecting on the reproduction trait in Landrace and Large White pigs.

MATERIALS AND METHODS

Animal care

Institutional Animal Care and Use Committee (IACUC) approval was not obtained for this study because the data were obtained from an existing database on pig breeding.

Animals and data

The data used for this study collected from Thailand commercial herds; extracted from the SowTracker (Version 3.4.7) reproductive data management software. The number of animals considered in the analyses was different for each trait because of the lack of data for some animals. A total of 11,048 Landrace and 13,985 Large White pigs were collected data records with 13,351 and 16,731 pedigree records respectively. These sows were raised in six farms. The reproduction traits included PWSY, LSY, NPD, W2CL, BAL, and PWL were recorded to a maximum of 10 parities between 2006 through 2015. PWSY was calculated as litter per sow per year multiplied by pigs weaned per litter; LSY was calculated as (the number days of gestating divided by 115 days) divided by (the number of days in the breeding herd divided by 365 days); W2CL, BAL, and PWL were collected on every litter and divided by total litters; the NPD was calculated as 365 minuses productive days multiplied by LSY, productive days was the total number of days that all gilts and sows were either gestating or lactating. Then checks the normal distribution before basic statistical analyses such as mean, standard deviation and coefficient of variation.

Genotype data and quality control

Used 321 animals; 140 Landrace and 181 Large White pigs were genotyped with the Illumina Porcine SNP 60k BeadChip, contained 61,177 SNPs. The quality control for genotypes SNPs of each breed following criteria was: SNP call rates <0.90, genotype call rates <0.90, minor allele frequencies <0.05, Monomorphic and checks parent-progeny Mendelian conflicts were selected for further analysis. A total of 129 Landrace and 175 Large White pigs with a total of 47,590 and 47,865 SNPs respectively were available for the genome-wide association analyses in this study.

Genome-wide association analysis

Single-step genome-wide association study

The genome-wide association analysis was estimated by using single-step genomic BLUP (ssGBLUP) [2]. GWAS by ssGBLUP can be called ssGWAS. In this methodology, multivariate and separate breed analyzed the data. The statistical model was used:

y=Xβ+Zu+e

Where y represented a vector of observations (PWL, BAL, W2CL, LSY, PWSY, and NPD), β is a vector of fixed effects. The fixed factors used in this study were last farrow-month, last farrow-year, last parity and farm, u is a vector of additive genetic effects, which was assumed to be distributed N(0,σu2), e is a vector of residual effects, which was assumed to be distributed as N(0,Iσe2), X is the incidence matrix related records to fixed effects; and Z is the incidence matrix related records to additive genetic effects.

The genetic variance component was obtained by using the restricted maximum likelihood (REML) and all analyses for REML, BLUP, and ssGWAS were run by using the BLUPF90 software [5]. In the animal model, the inverse of the numerator relationship matrix (A−1) was replaced by H−1 that combines the pedigree and genomic information [6].

Where,H-1=A-1+[000G-1-A22-1]

Where G−1 is the inverse of the genomic relationship matrix and A22-1 is the inverse of the pedigree-based relationship matrix for genotyped animals. The G matrix is a genomic relationship that constructed weighting each SNP effect by its expected variance in an iterative procedure can be created by following [7] as:

G=ZDZ′q

Where Z is a matrix relating genotypes of each locus (0, 1, or 2) adjusted for allele frequencies, D is a diagonal matrix of weights for variances of SNP effects (initially D = I), and q is a weighting factor. The weighting factor ensuring the average diagonal in G which is close to that of A22 [8]. The SNP effects and weight for ssGWAS can be derived as follows [2]:

Let D = I in the first step and calculate G matrix; G = ZDZ’q

Calculate GEBVs for the entire animal in the data set using ssGBLUP.

Convert GEBVs to SNP effects (û): û = qDZ’[ZDZ’q]−1âg, where û is a vector of SNP marker effects, and âg is the animal effects of genotyped animals.

Calculate weights for variances of SNP effects: di=u^i22Pi(1-Pi), where di is the genetic additive variance by each SNP marker, u^i2 is the square of the i-th SNP marker effect, Pi is the allele frequency of the second allele of the ith marker in the current population [9].

Normalized SNP weight to remain the total variance constant.

Calculate G matrix; G = ZDZ’q

Exit or Loop to step 2

The iterative process was repeated two times from step 2 to 7, and the percentage of genetic variance explained by i-th consecutive SNPs (SNP window) was calculated as described by Wang et al [2]:

Var(ai)σa2×100%=Var(Σj=i5Zju^j)σa2×100%

Where, ai is the genetic value of the i-th region that consists of consecutive 5 SNPs, σa2 is the total genetic variance, Zj is a vector of the gene content of the j-th SNP for all individuals, and ûj is marker effect of the j-th SNP within the i-th region.

In this study used 5 SNP window because it has been reported by Beissinnger et al [10] that both sliding windows of 5 or 10 SNPs had the most favorable ratio of detection rate to false-positive rate than larger window sizes.

Candidate gene search

The consecutive SNPs which explained 1% or more than of genetic variance were selected as SNPs that considered to associated with reproduction traits. These regions were used to determine possible putative QTL or candidate genes based on the regions those within the gene. Gene search was carried out by using the Sus scrofa Build 11.1 assembly database release on August 2017 and GeneCards (https://www.genecards.org/) for identification of biology function of the associated genes. In this study used HGNC gene symbols referenced from the HUGO Gene Nomenclature Committee. If the genes were not found in these regions, then it was considered flanking regions about 2.0 Mb upstream or downstream of QTL regions to possibly represent the locus [11]. Previously identified QTL in the pig genome was evaluated by using the PigQTLdb (http://www.animalgenome.org/cgi-bin/QTLdb/SS/index) [12].

RESULTS AND DISCUSSION

This study is preliminary of research using genomic information to apply for field data in commercial pigs of Thailand and need to beware of using small data size. However, we use ssGWAS because this approach utilizes all available information jointly in one step and has been validated using field data which more precise estimates of variance components by including non-genotyped animals if the number of genotyped animals is limited [13].

Genetic parameters estimation

Heritabilities calculated from the variance components are shown in Table 1. The estimated of heritabilities were low to moderate for all traits, ranging from 0.07 to 0.25. The results indicated that Landrace pigs had a heritability of six traits slightly lower than Large White pigs. The heritabilities of PWL, BAL, W2CL, PWSY, LSY, and NPD were 0.09, 0.12, 0.08, 0.17, 0.13, and 0.18, respectively in Landrace. Meanwhile, heritabilities in Large White pigs were 0.12, 0.14, 0.07, 0.25, 0.18, and 0.25, respectively. The genetic and phenotypic correlation estimates for reproduction traits are given in Table 2. The genetic correlations between NPD and the five reproduction traits in Landrace and Large White were −0.25 and −0.26 for PWL, −0.13 and −0.08 for BAL, 0.35 and 0.53 for W2CL, −0.99 and −0.99 for LSY, −0.67 and −0.65 for PWSY. The phenotypic correlation between NPD and PWL, BAL, W2CL, LSY, and PWSY were 0.06, 0.04, 0.35, −0.99, and −0.35, respectively in Landrace. For Large White the phenotypic correlation between NPD and PWL, BAL, W2CL, LSY and PWSY are 0.08, 0.05, 0.39, −0.99, and −0.31, respectively. Which, the phenotypic correlation of NPD and PWL and BAL has low magnitudes.

Genetic correlations (above the diagonal) and phenotypic correlations (below the diagonal) for reproduction traits in Landrace and Large White pigs

Genome-wide association study

The ssGWAS results of the 6 measured traits of Landrace and Large White were shown in Figures 1, 2, respectively. Figure 1 and 2 showed the plots of genetic variances explained by each 5-SNP sliding windows showed in a Manhattan plot. Different shades represented SNP on a different chromosome from Sus scrofa chromosome (SSC)1 (left) to X and unmapped (right). In total, there were 47,590 and 47,865 regions in Landrace and Large White, respectively.

Manhattan plot of genetic variance contributed by an SNP window of 5 consecutive SNP for reproduction traits in Large White pigs. SNP, single nucleotide polymorphism.

Genomic regions were found to be associated with reproduction traits for Landrace and Large White pigs in Table 3 and Table 4, respectively, together with the candidate genes and associated SNPs within each region. In PigQTLdb has been reported QTL effect on reproduction traits, 144 QTL were identified for NBA and 4 QTL for the number of weaned (November 2017). Our study had detected the new genomic regions for pig reproduction traits which did not overlap with QTL intervals previously reported from PigQTLdb (http://www.animalgenome.org/cgi-bin/QTLdb/SS/index).

The regions of 5 SNP windows which explained >1% of genetic variance for six reproduction traits in Large White, with a list of annotated genes

For Landrace pigs, the QTL regions were identified on SSC 2, 6, 14, and X. A total 60 regions were significantly associated (SNP windows that explained more than 1% of genetic variance) with reproduction traits for all six traits which were included in 11, 14, 13, 8, 6, and 8 regions for PWL, BAL, W2CL, LSY, PWSY and NPD, respectively. When the consideration of overlapped of QTL regions from all traits, it was found that a total of 25 SNPs was associated with reproduction traits. The candidate genes which had the highest of genetic variance of 5 adjacent SNPs for each trait were retinol binding protein 7 (RBP7) and ubiquitination factor E4B (UBE4B) gene located on SSC6 for PWL (3.36%), BAL (3.06%) W2CL (5.69%) and PWSY (1.67%); solute carrier organic anion transporter family member 6A1 (SLCO6A1) gene located on SSC2 for LSY (2.05%) and NPD (2.20%). From the results, some regions and genes showed a significant associated with more than one trait within a breed. It means that the presence of a variant may affect the multiple traits. For instance, the region that strongly associated with PWL, BAL, W2CL, and PWSY was rs81320475. The RBP7 and UBE4B gene have highly associated with four traits were PWL, BAL, W2CL, and PWSY.

It was found that a total of 11 genes (located on SSC2, SSC6, SSC14, and SSCX) associated with reproduction traits in Landrace pigs (Table 5). All of them, SLCO6A1, RBP7, and UBE4B have the highest percentage of genetic variance and rather cover associated with all reproduction traits. The SLCO6A1 is organic anion transporting polypeptide family, located on SSC2, which associated with W2C, LSY, NPD, and PSY. This gene strongly expressed in human testis [14,15] and has been identified as a cancer/testis antigen expressed in human lung cancer [15].

The RBP7 is a member of the retinol binding protein family. The retinoids play roles in vision, growth, reproduction, and cellular differentiation beginning in early development [16]. This gene located on SSC6, which associated with PWL, BAL, W2CL, and PWSY. In the pig, RBP7 gene has high expression in fat and higher expression in the endometrium on day 15 of estrous cycle compared to the pregnancy of day 15 [17]. According to Hu et al [18] reported the RBP7 plays a role in regulating peroxisome proliferator-activated receptor gamma transcriptional activity and that adiponectin (AdipoQ) might be a potential downstream target of RBP7 in the endothelium.

The UBE4B gene located on SSC6, which associated with PWL, BAL, W2CL, and PWSY. The vertebrate, UBE4B gene is also known as ubiquitin fusion degradation (UFD2a), homolog in yeast. In 2005, Kaneko-Oshikawa et al [19] showed that mice lacking UFD2a or deletion of ubiquitin enzymes effected on embryo-lethal and apoptosis in the heart. According to Zage et al [20] reported the UBE4B overexpression reduced neuroblastoma tumor cell proliferation which neuroblastoma is a type of cancer that found in an embryo or fetus in human.

For Large White pigs, a total of 69 putative QTL regions with in the significant regions (SNP windows explained more than 1% of genetic variance) were included 10, 11, 11, 11, 10, and 16 regions for PWL, BAL, W2CL, LSY, PWSY, and NPD respectively. When consideration of overlapped of QTL regions from all traits, it was found that a total of 22 SNPs was associated with reproduction traits. The candidate genes have the highest of genetic variance of 5 adjacent SNPs for each trait were aldehyde dehydrogenase 1 family member A3 (ALDH1A3) gene located on SSC1 for PWL (4.27%); leucine rich repeat kinase 1 (LRRK1) gene located on SSC1 for BAL (4.86%) and W2CL (3.66%); retrotransposon Gag like 4 (RTL4) located on SSCX for LSY (1.64%) and PWSY (1.86%). Finally, an uncharacterized gene (ENSSSCG00000022384) or the known gene was RTL4 gene found on SSCX (1.57 and 1.44% respectively) was the highest percentage of genetic variance for NPD. This result indicated that some regions showed significant associated with more than one trait within a breed. It means that the presence of a variant may affect multiple traits. For examples, the region that strongly associated with PWL, BAL, W2CL and PWSY was rs80830052. The ALDH1A3 and LRRK1 gene have highly associated with all reproduction traits.

It was found that a total of 9 genes (located on SSC1, SSC18, and SSCX) associated with reproduction traits in Large White pigs (Table 5). There are five gene that have the highest percentage of genetic variance and rather cover associated with all reproduction traits. The ALDH1A3 and LRRK1 were found on every trait in Large White pigs, which located on SSC1. The aldehyde dehydrogenase (ALDH) family is the important enzyme for the aldehyde metabolism which plays an important role in embryo formation and development, cell proliferation and differentiation [21]. ALDH1A3 is primarily responsible for oxidizing all-trans retinal to retinoic acid (RA) and active derivative of vitamin A (retinol) [21,22], which has been reported that ALDHLA3 knockout in mouse suppresses RA synthesis, vitamin A-deficient fetuses and cause malformations restricted to ocular and nasal regions, which is responsible for respiratory distress and death at birth [23]. Moreover, RBP7 and ALDH1A3 may enhance the growth and proliferation of muscle cells, possibly early in embryonic development, and contribute to the phenotypic expression of high feed efficiency through stimulation of the Jnk pathway [24]. The information above indicated that these genes were important for reproduction traits and might be applicable to the screening of candidate genes related to prolificacy and implantation rate in pig production.

The LRRK1 are large multidomain proteins containing kinase, GTPase and multiple protein-protein interaction domains which play a role in the regulation of bone mass in human mutation of LRRK1 [25] and lead to osteosclerotic metaphysical dysplasia and causes a severe osteopetrosis.

The RTL4 gene or mammalian retrotransposon transcripts also called Sushi-ichi-related retrotransposon homologue 11/zinc finger CCHC domain-containing 16 (SIRH11/ZCCHC16), located on SSCX. It is expressed in the brain, kidney, testis, and ovary in adult mice [26] but undetectable expressed in placental stages indicated no role during mouse placentogenesis [27]. The deletion of SIRH11/ZCCHC16 gene leads to abnormal behaviors related to cognition which, including attention, impulsivity and working memory, possibly via the noradrenergic system [26].

Transient receptor potential (TRP) channels play fundamental roles in sensory biology. The short transient receptor potential channel 5 (TRPC5) gene, plays an important role in maintaining blood pressure stability [28], may provide manipulate the activity of key neurons involved in the regulation of energy balance and glucose metabolism [29]. In addition, TRPC5 also associated with the weight of the biceps brachii muscle, which related to leg weakness in pigs [30]. The function of this gene was related to health which associated with reproduction traits by indirectly. In the current study, this gene located on SSCX and it was associated with PWL, BAL, LSY, PSY, and NPD trait in Large White pigs.

LHFPL tetraspan subfamily member 1 (LHFPL1) is a member of the lipoma HMGIC fusion partner (LHFP) gene family. It is expressed widely in all tissues, especially high in lung, thymus, skeleton muscle, colon, and ovary [31] but the function has not been determined.

Most of the QTL identified in this study are new genomic regions. Previously detected QTL for the total NBA has been reported at the SSC1, SSC2, and SSC6; for the number weaned has been reported at the SSC1 and SSC2 [12] which are same chromosome found in this study, but the QTL is in different locus or reported other reproduction traits such as SB, MUM, and gestation length. Meanwhile, other reproduction traits in this study have not been previously detected QTL from PigQTLdb.

The GWAS by using single-step identified twenty-five and twenty-two regions were associated with reproduction traits in Landrace and Large White, respectively. Among them, we focus on eight genes that associated with reproduction traits in both breeds which these regions located within the gene and had the highest of genetic variance of 5 adjacent SNPs. Three known genes were identified to be the candidate genes included two genes were RBP7 and UBE4B for PWL, BAL, W2CL, and PWSY and one gene was SLCO6A1 for LSY and NPD in Landrace pigs. Meanwhile, five genes were identified to be candidate genes in Large White, which associated with all of six reproduction traits included ALDH1A3 and LRRK1 and five traits except W2CL were RTL4, TRPC5, and LHFPL1.

The ssGWAS suitable for complex models like multiple traits and small size of genotypes animals because using all information (genotypes animals, non-genotypes animal, phenotypes, and pedigree information) to estimate genomic value for ssGWAS analysis which classical GWAS use only information from genotyped animals. However, the ssGWAS still weakness is cannot provide the p-value for each SNPs. Although the p-value can use the normalizing each SNP solution to a t-like statistical analysis; nonetheless, it difficult to apply to multiple SNPs and the future research may provide the level of significance.

ACKNOWLEDGMENTS

The authors gratefully thank the NSTDA University Industry Research Collaboration (NUI-RC) through the National Science and Technology Development Agency (NSTDA) for providing the financial support, Research and Development Center Betagro Group Company in Thailand for providing the genotypes and phenotypes, Khon Kaen University (KKU) for providing the research place. This work was also supported by Research and Development Network Center for Animal Breeding (Native Chicken) (NCAB), Khon Kaen University.

Notes

CONFLICT OF INTEREST

We certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript. Taharnklaew R is an employee of Betagro Group and Tuangsithtanon K is an employee of Betagro Hybrid International Company Limited.

Article information Continued

This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Gene locations on the Sus scrofa Build 11.1 assembly and the upstream and downstream of regions that possibly associated with each reproduction traits. Gene names represent on Ensembl (http://asia.ensembl.org/Sus_scrofa/Info/Index).

Gene locations on the Sus scrofa Build 11.1 assembly and the upstream and downstream of regions that possibly associated with each reproduction traits. Gene names represent on Ensembl (http://asia.ensembl.org/Sus_scrofa/Info/Index).