Australian Centre for Plant Functional Genomics, School of Agriculture and Food Sciences, University of Queensland, St. Lucia, QLD 4072, Australia.School of Plant Biology, University of Western Australia, WA 6009, Australia.

Australian Centre for Plant Functional Genomics, School of Agriculture and Food Sciences, University of Queensland, St. Lucia, QLD 4072, Australia.School of Plant Biology, University of Western Australia, WA 6009, Australia.

This article has a correction. Please see:

The genomic origins of rape oilseed

Many domesticated plants arose through the meeting of multiple genomes through hybridization and genome doubling, known as polyploidy. Chalhoub et al. sequenced the polyploid genome of Brassica napus, which originated from a recent combination of two distinct genomes approximately 7500 years ago and gave rise to the crops of rape oilseed (canola), kale, and rutabaga. B. napus has undergone multiple events affecting differently sized genetic regions where a gene from one progenitor species has been converted to the copy from a second progenitor species. Some of these gene conversion events appear to have been selected by humans as part of the process of domestication and crop improvement.

Abstract

Oilseed rape (Brassica napus L.) was formed ~7500 years ago by hybridization between B. rapa and B. oleracea, followed by chromosome doubling, a process known as allopolyploidy. Together with more ancient polyploidizations, this conferred an aggregate 72× genome multiplication since the origin of angiosperms and high gene content. We examined the B. napus genome and the consequences of its recent duplication. The constituent An and Cn subgenomes are engaged in subtle structural, functional, and epigenetic cross-talk, with abundant homeologous exchanges. Incipient gene loss and expression divergence have begun. Selection in B. napus oilseed types has accelerated the loss of glucosinolate genes, while preserving expansion of oil biosynthesis genes. These processes provide insights into allopolyploid evolution and its relationship with crop domestication and improvement.

The Brassicaceae are a large eudicot family (1) and include the model plant Arabidopsis thaliana. Brassicas have a propensity for genome duplications (Fig. 1) and genome mergers (2). They are major contributors to the human diet and were among the earliest cultigens (3).

Genomic alignments between the basal angiosperm Amborella trichopoda (24), the basal eudicot Vitis vinifera (25), and the model crucifer A. thaliana, as well as B. rapa (9), B. oleracea (10, 11), and B. napus, are shown. A typical ancestral region in Amborella is expected to match up to 72 regions in B. napus (69 were detected for this specific region). Gray wedges in the background highlight conserved synteny blocks with more than 10 gene pairs.

B. napus (genome AnAnCnCn) was formed by recent allopolyploidy between ancestors of B. oleracea (Mediterranean cabbage, genome CoCo) and B. rapa (Asian cabbage or turnip, genome ArAr) and is polyphyletic (2, 4), with spontaneous formation regarded by Darwin as an example of unconscious selection (5). Cultivation began in Europe during the Middle Ages and spread worldwide. Diversifying selection gave rise to oilseed rape (canola), rutabaga, fodder rape, and kale morphotypes grown for oil, fodder, and food (4, 6).

The assembled Cn subgenome (525.8 Mb) is larger than the An subgenome (314.2 Mb), consistent with the relative sizes of the assembled Co genome of B. oleracea (540 Mb, 85% of the ~630-Mb genome) and the Ar genome of B. rapa (312 Mb, 59% of the ~530-Mb genome) (9–11). The B. napus assembly contains 34.8% transposable elements (TEs), less than the 40% estimated from raw reads (tables S11 to S14) (7), with asymmetric distribution in the An and Cn subgenomes (table S12) as in the progenitor genomes (9–11). A small TE fraction has proliferated since B. napus separated from its progenitors (7), at lower rates in the B. napus subgenomes than the corresponding progenitor genomes (table S14 and figs. S9 and S10).

The genome comprises 9 chromosomes belonging to the Cn subgenome and 10 to the An subgenome, scaled on the basis of their assembled lengths. Tracks displayed are (A) gene density (nonoverlapping, window size = 100 kb for all tracks). Positions showing loss of one or more consecutive genes are displayed (triangles) along with homeologous exchanges, detected as missing genomic segments that have been replaced by duplicates of corresponding homeologous segments (red rectangles). (B and C) Transcription states estimated by RNA-seq in leaves (B) and roots (C) (in nonoverlapping 100-kb windows). (D) DNA transposon density. (E) Retrotransposon density. (F) CpG methylation in leaves (green) and roots (brown); both curves are overlapping. (G) Centromeric repeats (densities exaggerated for visual clarity). Homeologous relationships between An and Cn chromosomes are displayed with connecting lines colored according to the Cn chromosomes.

The B. napus An and Cn subgenomes are largely colinear to the corresponding diploid Ar and Co genomes, with asymmetric gene distribution (42,320 and 48,847, respectively) and 93% of the diploid gene space in orthologous blocks (fig. S12) (7). We identified 34,255 and 38,661 orthologous gene pairs between the An and Cn subgenomes and their respective progenitor genomes (fig. S13). Comparison of An-Ar and Cn-Co orthologous gene pairs suggested a divergence 7500 to 12,500 years ago (fig. S14), indicating formation of B. napus after this date. Synteny with Arabidopsis (table S19) confirmed the triplicated mesoploid structure (9–11) of the An and Cn subgenomes, with the recent allopolyploidy conferring on B. napus an aggregate 72× genome multiplication since the origin of angiosperms (Fig. 1) (7).

Most orthologous gene pairs in B. rapa and B. oleracea remain as homeologous pairs in B. napus (tables S19 to S25 and figs. S12 to S17) (7). DNA sequence analysis (7) confirmed the loss of 112 An and 91 Cn genes in B. napus ‘Darmor-bzh’ (tables S21 to S26), ~2.6 times higher than the 41 and 37 genes lost in B. rapa ‘Chiifu’ and B. oleracea ‘TO1000’ respectively (tables S26 and S27; χ2 test P = 5.3 × 10–14). Further analyses of a Brassica diversity set showed that ~47% of Darmor-bzh An and 31% of Cn deleted genes were also deleted in at least one additional progenitor genotype (tables S28 and S29), indicating that their deletion probably predated allopolyploidization of B. napus (7). A high proportion (27% to 54%) of the remaining Darmor-bzh deleted genes were also deleted from diverse B. napus genotypes (tables S28 and S29).

Homeologous exchanges (HEs), including crossovers and noncrossovers, are frequent between B. napus subgenomes and range in size from large segments to single SNPs (7) (Fig. 3, figs. S17 to S24, and tables S30 to S39).

(A) Coverage depth obtained along the An2 chromosome after mapping Illumina sequence reads from seven natural and one resynthesized B. napus genotypes (named on the right) to the reference genome of B. napus ‘Darmor-bzh.’ (B and C) Coverage depth obtained for Ar2 and Co2 chromosomes, respectively, after mapping >21 genome-equivalents of Illumina sequence reads from B. napus Darmor-bzh on the B. rapa and B. oleracea genome assemblies concatenated together. (D) Similar to (A), where the Cn2 chromosome of Darmor-bzh is displayed. Segmental HEs are revealed based on sequence read coverage analysis, where a duplication (red) is revealed by significantly greater coverage for a given segment than the rest of the genome (black) and a deletion (blue) by little or no coverage for the corresponding homeologous segment. Sizes of chromosomes are indicated in Mb. An-to-Cn converted genes (at 60% or more conversion sites) are plotted as blue dots on Ar2 (B) and red dots on Co2 (C). Cn-to-An converted genes are plotted as blue dots on Co2 (C) and red dots on Ar2 (B). Open circles denote entirely converted genes using the same color code. Light gray lines connecting (A), (B), (C), and (D) indicate orthology relationships, and dark gray lines highlight segmental HEs in Darmor-bzh (names and descriptions detailed in table S31). Further HEs occurring between other homeologous chromosomes are shown in fig. S19. Black arrows in (A) indicate HEs involving GSL and FLC genes.

At the chromosome segment level, HEs are characterized by replacement of a chromosomal region with a duplicated copy from the corresponding homeologous subgenome (7). We identified 17 HEs, 14 Cn to An and 3 An to Cn (Fig. 3, fig. S19, and tables S30 and S31). Sequences from seven diverse B. napus genotypes revealed both shared and specific segmental HEs. These are of varying sizes and are most frequent between chromosomes An1-Cn1, An2-Cn2, and An9-Cn9 (table S32, Fig. 3, and fig. S19). Larger HEs found in the synthetic B. napus H165 affect, for example, most of chromosomes An1-Cn1 and An2-Cn2 (Fig. 3 and fig. S19). Functional annotation of genes within HEs suggests some have experienced selection, contributing to the diversification of winter, spring, and Asian types of oilseed rape, rutabaga, and kale vegetables (Fig. 3B, fig. S19, and table S33).

We also identified 37 Cn to An and 56 An to Cn whole-gene conversions (12) (table S34).

At the single-nucleotide scale, exchanges between homeologous subgenomes account for ~86% of allelic differences between B. napus and its progenitors, with nearly ~1.3 times more conversions from the An to the Cn subgenome than the reverse (χ2, P < 1.6 × 10−16) (tables S35 and S36). A total of 16,938 An and 13,429 Cn genes (with 10,258 from homeologous pairs) had at least two conversion sites (table S37); 842 An and 579 Cn genes were highly converted with 60 to 90% conversion sites (table S37).

An and Cn homeologs contributed similarly to gene expression for 17,326 (58.3%) gene pairs (χ2, P < 0.01) (fig. S27 and table S41). Both tissues showed higher expression for 4665 (15.7%) An homeologs and 5437 (17.3%) Cn homeologs (fig. S28 and table S41). There were 1062 gene pairs (3.7%) with higher expression of the An homeolog over the Cn homeolog in leaves, whereas the reverse was true in roots (fig. S28 and table S41). Conversly, for 966 gene pairs (3.3%), An homeologs had lower expression than Cn homeologs in leaves, with the pattern inverted in roots (fig. S28 and table S41).

Gene expression is generally inversely related to CpG, CHG, and CHH cytosine DNA methylation levels (p is phosphate, implying a C is directly followed by a G, and H is A, C, or T) (7). Methyl bisulfite sequencing in Darmor-bzh (figs. S30 to S32 and tables S43 to S45) showed 4 to 8% higher methylation in Cn genes than in their homoelogous An genes (table S44), possibly because of greater transposon density in the Cn subgenome (Fig. 2F). Of the ~3100 gene pairs with differential gene body and/or untranslated region methylation between An and Cn homeologs in both roots and leaves, 51% were equally expressed. Only ~34% showed higher expression for the less-methylated homeologs, and the remaining ~15% showed the opposite pattern (table S45).

It is interesting that partitioning of homeolog gene expression is largely established in B. napus with patterns of both genome dominance and genome equivalence. The absence of significant bias toward either subgenome of the recent B. napus allopolyploid contrasts with many old and recent polyploids (13–17) but concurs with other old polyploids (18).

The expansion of B. napus lipid biosynthesis genes exceeds that known in other oilseed plants, with 1097 and 1132 genes annotated in the An and Cn subgenomes, respectively (7) (tables S46 to S48). Most lipid biosynthesis genes identified in the progenitor genomes are conserved in B. napus. For 18 acyl lipid orthologs, 3 and 2 genes appeared to be deleted from An and Cn subgenomes, respectively. Another 13 have been converted by HEs, nine from An to Cn and four from Cn to An (tables S47 and S48) (7).

Genetic variation for reduced seed GSLs also appears to be under breeding-directed selection. GSLs are sulfur-rich secondary metabolites important for plant defense and human health (19); however, high levels in seeds form toxic breakdown products in animal feeds (20). All 22 GSL catabolism genes identified in B. rapa and B. oleracea (10) are conserved in B. napus (7), and orthologs of only three Co and one Ar GSL biosynthesis genes are missing (table S49). One deleted homeologous pair, corresponding to orthologs of B. oleraceaBo2g161590 and B. rapaBra02931 , colocates with two quantitative trait loci (QTLs) for total aliphatic GSL content (21) and corresponds to a HE in which a segment of An2, with a missing GSL gene, has replaced the Cn2 homeolog (Fig. 3). Two additional QTLs for aliphatic GSL content (21) colocalize with a deletion of the B. rapaBra035929 ortholog on An9 and its nondeleted homeolog on Cn9 (BnaC09g05300D, fig. S17).

We identified 425 nucleotide binding site leucine-rich repeat (NBS-LRR) sequences encoding resistance gene homologs (245 on Cn and 180 on An). Of these, 75% (153 An and 224 Cn) are syntenic to Ar and Co progenitors (7) (table S50 and figs. S33 and S34). We confirmed the absence of five NBS-LRR genes from the An subgenome, three from the Cn subgenome, and three from B. rapa (Ar), with none absent from B. oleracea Co. This variation may reflect differential selection for resistance to diseases.

B. napus morphotypes show broad adaptation to different climatic zones and latitudes. A key adaptive gene controlling vernalization and photoperiod responses, FLOWERING LOCUS C (FLC) is expanded from one copy in A. thaliana to four in B. rapa and B. oleracea and nine or more in B. napus (7) (table S51). Different FLC homologs lie within HEs, from Cn2 to An2 in the Asian semiwinter oilseed forms Yudal and Aburamasari (Fig. 3) and Cn9 to An10 in late-flowering swedes (fig. S19 and table S51). These loci correspond to important QTLs for vernalization requirement and flowering time (22).

Human cultivation and breeding of B. napus morphotypes may have selected favorable HEs, causing subgenome restructuring of regions containing genes controlling valuable agronomic traits such as those shown here for oil biosynthesis, seed GSL content, disease resistance, and flowering. Because B. napus is a young allopolyploid beginning gene loss and genome reorganization, further partitioning of expression may become a key determinant for the long-term preservation of its duplicated genes (23). The integrative genomic resources that we report provide unique perspectives on the early evolution of a domesticated polyploid and will facilitate the manipulation of useful variation, contributing to sustainable increases in oilseed crop production to meet growing demands for both edible and biofuel oils.

,
A comparative transcriptomic study of an allotetraploid and its diploid progenitors illustrates the unique advantages and challenges of RNA-seq in plant species. Am. J. Bot.99,
383–396 (2012).doi:10.3732/ajb.1100312pmid:22301896

,
Shifts in the evolutionary rate and intensity of purifying selection between two Brassica genomes revealed by analyses of orthologous transposons and relics of a whole genome triplication. Plant J.76,
211–222 (2013).pmid:23869625

,
Assessment of FAE1 polymorphisms in three Brassica species using EcoTILLING and their association with differences in seed erucic acid contents. BMC Plant Biol.10,
137 (2010).doi:10.1186/1471-2229-10-137pmid:20594317

,
Characterization of metabolite quantitative trait loci and metabolic networks that control glucosinolate concentration in the seeds and leaves of Brassica napus. New Phytol.193,
96–108 (2012).doi:10.1111/j.1469-8137.2011.03890.xpmid:21973035

Acknowledgments: Sequence Read Archive accession numbers of B. napus sequencing data are ERP005275 and PRJEB6069, and those of B. rapa and B. oleracea data are PRJNA248388 and PRJNA158027. The B. napus assembly is available at ENA (European Nucleotide Archive), in the WGS section for contigs (accession numbers CCCW010000001 to CCCW010044187) and the CON section for scaffolds, chromosomes, and annotation (accession numbers LK031787 to LK052685). The B. napus genome is also available at CoGe (https://genomevolution.org/CoGe/) and at (www.genoscope.cns.fr/brassicanapus) the Genoscope Genome Database, with additional tools for comparative genomic analysis. The B. napus segregating populations Darmor-bzh × Yudal and Darmor × Bristol are available at INRA-IGEPP, Rennes, France, under a material transfer agreement. This project was funded by the French ANR (Agence Nationale de la Recherche, www.agence-nationale-recherche.fr) 2009 (ANR-09-GENM-021) to B.C., P.W., D.B., and R.D., with additional funding from Sofi-Proteol for bioinformatic personnnel (J.J.); the National Basic Research Program of China (2011CB109300) for S.L., Y.Z., C.G., and W.H.; and the Canadian Canola Genome Sequencing Initiative (http://aafc-aac.usask.ca/canseq/, Genome Alberta, and industry partners) for I.A.P.P., A.G.S., C.K., and C.S. Research leaders are B.C., S.L., I.A.P.P., X.W., I.B., R.D., J.B., D.E., Y.Z., W.H., A.G.S., A.H.P., C.G., and P.W. Additional acknowledgements and author contributions are included in the supplementary materials.