Abstract

Background

Previous studies suggest that the metabolic pathway structure influences the selection and evolution rates of involved genes. However, most of these studies have exclusively considered a single gene copy encoding each enzyme in the metabolic pathway. Considering multiple-copy encoding enzymes could provide direct evidence of gene evolution and duplication patterns in metabolic pathways. We conducted a detailed analysis of the phylogeny, synteny, evolutionary rate and selection pressure of the genes in the isoflavonoid metabolic pathway of soybeans.

Results

The results revealed that 1) only the phenylalanine ammonia-lyase (PAL) gene family most upstream from the pathway preserved all of the ancient and recent segmental duplications and maintained a strongly conserved synteny among these duplicated segments; gene families encoding branch-point enzymes with higher pleiotropy tended to retain more types of duplication; and genes encoding chalcone reductase (CHR) and isoflavone synthase (IFS) specific for legumes retained only recent segmental duplications; 2) downstream genes evolved faster than upstream genes and were subject to positive selection or relaxed selection constraints; 3) gene members encoding enzymes with high pleiotropy at the branching points were more likely to have undergone evolutionary differentiation, which may correspond to their functional divergences.

Conclusions

We reconciled our results with existing controversies and proposed that gene copies at branch points with higher connectivity might be under stronger selective constraints and that the gene copies controlling metabolic flux allocation underwent positive selection. Our analyses demonstrated that the structure and function of a metabolic pathway shapes gene duplication and the evolutionary constraints of constituent enzymes.

Keywords

Background

Isoflavonoid phytoalexins are phenolic secondary metabolites. An increasing number of studies have demonstrated that isoflavonoid phytoalexins play an important role in plant defense against pathogens. Isoflavonoid phytoalexins (e.g., medicarpin or glyceollin) are distributed predominantly in leguminous plants and are synthesized through the central phenylpropanoid pathway and legume-specific isoflavonoid branch pathways [1]. To date, plant genetics and biochemical studies have resulted in the isolation and characterization of most of the structural genes involved in phytoalexin production [2, 3]. Although the isoflavonoid phytoalexin pathway (to simplify, we refer to it as the isoflavonoid pathway) is one of the most studied secondary metabolic pathways in plants, systematic molecular evolution analyses of isoflavonoid pathway genes in soybeans remain scarce.

In the process of legume divergence, the soybean (2n = 4× = 40) has undergone two rounds of whole genome duplication (WGD) events. The ancient WGD event most likely predated the split between soybean and Medicago truncatula (2n = 2× = 16) approximately 50 to 60 million years ago (mya) [4–7], which may have contributed to survival after the Cretaceous-Tertiary extinction event [8]. The recent WGD was found to be an allopolyploidization event that occurred independently in the soybean approximately 5 to 15 mya [8–14]. When whole-genome or chromosome segmental duplication occurs, genes are either lost or retained as repeat genes. Regarding the duplication retention patterns of gene families in metabolic pathways, studies have shown that gene families that encode highly connected enzymes tend to retain more duplicated copies than the gene families related to enzymes with fewer connections [15]. However, no detailed investigation regarding the correlation between gene duplication patterns and the gene’s position or function in metabolic pathways is available.

A primary objective of molecular evolutionary research is to elucidate the driving forces that dominate the variation and mechanisms of molecular evolution. Recently, many studies have focused on how selection acts on the genes involved in metabolic pathways [16, 17]. However, controversies related to inconsistencies among research findings exist. For example, several studies have found that genes encoding upstream enzymes were subject to stronger selective constraints and therefore evolved more slowly than genes encoding downstream enzymes [18–22]. However, investigations into the phenylpropanoid pathway in Arabidopsis thaliana[23], the gibberellin pathway in the Oryzeae tribe [24] and the starch pathway in Oryza sativa[25] failed to provide evidence of a correlation between the positions of genes in the pathway and selective constraints or evolutionary rates. In addition, in terms of the branch-point enzymes acting at the center of the metabolic pathways, some theoretical analyses and empirical studies have concluded that adaptive substitutions tend to be concentrated in branch-point genes and therefore tend to be subject to positive selection [26–28]. One opposing observation is that nonsynonymous substitution occurs less frequently in branch-point genes and thus reflects greater selective constraint in these genes [24].

Most of the abovementioned evolutionary pattern studies have exclusively considered a single gene copy that encoded each enzyme in the metabolic pathway. However, it remains to be determined whether the above conclusions still apply when multiple copies encoding each enzyme are considered. Homologous copies could incur functional differentiation after gene duplication, including pseudogenization [29], sub-functionalization and [30] neo-functionalization [31], which might lead to evolutionary divergence. If only single-copy encoding enzymes are considered in the study of the relationship between evolutionary patterns and the positions of enzymes in the metabolic pathway, bias toward the evolution of metabolic pathway genes might result. Experimental studies have suggested that several gene copies encoding branch-point enzymes located at the central isoflavonoid biosynthetic pathway in soybeans exhibit functional differentiation, such as 4-coumarate:CoA ligase (4CL) and chalcone isomerase (CHI) [32, 33]. This phenomenon is expected in soybean due to polyploidy; however, whether there are differential evolution patterns among these functionally differential copies and whether differential evolution patterns are relevant to pleiotropy or to the connectivity of divergent gene copies remain unknown. Above all, considering multiple-copy encoding enzymes in the study of differential evolutionary patterns of genes in metabolic pathways could provide clearer evidence of the way selection power acts on genes in metabolic pathways than previous studies have offered.The isoflavonoid phytoalexin pathway in soybeans provides an excellent system for investigating the influence of metabolic pathway structure on the duplication and evolutionary patterns of enzymes. First, multiple copies of gene-encoding enzymes in the isoflavonoid biosynthetic pathway were retained as the soybean experienced two rounds of WGD events. Second, the pathway contains nine major enzymes acting at different positions and four metabolic nodes of the isoflavonoid biosynthesis pathway. Therefore, the network topology is suitable to investigations of the effect that the positions and functions of enzymes in the pathway on gene duplication and evolutionary patterns (Figure 1). Metabolic node substances are defined as the substances that participate in two or branching pathways. Branch-point enzymes are the enzymes that catalyze these reactions and primarily include4-coumarate:CoA ligase (4CL), chalcone synthase (CHS), chalcone reductase (CHR), CHI and isoflavone synthase (IFS). In general, 4CL participates in most of the pathways involved in the biosynthesis of lignin, flavone, flavonol, anthocyanin and isoflavonoid. CHS and CHI also participate in these pathways, with the exception of the lignin biosynthesis pathway. In contrast, CHR and IFS only control isoflavonoid biosynthesis. Therefore, we grouped 4CL, CHI and CHS into enzymes with greater pleiotropy and CHR and IFS into enzymes with less pleiotropy in the branching pathway. To avoid ambiguous boundaries in the branching pathway, the most upstream enzymes, phenylalanine ammonia-lyase (PAL) and cinnamate 4-hydroxylase (C4H), are classified as the upstream enzymes, whereas the most downstream enzymes, isoflavone O-methyltransferase (IOMT) and isoflavone reductase (IFR), are downstream enzymes.

Figure 1

The isoflavonoid phytoalexin synthesis pathway in soybeans. Terminal productions of the phenylpropanoid pathway are in brackets. The most upstream and downstream enzymes are indicated in blue. The branch-point enzymes are indicated in red. Metabolic node substances are labeled in bold. Dotted arrows represent multiple or unclear steps.

Considering these enzymes together, we identified the isoflavonoid biosynthesis pathway genes in the soybean genome to investigate duplication and evolutionary patterns by analyzing the phylogenies, synteny, evolutionary rate and selection pressure of the genes in each gene family. We sought to examine the following: 1) whether different duplicate retention patterns exist in different gene families and how pathway position, node connectivity and pleiotropy affect the duplicate retention patterns; 2) whether positive selection signatures and evolutionary rate heterogeneity exist in genes that encode the enzymes involved in the isoflavonoid biosynthesis pathway and how they are distributed in this pathway; and 3) whether differential evolutionary patterns exist among multiple gene copies that encode each enzyme involved in the isoflavonoid biosynthesis pathway in soybeans.

Results

Gene duplication patterns in isoflavonoid pathway gene families

The isoflavonoid pathway gene sequences were used to search against seven plant genome databases. PAL, C4H, 4CL, CHS and CHI were detected in all flowering plants surveyed, whereas CHR, IFS, IOMT and IFR were specific to legumes (Table 1). The copy number for each gene varied among plants. A maximum of a five-fold size difference existed in CHS, and 4 and 21 homologs were found in A. thaliana and M. truncatula, respectively. Diverse copy numbers were also detected among genes. There were 7 PAL, 3 C4H, 6 4CL, 12 CHS and 6 CHI homologous genes, on average, in the plants we surveyed. In addition, the copy numbers of CHR, IFS, IOMT and IFR were 4, 3, 10 and 6, respectively. Clearly, the number of CHR and IFS genes was significantly less than the number of IOMT and IFR genes.

Table 1

The number of genes of each gene family in the isoflavonoid pathway in various plants and the number of ancestral genes in the most recent common ancestor (MRCA) of different lineages

Glycine max

Phaseolus vulgaris

Medicagotruncatula

Cicerarietinum

Arabidopsis thaliana

Vitisvinifera

Oryza sativa

Average

Legumesa

Dicotsb

Dicot/monocotc

PAL

8

6

6

5

4

11

9

7

4

1

1

C4H

4

3

2

3

1

2

4

3

3

2

2

4CL

9

5

4

6

6

4

7

6

5

4

3

CHS

13

11

21

4

4

14

15

12

4

3

2

CHI

8

7

9

5

5

3

4

6

5

3

3

CHR

2

2

4

6

0

0

0

4

1

0

0

IFS

2

3

3

2

0

0

0

3

1

0

0

IOMT

17

6

12

4

0

0

0

10

3

0

0

IFR

8

7

4

6

0

0

0

6

5

0

0

aAncestral genes in the MRCA of legumes.

bAncestral genes in the MRCA of dicots.

cAncestral genes in the MRCA of dicots and monocots.

Phylogenetic analyses allow us to identify the evolutionary history of the isoflavonoid pathway genes. Genes from legumeswere clustered in one clade. Homologs from A. thaliana and Vitis vinifera were located outside of legumes, and homologs from rice gathered in the outermost regions. Therefore, the gene trees containing different flowering plants were congruent with the species phylogeny. To better understand how these genes evolved in plants, we estimated the number of isoflavonoid pathway genes in the most recent common ancestor (MRCA) of dicots and monocots, that of dicot and that of legumes, respectively (Table 1, Figure 2 and see Additional file 1: Figure S1). Reconciliation of the gene trees for PAL, C4H, 4CL, CHS and CHI revealed 1, 2, 3, 2 and 3 ancestral genes, respectively, in the MRCA of dicots and monocots. When the numbers of ancestral genes were compared with the gene numbers in each plant, it appeared that the expansion was uneven among gene families. The average size of C4H, 4CL and CHI increased approximately 1.4- to 2-fold after the dicot/monocot split ~145 mya. In contrast, dramatic expansions occurred in PAL and CHS, the size of which increased 7- and 6-fold, respectively, since the divergence of dicots and monocots. The numbers of PAL, C4H, 4CL, CHS and CHI genes in the MRCA of dicots are similar to those in the MRCA of dicots and monocots, indicating that no major expansion occurred before the divergence of dicots. When the numbers of isoflavonoid pathway genes in the MRCA of legumes were compared with those in each legume, uneven expansions were also detected among gene families. CHS, CHR and IOMT have tripled in size since the split of legumes ~54 mya [34]. In contrast, the sizes of other families have been relatively stable since the legume split.

Figure 2

Phylogenic relationship of thePALgene family members from 7 species. Each species was labeled with different shapes as presented in the figure. Genes reported from the NCBI are highlighted with a blue triangle. The genes sequenced in our study are highlighted with a red dot. The red circles at the nodes represent ancestral genes in the MRCA of legumes. The red triangles and rectangles represent ancestral genes in the MRCA of dicots and those of dicots and monocots, respectively. The nodes with bootstrapping lower than 50% are not shown. Each red square bracket represents one recent segmental duplication; each blue square bracket represents one old segmental duplication.

To more precisely investigate the duplication patterns of isoflavonoid pathway genes, the physical locations of all homologs of each family were positioned and categorized as segmental or tandem duplications. On one hand, we conducted synteny analysis and dated segmental duplication events (Figure 2, Table 2 and see Additional file 2: Figure S2). Segmental duplications following the first round of WGD (50 to 60 mya) and the second round of WGD (5 to 15 mya) were classified into old and recent segmental duplications, respectively. On the other hand, we classified tandem duplications that occurred before Leguminosae differentiation (duplications shared by soybeans and M. truncatula) as old tandem duplications and those that were specific to soybeans as recent tandem duplications. We summarized the recent and old duplications among each gene family (Table 3). Old segmental duplications were exclusively detected in the most upstream gene family, PAL. Recent duplications occurred at each of the old segmental duplicated genes. Finally, the PAL gene family retained both old and recent segmental duplications. Meanwhile, a perfect synteny relationship was detected among these duplicated genes (Figure 3). Recent segmental duplication occurred approximately four times in the CHS gene family and three times each in PAL, 4CL and CHI; in contrast, the CHR and IFS gene families only retained copies from one to two recent segmental duplications. Old tandem duplication occurred once and twice in the CHI and IFR gene families, respectively, and all of the old duplications were retained in both soybeans and M. truncatula during the Leguminosae evolution process. One old tandem duplication was detected in the 4CL gene family, but one of the tandem duplicated genes was lost in M. truncatula. Recent tandem duplication occurred five times in both the CHS and IOMT gene families; thus, those families contained the greatest number of gene copies. The recent tandem duplication of the IOMT gene family occurred before the second segmental duplication event. Unlike the IOMT gene families, however, the recent tandem duplications in CHS, PAL, 4CL and IFR occurred after the recent segmental duplication.

Comparing π, dN and dSin upstream and downstream genes

Partial coding sequences of the twenty-one gene members from the eight gene families involved in the isoflavonoid biosynthesis pathway were isolated from 33 sampled Chinese soybean accessions; additional sequence information for two gene copies of IFS was also used in this study [35]. The π, dN and dS values in the upstream and downstream genes are listed in Table 4. Upon comparison of polymorphism and evolutionary rates among the different gene copies encoding each upstream and downstream enzyme, similar levels of these parameters were observed among these multiple copies. Next, we compared the evolutionary patterns of the upstream and downstream genes. The π values ranged from 0.00 (cinnamate 4-hydroxylase [C4H]) to 0.16 (PAL1) in the upstream gene copies. In contrast, the values ranged from 0.20 (IFR) to 0.44 (IOMT) in the downstream gene copies. The average π value of all of the downstream gene copies (0.28) was four-fold higher than that of all upstream gene copies (0.07) in the pathway (Table 4). This finding suggests that the rate of nucleotide polymorphism in the downstream genes was significantly increased compared with the upstream genes (P < 0.05). The dN value also varied greatly between the upstream and downstream genes; values ranged from 0.18 (IOMT) to 0.37 (IOMT) in the downstream genes compared with 0 for three gene members to 0.02 (for PAL2) in the the upstream genes. The average dN value of all of the downstream gene copies (0.29) was approximately 30-fold higher than all upstream gene copies (0.01) in the pathway. However, the average dS values were similar for the upstream (0.28) and downstream genes (0.26).

Regarding branch-point enzymes that are centrally located in the metabolic pathway, we observed divergent evolutionary patterns among different gene copies encoding each high pleiotropic branch-point enzyme (4CL, CHS, CHI) regardless of the level of polymorphism or the evolutionary rate (Table 5). For example, the π values of multiple copies encoding CHI varied greatly from 0.00 (CHI1B2 and CHI4) to 0.79 (CHI3), and the dN/dS ratio of multiple copies encoding CHS ranged widely from 0.00 (CHS8) to 1.49 (CHS2). In contrast, no significant differences in evolutionary pattern between different copies encoding the less-pleiotropic branch-point enzymes (CHR, IFS) were observed.

Table 5

Evolutionary pattern of genes encoding for branch-point enzymes

Pleiotropy

Enzyme

Gene locus

π(%)

dN(%)

dS(%)

dN/dS

LRT

Higher

4CL

Glyma17g07190(4CL1)

0.18

0.14

0.30

0.48

0.00

Glyma13g44950(4CL2)

0.40

0.39

0.40

0.96

28.98**

Glyma11g01240(4CL3)

0.09

0.12

0.00

0.12e-2/0

66.60**

CHS

Glyma05g28610(CHS2)

0.17

0.18

0.12

1.49

112.46**

Glyma01g43880(CHS7)

0.03

0.04

0.00

0.04e-2/0

0.00

Glyma11g01350(CHS8)

0.07

0.00

0.29

0.00

0.00

CHI

Glyma10g43850(CHI1B2)

0.00

0.00

0.00

0/0

0.00

Glyma20g38580(CHI2)

0.40

0.23

0.97

0.24

60.41**

Glyma06g14820(CHI4)

0.00

0.00

0.00

0/0

0.00

Glyma13g33730(CHI3)

0.79

0.58

1.49

0.39

0.40

Lower

CHR

Glyma02g47750

0.27

0.00

1.29

0.00

0.00

Glyma14g00870

0.11

0.07

0.25

0.30

0.00

IFS

Glyma07g32330(IFS1)

0.22

0.12

0.56

0.21

0.00

Glyma13g24200(IFS2)

0.17

0.12

0.32

0.38

0.00

LRT statistics are twice the log-likelihood differences between M7 and M8.

Significant positive sites are labeled in bold.

*0.01 < P < 0.05; **0.001 < P < 0.01.

Detection of positive selection

A significant rate of heterogeneity among the isoflavonoid biosynthesis pathway genes, particularly for dN/dS ratios that span more than one order of magnitude, might result either from the intense purifying selection on slowly evolving genes (e.g., PAL) or from the frequent episodes of positive selection or relaxed purifying selection on fast-evolving genes (e.g., 4CL, CHS). Instead of overall dN/dS ratios that detect accumulated mutations across the entire gene region, maximum likelihood analyses of ω conducted by PAML [36] can effectively reveal positive selection acting among specified sites. This analysis revealed that significantly positive sites existed in many of the genes that encode enzymes in the isoflavonoid biosynthesis pathway, such as 4CL2, 4CL3, CHS2, CHI2 and IFR2 (Tables 4 and 5). The existence of these sites indicates that positive selection tended to occur in branch-point enzymes, especially those with higher pleiotropy.

Discussion

Duplication pattern of gene families in the isoflavonoid pathway

Our study investigated the retention patterns of duplicated genes involved in the isoflavonoid biosynthesis pathway following two WGD events in soybeans and attempted to determine whether any correlation exists between a gene family’s duplication pattern and its position or function in the pathway. In our study, only the PAL gene family, which was the most upstream of the entire pathway, preserved all of the ancient and recent segmental duplications and maintained a strongly conserved synteny among these duplicated segments. PAL is the entry point enzyme of the phenylpropanoid pathway and directly controls the production of multiple secondary metabolites downstream. PAL plays important roles in plant development and variable environment responses. Regarding the perfect synteny maintained among PAL gene regions in soybean, we suggest that the chromosome regions flanking the PAL genes were very stable to ensure the gene’s functional stability.

Although the recent segmental duplication was preserved in all gene families in the soybean isoflavonoid synthetic pathway, the number of duplications was asymmetrically distributed in the branching pathway. Gene families encoding enzymes with higher pleiotropy tend to retain more recent segmental duplications (3 to 4) than those with lower pleiotropy (1 to 2). In addition to recent segmental duplications, gene families with higher pleiotropy reserved more gene copies from other types of duplication, such as the old tandem duplication in 4CL and CHI and therecent tandem duplication in 4CL and CHS. In comparison, branch-point gene families with lower pleiotropy, such as CHR and IFS, only retained 4 and 2 copies, respectively, from recent segmental duplication. We hypothesized that enzymes with higher pleiotropy catalyze metabolic node substances that are responsible for a greater range of downstream production and thus required more copies to reinforce their function. However, both CHR and IFS, which are legume-specific genes, play a pivotal role in isoflavonoid biosynthesis. An increased number of copies and copy duplication types were detected in one of the downstream enzymes, IOMT (14 copies). A previous study revealed that recombinant M. truncatula IOMTs had the ability to catalyze differential conformational substrates [37]; thus, we inferred that more copies were needed to maintain distinct catalytic properties.

Relationship of substitution rate and selection to pathway structure

Evolutionary patterns between upstream and downstream genes were significantly distinct; the average π value and nonsynonymous substitution rate were significantly increased in downstream genes, whereas the synonymous substitution rate was similar between the upstream and downstream genes. This result indicated that downstream genes evolved faster than upstream genes, and this phenomenon is potentially not attributable to mutation rates given the similar synonymous substitution rates. A possible interpretation is that downstream enzymes are subject to positive selection or relaxed selection. Consistently, positive selection was detected in the gene-encoding downstream IFR enzyme. Increased nonsynonymous substitution rates in downstream genes compared with upstream genes were also observed in anthocyanin pathway genes in Ipomoea. This finding was attributed to relaxed constraints on the downstream genes rather than positive selection, as the investigators failed to detect positive selection in this pathway [20, 21]. In comparison, these two selection pressures were thought to influence the nucleotide patterns in the carotenoid and terpenoid pathway [18, 22]. Two explanations potentially account for the phenomenon that upstream enzymes evolved more slowly than downstream enzymes and were subject to strong purifying selection [22]. First, upstream enzymes evolve to exert more control over metabolic fluxes than downstream enzymes [17, 38]. Second, upstream enzymes influence a larger number of pathway end products than downstream enzymes [16, 18, 19]. Nevertheless, stronger selective constraints on upstream genes cannot be applied to all metabolic pathways. For example, no correlation was detected between pathway position and various estimates of nucleotide divergence of the genes in the gibberellin and starch pathways in rice [24, 25] or in the phenylpropanoid metabolism pathway in A.thaliana[23]. This absence of correlation was potentially observed because the researchers considered branch-point enzymes and did not compare the most upstream enzymes with the most downstream enzymes. In our study, the different gene copies encoding branch-point enzymes tended to exhibit differential evolution rates and selective constraints. For this reason, considering a single copy when branch-point enzymes are coded by multiple copies might affect the comparison.

Evolutionary pattern of branch-point enzymes

According to our study, the gene members encoding the enzymes (such as 4CL, CHS, and CHI) with high pleiotropy at the branching points tend to retain more gene copies. These duplicated copies are more likely to have evolutionary differences, corresponding to their functional divergences. Divergent members of the 4CL gene family have been identified in soybeans and exhibited pronounced differences in the ability of the isoenzymes to catalyze different substrates [32]. For example, 4CL catalyzes the reaction of p-coumarate to produce the final products flavone, flavonol, anthocyanin and isoflavonoid. In addition, the recombinant 4CL1 isoform could utilize several other substrates (such as ferulate and sinapate) to channel flux to the lignin biosynthetic pathway. Given that it accepted the broadest range of substrates, 4CL1 was shown to be under purifying selection. In contrast, positive selection was detected in 4CL2 and 4CL3, which acted on fewer substrates with great efficiency.

Previous efforts to compare global gene expression in two soybean cultivars with different seed isoflavonoid contents have demonstrated that the CHS7 and CHS8 genes play significant roles in isoflavonoid synthesis [39]. The gene loci corresponding to CHS7 and CHS8 were subject to strong purifying selection. In comparison, CHS2 was under positive selection, as demonstrated by a dN/dS value greater than 1 and the existence of positive sites in PAML analysis.

Functional divergence has also been experimentally proven to exist in the CHI enzyme, as indicated by differences in the level of expression and kinetics among the various types of soybeans [33]. For instance, CHI1B2 members belonging to Type II CHI use a variety of chalcone substrates and are coordinately regulated with an isoflavonoid-specific gene, whereas CHI2 members belonging to Type I CHI exclusively use naringenin chalcone as the substrate and are coordinately regulated with other flavonoid-specific genes. In our study, CHI1B2 underwent purifying selection, whereas CHI2 was under positive selection. This observation was consistent with the conclusion that gene copies with an ability to utilize a larger range of substrates (4CL1,CHI1B2) underwent purifying selection, whereas those using a narrower range of substrates (4CL2, 4CL3 and CHI2) underwent positive selection. A reasonable explanation could be that the gene copies at branch points with wide-ranging catalytic activities and high connectivity are under stronger selective constraint [15, 24]. In addition, greater pleiotropy, which is often believed to experience stronger selective constraint [18], potentially plays a role in the higher connectivity in branch-point enzymes in the isoflavonoid pathway. The gene copies (4CL2, 4CL3 and CHI2) that catalyze a narrower range of substrates, mainly leading to the production of 4-coumaroyl-CoA and naringenin chalcone in the metabolic branch, potentially play more important roles in flux allocation [22]. Our study results reinforced previous findings that branch-point enzymes were the targets of positive selection in the central metabolism of Drosophila[27], the starch pathway in maize [28] and the asparagine N-glycosylation metabolic pathway among human populations [40]. These results provided further evidence that positive selection events were asymmetrically distributed in multiple-copy genes encoding branch-point enzymes and were concentrated on gene copies with more powerful control over metabolic flux allocation. We hypothesized that these positive selection events contribute to higher isoflavonoid content in soybean or legumes. This hypothesis requires further investigations comparing a large number of legumes or non-legume species.

We originally hypothesized that legume-specific genes that directly contribute to isoflavonoid content should be subject to positive selection during legume evolution. However, we failed to detect positive sites in the genes encoding the CHR and IFS enzymes, which play a pivotal role in isoflavonoid biosynthesis and have been reported to be specific to legumes [41, 42]. A possible explanation could be that we only conducted positive selection tests in Glycine without considering other taxonomic groups. To adapt to new environments, legume-specific genes might be subject to positive selection during the early evolution stage. Thereafter, purifying selection might affect these genes and create functional stability with the completion of function evolution [43–46].

Surprisingly, we also observed that the gene copies that were more heavily involved in isoflavonoid synthesis (CHI1B2, CHS7 and CHS8) and that the legume-specific genes (CHR and IFS) directly controlling the production of isoflavonoids were all subject to purifying selection. We propose that these enzymes in the isoflavonoid pathway experienced convergent evolution and were subject to similar selection. This hypothesis requires substantial functional evidence in plants for confirmation.

Conclusions

Although numerous evolutionary studies on pathway genes have been performed, evidence on the duplication pattern of multiple-copy gene families is lacking. The results of this study provide evidence that there exists correlation between a gene family’s duplication as well as evolutionary patterns and its position or function in the pathway. It is noteworthy that gene copies encoding branch-point enzymes with high pleiotropy tend to possess evolutionary divergence and undergo many duplication events. This result underscores the need for multicopy-based approaches in studies of the molecular evolution of metabolic pathways. Interestingly, the evolutionary differentiation of gene copies located at branch points potentially corresponds to their functional divergences (e.g., the gene copies closely related to isoflavonoid synthesis were all subject to purifying selection). More intensive molecular evolution studies on multiple gene copies involved in this pathway would offer profound insight for engineering isoflavonoid composition in soybean.

Methods

Sequence collection

Seven plant genomes were used to identify the isoflavonoid synthesis pathway genes. Their sequences and corresponding annotations were downloaded from online databases (see Additional file 3: Table S1). The sequences used to generate the BLAST results were the reported coding DNA sequences (CDS) of putative soybean isoflavonoid synthesis genes; the exception was isoflavone O-methyltransferase (IOMT) gene, which has been identified only in M. truncatula (see Additional file 3: Table S2). Both BLASTn and BLASTp were conducted for non-legumes with cutoff E values of 1e-20 and 1e-100, respectively; BLASTn was conducted for legumes, with a cutoff E value of 1e-20.

Phylogenetic analyses

We conducted phylogenetic analyses using collected sequences of each gene family (excluding pseudogenes). Protein sequences were initially aligned using ClustalW 1.83 with the default options and MEGA Version 5.0 [47, 48] for manual alignment corrections. The amino acid alignments were then used to guide the alignments of nucleotide coding sequences (CDSs). Phylogenetic trees were constructed based on the bootstrap neighbor-joining (NJ) method with a Kimura 2-parameter model by MEGA 5.0. The stability of internal nodes was assessed using bootstrap analysis with 1000 replicates (Figure 2 and see Additional file 1: Figure S1).

Synteny analyses

Syntenic information for chromosome segments that include genes in the isoflavonoid biosynthesis pathway within the soybean genome was collected from the Plant Genome Duplication Database [49]. To establish synteny between soybeans and M. truncatula, target genes and flanking region genes extending 100 kb in each direction were blasted againstM. truncatula genome sequences, and alignments with an E-value < 1e−10 were considered significant matches.

Calculating KSand dating the duplication event

Protein sequences of the gene pairs were aligned using Jalview, and the results were used to guide CDS alignments in Pal2Nal [50]. KS, the number of synonymous substitutions per site, was calculated using the CodeML program in PAML 4.3 with all alignment gaps excluded [36].

In dating segmental duplication events, six or fewer consecutive anchor points on each side of the isoflavonoid synthesis genes were chosen and used to calculate the average KS after the minimum and maximum were removed. The approximate date of the segmental duplication event was calculated using the mean synonymous substitution rate (λ), which was 6.1 × 1e−9 for Fabaceae according to the formula T = Ks/2λ [51].

Analysis of polymorphism and positive selection

Several identified gene copies within each gene family participating in isoflavonoid biosynthesis were chosen and sequenced in 33 Chinese soybean accessions, the collection locations of which are described in Cheng et al. [52]. Seeds from all of the accessions were obtained from Germplasm Storage of the Chinese National Center for Soybean Improvement (Nanjing Agricultural University, Nanjing, China). The sequenced regions of these genes were the most polymorphic regions based on 31 resequencing wild and cultivated soybean genomes [53]. For multi-copy genes, PCR primers were designed for specific regions flanking the sequencing regions, and sequencing primers were designed for sequencing regions (see Additional file 3: Table S3). PCR was conducted in a 50-μL reaction volume using KOD FX Neo polymerase (Toyobo, Japan) for 1 cycle of 3 min at 94°C; followed by 33 cycles of 30 s at 94°C for denaturation, 30 s at the annealing temperature for the respective primer pairs and 1 min at 68°C for extension; followed by 1 cycle of 10 min at 68°C using a PTC-225 thermal cycler (MJ Research, Watertown, MA). The PCR products were purified using the AxyPrep DNA Gel Extraction Kit (AxyGEN, Hangzhou China) and then sequenced on an ABI 3100A automated sequencer. All DNA sequences have been submitted to the GenBank databases (accession numbers KJ010826-KJ010858 and KM012193-KM012842).

The average pairwise nucleotide sequence diversity parameter (π) was calculated with DnaSP v5.0 using the Jukes and Cantor correction [54]. To determine whether any of these enzymes exhibited evidence of positive selection, we calculated the ratio of nonsynonymous (dN) to synonymous (dS) nucleotide substitution rates (dN/dS). We also used a maximum likelihood (ML) method to reveal the sites with significant positive selection. The ω ratio was calculated using the CodeML procedure in the phylogenetic analysis from PAML [36]. The likelihood ratio test (LRT) statistic for positive selection was conducted based on the comparison of M7 and M8 codon substitution models.

Notes

Shanshan Chu, Jiao Wang contributed equally to this work.

Abbreviations

PAL:

Phenylalanine ammonia-lyase

CHR:

Chalcone reductase

IFS:

Isoflavone synthase

WGD:

Whole genome duplication

mya:

Million years ago

4CL:

4-coumarate: CoA ligase

CHS:

Chalcone synthase

CHI:

Chalcone isomerase

IOMT:

Isoflavone O-methyltransferase

IFR:

Isoflavone reductase

C4H:

Cinnamate 4-hydroxylase

MRCA:

Most recent common ancestor.

Declarations

Acknowledgements

We thank Dr. Sihai Yang of Nanjing University for his critical reading of the manuscript. This work was supported in part by National Basic Research Program of China (973 Program) (2010CB125906), Key Transgenic Breeding Program of China (2013ZX08004-003), National Natural Science Foundation of China (31271749, 31301342) and Nanjing Agricultural University Youth Science and Technology Innovation Fund (KJ2013002).

Electronic supplementary material

12863_2013_1266_MOESM1_ESM.pdfAdditional file 1: Figure S1: The phylogenetic trees of genes in the isoflavonoid synthesis pathway from 7 species. Each species was labeled with different shapes as shown in the figure. Genes reported in NCBI are highlighted with a blue triangle. The genes sequenced in our study are highlighted with a red dot. The red circles at the nodes represent ancestral genes in the MRCA of legumes. The red triangles and rectangles represent ancestral genes in the MRCA of dicots and those of dicots and monocots, respectively. The nodes with bootstrapping lower than 50% are not shown. Each red square bracket represents one recent segmental duplication. a, C4H gene family. b, 4CL gene family. c, CHS gene family. d, CHI gene family. e, CHR gene family. f, IFS gene family. g, IOMT gene family. h, IFR gene family. (PDF 804 KB)

12863_2013_1266_MOESM3_ESM.pdfAdditional file 3: Table S1: All species used and their online databases. Table S2. The sequences used to conduct BLAST. Table S3. Primers used in this study. (PDF 159 KB)

Below are the links to the authors’ original submitted files for images.

Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.