Abstract

Background

Fermentative bacteria offer the potential to convert lignocellulosic waste-streams into biofuels such as hydrogen (H2) and ethanol. Current fermentative H2 and ethanol yields, however, are below theoretical maxima, vary greatly among organisms, and depend on the extent of metabolic pathways utilized. For fermentative H2 and/or ethanol production to become practical, biofuel yields must be increased. We performed a comparative meta-analysis of (i) reported end-product yields, and (ii) genes encoding pyruvate metabolism and end-product synthesis pathways to identify suitable biomarkers for screening a microorganism’s potential of H2 and/or ethanol production, and to identify targets for metabolic engineering to improve biofuel yields. Our interest in H2 and/or ethanol optimization restricted our meta-analysis to organisms with sequenced genomes and limited branched end-product pathways. These included members of the Firmicutes, Euryarchaeota, and Thermotogae.

Results

Bioinformatic analysis revealed that the absence of genes encoding acetaldehyde dehydrogenase and bifunctional acetaldehyde/alcohol dehydrogenase (AdhE) in Caldicellulosiruptor, Thermococcus, Pyrococcus, and Thermotoga species coincide with high H2 yields and low ethanol production. Organisms containing genes (or activities) for both ethanol and H2 synthesis pathways (i.e. Caldanaerobacter subterraneus subsp. tengcongensis, Ethanoligenens harbinense, and Clostridium species) had relatively uniform mixed product patterns. The absence of hydrogenases in Geobacillus and Bacillus species did not confer high ethanol production, but rather high lactate production. Only Thermoanaerobacter pseudethanolicus produced relatively high ethanol and low H2 yields. This may be attributed to the presence of genes encoding proteins that promote NADH production. Lactate dehydrogenase and pyruvate:formate lyase are not conducive for ethanol and/or H2 production. While the type(s) of encoded hydrogenases appear to have little impact on H2 production in organisms that do not encode ethanol producing pathways, they do influence reduced end-product yields in those that do.

Conclusions

Here we show that composition of genes encoding pathways involved in pyruvate catabolism and end-product synthesis pathways can be used to approximate potential end-product distribution patterns. We have identified a number of genetic biomarkers for streamlining ethanol and H2 producing capabilities. By linking genome content, reaction thermodynamics, and end-product yields, we offer potential targets for optimization of either ethanol or H2 yields through metabolic engineering.

Background

Fuel derived from waste-stream lignocellulosic biomass via consolidated bioprocessing is a renewable and carbon-neutral alternative to current petroleum-based fuels [1–3]. Consequently, considerable effort is being made to characterize species capable of efficiently converting lignocellulosic substrates into biofuels. An ideal biofuel producing microorganism should posses several key features, including: (i) high yields of the desired product, (ii) simultaneous utilization of sugars (cellulose, hemicellulose, pectin), and (iii) growth at elevated temperatures, and (iv) low product inhibition. Recent studies have focused on the characterization of numerous cellulose and hemicellulose degrading species of bacteria [4–6]. To fully exploit the biofuel producing potential of these organisms, several genomes have been sequenced and are now available for analysis (http://genome.jgi-psf.org/). While some hemicellulolytic or cellulolytic microorganisms are capable of hydrogen (H2) or ethanol production via fermentation, end-product yields typically are far lower than their maximum theoretical values (4 mol H2 or 2 mol ethanol per mol glucose) when cells are grown in pure culture. This is due to the presence of branched catabolic pathways that divert carbon and/or electrons away from a particular desired end-product [7]. Strategies that optimize yields for a single biofuel (H2 or ethanol) can only be developed through a detailed knowledge of the relationships between genome content, gene and gene product expression, pathway utilization, and end-product synthesis patterns.

Given that our primary focus is to optimize H2 and/or ethanol yields, we restricted our meta-analysis to sequenced organisms with limited branched end-product pathways (i.e. organisms that do not produce butyrate, butanol, propionate, propanol, and acetoin) for which end-product data was available. These included members of the Firmicutes (Clostridium, Caldicellulosiruptor, Thermoanaerobacter, Caldanaerobacter, Ethanoligenens, Geobacillus, and Bacillus species), Euryarchaeota (Thermococcus and Pyrococcus species), and Thermotogae (Thermotoga species). A list of species analyzed and corresponding GenBank accession numbers are summarized in Table 1. With the exception of Caldanaerobacter subterraneus subsp. tengcongensis, Thermoanaerobacter pseudethanolicus, Pyrococcus furiosus, Geobacillus thermoglucosidasius, and Bacillus cereus, all organisms were capable of cellulose and/or xylan saccharification.

We focused on the various metabolic branches involved in pyruvate formation from phosphoenolpyruvate (PEP) and subsequent catabolism of pyruvate into end-products. Although studies comparing the H2 and ethanol-producing potential of several cellulose degrading bacteria have been previously published [8–10], a comprehensive comparison of the major biofuel producing pathways at the genome level has not yet been reported. Here we present a comparison of the genes encoding proteins involved in (i) pyruvate metabolism, (ii) ethanol synthesis, and (iii) H2 metabolism, in order to rationalize reported end-product yields. Results indicate that the presence or absence of specific genes dictating carbon and electron flow towards end-products may be used to infer end-product synthesis patterns and help develop informed metabolic engineering strategies for optimization of H2 and ethanol yields. Furthermore, certain genes may be suitable biomarkers for screening novel microorganisms’ capability of producing optimal H2 or ethanol yields, and may be suitable targets for metabolic engineering strategies for optimization of either ethanol or H2 yields

Methods

Comparative analysis of genome annotations

All sequence data and gene annotations were accessed using the Joint Genome Institute’s Integrated Microbial Genomes (IMG) database [11]. Gene annotations presented in this paper reflect the numbering of the final assembly or most recent drafts available (July, 2012). Comparative analyses were performed using the IMG database. In brief, analyses of all genomes (Table 1) were conducted using three annotation databases independently: i) Clusters of Orthologs Groups (COGs) [12], ii) KEGG Orthology assignments (KO) [13], and (iii) TIGRFAMs [14]. Genes identified using a single database were cross-referenced against the others to identify genes of interest. Functional annotations of the identified genes were evaluated on a case-by-case basis and decisions regarding the annotation accuracy were made using a combination of manual analysis of genomic context, literature searches, and functional prediction through RPS-BLAST using the Conserved Domain Database website [15].

Hydrogenases were classified based on phylogenetic relationships of hydrogenase large subunits according to Calusinska et al. [16]. The evolutionary history was inferred using the Neighbor-Joining method [17]. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the taxa analyzed [18]. The evolutionary distances were computed using the Poisson correction method [19] and are in the units of the number of amino acid substitutions per site. The analysis involved 50 amino acid sequences. All ambiguous positions were removed for each sequence pair. There were a total of 863 positions in the final dataset. Evolutionary analyses were conducted in MEGA5 [20]. Thermodynamic calculations were performed using values provided by Thauer et al.[21] and the CRC Handbook of Chemistry and Physics [21, 22]. BioEdit v.7.0.9.0 [23] was used to perform sequence alignments.

Results and discussion

Survey of End-product yields

A literature survey of end-product yields (normalized to mol end-product per mol hexose equivalent) of the species surveyed in this study is summarized in Table 2. While it is difficult to perform a direct comparison of end-product yields from available literature due to different growth conditions employed (ex. growth substrate, carbon loading, reactor conditions, etc.), and further difficult to validate these data due to incomplete end-product quantifications and lack of corresponding carbon balances and oxidation/reduction (O/R) ratios, it still provides a good approximation of molar end-product yields based on substrate utilization. Calculated end-product yields reveal that the Caldicellulosiruptor, Pyrococcus, Thermococcus, and Thermotoga species surveyed, produced, in most cases, near-maximal H2 yields with concomitant CO2 and acetate production, and little or no ethanol, formate, and lactate [24–40]. It is important to note that while some studies [29–31, 34, 35, 39] report lower overall end-product yields, likely due to a large amount of carbon flux being directed towards biomass production under a given growth condition, H2:ethanol ratios remain high. Cal. subterraneus subsp. tengcongensis, E. harbinense, and Clostridium species displayed mixed end-product fermentation patterns, with comparatively lower H2, CO2, and acetate yields, higher ethanol yields, and generally low formate and lactate yields [10, 41–47]. Ta. pseudethanolicus produced the highest ethanol yields of the organisms surveyed with little concomitant H2, acetate, and lactate production, and no formate synthesis [48–50]. G. thermoglucosidasius and B. cereus produced the highest lactate and formate yields, moderate ethanol and acetate yields, and low H2 and CO2 yields [51, 52].

The assemblage of genes encoding proteins involved in pyruvate metabolism and end-product synthesis dictate, in part, how carbon and electron flux is distributed between the catabolic, anabolic, and energy producing pathways of the cell. The flow of carbon and electrons from PEP towards end-products may be separated into branch-points or nodes which include (i) the PEP/oxaloacetate/pyruvate node, (ii) the pyruvate/lactate/acetyl-CoA node, (iii) the acetyl-CoA/acetate/ethanol node, and the (iv) ferredoxin/NAD(P)H/H2 node [73]. Several different enzymes may be involved in the conversion of intermediate metabolites within these nodes. These enzymes, and the presence of corresponding genes encoding these proteins in each of the organisms surveyed, are summarized in Figure 1. The oxidation of electron carriers (NADH and/or reduced ferredoxin) is required for maintaining glycolytic flux and leads to the ultimate production of reduced products (ethanol, lactate, and H2). Thus, distribution of carbon and electron flux among different pathways can influence levels of reduced electron carrier pools, which in turn can dictate end-product distribution patterns. Genome content can be used to resolve the relationship between carbon and electron flux with end-product distribution.

Genes involved in pyruvate synthesis

All organisms considered in this study utilize the Embden-Meyerhof-Parnas pathway for conversion of glucose to PEP with the following notable variations. Alignments of key residues of phosphofructokinase (PFK) according to Bapteste et al.[74, 75], suggest that P. furiosus, Th. kodakaraensis, Cal. subterraneus subsp. tengcongensis, E. harbinense, G. thermoglucosidasius, and B. cereus encode an ATP-dependent PFK, while Thermotoga, Caldicellulosiruptor, Clostridium, and Thermoanaerobacter species encode both an ATP-dependent PFK, as well as a pyrophosphate (PPi)-dependent PFK [74, 75] (Additional file 1). Furthermore, while bacteria catalyze the oxidation of glyceraldehyde-3-P to 3-phosphoglycerate (yielding NADH and ATP) with glyceraldehydes-3-phosphate dehydrogenase (GAPDH) and phosphoglycerate kinase (PGK), archea (P. furiosus and Th. kodakaraensis) preferentially catalyze the same reaction via glyceraldehyde-3-phosphate ferredoxin oxidoreductase (GAPFOR). This enzyme reduces ferredoxin (Fd) rather than NAD+ and does not produce ATP [76].

In contrast to the generally conserved gene content required for the production of PEP, a number of enzymes may catalyze the conversion of PEP to pyruvate [73] (Figure 1; Table 3). PEP can be directly converted into pyruvate via an ATP-dependent pyruvate kinase (PPK), or via an AMP-dependent pyruvate phosphate dikinase (PPDK). All strains considered in this review encode both ppk and ppdk, with the exception of C. thermocellum strains, which do not encode a ppk, and E. harbinense, G. thermoglucosidasius, and B. cereus, which do not encode ppdk. Given that the formation of ATP from ADP and Pi is more thermodynamically favorable than from AMP and PPi (△G°’ = 31.7 vs. 41.7 kJ mol-1), production of pyruvate via PPK is more favorable than via PPDK [21].

Flux balance analysis integrated with RNAseq data suggests higher carbon and electron flux in C. thermocellum ATCC 27405 is directed through enzymes capable of direct, rather than indirect, conversion of PEP to pyruvate [77]. However, C. cellulolyticum mutation studies suggests that a portion of PEP can also be converted to pyruvate via the “malate shunt” [78]. This PPK/PPDK bypass system utilizes either (i) phosphoenolpyruvate carboxykinase (PEPCK), malate dehydrogenase (MDH), and malic enzyme (MalE), or (ii) PEPCK and oxaloacetate decarboxylase (OAADC), for the interconversion of PEP and pyruvate (Figure 1). While PEPCK provides a pathway for energy conservation via ATP (or GTP) production, MDH and MalE permit transhydrogenation from NADH to NADP+[71], generating additional reducing equivalents required for biosynthesis. G. thermoglucosidasius, B. cereus, C. thermocellum (ATCC 27405), and C. cellulolyticum contain pepck, mdh and malE suggesting that they are capable of transhydrogenation using these proteins. Although the draft genome of C. thermocellum DSM 4150 does not include genes encoding MDH and MalE, we have verified their presence via PCR amplification (unpublished results). Deletion of mdh in C. cellulolyticum resulted in significant increases in lactate, and to a lesser extent ethanol yields, and reduced acetate production when grown on cellulose demonstrating carbon and electron flux through MDH in wild type strains [78]. It seems evident that in the absence of MDH, transhydrogenation was reduced, and thus the resulting increase in NADH:NADPH ratios promote lactate and ethanol production, while decreasing NADPH levels for biosynthesis.

A number of organisms analyzed encode pepck and oaadc (Ca. bescii, Ca. saccharolyticus, C. cellulolyticum, C. phytofermentans, and C. thermocellum), also allowing for indirect conversion of PEP to pyruvate via an oxaloacetate intermediate. While the redirection of carbon and electron flux through this pathway likely has little effect on product yields, synthesis of GTP, versus ATP, may promote transcription and protein synthesis. Finally, Cal. subterraneus, E. harbinense, P. furiosus, Th. kodakaraensis, Ta. pseudethanolicus, and Thermotoga species do not encode all of the proteins required for a “malate shunt” and consequentially the catalysis of PEP to pyruvate must be achieved via PPK and/or PPDK.

Genes involved in pyruvate catabolism

The pyruvate/lactate/acetyl-CoA node plays an important role in regulating carbon flux and electron distribution and dramatically affects end-product distribution. The NADH-dependent reduction of pyruvate to lactate via fructose-1,6-bisphosphate activated lactate dehydrogenase (LDH) [56] diverts reducing equivalents away from biofuels such as H2 and ethanol. Alternatively, the oxidative decarboxylation of pyruvate to acetyl-CoA via pyruvate dehydrogenase (pdh) or pyruvate:ferreodoxin oxidoreductase (pfor) generate NADH and reduced Fd, respectively. These reducing equivalents may then be oxidized during the production of H2 or ethanol (Figure 1). Pyruvate may also be catabolised to acetyl-CoA via pyruvate:formate lyase (pfl) yielding formate in the process. In some enterobacteria, formate is further oxidized to CO2, releasing H2, through the action of a multisubunit formate hydrogen lyase (FHL) complex [79]. However, pfl was not encoded in any of the organisms analysed.

With the exception of Cal. subterraneus subsp. tengcongensis, P. furiosus, and Th. kodakaraensis, ldh genes were identified in all organisms studied (Table 4). Surprisingly, while the production of lactate from pyruvate is highly favorable thermodynamically (△G°’ = − 26.1 kJ mol-1-), only B. cereus, G. thermoglucosidasius, and, under some conditions, Ta. pseudethanolicus and T. neapolitana produce high yields of lactate (> 0.5 mol mol-glucose-1). In all other organisms surveyed lactate production was either a minor end-product, not detected, or not reported under the reported growth conditions (Table 2). This suggests that the presence of ldh cannot be used to predict lactate production.

LDH is, in fact, allosterically activated by fructose-1,6-bisphosphate in C. thermocellum ATCC 27405, Ca. saccharolyticus, and Thermoanaerobacter brockii[56, 57, 62, 80]. While enzyme assays reveal high LDH activity in C. thermocellum[10, 72], most studies report only trace amounts of lactate. Islam et al. [46], however, demonstrated that lactate production was triggered in stationary-phase batch cultures only under excess cellobiose conditions. In Thermoanaerobacter brockii, Ben-Bassat et al. reported elevated lactate production as a consequence of accumulated intracellular fructose-1,6-bisphosphate (FDP) when cultures were grown on glucose compared to starch [62]. Finally, Willquist and van Niel [57] reported that LDH in Ca. saccharolyticus was activated by FDP and ATP, and inhibited by NAD+ and PPi. An increase in fructose-1,6-bisphosphate, NADH:NAD+ ratios, and ATP:PPi ratios was observed during the transition from exponential to stationary phase in Ca. saccharolyticus cultures, and was accordingly accompanied by lactate production [57].

All organisms analyzed encode either pdh or pfor, but not both (Table 4). While G. thermoglucosidasius and B. cereus encode pdh, all other organisms analyzed encode pfor. Although Caldicellulosiruptor, Clostridia, and Thermoanaerobacter species studied appear to encode a putative pdh, there has been no enzymatic evidence to support the presence of PDH in these species. Thus far, only PFOR activity has been verified in C. cellulolyticum[58, 60] and C. thermocellum[10, 72]. The putative E1, E2, and E3 subunits of the pdh complex (Csac_0874-0872) in Ca. saccharolyticus were designated simply as a keto-acid dehydrogenase by van de Werken et al. [81]. Similarly, while genes encoding a putative pdh (Teth_0790-0793) are present in Ta. pseudethanolicus, genomic context strongly supports that this putative pdh is part of an acetoin dehydrogenase complex, despite the absence of reported acetoin production. In Clostridia species, putative pdh’s (Cthe_3449-3450, Cthe_1543) may actually encode 2-oxo acid dehydrogenase complexes, which share a common structure and homology to pyruvate dehydrogenase. These include 2-oxoglutarate dehydrogenase, branched-chain alpha-keto acid dehydrogenase, acetoin dehydrogenase complex, and the glycine cleavage complex. All organisms that encode a pfor also encode a Fd-dependent hydrogenase (H2ase), bifurcating H2ase, and/or a NADH:Fd oxidoreductase (NFO), and are thus capable of reoxidizing reduced Fd produced by PFOR. Conversely, G. thermoglucosidasius and B. cereus, which encode pdh but not pfor, do not encode enzymes capable of reoxidizing reduced Fd, and thus do not produce H2. While the presence of PDH allows for additional NADH production that could be used for ethanol production, G. thermoglucosidasius and B. cereus end-product profiles suggest that this NADH is preferentially rexodized through lactate production rather than ethanol production. Pyruvate decarboxylase, a homotetrameric enzyme that catalyzes the decarboxylation of pyruvate to acetaldehyde was not encoded by any of the species considered in this study.

Given the requirement of reduced electron carriers for the production of ethanol/H2, the oxidative decarboxylation of pyruvate via PDH/PFOR is favorable over PFL for the production of these biofuels. Genome analyses revealed that a number of organisms, including P. furiosus, Ta. pseudethanolicus, Cal. subterraneus subsp. tencongensis, and all Caldicellulosiruptor and Thermotoga species considered, did not encode PFL. In each of these species, the production of formate has neither been detected nor reported. Unfortunately, many studies do not report formate production, despite the presence of PFL. This may be a consequence of the quantification methods used for volatile fatty acid detection. When formate is not produced, the total oxidation value of 2 CO2 per mole glucose (+4), must be balanced with the production of H2 and/or ethanol. Thus, the “total molar reduction values of reduced end-products (H2 + ethanol)”, termed RVEP, should be −4, providing that all carbon and electron flux is directed towards end-product formation and not biosynthesis. Indeed, RVEP’s were usually greater than 3.5 in organisms that do not encode pfl (T. maritima, Ca. saccharolyticus), and below 3.5 in those that do encode pfl (C. phytofermentans, C. thermocellum, G. thermoglucosidasius, and B. cereus; Table 2). In some studies, RVEP’s were low due to a large amount of carbon and electron flux directed towards biosynthesis. In G. thermoglucosidasius and B. cereus RVEP’s of H2 plus ethanol ranged from 0.4 to 0.8 due to higher reported formate yields. The large differences in formate yields between organisms that encode pfl may be due to regulation of pfl. In Escherichia coli[82, 83] and Streptococcus bovis[84, 85], pfl expression has been shown to be negatively regulated by AdhE. Thus presence of pfl alone is not a good indicator of formate yields.

Genes involved in acetyl-CoA catabolism, acetate production, and ethanol production

The acetyl-CoA/acetate/ethanol node represents the third major branch-point that dictates how carbon and electrons flow towards end-products (Figure 1). Acetyl-CoA may be converted to acetate, with the concomitant production of ATP, either indirectly through an acetyl phosphate intermediate using phosphotransacetylase (pta) and acetate kinase (ack), or directly via acetate thiokinase (atk). Although both reactions produce ATP, the former uses ADP and Pi whereas the latter uses AMP and inorganic PPi as substrates for ATP synthesis. As a result, acetate production via pta and ack is more thermodynamically favorable than via atk (△G°’ = −3.9 vs. +6.0 kJ/mol, respectively) which is typically used for acetate assimilation. Of the organisms surveyed, E. harbinense, G. thermodenitrificans, C. cellulolyticum, both C. thermocellum strains, and G. thermoglucosidasius contain all three genes capable of converting pyruvate to acetate (Table 5). Conversely, Cal. subterraneus subsp. tengcongensis, Thermotoga and Caldicellulosiruptor species, C. phytofermentans, Ta. pseudethanolicus, and B. cereus encode only pta and ack, whereas P. furiosus and Th. kodakaraensis encode only atk.

Alternatively, acetyl-CoA may be converted into ethanol, during which 2 NADH (or NADPH) are oxidized, either directly via a fused acetaldehyde/alcohol dehydrogenase encoded by adhE, which has been proposed to be the key enzyme responsible for ethanol production [86, 87], or indirectly through an acetaldehyde intermediate via acetaldehyde dehydrogenase (aldH) and alcohol dehydrogenase (adh). While all organisms surveyed encoded multiple class IV Fe-containing ADHs (Table 5), the functions of these ADHs may vary with respect to substrate specificity (aldehyde length and substitution), coenzyme specificity (NADH vs. NADPH), and the catalytic directionality favored (ethanol formation vs. consumption) [10, 57–59, 72, 88–91]. Although there are reports of in silico determinations of substrate and cofactor specificity amongst ADHs, in our experience such resolutions are problematic [92, 93]. Often times, the gene neighborhoods of identified ADHs were suggestive that the physiological role of many enzymes was not ethanol production. This is evident in Ca. saccharolyticus, which does not produce ethanol despite reported NADPH-dependent ADH activity [57].

P. furiosus, Th. kodakaraensis, and all Thermotoga and Caldicellulosiruptor species do not encode adhE or aldH, and therefore produce negligible or no ethanol. Given the absence of ethanol producing pathways in these species, reducing equivalents are disposed of through H2 production via H2ases and/or lactate production via LDH. Surprisingly, while Cal. subterraneus subsp. tengcongensis also does not appear to encode aldH or adhE, NADPH-dependent AldH and both NADH and NADPH-dependent ADH activities, as well as ethanol production, have been reported by Soboh et al. [42]. Similarly, Caldicellulosiruptor obsidiansis, which does not encode aldH or adhE, does produce trace levels of ethanol, suggesting that the various encoded ADHs may have broad substrate specificities [94]. Although C. cellulolyticum and Ta. pseudethanolicus do not encode aldH, they do encode adhE, and thus are capable of ethanol production. Of the organisms surveyed, only G. thermoglucosidasius and C. cellulolyticum encoded aldH and adh but no adhE, and produced moderate amounts of ethanol (~0.4 mol per mol hexose). Conversely, a number of organisms (E. harbinense, C. phytofermentans, both C. thermocellum strains, G. thermoglucosidasius, and B. cereus) encoded aldH, adh, and adhE, all of which produce varying ethanol yields.

Hydrogenases

In addition to disposal of reducing equivalents via alcohol and organic acid production, electrons generated during conversion of glucose to acetyl-CoA can be used to produce molecular hydrogen via a suite of [FeFe] and/or [NiFe] H2ases. The incredible diversity of H2ases has been extensively reviewed by Vignais et al. and Calusinska et al. [16, 95, 96]. H2ases may be (i) monomeric or multimeric, (ii) can catalyze the reversible production of H2 using various electron donors, including reduced Fd and NAD(P)H, or (iii) can act as sensory H2ases capable of regulating gene expression [97]. While most H2ases can reversibly shuttle electrons between electron carriers and H2, they are typically committed to either H2-uptake or evolution, depending on reaction thermodynamics and the requirements of the cell in vivo[95]. While Fd-dependent H2 production remains thermodynamically favorable at physiological concentrations (△G°’ ~ −3.0 kJ mol-1), potential production of H2 from NAD(P)H (△G°’ = +18.1 kJ mol-1) becomes increasingly unfavorable with increasing hydrogen partial pressure [98]. Hence, Fd-dependent H2ases are associated with H2 evolution, whereas NAD(P)H-dependent H2ases are more likely to catalyze H2 uptake. Recent characterization of a heterotrimeric “bifurcating” H2ase from Thermotoga maritma demonstrated that it can simultaneously oxidize reduced Fd and NADH to H2 (△G°’ ~ +7.5 kJ mol-1), which drives the endergonic production of H2 from NADH by coupling it to the exergonic oxidation of reduced Fd [99].

With the exception of P. furiosus and Th. kodakaranesis, which encode only Fd-dependent and putative F420-dependent [NiFe] H2ases, all other H2ase encoding organisms surveyed are capable of H2ase-mediated oxidation/reduction of both Fd and NAD(P)H. This seems fitting given that P. furiosus and Th. kodakaraensis preferentially catalyze the oxidation of glyceraldedhyde-3-P via GAPFOR rather than GAPDH and PGK, and thus must reoxidize reduced Fd, rather than NADH, during fermentative product synthesis. All other H2ase encoding organisms produce NADH during glycolysis and reduced Fd via PFOR. In these organisms, the oxidation of these electron carriers may be carried out using various different types of H2ases. All of these species encoded at least a single putative bifurcating H2ase (Table 6). The majority of these bifurcating H2ases were found downstream dimeric or monomeric sensory [FeFe] H2ases that may be involved in their regulation (Table 6). Soboh et al. have demonstrated that NADH-dependent H2ase activities in Cal. subterraneus subsp. tengcongensis are affected by H2 partial pressures [42] suggesting possible regulation of these H2ases via a two-component signal transduction mechanism in response changes in redox levels [16, 97]. It is important to note that these NADH-dependent H2ase activities may reflect bifurcating H2ase activities given that Cal. subterraneus subsp. tengcongensis encodes only a Fd-dependent and a putative bifurcating H2ase, and no NAD(P)H-dependent H2ases.

While Ta. pseudethanolicus only encodes a bifurcating H2ase, all other organisms that encode a bifurcating H2ase also encode Fd-dependent H2ases. Putative Fd-dependent, [NiFe] Ech/Mbh-type H2ases were identified in the genomes of Cal. subterraneus subsp. tengcongensis, P. furiosus, Th. kodakaraensis, and all Caldicellulosiruptor and Clostridium species (Table 6). A pair of putative Fd-dependent [FeFe] H2ases were identified in both E. harbinense and C. phytofermentans. With the exception of Ta. pseudethanolicus, Cal. subterraneus subsp. tengcongensis, and Caldicellulosiruptor species, all organisms surveyed containing a bifurcating H2ase also appear to be capable of NADH and/or NADPH oxidation using NADH/NADPH-dependent H2ases. As with ADHs, however, we could not determine H2ase cofactor specificity exclusively using in silico sequence analysis, stressing the importance of activity characterization of enzyme substrate specificity. While C. cellulolyticum achieves NAD(P)H oxidation using a putative H2-uptake [NiFe] H2ases, E. harbinense, Thermotoga species, and C. thermocellum ATCC 27405 achieve this using [FeFe] H2ases. Although the draft genome of C. thermocellum DSM 4150 does not encode an NAD(P)H-dependent H2ase, our proteomic and microarray data reveal the presence of Cthe_3003/Cthe_3004 homologues (Rydzak, unpublished results).

In addition to H2ase-mediated electron transfer between Fd and/or NADH and H2, electrons may be transferred directly between Fd and NAD(P)H via an Rnf-like (Rhodobacter nitrogen fixation) NADH:ferredoxin oxidoreductase (NFO), a membrane-bound enzyme complex capable of generating a sodium motive force derived from the energy difference between reduced Fd and NADH. Only Thermotoga species, C. phytofermentans, C. thermocellum, and Ta. pseudethanolicus encode putatively identified NFO. Proteomic analysis of C. thermocellum, however, revealed low, or no, expression of NFO subunits, suggesting it does not play a major factor in electron exchange between Fd and NADH [100].

While the presence/absence of genes encoding pathways that lead to reduced fermentation products (i.e. formate, lactate, and particularly ethanol) is a major determinant of H2 yields, we can make some inferences with respect to H2 yields based on the types of H2ases encoded. Given the thermodynamic efficiencies of H2 production using different cofactors, we can say that Fd-dependent H2ases are conducive for H2 production while NAD(P)H-dependent H2ases are not. However, organisms that do not encode ethanol-producing pathways (i.e. Caldicellulosiruptor and Thermotoga species) may generate high intracellular NADH:NAD+ ratios, making NADH-dependent H2 production thermodynamically feasible under physiological conditions. Conversely, in organisms capable of producing both H2 and ethanol (Ethanoligenens, Clostridium, and Thermoanaerobacter species), the presence of Fd-dependent H2ases appears to be beneficial for H2 production. For example, E. harbinense and Clostridium species, which encode Fd-dependent, as well as bifurcating and NAD(P)H-dependent H2ases, produce much higher H2 yields when compared to those of Ta. pseudethanolicus, which encodes only one bifurcating H2ase and no Fd or NAD(P)H-dependent H2ases. Interestingly, organisms that do not encode H2ases (G. thermoglucosidasius and B. cereus) produce low ethanol and high lactate (and/or formate yields), suggesting that H2 production can help lower NADH:NAD+ ratios, and thus reduce flux through LDH.

Influence of overall genome content on end-product profiles

The presence and absence of genes encoding proteins involved in pyruvate metabolism and end-product synthesis may be used as an indicator of end-product distribution. By comparing genome content to end-product yields, we identified key markers that influence ethanol and H2 yields. These include (i) MDH (ii) LDH, (iii) PFL vs. PFOR and/or PDH (iv) Aldh and AdhE, and (v) bifurcating, Fd-dependent, and NAD(P)H dependent H2ase.

While it is difficult to elucidate how differences in “malate shunt” genes affect end-product synthesis patterns by comparing reported yields, eliminating MDH has been shown to increase lactate and ethanol production, and decrease acetate production in C. cellulolyticum[78]. The elimination of this transhydrogenation pathway may increase NADH:NAD+ ratios for reduced end-product synthesis and reduce NADPH:NADP+ ratios for biosynthesis. While presence of LDH is not a good predictor of lactate yields, LDH, when activated, diverts reducing equivalents away from H2 and ethanol. In contrast to PFL, PFOR and PDH produce additional reducing equivalents (reduced Fd and NADH, respectively), and thus promote reduced end-product synthesis. Organisms that do not encode pfl generally produce more ethanol and H2 (based on sum redox value) compared to those that do encode pfl. Of the organisms surveyed, those that did not encode (or express) both adhE and aldH produced near-maximal H2 yields and little to no ethanol. While the type(s) of encoded H2ases appear to have little impact in organisms that do not encode ethanol producing pathways, they do seem to influence reduced end-product yields in those that do. For example, Ta. pseudethanolicus, which encodes an adhE, NFO, and a single bifurcating H2ase, but no discernable Fd or NAD(P)H-dependent H2ases, generates low H2 and near-optimal ethanol yields. The inability to oxidize reduced Fd via Fd-dependent H2ases may elevate reduced Fd levels, which in turn can be used by NFO to produce additional NADH for ethanol synthesis. Interestingly, in the absence of H2ases, lactate production was favoured over ethanol production, suggesting that H2 production can help lower NADH:NAD+ ratios, and thus reduce flux through LDH.

Given the impact that MDH, PFL, Aldh, AdhE, and the different H2ases have on end-product yields, screening for these biomarkers can streamline ethanol and H2 producing potential of sequenced and novel organisms through in silico gene mining and the use of universal primers, respectively. Furthermore, understanding how end-product yields are affected by (i) the framework of genes encoding pathways catalyzing pyruvate into end-products, and (ii) thermodynamic efficiencies of these reactions, we can begin to develop informed metabolic engineering strategies for optimization of either ethanol or H2 (Figure 2). For example, in order to optimize either ethanol or H2, we would recommend elimination of ldh and pfl in order to allow accumulation of additional reducing equivalents. Given that ethanol and H2 compete for reducing equivalents, elimination of one product should direct carbon/and or electron flux towards the other.

For optimization of H2 yields (Figure 2A), deletion of aldH and adhE is likely most effective. Although conversion of pyruvate to acetyl-CoA is more thermodynamically favorable using PDH versus PFOR (△G°’ = −33.4 vs. -19.2 kJ mol-1), production of H2 from NADH is highly unfavorable compared to the use of reduced Fd (△G°’ = +18.1 vs. -3.0 kJ mol-1). This in turn demonstrates that reduction of Fd via PFOR and subsequent H2 production via a Fd-dependent H2ase (△G°’ = −21.2 kJ mol-1) is more favorable than NADH production via PDH and subsequent H2 production via NAD(P)H-dependent H2ases (△G°’ = −15.3 kJ mol-1). Therefore, we propose that conversion of pyruvate to acetyl-CoA via PFOR is favorable for H2 production, and pdh (and pfl) should be deleted. Given that 2 NADH (per glucose) are produced during glycolysis in most anaerobic microorganisms, the presence of a bifurcating H2ase, which would simultaneously oxidize the 2 NADH generated during and 2 reduced Fd produced by PFOR, would be required to achieve theoretically maximal H2 yields of 4 mol per mol glucose. A Fd-dependent H2ase would also be conducive for H2 production during times when reducing equivalents generated during glycolysis are redirected towards biosynthetic pathways, resulting in a disproportionate ratio of reduced ferredoxin to NAD(P)H. Alternatively, in organisms such as P. furiosus and Th. kodakaraensis, which generate high levels of reduced Fd and low levels of NADH, the presence of Fd-dependent H2ases, rather than bifurcating H2ases, would be more conducive for H2 production. In all cases, NFO and NAD(P)H-dependent H2ases should be deleted to prevent oxidation of reduced Fd and uptake of H2, respectively, which would generate NAD(P)H.

The metabolic engineering strategies employed for optimization of ethanol (Figure 2B) are much different than those used for the production of H2. First, adhE and/or aldH and adh genes that encode enzymes with high catalytic efficiencies in the direction of ethanol formation should be heterologously expressed. Given that ethanol production is NAD(P)H dependent, increasing NADH production should be optimized, while Fd reduction should be eliminated. Through deletion of pfl and pfor, and expression of pdh, up to 4 NADH can be generated per glucose, allowing for the theoretical maximum of 2 mol ethanol per mol glucose to be produced. To prevent NADH reoxidation, lactate and H2 production should be eliminated by deleting ldh and NAD(P)H-dependent H2ases. While this strategy is theoretically sound, low AldH/Adh catalytic efficiencies may cause NADH/NAD+ ratios to rise so high that they may impede glycolysis. In these situations, the presence of a NFO or NAD(P)H-dependent H2ase may intermittently alleviate these high NADH/NAD+ ratios through generation of reduced Fd pools or H2 production, respectively, albeit it would decrease reducing equivalents for ethanol production.

While some attempts to increase H2 and/or ethanol yields through genetic engineering have been successful in a number of lignocellulolytic organisms (reviewed elsewhere; [101]) engineering of strains discussed here has only been marginally successful. Heterologous expression of Zymomonas mobilis pyruvate decarboxylase and Adh in C. cellulolyticum increased cellulose consumption and biomass production, and decreased lactate production and pyruvate overflow due to a more efficient regulation of carbon and electron flow at the pyruvate branchpoint [102]. However, despite higher levels of total ethanol produced, ethanol yields (per mol hexose consumed) actually decreased when compared to the wild-type strain. Similarly, deletion of PTA in C. thermocellum drastically reduced acetate production, but had minimal impact on lactate or ethanol production [103]. This suggests that genome content alone cannot exclusively dictate the extent of end-product yields observed in literature, and thus growth conditions must be optimized in order to moderate regulatory mechanisms that direct carbon and electron flux. This could only be attained through a thorough understanding of regulatory mechanisms that mediate gene and gene-product expression and activity levels under various growth conditions through a combination of genomics, transcriptomics, proteomics, metabolomics, and enzyme characterization.

Conclusions

Fermentative bacteria offer the potential to convert biomass into renewable biofuels such as H2 and ethanol through consolidated bioprocessing. However, these bacteria display highly variable, branched catabolic pathways that divert carbon and electrons towards unwanted end products (i.e. lactate, formate). In order to make fermentative H2 and/or ethanol production more economically feasible, biofuel production yields must be increased in lignocellulolytic bacteria capable of consolidated bioprocessing. While the cellulolytic and, to a lesser extent, H2 and ethanol producing capabilities of cellulolytic bacteria have been reviewed [8, 9, 44], a comprehensive comparison between genome content and corresponding end-product distribution patterns has not been reported. While reported end-product yields vary considerably in response to growth conditions, which may influence gene and gene product expression and metabolic flux, we demonstrate that composition of genes encoding pyruvate catabolism and end-product synthesis pathways alone can be used to approximate potential end-product distribution patterns. We have identified a number of genetic biomarkers, including (i) MDH (ii) LDH, (iii) PFL vs. PFOR and/or PDH (iv) Aldh and AdhE, and (V) bifurcating, Fd-dependent, and NAD(P)H dependent H2ases, that can be used for streamlining H2 and/or ethanol producing capabilities in sequenced and novel isolates. By linking genome content, reaction thermodynamics, and end-product yields, we offer potential targets for optimization of either ethanol or H2 yields via metabolic engineering. Deletion of LDH and PFL could potentially increase both H2 and ethanol yields. While deletion of ethanol producing pathways (aldH, adh, adhE), increasing flux through PFOR, overexpression of Fd -dependent H2ases, and elimination of potential H2-uptake (NAD(P)H-dependent) H2ases could lead to increased H2 production, eliminating H2 production and redirecting flux through PDH would be beneficial for ethanol production. Although gene and gene-product expression, functional characterization, and metabolomic flux analysis remains critical in determining pathway utilization, insights regarding how genome content affects end-product yields can be used to direct metabolic engineering strategies and streamline the characterization of novel species with potential industrial applications.

Acknowledgements

This work was supported by funds provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), through a Strategic Programs grant (STPGP 306944–04), by Genome Canada, through the Applied Genomics Research in Bioproducts or Crops (ABC) program for the grant titled, “Microbial Genomics for Biofuels and CoProducts from Biorefining Processes”, and by the Province of Manitoba, Agricultural and Rural Development Initiative (ARDI), grant 09–986.

Corresponding author

Additional information

Authors’ contributions

TR and CRC co-authored the manuscript. TV, CRC and TR performed genomic meta-analysis. TR performed end-product comparisons and thermodynamic calculations. CRC performed phylogenetic analysis. RS, NC, and DBL conceived of the study, participated in its design, and helped draft the manuscript. All authors read and approved the final manuscript.