Thank you for visiting nature.com. You are using a browser version with
limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off
compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site
without styles and JavaScript.

Subjects

Abstract

As global temperatures rise, large amounts of carbon sequestered in permafrost are becoming available for microbial degradation. Accurate prediction of carbon gas emissions from thawing permafrost is limited by our understanding of these microbial communities. Here we use metagenomic sequencing of 214 samples from a permafrost thaw gradient to recover 1,529 metagenome-assembled genomes, including many from phyla with poor genomic representation. These genomes reflect the diversity of this complex ecosystem, with genus-level representatives for more than sixty per cent of the community. Meta-omic analysis revealed key populations involved in the degradation of organic matter, including bacteria whose genomes encode a previously undescribed fungal pathway for xylose degradation. Microbial and geochemical data highlight lineages that correlate with the production of greenhouse gases and indicate novel syntrophic relationships. Our findings link changing biogeochemistry to specific microbial lineages involved in carbon processing, and provide key information for predicting the effects of climate change on permafrost systems.

Acknowledgements

This study was funded by the Genomic Science Program of the United States Department of Energy (DOE) Office of Biological and Environmental Research (BER), grants DE-SC0004632, DE-SC0010580 and DE-SC0016440. B.J.W. and P.N.E. are supported by Australian Research Council Discovery Early Career Research Awards #DE160100248 and #DE170100428, respectively. C.M.S. and J.A.B. are supported by Australian Government Research Training Program (RTP) Scholarships, and G.W.T. is supported by Australian Research Council Future Fellowship FT170100070. A portion of the research was performed using Environmental Molecular Sciences Laboratory (EMSL), a DOE Office of Science User Facility, and a portion was performed under the Facilities Integrating Collaborations for User Science (FICUS) initiative with resources at both the DOE Joint Genome Institute and EMSL. Both facilities are sponsored by the Office of BER and operated under contracts DE-AC02-05CH11231 (JGI) and DE-AC05-76RL01830 (EMSL). We thank the IsoGenie 1 and 2 Project Teams and the 2010–2012 field teams for sample collection, particularly T. Logan, as well as the Abisko Scientific Research Station for sampling infrastructure and support. We thank P. Hugenholtz, D. Parks, S. Robbins, B. Kemish, M. Chuvochina, S. Low and M. Butler for helpful discussion and infrastructure support.

Reviewer information

Nature thanks S. Allison, J. Jansson and the other anonymous reviewer(s) for their contribution to the peer review of this work.

a, Relative abundance of each phylum estimated through the recovery of 16S rRNA gene reads, averaged within each thaw stage. The 15 phyla with the highest relative abundance across all samples are shown. b, Number of MAGs recovered from each of the phyla in a, showing that broadly, MAGs recovered are from lineages highest in abundance. c, Principal coordinates analysis of weighted UniFrac compositional differences between samples, based on average coverage of each recovered genome of reads mapped to the dereplicated genome set. Colours indicate thaw stage: brown = palsa (P), green = sphagnum/bog (S), blue = eriophorum/fen (E). Depth: S = surface, M = mid-depth, D = deep, X = extra-deep. Goodness of fit was 0.57 for PCoA 1 and 0.65 for PCoA 2. Sample numbers: n = 53, 65 and 70 biologically independent samples for palsa, bog and fen, respectively. d, Quantitative PCR analysis of samples taken in 2012. The number of cells per gram of soil is shown for three depths at the three thaw stages, after correcting for 16S rRNA gene copy number variation (see Methods). Fen samples contained significantly more cells per gram of soil than bog and palsa samples (average 2.6×, P = 7 × 10–8, n = 103, two-sided Mann–Whitney U-test). Sample numbers: n = 8, 9, and 8 for biologically independent samples palsa surface, mid and deep, respectively, n = 9, 8, 9 and 10, 7 and 9 for bog and fen, respectively. e, f, Relative abundances of phyla and classes within the Proteobacteria across the thaw gradient, respectively. The depth of each sample is indicated by the colour of the box (surface: red, mid-depth: green, deep: blue, extra-deep: purple). Each data point is the sum of relative abundances of all lineages assigned to the phylum in a sample after adding a 0.1% pseudocount to all phyla (so the y axis is not dominated by small values visually). Box plots are shown plotted on a log-scale y axis, with phyla and classes ordered by decreasing average relative abundance across all samples. Relative abundance was calculated based on the fraction of the community with recovered genomes (see Methods). Sample numbers: n = 53, 65 and 70 biologically independent samples for palsa, bog and fen, respectively.

a, b, Cellulose; c, d, xylanase; e, f, β-glucosidase. Samples analysed with metatranscriptomics are described by the date of sampling, core number and depth. a, c, e, Relative contribution of each phylum to the total TPM of the enzyme class observed in the metatranscriptomes. b, d, f, Total TPM of all expressed genes in the sample.

a, As in Fig. 2, 97% dereplicated MAGs are shown as circles (‘MAG abundance’), where the radius of the circle represents the average relative abundance of that genome in the palsa, bog or fen. b, As in Fig. 2, the total relative abundance of genomes encoding the pathway is shown among the entire community. Sample numbers: n = 53, 65 and 70 biologically independent samples for palsa, bog and fen, respectively.

a, Diagram of xylose degradation pathways. b, Venn diagram showing how each xylose breakdown pathway is shared among the Stordalen Mire MAGs. Percentages represent the proportion compared to all Stordalen genomes encoding a xylose degradation pathway. In the metaproteomes, genomes Acidobacteria_bog_390, Actinobacteria_fen_455 and Actinobacteria_bog_808 expressed a protein specific to oxidoreductase pathways and a protein specific to the isomerase pathway. In the metatranscriptomes, Acidobacteria_palsa_248, Acidobacteria_bog_370, Acidobacteria_bog_390, Actinobacteria_fen_455, Actinobacteria_bog_586, Actinobacteria_bog_808 and Planctomycetes_fen_1346 expressed a protein specific to oxidoreductase pathways and a protein specific to the isomerase pathway. c–h, Gene expression of xylose degradation pathways. Average expression of genes in the canonical bacterial xylose isomerase (c, d), oxidoreductase (e, f) and xylanate dehydratase pathways (g, h) are depicted across the thaw gradient. Samples analysed with metatranscriptomics are described by the date of sampling, core number and depth. c, e, g, Relative contribution of each phylum to the total TPM of the enzyme class observed in the metatranscriptomes. d, f, h, Total TPM of all expressed genes in the sample.

Samples analysed with metatranscriptomics are described by the date of sampling, core number and depth. a, c, e, g, Total TPM of each fermentation pathway in the metatranscriptomes. b, d, f, h, Relative contribution of each phylum to the total TPM of each pathway.

a, CO2 and CH4 concentrations in porewater derived from the bog and fen. The blue line shown is a line of best fit, forced through the origin. Dots indicate the samples, with colours indicating the sample depth. The concentrations are correlated, and the CH4 concentrations are much lower than the CO2 concentrations in both sites. Sample numbers: n = 51 (bog) and 61 (fen) biologically independent samples. b, Methanogenesis versus methanotrophy rates. Each point represents the average relative abundance of methanotrophs and methanogens across all samples in a single core, multiplied by the rate of methane generation or consumption inferred from previous culture-based measurements (2.345 and 20.1 fmol CH4 h–1 per cell of methanogenesis and methanotrophy, respectively, see Methods). The line represents the 1:1 ratio. Inferred fluxes were calculated using relative abundance of methanogenic or methanotrophic lineages so rates are only intended for comparison between the x and y axes, rather than as an absolute measure of CH4 flux. Methanotrophy appears to mitigate a significant proportion of the CH4 generated in the bog sites. c, Correlation of the relative abundance of Ca. ‘Methanoflorens stordalenmirensis’ with the isotopic fractionation of methane (αC) dissolved in paired porewater samples taken from the bog. Previously observed in 2011 using 16S rRNA gene amplicon sequencing33, the correlation is confirmed here using genome-centric metagenomic techniques on the 2011 samples, as well as in a new year of sampling in 2012. Sample numbers: n = 23 (2011) and 24 (2012) biologically independent samples. d, e, Expression of methanogenesis marker gene mcrA across the thaw gradient. Samples analysed with metatranscriptomics are described by the date of sampling, core number and depth. d, Relative contribution of each methanogenic order to the total TPM. e, Relative contribution of all mcrA genes in the metatranscriptome. Metaproteomes revealed the expression of 289 hydrogenotrophic McrA proteins across 13 samples, as well as 78 acetoclastic McrA proteins across eight samples (Supplementary Data 2). f, Linear regression analysis for predicting effective fractionation (αc) of CH4 from environmental variables and Ca. ‘Methanoflorens stordalenmirensis’ abundances in the bog. Ca. ‘Methanoflorens stordalenmirensis’ abundance exceeds bulk geochemical parameters in predicting the effective fractionation of CH4. Each line is the result of a linear regression of the specified measurement against the αc of CH4 in bog porewater samples taken in 2011 and 2012 (n = 47 biologically independent samples).

a, Total relative abundance of the genus Ca. ‘Changshengia’ correlated with the fraction of the concentration of C mineralized to CO2 versus CH4 in the bog porewater samples (R2 = 0.19, P = 0.001, n = 51 biologically independent samples). Each point represents an individual sample from 2012, with its colour representing the depth in the core from which the sample was taken. b, Metabolic reconstruction of genomes belonging to the candidate phylum AD3 genus Ca. ‘Changshengia’ correlating with the CH4:CO2 concentration ratio in porewater from 2012 bog samples. Genomes from four clades within the AD3 were assembled from across Stordalen Mire. Enzyme colour indicates the families that share that metabolic potential, as outlined in the legend on the left. Arrow colouring indicates whether expression was detected (red arrows) or not detected (black arrows) for genes encoding the enzyme in any of the 24 metatranscriptomes. Orange stars indicate detection of protein expression in any of the 22 metaproteomes from the Ca. ‘Changshengia’ and related genomes. All four lineages encode the potential to oxidize glycerol anaerobically through glycerol transporter (glpF), glycerol kinase (glpK) and a membrane-bound glycerol-3-phosphate dehydrogenase (glpABC), entering glycolysis via dihydroxyacetone phosphate processed to glyceraldehyde-3-phosphate by the triosephosphate isomerase (tpiA). Other glycerol derivatives such as glycerol-3-phosphate could be imported (glpT) by this and other family members, and dihydroxyacetone phosphate can also be processed using the PTS-dependent dihydroxyacetone kinase (dhaLMK) complex. Sinks for the electrons generated from the oxidation of glycerol also varied between the different lineages, with Ca. ‘Changshengia’ and clade 1 having a H+-translocating complex I NADH:oxidoreductase, while clade 1 also has a high affinity cytochrome oxidase complex IV, clade 2 genomes encode only a nitrate reductase (narGHI) and clade 4 genomes only a fumarate reductase (sdhABCD). These differences are likely to lead to the differentiation of the niches that each lineage occupies across different sites and depths of the mire. Lineages were considered positive for genes or complexes based on the presence of sequences with 80% homology in 50% of the genomes. c, Phylogenetic subtree showing the family groupings of AD3 for the metabolic analysis. Representative genomes from the 97% average nucleotide identity (ANI) dereplication are indicated in red. Bootstrap support is indicated at the nodes for values over 70% or 90% in grey and black, respectively. Blue clade indicates cluster of seven UBA and RefSeq genomes.

Extended Data Table 1 Genomes with high prevalence in specific sites and depths

Proteins identified as expressed through metaproteomics, including the sample, the name of each protein, where the first 3 elements in snake case refer to the genome, and the final number refers to the protein ID within that genome, peptide and assigned annotation (KEGG orthology group or hydrolysis enzyme annotation).

Abundances of representative lineages across the thaw gradient. Each column represents the relative abundance of a lineage in a particular sample. The column 'coverage' indicates the trimmed mean pileup (BamM 'tpmean') coverage of reads, 'relative_abundance' represents the fraction of the total community that lineage is estimated to represent, and 'relative_abundance_of_recovered' represents the fraction of the recovered part of the community that genome represents.