Marine Microbial Populations

PNAS PLUS
Pattern and synchrony of gene expression among
sympatric marine microbial populations
Elizabeth A. Ottesena, Curtis R. Younga, John M. Eppleya, John P. Ryanb, Francisco P. Chavezb, Christopher A. Scholinb,
and Edward F. DeLonga,c,1
Departments of aCivil and Environmental Engineering and cBiological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139;
and bMonterey Bay Aquarium Research Institute, Moss Landing, CA 95039
Contributed by Edward F. DeLong, December 19, 2012 (sent for review December 3, 2012)
Planktonic marine microbes live in dynamic habitats that demand combined with the growing number of environmentally relevant
rapid sensing and response to periodic as well as stochastic environ- reference genomes, now enable elucidation of genome-wide tran-
mental change. The kinetics, regularity, and speciﬁcity of microbial scription proﬁles of abundant microbial groups represented in
responses in situ, however, are not well-described. We report here metatranscriptomic datasets (6–8). In situ sampling of discrete, co-
simultaneous multitaxon genome-wide transcriptome proﬁling in herent microbial populations over time represents another formi-
a naturally occurring picoplankton community. An in situ robotic dable challenge. This problem is especially apparent in aquatic
sampler using a Lagrangian sampling strategy enabled continuous environments, where, because of hydrodynamic processes, sam-
tracking and repeated sampling of coherent microbial populations ples collected at a ﬁxed location tend to convolute temporal
over 2 d. Subsequent RNA sequencing analyses yielded genome-wide variability with spatial heterogeneity (6, 9). To overcome these
transcriptome proﬁles of eukaryotic (Ostreococcus) and bacterial challenges and better assess microbial community environmental
(Synechococcus) photosynthetic picoplankton as well as proteorho- responses and dynamics in situ, we combined an automated La-
dopsin-containing heterotrophs, including Pelagibacter, SAR86-clus-
MICROBIOLOGY
grangian sampling approach with microbial community tran-
ter Gammaproteobacteria, and marine Euryarchaea. The photo- scriptome analyses to generate a high-resolution 2-d time series
synthetic picoplankton exhibited strong diel rhythms over thousands of transcriptional activity among sympatric marine picoplankton
of gene transcripts that were remarkably consistent with diel cycling populations.
observed in laboratory pure cultures. In contrast, the heterotrophs
did not cycle diurnally. Instead, heterotrophic picoplankton popula- Results and Discussion
tions exhibited cross-species synchronous, tightly regulated, tempo- Marine surface water microbes were collected and preserved by
rally variable patterns of gene expression for many genes, a robotic system (10) suspended beneath a free-drifting buoy
ENVIRONMENTAL
particularly those genes associated with growth and nutrient acquisi- deployed off the coast of northern California (Fig. 1). Over a 2-d
SCIENCES
tion. This multitaxon, population-wide gene regulation seemed to re- sampling period, the instrument drifted 50.3 km along the warm
ﬂect sporadic, short-term, reversible responses to high-frequency side of a front that was generated by coastal upwelling to the east
environmental variability. Although the timing of the environmental (Fig. 1). Samples for community RNA sequencing were collected
responses among different heterotrophic species seemed synchro- and preserved every 4 h. Portions of the sampling track were
nous, the speciﬁc metabolic genes that were expressed varied from marked by strong vertical gradients in salinity and temperature
taxon to taxon. In aggregate, these results provide insights into the (Fig. 1 C and D). However, current velocity data suggest that the
kinetics, diversity, and functional patterns of microbial community water sampled for microbial transcriptomic analysis was relatively
response to environmental change. Our results also suggest a means stable horizontally (SI Appendix, Fig. S1 and Table S1). The
by which complex multispecies metabolic processes could be coordi- taxonomic representation in the metatranscriptome sampled over
nated, facilitating the regulation of matter and energy processing in
a dynamically changing environment. Signiﬁcance
|
autonomous sampling environmental microbiology | Microbial communities regulate the cycling of energy and
| |
metatranscriptomics microbial oceanography community ecology
matter in the environment, yet how they respond to environ-
mental change is not well-known. We describe here a day in
P lanktonic microbial communities are characterized by high
productivity and rapid turnover. As a result, they respond
rapidly to environmental ﬂuctuations, and minor perturbations
the life of wild planktonic microbial species using robotic
sampling coupled with genome-wide gene expression analysis.
Our results showed that closely related populations, as well as
have the potential to trigger large and rapid shifts in ecosystem very different bacterial and archaeal species, displayed re-
properties and function (1). Characterizing the dynamics of natural markably similar time-variable synchronous patterns of gene
microbial communities on relevant temporal and spatial scales expression over 2 d. Our results suggest that speciﬁc environ-
however, remains challenging. Consequently, few data are available mental cues may elicit cross-species coordination of gene ex-
on the timing, magnitude, and speciﬁc biological details of response pression among diverse microbial groups, potentially enabling
and regulation among diverse microbial taxa experiencing envi- multispecies coupling of metabolic activity.
ronmental ﬂuctuations on short time scales in situ.
Acquiring accurate and detailed assessments of microbial dy- Author contributions: E.A.O., C.A.S., and E.F.D. designed research; E.A.O. performed re-
search; J.M.E. and C.A.S. contributed new reagents/analytic tools; E.A.O., C.R.Y., J.M.E.,
namics in natural ecosystems poses several challenges. One chal- J.P.R., F.P.C., C.A.S., and E.F.D. analyzed data; and E.A.O. and E.F.D. wrote the paper.
lenge is the paucity of methods available for estimating the disparate The authors declare no conﬂict of interest.
activities and environmental responses of diverse microbes inhabit-
Freely available online through the PNAS open access option.
ing complex communities. A recent approach that addresses this
Data deposition: The sequences reported in this paper have been deposited in the Gen-
challenge involves the use of community RNA sequencing (e.g., Bank database (accession no. SRA062433), and the CAMERA (http://camera.calit2.net)
metatranscriptomics) to facilitate simultaneous transcriptional pro- database repository (accession no. CAM_P_0001026).
ﬁling of co-occurring taxa within a microbial assemblage (2–5). Initial 1
To whom correspondence should be addressed. E-mail: delong@mit.edu.
studies using this approach focused on changes in overall community This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
metabolic proﬁles. Recent advances in sequencing technologies, 1073/pnas.1222099110/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1222099110 PNAS Early Edition | 1 of 10
Fig. 1. Sampling locations and sample characteristics. (A) The ESP drift track imposed over a map showing average sea surface temperature [Polar orbiting
environment satellites, advanced very high resolution radiometer, local area coverage. Western United States, day/night, 5 × 5-pixel (5 × 5-km) median-
ﬁltered composite from September 21, 2010; cloud cover precluded satellite observation during the sampling period]. Inset shows the sampling location
relative to western North America. (B) Transcript abundances of major taxa represented as percent of sequences with matches in the NCBInr peptide da-
tabase. (C) Environmental conditions in the immediate vicinity of the ESP (measurements taken by ESP-mounted instruments). Grey bars represent sample
collection times. (D) Integrated depth proﬁle showing salinity gradients surrounding the sampling location (dotted line). Measurements were taken near the
ESP by a ship-deployed instrument. The arrow in A indicates the direction of the drift.
this time also remained relatively constant (Fig. 1). Of the over abundance within each speciﬁc taxonomic population, independent
2.4 million sequences that were assigned to 8,117 unique National of ﬂuctuations in its abundance relative to the total community
Center for Biotechnology Information (NCBI) Taxonomy IDs, transcriptome. (Table 1 and SI Appendix, Figs. S3–S11 and Tables S2
∼98% matched taxa that were detected in all 13 samples. Addi- and S3 have details of transcript mapping and annotation.)
tionally, our samples showed greater overall similarity in taxo- To conﬁrm that complex transcriptional dynamics could be ob-
nomic composition to one another across all of the time points served within transcriptional proﬁles extracted from community-
than did samples collected over a 24-h period at a ﬁxed spatial wide gene expression datasets, we used cluster analysis to assess
location in Monterey Bay (6) (SI Appendix, Fig. S2). global transcriptional patterns among the ﬁve most highly repre-
Our analyses focused on transcriptional dynamics among ﬁve sented taxonomic groups (Fig. 2 and SI Appendix, Figs. S12–S17).
abundant microbial populations in the sampled community, in- Within each taxon, we identiﬁed groups of genes that shared similar
cluding Ostreococcus, Synechococcus, Pelagibacter, SAR86 cluster transcriptional proﬁles using the geneARMA software package
Gammaproteobacteria (SAR86), and marine group II Euryarchaea (16), which uses an autoregressive moving average model, ARMA
(MGII) (Table 1). These populations represent widely distributed (p,q), for the longitudinal covariance structure and Fourier series
and ecologically important clades of marine picoplankton (11–15). functions to model gene expression patterns. As anticipated, a large
Transcripts mapping to each of these groups were identiﬁed and number of coexpressed genes with apparent 24-h periodicity were
annotated using a newly developed computational workﬂow (SI identiﬁed among Ostreococcus and Synechococcus populations.
Appendix, Fig. S3) that assigned sequences to speciﬁc taxon bins Additionally, principal components analysis clearly separated the
based on best-scoring matches within the full NCBI database. Within transcriptome proﬁles of these taxa based on time of day (Fig. 2B).
each taxon bin, transcript counts for genes shared between multiple Pure cultures of both Ostreococcus and Synechococcus are known to
reference genomes of the same taxa were combined, and analyses of have fully functional circadian clocks that coordinate large-scale
transcriptional dynamics focused on changes in relative transcript transcriptional dynamics (17, 18), and those rhythms were readily
2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1222099110 Ottesen et al.
PNAS PLUS
Table 1. Assignment of sequences to taxon bins
Ostreococcus Synechococcus Pelagibacter SAR86 cluster MGII Archaea
Sample CDS Reads Orthologs Reads Orthologs Reads Orthologs Reads Orthologs Reads Orthologs
9/16 2:00 PM 234,860 35,590 4,661 18,777 2,535 14,262 1,278 10,053 1,499 12,600 1,195
9/16 6:00 PM 208,667 21,292 4,148 15,286 2,437 11,275 1,199 7,207 1,383 10,185 1,097
9/16 10:00 PM 198,687 32,218 5,112 16,412 2,282 12,850 1,235 6,665 1,173 11,635 1,229
9/17 2:00 AM 269,745 31,412 4,908 6,551 1,749 12,508 1,253 7,990 1,438 12,072 1,160
9/17 6:00 AM 209,305 29,903 4,402 6,908 1,618 11,339 1,238 8,003 1,441 10,574 1,233
9/17 10:00 AM 384,870 54,840 4,271 5,579 1,279 22,572 1,377 8,698 1,341 24,340 1,384
9/17 2:00 PM 209,634 22,695 4,037 8,714 1,955 13,707 1,304 9,057 1,490 9,309 1,127
9/17 6:00 PM 183,482 14,357 3,443 5,046 1,406 12,056 1,192 6,804 1,268 8,512 1,105
9/17 10:00 PM 176,778 15,692 4,231 9,261 2,070 9,161 1,180 7,678 1,480 7,643 1,142
9/18 2:00 AM 220,569 23,025 4,501 13,138 2,035 13,859 1,284 9,456 1,527 6,922 1,067
9/18 6:00 AM 222,828 21,064 3,239 6,384 1,306 15,609 1,217 8,717 1,309 11,038 1,240
9/18 10:00 AM 208,782 23,215 3,641 9,730 1,818 11,393 1,195 9,010 1,521 6,954 1,040
9/18 2:00 PM 232,718 27,704 3,699 14,811 2,216 14,639 1,098 8,973 1,451 5,692 925
Total 2,960,925 353,007 7,491 136,597 3,950 175,230 1,810 108,311 2,226 137,476 1,691
The total number of putative coding sequences (unique nonrRNA sequence reads with at least one hit in the NCBInr database with bit score >50) identiﬁed
in each sample is listed. For each taxon bin, the number of sequence reads assigned to that group and the number of ortholog clusters with at least one
assigned sequence within each sample are listed. CDS, coding sequence.
MICROBIOLOGY
apparent in the transcriptome proﬁles of wild populations as well as some, and Calvin cycle transcripts largely exhibited a morning peak
the behavior of individual transcripts (see below). In contrast, in expression, whereas cytochrome oxidase transcripts peaked in
cluster analysis of transcription among the three proteorhodopsin- the evening. In contrast to observed patterns among Ostreococcus,
expressing heterotrophic populations did not exhibit evidence of only one photosystem gene (Cluster 8842 psaK) and relatively few
signiﬁcant diel regulation of gene expression (Fig. 2C). Neverthe- antenna proteins (4 of 25) were identiﬁed as signiﬁcantly periodic
less, transcripts recovered from these populations, particularly in our dataset. Genes in these categories did, however, show high-
transcripts from Pelagibacter, did reveal variable and coordinated amplitude and highly variable expression, suggesting that they may
ENVIRONMENTAL
regulatory patterns among a large variety of different gene suites be differentially regulated based on environmental factors. Also
SCIENCES
and metabolic pathways (Fig. 2 and SI Appendix, Figs. S15–S17). unlike Ostreococcus, evidence for diel periodicity in growth and
division among this Synechococcus population was relatively weak,
Diel Rhythms in Gene Expression Among Ostreococcus and Syne- with no cell cycle or DNA replication transcripts showing signiﬁ-
chococcus. To further explore diel transcriptional rhythms, we cantly periodic expression. However, given the complicated re-
used harmonic regression-based analyses to identify individual lationship between nutrient status and the timing of DNA
transcripts that followed a sinusoidal curve with 24-h periodicity replication and cell division in Synechococcus (19, 20), the un-
(Fig. 3). Using this approach, 2,097 of 7,491 Ostreococcus tran- known growth state of this population, and the apparent presence
scripts and 130 of 3,950 Synechococcus transcripts were identiﬁed in our dataset of two ecologically distinct Synechococcus clades (SI
as signiﬁcantly periodic. None of the transcripts from the three Appendix, Fig. S8), a lack of a clear periodic signal in population-
heterotrophic populations were identiﬁed as signiﬁcantly periodic averaged transcript abundance for individual cell cycle and DNA
using this method. replication genes is perhaps not surprising.
Ostreococcus population transcripts showing strong periodic Both the Ostreococcus and Synechococcus data reﬂected broad
trends in transcript abundance included the master clock genes trends previously observed in laboratory monocultures. The di-
Circadian Clock Associated 1 and Timing of Cab expression 1 and urnal timing of expression of transcripts in wild Ostreococcus
genes associated with major metabolic functions (Fig. 3). Ribosomal populations in particular was remarkably consistent with gene
protein gene expression peaked in the early morning followed by an expression patterns observed in microarray-based laboratory
increase in gene transcripts associated in carbon ﬁxation, a sub- studies of O. tauri cultures grown in 12:12-h light:dark cycles (18)
sequent maximum in photosynthesis gene expression around mid- (Fig. 4). However, there were also a number of differences be-
day, and ﬁnally, cell cycle and DNA replication gene expression, tween results obtained for our natural populations and those
which reached a maximum at the end of the day. Although only one results observed in laboratory analyses of pure cultures.
mitochondrial gene (NADH dehydrogenase I subunit 6) was iden- A direct comparison of our ﬁeld study with any previously
tiﬁed as signiﬁcantly periodic, 49 of 61 total plastid genes exhibited published laboratory studies of Synechococcus was problematic,
24-h periodicity, with a predawn peak of biosynthetic genes (ribo- because most existing datasets (17) focus on S. elongatus, a fresh-
somal proteins and RNA polymerase) and a midafternoon peak of water species, and were performed under constant light illumi-
photosynthesis genes. Interestingly, whereas most genes involved in nation. It is worth noting, however, that our study identiﬁed
carbon ﬁxation were identiﬁed as signiﬁcantly periodic with peak fewer Synechococcus transcripts that exhibited periodic trends
expression around 8:00 AM, the large subunit of RuBisCO (the only in expression compared with laboratory studies of the freshwa-
carbon ﬁxation gene still encoded by the Ostreococcus plastid ge- ter Synechococcus species. However, the orthologs identiﬁed in
nome) did not show cyclical trends in transcript abundance. our ﬁeld populations do not simply represent a high-amplitude
Among Synechococcus population transcripts, two of three subset of the periodically expressed transcripts detected in labo-
kaiABC clock genes as well as genes involved in oxidative phos- ratory studies. Of 69 periodically expressed Synechococcus or-
phorylation, photosynthesis, respiration, and carbon ﬁxation (Fig. thologs in our ﬁeld study that could be mapped to probes in the
3) exhibited periodic trends in transcript abundance. The third laboratory-based microarray study, 24 orthologs were not iden-
clock gene, kaiB, was not identiﬁed as signiﬁcantly periodic, but its tiﬁed as signiﬁcantly periodic under laboratory conditions.
relatively low coverage level (zero to ﬁve copies per library) may For Ostreococcus, the populations reported here and a previous
have precluded accurate quantiﬁcation. ATP synthase, carboxy- laboratory study (18) identiﬁed ∼2,000 periodically expressed
Ottesen et al. PNAS Early Edition | 3 of 10
Fig. 2. Global transcriptional proﬁles from phototrophic and heterotrophic taxon bins. Heat maps depict third-order Fourier series models from geneARMA
clustered transcripts within phototrophic (A) and heterotrophic (C) taxa. Heat maps show model amplitude at the 13 time points (columns) for each of the
geneARMA clusters (rows). GeneARMA cluster models were normalized to an amplitude of one. Membership information and individual gene transcript
traces are shown in SI Appendix, Figs. S13–S17. (B) Principal components analysis of Ostreococcus and Synechococcus transcriptomes. Axes represent the ﬁrst
and second principal components and are labeled with the percent of total variance explained.
Ostreococcus transcripts. However, of 1,683 signiﬁcantly periodic studies have suggested single-time point day/night differences in
orthologs in our ﬁeld populations that could be mapped to probes the overall transcriptional proﬁles of marine microbial commu-
in the O. tauri microarray, 881 orthologs were not identiﬁed as nities, our analyses here provide a much higher-resolution picture
signiﬁcantly periodic in the laboratory study. Reprocessing the of genome-wide diel transcriptional dynamics among different
laboratory microarray data using our regression-based approach microbial populations in a natural microbial community in situ.
(with a Gaussian error model) increased the overlap in signiﬁ-
cantly periodic genes, but this analysis still yielded 393 genes Transcriptional Dynamics Within Pelagibacter Population Transcripts.
identiﬁed as periodic in our ﬁeld data but not in the laboratory The naturally occurring Pelagibacter populations that we sampled
study. We did not identify any obvious biological trends among did not exhibit strong circadian rhythms of gene expression. We
Ostreococcus transcripts that were identiﬁed as periodically did however observe evidence for well-orchestrated, genome-
expressed in the ﬁeld but not the laboratory. Although some of wide transcriptional regulation within this group. Hierarchical
these differences in gene expression patterns may be the result of clustering of samples and pathways showed a large degree of
methodological differences, many are likely to represent re- covariance between some major metabolic pathways (Fig. 5A). In
sponses to cues present in the natural environment but not within particular, the pathway-level signal for ribosomal proteins and
the relatively static laboratory environment. oxidative phosphorylation showed strong positive correlation with
In sum, these analyses validate our approach and conﬁrm that one another (correlation coefﬁcient = 0.98, P value = 1 × 10−8)
complex transcriptional patterns within distinct populations can be and were negatively correlated with many transport gene tran-
resolved within bulk community RNA proﬁles. Although previous scripts, including the ATP binding cassette (ABC) transporter
4 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1222099110 Ottesen et al.
PNAS PLUS
MICROBIOLOGY
ENVIRONMENTAL
SCIENCES
Fig. 3. Periodic gene expression in Ostreococcus- and Synechococcus-assigned transcripts. (A and B) 48-h time series of observed (points) and ﬁtted (lines)
transcript abundances is shown for selected transcripts from Ostreococcus (A) and Synechococcus (B) populations. Fitted values with solid lines represent
transcripts with signiﬁcantly periodic expression, whereas dotted lines represent best-ﬁt curves for transcripts not passing signiﬁcance cutoffs. For reference,
plots of relative light levels are shown. (C and D) Plots showing peak expression times for all orthologs (grey circles) and signiﬁcantly periodic orthologs (red
circles) assigned to major cellular functions in Ostreococcus (C) and Synechococcus (D). KEGG pathways for photosynthesis proteins and antenna proteins were
combined for the purposes of this plot along with purine and pyrimidine metabolism pathways. Ostreococcus (OC) and Synechococcus (SC) ortholog cluster
designations for transcripts in A and B: ATPF0A, ATP synthase subunit A, OC 9555 (plastid-encoded), SC 1180; Circadian Clock Associated 1 (CCA1) and Timing
of Cab expression 1 (TOC1), Ostreococcus clock genes OC 3107 and 7575; COX1, coxA, cytochrome c oxidase subunit I, OC 9595 (mitochondrial), SC 1503; Cyclin
B, mitotic cyclin B, OC 658; kaiA, -B, and -C, Synechococcus clock genes SC 332, 3370, and 334; ND1, ndhA, NADH dehydrogenase I subunit 1, OC 9600
(mitochondrial), SC 210; PAR, photosynthetically available radiation; psaA, PSI apoprotein A1, OC 9562 (plastid-encoded), SC 2040; psbA, PSII reaction center
D1, OC 9541 (plastid-encoded), SC 1091; rbcS, rbcL, RuBisCo large and small subunits, OC 6808, SC 130.
family (correlation coefﬁcient = −0.88, P value = 1 × 10−5 for In many microbial species, the abundance of ribosomal pro-
ribosomal proteins vs. ABC). Principal components analysis sug- teins and their transcripts is tightly regulated with respect to
gested that these metabolic signals explain more of the variability cellular growth rate (22–24), and the relative abundance of ri-
observed in wild Pelagibacter transcript proﬁles than any of the bosomal protein transcripts for a given taxon has been proposed
measured environmental parameters (Fig. 5B). Furthermore, as a metric for assessing in situ growth rates (8). The trends in
a Poisson regression-based analysis found that 101 of 1,810 ob- Pelagibacter gene expression that we report here however, sug-
served Pelagibacter transcripts were signiﬁcantly correlated with gest a very dynamic and rapidly ﬂuctuating reallocation of cel-
pathway-level signals for either ribosome biosynthesis or ABC lular resources between growth and nutrient acquisition. In this
transport (SI Appendix, Tables S4 and S5). A total of 74 of these context, cell populations exhibiting decreased ribosomal protein
transcripts was identiﬁed as up-regulated in tandem with the synthesis and increased transporter activity are most likely indi-
ribosome synthesis pathway (and down-regulated with transport cators of sporadically limiting substrate availability in the ambient
up-regulation), including not only transcripts coding for ribo- environment. The broad range of transporters that were expressed
somal proteins but also genes associated with C1 metabolism when ribosomal synthesis was down-regulated does not suggest
(21), secretion, ATP synthase (six of nine subunits), and proton- limitation by any single nutrient. Additionally, neither set of genes
translocating pyrophosphatase. The 27 transcripts that followed seems to reﬂect stationary-phase responses previously reported in
the opposite trend (up-regulated with transporters and down- laboratory cultures of Candidatus P. ubique (25) (SI Appendix,
regulated with ribosome synthesis) seemed to represent a gen- Tables S4 and S5), suggesting that none of these populations have
eralized transport signal, encompassing not only ABC trans- entered a starvation state. Instead, these trends reﬂect highly dy-
porters of amino acids, polyamines, and phosphonates but also namic and variable transcriptional responses (and potentially,
TRAP (Tripartite ATP-independent Periplasmic) transporters metabolic and growth rate variability) over short time scales, that
of carboxylic acids, an ammonium transporter, and an Na+/solute seem to be dictated by surrounding environmental and nutrient
symporter. variability.
Ottesen et al. PNAS Early Edition | 5 of 10
for small peptides, osmolytes, and dicarboxylic acids at high levels,
consistent with genomic analyses (26). SAR86 transcripts included
a large number of TonB-dependent receptors, previously hypoth-
esized to mediate uptake and metabolism of large polysaccharides
and lipids in this organism (14). Finally, the MGII transcriptome
was dominated by large cell surface proteins and amino acid
transporters, consistent with a hypothesized ability to metabolize
large proteins (15). Therefore, although these three very different
populations exhibited similar trends in expression of pathways in-
volved in growth and energy metabolism, they did not show similar
trends in most metabolic pathways (SI Appendix, Table S6). This
trend suggests that the synchronous transcriptional dynamics
observed for these groups may reﬂect bulk changes in the
Fig. 4. Comparison of peak expression times for periodically expressed
Ostreococcus orthologs in ﬁeld populations vs. a laboratory pure culture.
Each point represents 1 of 1,290 transcripts detected as signiﬁcantly periodic
in our ﬁeld study reported here and a previous microarray study of O. tauri
(18). For this comparison, microarray data (as reported in Gene Expression
Omnibus accession no. GSE16422) were reprocessed using our harmonic re-
gression method with a Gaussian error model.
Synchronous Transcriptional Dynamics Within Pelagibacter, SAR86,
and MGII. Although Pelagibacter exhibited the strongest and most
coherent patterns in transcript abundance among the three proteo-
rhodopsin-bearing heterotrophic populations examined, SAR86 and
MGII populations also exhibited transcriptional changes over the
2-d time series. Interestingly, these patterns seemed to reﬂect syn-
chronous responses to the same cryptic environmental changes that
appeared to be driving Pelagibacter transcription dynamics. The de-
gree of similarity between samples for the three organisms was sig-
niﬁcantly related (Fig. 6A), suggesting that the overall transcriptional
proﬁles of these groups were changing simultaneously, or nearly so.
This relationship was even more evident when Procrustes tests were
used to compare only the ﬁrst two axes of population-speciﬁc prin-
cipal components analyses (Fig. 6B), restricting the comparison to
the strongest trends in sample-to-sample variability. Furthermore,
the relative abundance of transcripts involved in ribosome bio-
synthesis and oxidative phosphorylation was positively correlated
across the 13 time points for the three groups (Fig. 6C and SI Ap-
pendix, Table S6). Similarly, independent geneARMA analyses of Fig. 5. Analysis of Pelagibacter transcriptional proﬁles. (A) Heat map
the three taxa identiﬁed gene cluster models exhibiting similar trends showing relative abundance of major metabolic pathways among Pelagi-
bacter-assigned sequences. Hierarchical clustering of samples and pathways
in transcript abundance across the three datasets (SI Appendix, Fig.
used average-linkage clustering based on Pearson correlation coefﬁcients.
S18). Altogether, these analyses suggest that all three populations For each pathway, the fraction of transcripts assigned to each pathway that
were responding to a common environmental signal, exhibiting is signiﬁcantly correlated (based on Poisson regression) with the overall
global, synchronous changes in taxon-level taxonomic proﬁles. pathway-level signal is listed. (B) Principal components analysis of Pelagi-
Among heterotrophic marine picoplankton, Pelagibacter, bacter transcriptional proﬁles. Axes represent the ﬁrst two components and
SAR86, and MGII have been hypothesized to catabolize different are labeled with the proportion of variance explained by each. Vector ﬁts for
types of carbon molecules (14, 15, 21, 26). Pelagibacter seems to use selected KEGG pathways (ABC, ABC transporters; OxP, oxidative phosphor-
simple peptides, amino acids, osmolytes, and single carbon com- ylation; Rib, ribosomal proteins) were highly signiﬁcant (P < 0.0001) and are
shown in red. Of the environmental data collected, only surface PAR (blue;
pounds, whereas the SAR86 and MGII groups have been hy-
P = 0.003) was signiﬁcantly correlated (P < 0.05) with the ordination. (Inset)
pothesized to specialize in the consumption of larger, more Loadings of Pelagibacter transcripts on the principal component axes.
complex polymers, such as proteins, polysaccharides, and lipids. Transcripts signiﬁcantly correlated with either ribosome or ABC transport
Consistent with these predictions, the most abundant transcripts pathways are colored based on their relationship with those pathways (cyan
for each of the taxon bins reﬂected very different metabolic proﬁles for orthologs positively correlated with ribosome and/or negatively corre-
(SI Appendix, Fig. S19). Pelagibacter expressed ABC transporters lated with ABC transport; magenta for the opposite relationship).
6 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1222099110 Ottesen et al.
PNAS PLUS
Fig. 6. Synchronous transcriptional dynamics among three heterotrophic populations. (A) Mantel test showing a signiﬁcant relationship in transcriptome
dissimilarity for Pelagibacter vs. SAR86 (Upper) and MGII (Lower) populations. Comparisons used pairwise Euclidean distances (square root of the sum of
squared differences in abundance for all transcripts). (B) Procrustes analysis revealing a large degree of congruence in sample clustering patterns for the three
MICROBIOLOGY
heterotrophic populations. In Procrustes tests, the results of principal components analyses are rotated and scaled to identify similarities in clustering patterns
while maintaining relationships between samples. A smaller distance between points corresponding to a single sample reﬂects a more similar clustering
pattern. Rotated and scaled SAR86 (blue) and MGII (green) analyses are overlaid on the Pelagibacter (red) results from Fig. 5B. Samples are labeled according
to position in the time series (9/16 2:00 PM is sample 1). Procrustes correlation (m12 ) and permutation-based signiﬁcances are shown for each comparison. (C)
The relative abundance of transcripts for ribosomal proteins (Upper) and genes associated with oxidative phosphorylation (Lower) within each taxon bin at
each time point. Pearson correlation coefﬁcient and P value are listed for relative abundances of these pathways within the Pelagibacter vs. their abundances
in SAR86 and MGII transcriptomes.
ENVIRONMENTAL
SCIENCES
availability of a broad range of carbon-based substrates rather S6). The metabolic proﬁles of SAR86 and MGII, although in-
than responses to the availability of a single common nutrient or dicative of different substrate preferences, share the commonality
substrate limitation. of the binding and hydrolysis of large polymeric substrates, such
Overall, the Pelagibacter population showed a stronger global as cell wall and membrane components. The overall abundances
transcriptional response, involving a larger number of transcripts, of SAR86 and MGII transcripts in the metatranscriptomes are
than the transcriptional responses observed for the SAR86 and negatively correlated (correlation coefﬁcient = −0.55, P value =
MGII populations. Although some of this difference may be due 0.047), which may suggest some degree of niche overlap and
to the higher sequence coverage of Pelagibacter (because of its competition between these organisms. In contrast, Pelagibacter
smaller genome and greater overall abundance), SAR86 and MGII specializes in a very different fraction of the substrate pool, in-
showed less sample-to-sample variation between transcriptional cluding low-molecular weight monomers, such as carboxylic acids
proﬁles than did Pelagibacter, even when all datasets were resam- and amino acids, that are likely to be generated as a byproduct of
pled to represent an even coverage level. Pelagibacter species have both SAR86 and MGII metabolic activities.
been shown to have highly streamlined genomes with reduced Altogether, our results revealed unexpected interspecies syn-
regulatory machinery (26). As a result, these α-proteobacteria may chronicity in the regulation of some pathways, as well as a sur-
exhibit a smaller range of transcriptional dynamics in response to prising degree of heterogeneity in the transcriptional proﬁles
complex environmental cues, resulting in strong global transcrip- among photoheterotrophic picoplankton populations. Tran-
tional dynamics. In contrast, SAR86 and MGII, with larger scripts encoding ribosomal proteins and genes involved in oxida-
tive phosphorylation were previously identiﬁed as highly variable
genomes and corresponding increased metabolic and regulatory
between samples collected at distant geographic locations (5).
versatility, might be expected to exhibit more complex time- and
Notably, we observed as much variability in transcript abundance in
location-speciﬁc behaviors not as easily distinguished using cluster-
these pathways over only a few hours time and in the same water
and correlation-based approaches. Alternatively, the SAR86 and
mass, as had been previously reported in transoceanic meta-
MGII populations may simply respond less strongly to the envi- genomic surveys. These data suggest that activity levels and respi-
ronmental cues that elicited strong Pelagibacter responses. It is also ration rates for heterotrophic populations may be spatially and
possible that higher-molecular weight, more-complex substrates temporally patchy in marine surface waters, potentially due to
preferred by SAR6 and MGII had a more patchy distribution than episodic substrate releases, such as small-scale lytic events and
the simple peptides and osmolytes used by Pelagibacter. As a result, other stochastic environmental processes. Additional exploration
gene expression in SAR86 and MGII populations might be of these behavioral patterns and the environmental cues that
expected exhibit a larger degree of cell-to-cell variability, resulting control them is likely to provide signiﬁcant insight into niche spe-
in weaker or less synchronized transcriptional dynamics when av- cialization of key microbial groups in the planktonic environment.
eraged at the temporal (∼40 min) and spatial (∼1 L) scale of our
sample collections. Implications. Episodic environmental variation and subsequent
Notably, although many pathway-level signals from SAR86 microbial responses play signiﬁcant roles in shaping marine bio-
and MGII populations were individually correlated with path- geochemical cycles (1). To better understand and predict micro-
way-level signals from Pelagibacter, these populations showed bial community responses to such events, it is critical to observe
signiﬁcantly less congruence to each other (SI Appendix, Table them on relevant temporal and spatial scales in situ. Here, we
Ottesen et al. PNAS Early Edition | 7 of 10
show that Lagrangian sampling combined with microbial com- on longer-term community assembly, structure, and functional
munity transcriptome analyses can resolve microbial dynamics on patterns. Future studies using the approaches we describe here
time scales of hours to days, yielding robust transcriptional ex- have potential to yield deeper insight into microbial environ-
pression patterns. Within a given taxon, the presence of re- mental interactions and their ecological consequences in a dy-
producible temporal patterns in the genome-wide transcription namic and constantly changing environment.
proﬁle indicated that a large fraction of individual cells within
a given population was responding synchronously, at least within Methods
the temporal resolution of our measurements. Furthermore, dis- Sample Collection. Seawater samples (1 L) were collected along the central
parate heterotrophic taxa within the community also seemed to be California coast in September of 2010 using an Environmental Sample Pro-
simultaneously responding to similar environmental cues but cessor (ESP) (6, 10) suspended beneath a free-drifting surface ﬂoat at 23-m
depth. Microbes in the 0.22- to 5-μm size fraction were collected and pre-
expressed different functional gene suites in response to them,
served as previously described (6) but with a reduced incubation time in
suggesting a potential means by which multispecies metabolic and RNALater (Ambion) of 2 min per wash, which yields RNA of similar integrity.
biogeochemical processing might be coordinated. The instrument was recovered on September 19, 2010, and sample ﬁlters
The speciﬁc environmental factors that inﬂuence the observed were moved to individual vials for long-term storage at −80 °C within 36 h.
synchronized transcriptional regulation of diverse heterotrophic
microbial species are unknown at present. It may be that each Library Preparation and Sequencing. Approximately one-half of each ﬁlter was
species population is responding independently to the same (or used for extraction of total community RNA and subtractive hybridization of
simultaneously occurring) physicochemical environmental cues. rRNAs as previously described (28). Synthesis of antisense rRNA probes used
However, we cannot rule out that these transcriptional patterns DNA extracted from 5.8- to 7.1-L seawater samples collected using a rosette
are partly inﬂuenced by species-to-species communication and sampler at 23-m depth near the ESP at 10:00 AM on September 15, 17, and 19.
signaling cascade events. It is well-known that speciﬁc auto- PCR products from the three dates were pooled for use as templates for
synthesis of bacterial, archaeal, and eukaryotic large- and small-subunit rRNA
inducer molecules can elicit complex regulatory responses within
probes. Approximately 150 ng total community RNA were hybridized with
and between disparate bacterial species (27). It, therefore, seems 300 ng each bacterial, 100 ng each archaeal, and 150 ng each eukaryotic
possible that speciﬁc physicochemical environmental cues might small- and large-subunit probes. Probe removal used two successive 5-min
be sensed initially by only one or a few species. These cues might incubations with 75 μL washed Streptavidin beads (NEB) in a ﬁnal volume of
then be indirectly broadcast to other species by small-molecule 50 μL. Puriﬁed and concentrated message was linearly ampliﬁed and con-
signaling, thereby transmitting the response to other community verted to cDNA as described previously (2).
members. Future work, using higher-frequency sampling of mi- A GS FLX Titanium system (Roche) was used to sequence cDNA. Library
crobial community transcriptional proﬁles, may provide the tem- preparation followed the Titanium Rapid Library Preparation protocol. To
poral resolution necessary to distinguish between these different improve the retention of smaller cDNA molecules, adaptor-ligated libraries
cross-community sensing and response modalities. Regardless were not diluted before size selection with AMPure XP beads. Libraries were
quantiﬁed using the Titanium Slingshot kit (Fluidigm) and added to emulsion
of the speciﬁc mechanism(s) of multipopulation environmental
PCR reactions at 0.1 molecules per bead. Sequencing and quality control
sensing and response, both the above possibilities could elicit the followed the manufacturer’s recommendations.
multispecies transcriptional events that we observed. This tem-
poral entrainment could conceivably serve to coordinate down- Sequence Analysis and Annotation. Our analytical pipeline for sequence an-
stream biogeochemical processing and nutrient regeneration. For notation is summarized in SI Appendix, Fig. S3. Metatranscriptomic sequence
example, hydrolysis of higher-molecular weight organic com- libraries were screened for rRNA-derived transcripts and duplicates as pre-
pounds by SAR86 and GII Euryarchaea could produce monomers viously described (28). Putative coding sequences with bit scores ≥ 50 were
that were subsequently processed by Pelagibacter. initially identiﬁed by BLASTX against the NCBI nonredundant peptide data-
Very little is known about the actual metabolic rates of speciﬁc base as downloaded on May 31, 2010. After this initial analysis, additional
heterotrophic picoplankton species in the environment. Most in reference sequences became available for SAR86 cluster Gammaproteobac-
situ growth estimates have been derived from bulk averaged teria and MGII Euryarchaea. All unique nonrRNA sequences were again
compared by BLASTX with these newly released genome sequences, retain-
radioisotope incorporation into DNA or protein across entire
ing those sequences with bit scores ≥50 that were greater than or equal to
(heterogeneous) assemblages. Complex predator–prey dynamics their best match in the previous NCBInr database search (SI Appendix, Fig. S3).
involving phages and protists further complicate the measure- Sequence classiﬁcation and annotation used the highest-scoring database
ment of species-speciﬁc growth rates in situ. The transcriptional match and followed the NCBI taxonomy (with the exception of α-proteo-
proﬁles of heterotrophic marine picoplankton that we observed bacterium HIMB114, which we included within the Pelagibacter). For
suggested that disparate species were responding rapidly to en- sequences matching equally well to multiple genes within the database, all
vironmental variability with frequent and synchronized up- or matches were required to fall within the Chlorophyta for assignment to
down-regulation of transcripts in many pathways. In particular, Ostreococcus, the Cyanobacteria for Synechococcus, and the SAR11 cluster
gene transcript abundance in growth-related pathways, like ri- for Pelagibacter. All top-scoring matches were required to fall within the
SAR86 cluster or the MGII Euryarchaea for assignment to those taxon bins.
bosome biosynthesis and oxidative phosphorylation, varied sig-
Sequences were mapped to a single reference gene for annotation purposes,
niﬁcantly over the 2-d sampling period in these populations. with preference given to references that were abundant in the dataset and
Notably, we did not observe gene expression patterns that would references derived from sequenced genomes. Data ﬁles containing all taxon-
suggest transition into the stationary phase over the 2-d sampling speciﬁc transcript sequences for Ostreococcus, Synechococcus, Pelagibacter,
period. In total, these data suggest that frequent periods of SAR86, and Group II Euryarchaea are available from the authors on request.
metabolic acceleration and deceleration, even over the time span Within each major taxonomic bin, sequence counts for genes present in
of only one doubling, may be a common modality in heterotro- multiple reference genomes were compiled to generate ortholog cluster-based
phic marine picoplankton species in situ. transcript abundances. This approach was implemented to avoid artiﬁcial di-
The kinetics and regularity as well as quantitative and quali- vision of transcript pools from environmental organisms among multiple im-
tative attributes of microbial response dynamics in situ have perfectly matched reference sequences. Pairwise reciprocal best BLAST hits
implications beyond biogeochemical considerations. Short time- between translated coding sequences of reference genomes were compiled to
generate ortholog cluster assignments. Identiﬁcation of shared genes in
scale microbe–environment and microbe–microbe interactions
Ostreococcus used the previously described e-value–based signiﬁcance cutoff
ultimately give rise to microbial population variation, functional of 10−8 (29), whereas Synechococcus, Pelagibacter, and MGII comparisons used
variability, microbial community succession, and large-scale tax- an e-value cutoff of 10−5 and required 30% alignment identity over 80% of
onomic shifts over days, weeks, and months and across seasons. the longer sequence; SAR86 comparisons used the 10−5 e-value and 30%
A better understanding of short time-scale ecological microbial identity cutoffs, but (because of the highly fragmented nature of the SAR86
community processes should therefore provide a new perspective assemblies) only 50% of the longer sequence was required to align. Func-
8 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1222099110 Ottesen et al.
PNAS PLUS
tional annotation of ortholog clusters used the Kyoto Encyclopedia of Genes clusters for Pelagibacter, 7 clusters for SAR86, and 9 clusters for MGII (SI
and Genomes (KEGG) (30) annotations where available. Key species lacking Appendix, Fig. S12).
curated annotations were analyzed using the KEGG automated annotation
pipeline (31). In some cases, metatranscriptomic sequences were mapped to Regression Tests for Count Data. Gene-by-gene tests to identify transcripts
reference genes that were not derived from sequenced genomes (i.e., envi- exhibiting sinusoidal periodicity or covariance with pathway-level functions
ronmental clones). Where possible, these references were assigned to used Poisson log-linear regression as implemented in the R software package
ortholog clusters based on single-directional peptide BLAST (signiﬁcance
(34). Library size offsets were based on the total number of transcripts
cutoffs as above). Full lists of ortholog cluster membership, annotation, and
assigned to a given taxon at each time point. For periodicity tests, the si-
results of statistical analyses are available in Datasets S1, S2, and S3. À2π Á
nusoidal function xt ¼ Acos 24t þ ω , where A represents the amplitude, ω is
the phase, and t is the midpoint of the sampling time in hours, was reduced
Functional Clustering of Transcriptional Proﬁles. Cluster-based analyses were À Á À2π Á
used to examine global patterns of transcription within and across taxon bins. to the linear equation xt ¼ αcos 2π t þ βsin 24t , where α ¼ A cos ω and
24
Because many of these analyses assume normally distributed data, a variance- β ¼ − A sin ω.
stabilizing transformation (32) was applied before analysis: The signiﬁcance of each model ﬁt was assessed using both a χ2 test (as
implemented in the anova.glm function) and a permutation test. Permuta-
8 rﬃﬃﬃﬃ 9
>
> c >
> tion P values were calculated as the fraction of randomized datasets with
>
< sin
−1
if c > 0 >
=
N a model ﬁt (evaluated using the difference between the null and residual
x¼ rﬃﬃﬃﬃﬃﬃﬃ ;
> −1 1
> sin > deviance) as good or better than model ﬁts of the actual experimental data.
>
: if c ¼ 0 >
>
;
4N To optimize computational resources, permutations continued until at least
10 randomized datasets with likelihood ratios equal to or exceeding the
where c is the count of each transcript and N is the library size at each time observed data had been identiﬁed (500–50,000 permutations). False dis-
point. Mantel tests, principal components analyses, and Procrustes tests used covery rate-corrected (35) P values of at least 0.1 from both tests were re-
variance-stabilized transcript abundances and were carried out using the quired for a relationship to be considered signiﬁcant.
vegan software package (33).
The geneARMA (16) package was used to identify global patterns of ACKNOWLEDGMENTS. This manuscript is dedicated to the memory of Carl
R. Woese, who profoundly and forever changed our understanding of the
MICROBIOLOGY
coregulation among Ostreococus, Synechococcus, Pelagibacter SAR86, and
MGII transcripts. This algorithm performs model-based soft clustering of evolution and nature of life on Earth. We thank the Environmental Sample
transcriptional proﬁles using an autoregressive moving average model, Processor team for their help in planning and implementing this experiment,
especially Gene Massion, Brent Roman, Scott Jensen, Roman Marin III, Chris
ARMA(p,q), for the longitudinal covariance structure and Fourier series
Preston, Jim Birch, and Julie Robidart. We also thank the engineering
functions to model gene expression patterns. Data were ﬁltered before technicians and machinists at Monterey Bay Aquarium Research Institute
analysis such that the maximum count for a row (ortholog time series) was (MBARI) for their help and dedication to instrument development, and the
greater than 5 and the sum of the row was greater than 10. For each crew of the R/V Western Flyer for their support and expertise during ﬁeld
dataset, the algorithm was run multiple times (100–400 iterations) from operations. This work is a contribution to the Center for Microbial Ecology:
random initializations for P = 1–2 autocorrelation terms, q = 0–1 moving Research and Education (C-MORE). Development and application of the
ENVIRONMENTAL
average terms, K = 1–3 Fourier function terms, and J = 1–40 clusters. The Environmental Sample Processor has been supported by National Science
SCIENCES
iteration possessing the highest likelihood for each parameter combination Foundation Grant OCE-0314222 (to C.A.S.), National Aeronautics and Space
Administration Astrobiology Grants NNG06GB34G and NNX09AB78G (to
was used for ﬁnal inference, and the Akaike Information Criterion was used
C.A.S.), the Gordon and Betty Moore Foundation (C.A.S.), and the David
to assess model complexity. The optimal model conﬁguration, as identiﬁed and Lucile Packard Foundation. This work was supported by National Science
by Akaike Information Criterion, for all ﬁve datasets was an ARMA(1,1) Foundation Science and Technology Center Award EF0424599 (to C.A.S. and
covariance structure and a three-term Fourier series mean function, and it E.F.D.), a grant from the Gordon and Betty Moore Foundation (to E.F.D.),
included 39 clusters for Ostreococcus, 25 clusters for Synechococcus, 13 and a gift from the Agouron Institute (to E.F.D.).
1. Karl DM (2002) Nutrient dynamics in the deep blue sea. Trends Microbiol 10(9): 17. Ito H, et al. (2009) Cyanobacterial daily life with Kai-based circadian and diurnal
410–418. genome-wide transcriptional control in Synechococcus elongatus. Proc Natl Acad Sci
2. Frias-Lopez J, et al. (2008) Microbial community gene expression in ocean surface USA 106(33):14168–14173.
waters. Proc Natl Acad Sci USA 105(10):3805–3810. 18. Monnier A, et al. (2010) Orchestrated transcription of biological processes in the marine
3. Gilbert JA, et al. (2008) Detection of large numbers of novel sequences in the picoeukaryote Ostreococcus exposed to light/dark cycles. BMC Genomics 11:192.
metatranscriptomes of complex marine microbial communities. PLoS One 3(8):e3042. 19. Binder BJ, Chisholm SW (1995) Cell cycle regulation in marine Synechococcus sp.
4. Poretsky RS, et al. (2009) Comparative day/night metatranscriptomic analysis of strains. Appl Environ Microbiol 61(2):708–717.
microbial communities in the North Paciﬁc subtropical gyre. Environ Microbiol 11(6): 20. Binder B (2000) Cell cycle regulation and the timing of chromosome replication in
1358–1375. a marine Synechococcus (cyanobacteria) during light- and nitrogen-limited growth. J
5. Hewson I, Poretsky RS, Tripp HJ, Montoya JP, Zehr JP (2010) Spatial patterns and light- Phycol 36(1):120–126.
driven variation of microbial population gene expression in surface waters of the 21. Sun J, et al. (2011) One carbon metabolism in SAR11 pelagic marine bacteria. PLoS
oligotrophic open ocean. Environ Microbiol 12(7):1940–1956. One 6(8):e23973.
6. Ottesen EA, et al. (2011) Metatranscriptomic analysis of autonomously collected and 22. Keener J, Nomura M (1996) Regulation of ribosome synthesis. Escherichia coli and
preserved marine bacterioplankton. ISME J 5(12):1881–1895. Salmonella: Cellular and Molecular Biology, eds Neidhardt FC, et al. (ASM Press,
7. Marchetti A, et al. (2012) Comparative metatranscriptomics identiﬁes molecular bases Washington, DC), 2nd Ed.
for the physiological responses of phytoplankton to varying iron availability. Proc Natl 23. Fazio A, et al. (2008) Transcription factor control of growth rate dependent genes in
Acad Sci USA 109(6):E317–E325. Saccharomyces cerevisiae: A three factor design. BMC Genomics 9:341.
8. Gifford SM, Sharma S, Booth M, Moran MA (2012) Expression patterns reveal niche 24. Hendrickson EL, et al. (2008) Global responses of Methanococcus maripaludis to
diversiﬁcation in a marine microbial assemblage. ISME J, 10.1038/ismej.2012.96. speciﬁc nutrient limitations and growth rate. J Bacteriol 190(6):2198–2205.
9. Martin AP, Zubkov MV, Burkill PH, Holland RJ (2005) Extreme spatial variability in 25. Smith DP, et al. (2010) Transcriptional and translational regulatory responses to iron
marine picoplankton and its consequences for interpreting Eulerian time-series. Biol limitation in the globally distributed marine bacterium Candidatus pelagibacter
Lett 1(3):366–369. ubique. PLoS One 5(5):e10487.
10. Preston CM, et al. (2011) Underwater application of quantitative PCR on an ocean 26. Giovannoni SJ, et al. (2005) Genome streamlining in a cosmopolitan oceanic
mooring. PLoS One 6(8):e22522. bacterium. Science 309(5738):1242–1245.
11. Morris RM, et al. (2002) SAR11 clade dominates ocean surface bacterioplankton 27. Ng WL, Bassler BL (2009) Bacterial quorum-sensing network architectures. Annu Rev
communities. Nature 420(6917):806–810. Genet 43:197–222.
12. Scanlan DJ, et al. (2009) Ecological genomics of marine picocyanobacteria. Microbiol 28. Stewart FJ, Ottesen EA, DeLong EF (2010) Development and quantitative analyses of
Mol Biol Rev 73(2):249–299. a universal rRNA-subtraction protocol for microbial metatranscriptomics. ISME J 4(7):
13. Demir-Hilton E, et al. (2011) Global distribution patterns of distinct clades of the 896–907.
photosynthetic picoeukaryote Ostreococcus. ISME J 5(7):1095–1107. 29. Palenik B, et al. (2007) The tiny eukaryote Ostreococcus provides genomic insights
14. Dupont CL, et al. (2012) Genomic insights to SAR86, an abundant and uncultivated into the paradox of plankton speciation. Proc Natl Acad Sci USA 104(18):7705–7710.
marine bacterial lineage. ISME J 6(6):1186–1199. 30. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic
15. Iverson V, et al. (2012) Untangling genomes from metagenomes: Revealing an Acids Res 28(1):27–30.
uncultured class of marine Euryarchaeota. Science 335(6068):587–590. 31. Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M (2007) KAAS: An automatic
16. Li N, et al. (2010) Functional clustering of periodic transcriptional proﬁles through genome annotation and pathway reconstruction server. Nucleic Acids Res 35(Web
ARMA(p,q). PLoS One 5(4):e9894. Server issue):W182–W185.
Ottesen et al. PNAS Early Edition | 9 of 10
32. Liu Z, Hsiao W, Cantarel BL, Drábek EF, Fraser-Liggett C (2011) Sparse distance-based 34. Team RDC (2012) R: A Language and Environment for Statistical Computing
learning for simultaneous multiclass classiﬁcation and feature selection of (R Foundation for Statistical Computing, Vienna).
metagenomic data. Bioinformatics 27(23):3242–3249. 35. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate—a practical and
33. Okansen J, et al. (2011) Vegan: Community Ecology Package, R Package Version 2.0-2. powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 57(1):
Available at http://cran.r-project.org/web//packages/vegan/index.html. 289–300.
10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1222099110 Ottesen et al.