Share

Copy the link

Today we are thrilled to publish the first phase of the Earth Microbiome Project in Nature: "A communal catalogue reveals Earth's multiscale microbial diversity". The "we" in this statement represents a vast number of individuals whose collective efforts have led to this milestone. It is our hope that many more people will join in the next phases of the project. But today I want to look back on the contributions of so many who brought us to this point.

In July 2010, a group of 26 researchers convened for the Terabase Metagenomics Workshop in Snowbird, Utah. Meeting leader Rick Stevens tasked the group with a simple but bold question: "What could you learn about microbial ecology if you had a trillion-base-pair sequencing run?" The researchers surmised they could run amplicon sequencing and metagenomes for 200,000 samples—and the vision of the Earth Microbiome Project (EMP) was born. The only scientists crazy enough to follow up on the idea were Jack Gilbert, Janet Jansson, and Rob Knight, who became the project's founders.

Learn more about the Earth Microbiome Project at earthmicrobiome.org and github.com/biocore/emp, and follow us on Twitter at twitter.com/earthmicrobiome.

Jack, Janet, and Rob sent out the call to microbial ecologists: send us your proposals for samples to have sequenced by 16S ribosomal RNA amplicon sequencing. Provided there was a valid study design with good metadata, the EMP labs would extract DNA and sequence the samples, using a standard protocol. The samples started to pour in from hundreds of researchers in dozens of countries. From the Arctic Circle to Antarctica, the labs received water, soils, sediments, lots of swabs, and (of course) poop.

Fast forward to June 2012. Jack was speaking about the EMP at King Abdullah University of Science and Technology (KAUST) in Saudi Arabia, where I happened to be a postdoc at the time. I had dinner with Jack and was excited about the project, but I didn't think much more about it after that. In August of that year, I attended the ISME meeting in Copenhagen, and Rob gave a talk about the EMP. The project was audacious, but the community was clearly excited about it. Later that year, Rob also visited KAUST, and I had a chance to introduce myself. I expressed an interest in working in his lab, and the following September I started a postdoc in the Knight Lab.

When I arrived in Boulder, I had never worked with amplicon data, didn't know how to use QIIME, and didn't know Python—all critical parts of doing bioinformatics in the Knight Lab. The EMP was a successful project by then, generating data from thousands of samples for dozens of individual studies, but the vision of combining these data into a single analysis was not yet realized. However, the project started to gain momentum after the Knight Lab moved to the University of California San Diego, where we assembled a core analysis team, with help from additional researchers around the United States. Because of the sheer scale of the dataset, we limited our initial meta-analysis to the first 97 studies—this was still 27,751 samples and over 2 billion sequences!

We quickly realized that nearly every software tool we used had to be rewritten to handle the scale of the EMP dataset. This included the indispensable QIIME software (led by Greg Caporaso) and the associated online server and database Qiita (Antonio González, Jose Navas, Gail Ackermann, and others). Analysis of beta-diversity patterns required retooled versions of UniFrac (Daniel McDonald) and Emperor (Yoshiki Vázquez-Baeza). Additionally, an entirely new OTU picking algorithm, Deblur (Amnon Amir and Daniel McDonald), was developed that uses exact sequences instead of traditional OTUs, which was central to the EMP meta-analysis. A search tool, Redbiom (Daniel McDonald), was developed to allow researchers to query the EMP catalogue and search by metadata values for particular samples or by sequences for their favorite microbes.

Simultaneously, the metadata from those tens of thousands of samples had to be wrangled to enable cross-study comparisons, a painstaking task. Luckily I had been tapped to teach data analysis at the Scripps Institution of Oceanography, forcing me to finally master Python and the powerful data science package Pandas. This gave me the tools to get a handle on the all-important metadata. In another coincidence, Jon Sanders and I had come up with an ontology (technically, a structured categorical variable) to classify sample types of new samples coming into the project. We realized that this EMP Ontology (EMPO) would be a useful way to categorize existing EMP samples, building on the Environment Ontology. We were continually amazed by how well it captured different measures of microbial diversity.

Two examples of EMP Trading Cards, showing how exact 90-bp fragments of the 16S rRNA gene are distributed across the EMP dataset.

We are only just scratching the surface of the "Earth microbiome", which of course is really a collection of countless microbiomes. However, with the framework introduced here, we are starting to get a handle on the factors driving the composition of microbial communities in different environments. We can now flip the question around and ask, "Where in the world is my favorite microbe found?" The EMP Trading Cards introduced in our paper (examples shown above) give a teaser of what this future might look like.

As we continue to add new samples from old and new studies alike, and expand our analyses to metagenomics and metabolomics, one thing is certain: the EMP will continue to be a collaborative and communal effort that everyone can take part in. We are excited to see where it leads!

Luke Thompson studies microbial distributions and processes in marine and terrestrial environments using both 'omics tools and traditional methods. He holds a BS from Stanford University in Biological Sciences and a PhD from MIT in Microbiology.

In general, the enteric microbiota composition is relatively stable due to the ongoing competition of bacterial members for space and nutrients. Newly arriving bacteria hardly find an empty niche and sufficient nutrients to thrive and colonize. Shortly after birth, however, this situation is markedly different. The neonate is born sterile and newly incoming bacteria can easily find a place and nutrients to stay and colonize the neonate's intestinal mucosa. Notably, it is generally thought that this process is mainly driven by exposure to bacteria derived e.g. from the mother of the environment.
But is that really true? If only the environment determines the microbiota composition couldn't that go terribly wrong? Shouldn't we expect that host factors influence the emerging microbiota ensuring a beneficial bacterial composition?

This community is not edited and does not necessarily reflect the views of Nature Research. Nature Research makes no representations, warranties or guarantees, whether express or implied, that the content on this community is accurate, complete or up to date, and to the fullest extent permitted by law all liability is excluded.

Please sign in or register for FREE

Sign in to Nature Research Microbiology Community

Register to Nature Research Microbiology Community

The Nature Research Microbiology Community provides a forum for the sharing and discussion of ideas and opinions about microbiology. Through posts, discussion, image and video content, the community space can be used by members to communicate with each other, and with editors, about topics ranging from the science itself through to policy, society and day to day life. It is also a place to learn more about the activities of Nature Microbiology's editors and the policies and practices of the journal.