Taxonomic and functional profiling of the microbiomes

Metagenomics has brought new challenges to bioinformatics. We have solved this challenge using cloud computing that allows us to analyze metagenomics massive data providing us scalable, real time and on demand computing.

We offer a quick and high quality analysis of massive data from taxonomic profiling of 16S rRNA samples to functional profiling of complex metagenomics samples using shotgun sequencing approaches.

We provide you a complete solution solving all the steps of your microbiome project:

The sample processing

The sequencing

The bioinformatics analysis

The interpretation providing you rich reports with charts and visualizations.

Era7 Bioinformatics is especially committed to provide services for microbiome analysis. We adapt our metagenomics data analysis service to the requirements of each project and design solutions for specific goals such as detection of genes with specific functionalities or enzymatic activities, the comparison between communities, microbiome differences before and after a treatment, study of the abundance of microbial species under varying environmental conditions or analysis of metabolic pathways.

What sequencing technologies?

We provides sequencing and bioinformatics analysis of microbiomes using illumina and Pacbio sequencing technologies.

Our methods of microbiome bioinformatics analysis have a version specifically designed for illumina and another for PacBio sequences.

What is MG7?

MG7 is a complete metagenomics data analysis tool developed by Era7 Bioinformatics oriented to provide taxonomic assignment results for big sets of sequences. MG7 pipelines of analysis are continuously being updated with the newest approaches for metagenomics analysis.

Our Reference database DB7

We have built our reference database DB7 of 16S and 18S sequences based on the complete RNAcentral. RNAcentral is a general database for all the types of non coding RNA maintained by RNAcentral Consortium: http://rnacentral.org/expert-database

RNAcentral includes the 16S and 18S sequences from the most important databases for metagenomics data analysis:

Silva

GreenGenes

RDP

ENA (all non coding RNA included at ENA)

RefSeq (all non coding RNA included at RefSeq

We have manually curated the database and have designed systematic curation approaches that allow us doing a rapid curation of the next RNAcentral releases.

Exhaustive taxonomic assignment for each read

We compare each read against all the sequences in our DB7 database (see above). The taxonomic assignment for each read is based on an exhaustive BLASTN of each read against our DB7 database of 16S and 18S sequences.

We do a specific taxonomic assignment for each read avoiding a previous step of binding and clustering. Some methods of assignment compare the sequences only against a small rRNA database or avoid computational cost clustering or binning the sequences first, and then doing the assignments only for the representative sequence of each cluster. Assignment based on direct similarity of each read, one by one, compared against a sufficiently wide database is a very exhaustive method for assignment [Segata-2013][Morgan-2012].

We use two different taxonomic assignment approaches: Best Blast Hit (BBH) and Lowest Common Ancestor (LCA).

Best Blast Hit (BBH) approach

The taxonomic assignment is based on the Best BLAST Hit obtained in the BLASTN of each read against DB7 database.

Each read is assigned to the taxon corresponding to the Best Blast Hit. Only the hits over a threshold of similarity (perc_identity) and with the aligned region over a threshold of percentage of the query length (qcovers) are considered. After that filtering we select for each read only those hits reaching the maximum percentage of identity obtained for that read and, among them, we select the hit with the higher bitscore as the BBH.

The parameters can be adapted for each project depending on the length of the reads, on the error rate of the sequencing technology, and even on the rareness of the organisms that are expected in a sample.

Lowest Common Ancestor (LCA) approach

In this case the taxonomic assignment is based on the Lowest Common Ancestor paradigm. We follow the same filtering protocol that for BBH assignment selecting only the hits over a threshold of similarity and over an alignment length (qcovers). We select for each read only those hits reaching the maximum percentage of identity obtained for that read. Then we obtain their taxonomic assignments and we search on the taxonomy tree the node including all the assignments, which is their Lowest Common Ancestor taxon. Some reads could not find sequences with enough similarity in the database and then they would be classified as reads with no hits.

The LCA approach has been adopted by advanced tools of metagenomics analysis as the last version of MEGAN [Huson–2013]. We have adopted an assignment algorithm very similar to the algorithm used in MEGAN.

Our LCA algorithm for taxonomic assignment

The goal of the algorithm is to assign each read to a node of the taxonomy tree. For each read these are the steps that we do:

Select all hits with qcovs (percentage of the query sequence aligned to the subject sequence) over the defined threshold

Select the hits with the maximum perc_identity for each read

For each read to calculate (sensu stricto) the Lowest Common Ancestor (LCA) for the taxonomic assignments of the selected reads with the maximum perc_identity

MG7 steps in the process of analysis of the reads

The reads are analyzed following a complex process in the cloud using advanced methods of parallelization working with the possibilities that offer the Amazon Web Services (AWS).

In this process each read will be assigned to a taxon based on sequence similarity to DB7 database sequences. Massive BLASTN tasks were performed to achieve this using MG7 developed by Era7. (See MG7 preprint in http://biorxiv.org/content/early/2015/09/28/027714).

Microbiome Applications

Traditional microbial genome sequencing relies upon cultivated clonal cultures but the new era of genomics is facing a new challenge: the metagenomics analysis.

Genomics analysis of the microbial communities contained in an environmental sample is one of the applications of Next Generation Sequencing Data both, in the case of 16S or 18S metagenomics or in the case of shotgun metagenomics.

The number of publications about metagenomics is exponentially growing.

Click on each area of the panel to see some recent publication about metagenomics in that field: