If you would like to be notified about new versions, new features, or any other news related to HUMAnN please join our mailing list: the HUMAnN Google Group.

HUMAnN is a pipeline for efficiently and accurately determining the presence/absence and abundance of microbial pathways in a community from metagenomic data. Sequencing a metagenome typically produces millions of short DNA/RNA reads. HUMAnN takes these reads as inputs and produces gene and pathway summaries as outputs:

The abundance of each orthologous gene family in the community. Orthologous families are groups of genes that perform roughly the same biological roles. HUMAnN uses the KEGG Orthology (KO) by default, but any catalog of orthologs can be employed with minor changes (COG, NOG, etc.)

The presence/absence of each pathway in the community. HUMAnN refers to pathway presence/absence as "coverage," and defines a pathway as a set of two or more genes. HUMAnN uses KEGG pathways and modules by default, but again can easily be modified to use GO terms or other gene sets.

The abundance of each pathway in the community, i.e. how many "copies" of that pathway are present.

HUMAnN can thus be used in tandem with any translated BLAST program, with out-of-the-box support for NCBI BLAST, USEARCH, MBLASTX, and MAPX. The pipeline converts sequence reads into coverage and abundance tables summarizing the gene families and pathways in one or more microbial communities. This lets you analyze a collection of metagenomes as a matrix of gene/pathway abundances, just like you might analyze a collection of microarrays.

We are aware that KEGG is now commercial, and we have updated HUMAnN accordingly. In brief, we include derived files and information needed for normal HUMAnN operation, but creation and evaluation of synthetic metagenomes is impeded without a KEGG license. Please contact the KEGG developers if this is an inconvenience for you contact us at the HUMAnN Google Group for assistance in evaluating HUMAnN output if necessary.

Many thanks to the NIH and to the entire Human Microbiome Project team for making the HMP possible and for the many collaborators who helped to make HUMAnN a reality. Sahar Abubucker and Makedonka Mitreva (Washington University) co-led the Metabolic Reconstruction group, Nicola Segata (Harvard School of Public Health) performed many HMP-specific analyses, the pipeline incorporates software from Yuzhen Ye (Indiana University), Beltran Rodriguez-Mueller (SDSU), and Pat Schloss (University of Michigan), and specific contributors include Alyx Schubert (University of Michigan), Jeremy Zucker (Broad Institute), Brandi Cantarel (UMD), Qiandong Zeng (Broad Institute), Johannes Goll (JCVI), and many others.

An overview of HUMAnN

Metabolic modules differentially abundant in one or more body sites of the human microbiome

Synthetic mock communities for validation

We generated 4 synthetic metagenomes to aid in evaluating HUMAnN's predictive accuracy. We generated two high-complexity (HC, 100 organisms) synthetic metagenomes called HC1 and HC2 and two low-complecity (LC, 20 organisms) synthetic metagenomes called LC1 and LC2. HC1 and LC1 have even distributions (all organisms present at equal abundance) while HC2 and LC2 have staggered distributions (organisms have random, log-normally distributed abundances). Organisms included in the LC metagenomes were manually selected from KEGG v54-curated reference genomes associated with the human microbiome, while organisms included in the HC metagenomes were randomly selected from all manually curated bacterial genomes.