There is considerable interest in how human microbial communities reveal and affect our health. Investigators are now performing numerous studies that contrast the microbiomes of healthy individuals with those affected with selected diseases. They often observe differences between healthy and disease states, and these differences can be used to build a disease signature. But a key question remains: How well do these signatures hold up across datasets?

Pasolli et al. performed a series of parallel analyses to answer this question. They constructed and evaluated metagenomic signatures across a series of studies of human microbial communities. Some of their findings should not surprise many researchers: It is relatively easy to differentiate the metagenomes derived from microbial communities in different body sites, and signatures tend to work best within the study from which they were generated, even when proper separation of training and testing data is preserved. Their results from evaluations incorporating healthy individuals from multiple studies are more surprising. When the team used metagenomic information to predict whether or not a subject had type 2 diabetes, the signature was more successful in an independent study if healthy individuals from other studies were included during signature construction. Including healthy individuals from multiple studies—particularly those performed at different institutions and in different populations—may provide the algorithms with a broader survey of healthy microbial communities. The authors also flip the perspective of disease signatures; instead of trying to build a disease predictor, they try to build a health predictor. They build cross-disease signatures of healthy individuals and apply that signature to a held out disease. The results from this analysis suggest that there may also be a microbiome signature of heath.

Our understanding of the composition of a healthy microbiome and, in particular, untangling the cause and effect between microbiome composition and health is still in early stages. Researchers who aim to generate metagenomic disease signatures should consider including healthy controls from additional studies. As the collection of public metagenomic data grows, we can look forward to more diverse views into the composition of healthy human-associated microbial communities.