Horizontal gene transfer is not a hallmark of the human genome

Crisp et al. recently reported that 145 human genes have been horizontally transferred from distant species. Here, I re-analyze those genes listed by Crisp et al. as having the highest certainty of having been horizontally transferred, as well as 17 further genes from the 2001 human genome article, and find little or no evidence to support claims of horizontal gene transfer (HGT).

A recent study by Crisp et al. [1] re-examined a claim, originally made in the landmark 2001 human genome paper, that bacteria had horizontally transferred 223 genes into a vertebrate ancestor of humans [2]. That claim was refuted soon after the original report [3, 4]. Using an alignment-based scoring scheme, the study by Crisp et al. [1] reported that 145 human genes, including 17 of those from the 2001 study, had been horizontally transferred from distant species. Here, I describe a re-analysis of these 17 genes and of the 28 highest-confidence genes newly claimed by Crisp et al. [1] to have been horizontally transferred, taking a more skeptical perspective, and find little or no evidence to support claims of horizontal gene transfer (HGT).

Hundreds of eukaryotic genomes and thousands of bacterial genomes have been sequenced in the 15 years since the human genome was published. In their recent report, Crisp et al. [1] argue that, with the availability of this far larger collection of genomes, the likelihood of false HGT findings that are actually the result of gene loss is now greatly reduced. Their reanalysis, which was based on a combination of BLAST searches and phylogenetic trees, identified hundreds of “foreign” genes in animals; this led them to claim that HGT “has occurred on a previously unsuspected scale in metazoans” and that it is a significant factor in animal evolution.

In this study, I re-examined the claims of Crisp et al. [1] focusing on the human genes. Instead of using a large-scale, automated analysis, which by its very nature could enrich the results for artifactual findings, I looked at each human gene individually to determine whether the evidence is sufficient to support the conclusion that HGT occurred. An important principal here is that extraordinary claims require extraordinary evidence: there is no doubt that the vast majority of human genes owe their presence in the human genome to the normal process of inheritance by vertical descent. Thus, if other, more mundane processes can explain the alignments of a human gene sequence, these explanations are far more likely than HGT.

For my re-analysis, I re-aligned the 17 human genes that were originally reported as having undergone bacterial-vertebrate transfer (BVT), a finding that has been rejected by our work [3] and that of others [4, 5], but re-claimed by Crisp et al. [1] (Table 1). I found that the evidence does not support HGT for any of them. (One important point worth noting here is that Crisp et al. listed some of these genes as “confirmed” by Salzberg et al. [3]. This was not the case; our previous study invalidated most of the previously claimed HGT events, but was not able to dismiss all of them. Our study made it clear that we did not consider the presence of the remaining genes to be the result of HGT events.) Crisp et al. [1] reported a total of 145 human genes that they claimed to be the result of HGT; 39 of these are labeled in their highest confidence group, class A. Of these 39, seven are included in the first group of 17, leaving 32 newly claimed HGT events. I examined these 32 class A genes (Table 2) and again find no evidence for HGT. A detailed, gene-by-gene description of these analyses can be found in Additional file 1 and the sequences of the genes in Tables 1 and 2 can be found in Additional file 2.

The HGT index, defined by Crisp et al. [1] as the difference in the best bitscore of a BLAST match to a non-metazoan and a metazoan species, is shown along with the bitscore of the best metazoan match. The best metazoan match excluded any matches to the phylum Chordata for these human genes. All of the genes in this table were reported by Crisp et al. [1] as high-confidence (class A) HGT. The recomputed HGT index (last column) is computed by subtracting the bitscore of best non-metazoan found by Crisp et al. [1] from that of the best non-chordate metazoans found by the new searches reported here. “No hits” means that no significant alignments were found to any non-chordate metazoans

aFor PRAME family members 1, 6, and 15, the protist alignment found by Crisp et al. [1] is a false positive caused by contamination. See main text for details

Of the 17 genes from the original human genome paper that Crisp et al. [1] claim are true examples of HGT, my analysis finds that 12 genes fail to pass the authors' own BLAST-based test for HGT, because their closest metazoan match has a bitscore that is greater than the best non-metazoan match (Table 1). Of the 28 genes representing new claims of HGT (Table 2), 26 fail the initial screen for HGT candidates, either because they fail the original BLAST bitscore test, because they represent contaminants in draft genomes, or because they are known mitochondrial or retrotransposed genes. The remaining seven genes (five from Table 1 and two from Table 2) include three close paralogs (HAS1–3) and thus represent four hypothesized HGT events. A combination of gene loss and evolutionary rate variation is more than adequate to explain these genes: among other reasons, the alignments and bitscores are the result of screening more than 20,000 human genes, and one might expect a few genes from this large set to be lost (or to have evolved slightly more rapidly) in the non-chordate genomes.

One reason that better BLAST results were found in the current study could well be that this study used data from May 2016, whereas Crisp et al.'s study used data from January 2013. A large number of additional genomes have been deposited in public archives during the three years between the two analyses. These species were not available to the previous study and thus the orthologous genes from these taxa were missed. Insofar as this explanation is correct, it strengthens the argument for gene loss as the explanation for the (very few) human genes that still have better BLAST matches in non-metazoans than in non-chordate metazoans.

Another factor is that because only non-chordates are considered, the alignments and bitscores between a human gene and these very distant relatives are necessarily quite weak. This distant relationship makes it more likely that some genes will not be found simply because the sequence has diverged too much for a pairwise alignment to detect it.

This study focuses only on human genes, but recent claims of high levels of HGT in other animals have also been reported. The most dramatic claim was the recent report that up to one-sixth of the genes in the tardigrade (Hypsibius dujardini) had been laterally transferred from other species [6], but that claim was quickly shown to be a false result due primarily to contamination of the genome assembly [7]. In Crisp et al. [1], contamination seems to be a likely explanation for the three human genes (PRAME family members 1, 6, and 15) reported as high-confidence HGT events, and a closer scrutiny of other automatically identified HGT candidates might reveal other cases. (Contamination has been reported to create false signals of HGT as far back as 2002 [8].) My re-examination here suggests that HGT is very rare rather than widespread in vertebrate genomes, and that every hypothesized HGT event needs to be subjected to careful scrutiny.

As we wrote in 2001 [3], “the argument for lateral gene transfer is essentially a statistical one, necessarily so because of the inherent impossibility of observing events that may have occurred in the distant past”. When searching a large set of genes against an even larger database, one must recognize that such large-scale, automated searches will inevitably find unusual results that include genes that were lost or evolved more rapidly in multiple lineages. Because HGT is such an unlikely event, the results of automated searches should be subjected to individual, close scrutiny with an eye toward explaining them through more mundane processes before concluding that these anomalies represent novel biological discoveries. As demonstrated here, a re-analysis using the latest genome databases shows that other than the well-known mitochondrial genome transfer and retrovirus-mediated events, no genes have been horizontally transferred into the human genome.

Ensembl identifiers for all genes proposed as examples of HGT were obtained from Crisp et al. [1] and validated by retrieving them from the Ensembl database (www.ensembl.org). Genomes and protein sequences were obtained from the National Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov) and UniProt (www.uniprot.org). Protein sequences were aligned individually using the blastp program and the non-redundant protein database, nr, available through the BLAST server at NCBI (https://blast.ncbi.nih.gov) or for direct download from the same source. To aid analysis, searches were run against the entire database and again with the phylum Chordata (taxon 7711) excluded from the results, which did not affect bitscores.

Acknowledgements

This work was supported in part by the US National Institutes of Health under grants R01-GM083873 and R01-HG006677.

Competing interests

The author declares that he has no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.