Opinion: What Is the Human Genome?

The human genome that researchers sequenced at the turn of the century doesn’t really exist as we know it.

By Ken Weiss | August 17, 2012

cosmin4000, Istockphoto

The Human Genome project sequenced “the human genome” and is widely credited with setting in motion the most exciting era of fundamental new scientific discovery since Galileo. That’s remarkable, because in important ways “the human genome” that we have labeled as such doesn't actually exist.

Plato essentially asserted that things like chairs and dogs, which we observe in this physical world, and even concepts like virtues, are but imperfect representations or instances of some ideal that exists, but not in the material world. Such a Platonic ideal is “the human genome,” a sequence of about 3 billion nucleotides arrayed across a linear scale of position from the start of chromosome 1 to the end of the sex chromosomes. Whether it was obtained from one person or several has so far been shrouded in secrecy for bioethical reasons, but it makes no real difference. What we call the human genome sequence is really just a reference: it cannot account for all the variability that exists in the species, just like no single dog on earth, real or imagined, can fully incorporate all the variability in the characteristics of dogs.

Nor is the human genome we have a “'normal” genome. What would it mean to be “normal” for the nucleotide at position 1,234,547 on chromosome 11? All we know is that the donor(s) had no identified disease when bled for the cause, but sooner or later some disease will arise. Essentially all available whole genome sequences show potentially disease-producing variants, even including nonfunctional genes, in donors who were unaffected at the time.

Furthermore, the current reference genome sequence is haploid, which means that even if it were compiled from just one donor, the single reference sequence does not report the variation at millions of nucleotide positions between the donor’s two copies (except for X and Y) that we know exist. I understand that the DNA template is being resequenced, to be reported as a diploid sequence, which is progress. Hopefully this will be done in a way that produces phased sequence, in which each chromosome is reported separately, rather than just identifying the two alleles at each variable site along the genome without specifying on which chromosome it lies. Only the former format will represent sequences as they actually exist in the sequenced person, identifying which alleles go together on a chromosome, and are thus linked evolutionarily.

Even so the reference human genome will keep changing! Corrections and refinements of problematic regions that are technically difficult to sequence are made, though nobody claims it will ever identify 100 percent of the 3.1 billion nucleotides without mistakes. But forgetting such minor errors, if such a diploid sequence were obtained from a single person, rather than a composite of several, one might think we finally have an actual set of sequences rather than a non-existent Platonic ideal. That would then be like the authorized type specimens of real plants and animals in museums.

Of course, biologists realize that it’s only a reference sequence, and they think of each of us carrying “copies” of the human genome referent, with some variants of that sequence. But even that idea is wrong. Calling them copies would be Platonic, as if our individual sequences came directly, if imperfectly, from this ideal as their shared template. More accurately, we should use a term like “instance” rather than copy. But a fundamental point is that the resemblance among instances is not due to descent from a single ideal, but for the evolutionaryreason that they are homologous, that is, are from a chain of descent from the gene’s common ancestor. Homology is not the manifestation of an ideal, because the original ancestral instance really did exist.

Biologists take advantage of this fundamental fact of life when inferring ancestral sequences from the observed variation in today’s populations. One might suggest that instead of a rather arbitrary reference sequence from some donor, “the human genome” sequence should be this inferred ancestral sequence. But that doesn’t work either. The ancestral sequence for human genes usually goes back far beyond the origin of humans, and the ancestral sequences for each gene will have existed in vastly different times, places, individuals, and species. The intervening noncoding sequences between such genes, which is generally less constrained by natural selection, vary so much that we often can’t really guess their ancestral state. Further, genes have been rearranged among the chromosomes over time, so that gene A and B that are chromosomal neighbors in human genomes today may have been on entirely different chromosomes in the past, or vice versa. Finally, the ancestral gene may have been so different from today's that using that as our reference would not serve the biomedical research community from a functional point of view.

The same is true to a lesser extent even among modern human genomes: in addition to single nucleotide variation, millions of bits of DNA large and small have been deleted, inserted, inverted, or rearranged in every human genome instance. This variation, and the variation that will continue to accrue in the human population, distances us from any single reference sequence even further.

Reference sets?

If a single reference sequence, even the ancestral sequence that really did exist, is problematic, could there be a better way? An appealing possibility is to use a set of DNA sequences, perhaps all knowninstances, to characterize human genomes. Instead of a single string, suppose we represented each part in the format of what is known as a gene or sequence "logo." Here is an example:

This shows the relative frequency of each nucleotide at every point along the sequence. One would have to add a way to visualize insertions and deletions and so on, but computer technologies should be up to that task. If “The Human Genomes” sequence was portrayed in this way, we might replace our arbitrary type-specimen with more natural, biologically accurate, population thinking. Efforts are under way to create a biological reference along these lines.

Of course, a reference like this would have to be constantly updated, and still could not keep up with the changing frequencies at each position as people die and babies are born all the time. But there’s a more important and even deeper problem—with Platonic implications. Every time an individual cell divides, new mutations arise; no two cells even within any individual have the identical sequence. Because of this somaticmutation, the single sequence obtained from each individual is an imperfect representation even of that person’s genome. We can never know the variants in each of his/her billions of cells.

Coming to terms with Plato

We routinely use an arbitrary reference and/or ancestral sequences in our daily research. We develop phylogenies, and identify variation responsible for traits, including disease. We comparatively consult arbitrary references for humans and mice to design experiments that work only because of our evolutionary relationship. As limited human beings, we cannot grasp everything in our heads, and representations and reference guidelines are immensely useful.

In fact, in many ways, the human reference genome is an ideal, but not in the way Plato had envisioned ideal. In a deep and interesting way, he had things backward. His idea was that we are only able to see imperfect images, of ideals that really have some separate existence. But actually, the ideals are neural constructs built inside our very material heads, and it is they that are imperfect representations of the actual world, not the other way round as Plato had it.

Thus, while any human reference genome may be far from perfect, it’s what we have to work with today, and it helps shed light on all aspects of human biology. Representations are fundamental to science. The danger is if we don't understand them and they become misrepresentations.

Ken Weiss is a geneticist and evolutionary biologist at Penn State University. A fuller discussion of these points is available at The Mermaid’s Tale, a blog to which Weiss is a contributor.

We currently have not only a human reference genome but also a model of how to compare it to the genomes of other species. Within the confines of that model, we know that the epigenetic effects of nutrient chemicals and pheromones cause changes in intracellular signaling and stochastic gene expression. The changes in gene expression allow sensory input from the environment to cause speciation. We also know that olfaction and odor receptors provide a clear evolutionary trail that can be followed from unicellular organisms to insects to humans.

Clearly, the genetic predisposition of the first living cell allowed the required receptor-mediated events for food acquisition and stochastic gene expression, which are linked directly to the de novo production of additional chemical (i.e., odor) receptors. Those receptors are unequivocally required for adaptive evolution via ecological, social, neurogenic, and socio-cognitive niche construction.

If the molecular biology that is common to all species did not ensure that the epigenetic effects of nutrient chemicals and pheromones altered gene expression in every species we would have limited genetic variation instead of the variation that is obviously required to link us -- via adaptive evolution -- to the origins of the human reference genome. That is the representation we now have to work with. What we observe is genetic diversity that can be sniffed out, even if it cannot be seen.

As Decartes probably meant: I think I need to eat; therefore I am. And I am tired of misrepresentations and constructs that do not incorporate the molecular biology that is common to all species. The human reference genome did not suddenly materialize so that it could be sequenced. Did it?

Speaking with a member of the 1000 Genomes Project analysis team a week ago about this topic, there is active work on refining the human reference sequence, along with an active discussion of whether there should be separate ethnicity reference sequences, which makes sense.

Ken, it's way worse than that. The genome that was sequenced is a partial one and we know it. We didn't sequence any chromosome in the region near the centromere because it was too hard. That's about 5%-10% of the genome. Is that area inactive? Probably not. We depend on BLAST to be right, but know that there are parts of the genome where it is most certainly wrong. And since the first sequencings, all further sequences use the baselines as templates to "speed up assembly".

In other words, the genome we have is wrong and we know it. And we are using the first, known to be incorrect version to "correct" and as a scaffolding for everything after it.

But the "story" of completion is just too good, too compelling, to tell the truth. Wouldn't be so good for grant writers would it? And there was that infamous competition too.

Unfortunately, this is not the case for Aboriginal or Native Americans. The US government is using blood quantum to determine how much 'Indian blood' a person has and this determines how much Native American rights can be accessed.

Just imagine if this method was used to determine how much 'American blood' every American person has that has to be linked to the people who came over with Christopher Columbus and this then determined the American rights you can access.

As a First Nations person, I have nothing to hide, but it has never been in my best interest to allow the government to take control.

Then kindly go ahead Dan and get it catalogued. There are people who have nothing to hide and yet they know people, with fascist tendencies (that too has a DNA profile), will use it against them. Once you label something, true or untrue, as DNA, or at the DNA level, trying to beat it is almost impossible. That way you can destroy from within. I would love to see genes like yours catalogued and analyzed. I'd like to see what a Nazi-like (of whatever religion or non-religion) looks like at the DNA level. People who love domination, control, lies and organized hurtful behaviors should be catalogued. Kindly go ahead and catalogue your DNA.