I saw this cited in another SDMB thread and it made me curious. This site says,

Quote:

humans and chimps share 99.4 percent of their DNA, the molecule that codes for life.

Pardon my skepticisim, but I would have an easier time believing this if a chimpanzee said it. I have heard this figure cited before, and it has always made me wonder whether (a) it is perfectly clear but doesn't make sense to me because genetics is just not my strong suit or (b) it actually has a basis in fact but it has been watered down so much for the hoi polloi that it has lost most of its meaning. I, for one, have never shared any of my DNA with a chimpanzee.

Does the 99.4% figure imply that humans share 100% of their DNA with other humans? I thought that the difference between identical and fraternal twins was that identical twins shared 100% of their DNA with each other whereas fraternal twins shared no more than average siblings. How much of their DNA would two random humans be expected to have in common? 99.9%?

I have a distinct and slightly Kafkaesque feeling that there are some words missing here. Do people mean that chimps and humans could share 99.4% of their DNA? Does the fact that we and the chimps have diffferent numbers of chromosomes factor in at all? Or do I have a completely whacked definition of DNA, thinking incorrectly that DNA is actual genetic information when actually it just contains genetic information?

If my questions aren't clear enough, maybe you can complete these sentences:
A. Identical twins are two individuals that share 100% of their ____.
B. Two members of the same species share 100% of their ____.

I think you may be misunderstanding the perspective of how much is in the genome and how much potential for variety there Chimps may seem more different than the slim margin you quote but when you compare humans to sea urchins it looks a little different.

Why would the fact that chimps and humans share 99.4% of their DNA automatically imply that 2 humans share 100% of their DNA? There are a HUGE number of nucleotide gbase pairs in human DNA, so two people could share 99.9999999% of their DNA and still differ by several hundred genes.

Originally posted by Q.E.D. Why would the fact that chimps and humans share 99.4% of their DNA automatically imply that 2 humans share 100% of their DNA? There are a HUGE number of nucleotide gbase pairs in human DNA, so two people could share 99.9999999% of their DNA and still differ by several hundred genes.

There are an estimated 3 billion base pairs in the human genome, so using Q.E.D.'s percentage gives 3 base pairs different - got carried away with the 9s, I think. Still, the point is correct - two genomes can be very, very similar as a percentage, but still differ in many ways.

Of course, a certain amount of those base pairs don't appear to code for any specific protiens. They may have more subtle purposes, or they may be `noise' introduced by our evolutionary past. Which makes those base pairs that do determine protiens all the more important, making variations in them more significant.

But even sharing `junk' is significant from an evolutionary standpoint. Perhaps even more significant than sharing `non-junk,' given that `junk' wouldn't feel much evolutionary pressure to change over time.

1) To say that humans are 99.9% genetically identical to one another indicates that they differ on average by one out of every 1000 base pairs (not genes) in the genome. (I'm not going to go into here the differences in percentages between coding and non-coding - so-called "junK" DNA. Suffice it to say that coding regions, being constrained by natural selection, show less variability than non-coding regions.)

2) The average gene consists of 2000-3000 base pairs, so one may expect about 2-3 differences per gene. However, some differences in base pairs are "synonomous," or redundant, and have no effect at all on the gene product produced. Others may result in a difference in the amino acid sequence in the enzyme produced, but have no effect on the actual enzyme function.

3) The upshot of all this is, that a 0.1% difference in base pairs will result in (very approximately) a 10% difference in gene products; that is, one out of every 10 genes will be functionally different in some way.

4) It should be evident from this, that a 0.6% difference between chimps and humans will produce a correspondingly greater difference in gene products.

Originally posted by Colibri
2) The average gene consists of 2000-3000 base pairs, so one may expect about 2-3 differences per gene. However, some differences in base pairs are "synonomous," or redundant, and have no effect at all on the gene product produced. Others may result in a difference in the amino acid sequence in the enzyme produced, but have no effect on the actual enzyme function.

3) The upshot of all this is, that a 0.1% difference in base pairs will result in (very approximately) a 10% difference in gene products; that is, one out of every 10 genes will be functionally different in some way.

Colibri: Can you clarify the math in #3? Are you makeing an estimate based on the fact that even though every gene is likely to have some base pair differences, only about 10% will have base pair difference that exhibit themselves in the phenotype? Is the 10% number your guess, or is it generally accepted by geneticists? Any actually data to confirm the number?

BTW, I have referenced your explanation (back in my "99.9% thread") quite frequently in the last few weeks. It's been very helpful in the GD forum even. Not sure if you visit there often.

Originally posted by John Mace Colibri: Can you clarify the math in #3? Are you makeing an estimate based on the fact that even though every gene is likely to have some base pair differences, only about 10% will have base pair difference that exhibit themselves in the phenotype? Is the 10% number your guess, or is it generally accepted by geneticists? Any actually data to confirm the number?

I'll refer you to this article that I cited in the other thread. The figure is based on both empirical observations and calculations based on observed substitution rates.

Quote:

Classical studies of human variation have explored the manifestation of genetic expression — either as antigenic or charge differences in soluble proteins. Variation has been assessed in two ways — either by the proportion of polymorphic proteins or the average gene diversity (heterozygosity expected under random mating). The first comprehensive human study, carried out by Harry Harris, showed that about 30% of human proteins are polymorphic and the average human gene is heterozygous no more than 10% of the time. These data suggest that, at the protein level, two alleles differ at no more than 0.1 substitutions and an estimated 25% of these result in charge differences.

And:

Quote:

A typical human cDNA is about 3 kb, with approximately 1 kb of synonymous sites and 2 kb of nonsynonymous sites, and expected gene diversities of 33% and 25% at synonymous and nonsynonymous sites, respectively. At the protein level, synonymous changes go undetected and only a fraction (25% of the total or 38% of nonsynonymous sites) of the remainder would lead to charge differences, giving an expected protein heterozygosity rate of 38% x 25%, or 9.4% — similar to the value of 10% observed by Harris.

Bolding mine.

Quote:

Originally posted by John Mace It's been very helpful in the GD forum even. Not sure if you visit there often.

I sometimes read the threads but rarely post. If I started to get involved in debates I'd never get anything done.

Actually, I just submitted a paper to Journal of Biological Chemistry that touches on this. His estimation is excessive. In general, looking at codon to amino acid correspondences, a broad cross section of coding sequences (a gene is really more than just the protein coding sequence) could come up to about 50% "silent mutation" tolerance. Thus, roughly 50% of the basepair mutants would have a different amino acid sequence.

HOWEVER, unless the amino acid sequence is the "phenotype", it should also be noted that a "non-silent" mutation at the amino acid sequence level could still end up being functionally identical at the folded protein level.

Why would the fact that chimps and humans share 99.4% of their DNA automatically imply that 2 humans share 100% of their DNA? There are a HUGE number of nucleotide gbase pairs in human DNA, so two people could share 99.9999999% of their DNA and still differ by several hundred genes.