Post subject: can someone dumb down how we use DNA analysis to distinguish

Posted: March 15th, 2012, 11:17 am

Joined: June 8th, 2010, 12:54 pmPosts: 259Location: Colorado

Sorry, i ran out of available characters in the subject line...

Can someone dumb down how we use dna analysis to determine if members of a population deserve their own species status? I only have basic bio education and i do remember learning that there were different ways of determining if a population represented a distinct species. The one i was always comfortable with was the sexual reproduction one, whereby if they could mate and produce viable offspring then they were the same species. I realize this is dated now and hell, we even know that individuals of different genera can mate and produce fertile young, but still i sorta cling to that definition.

My question stems from this whole "cryptic species" phenomena... such as that leopard frog in the NYC area. I know that they used DNA analysis to determine it was in fact a separate species, but how exactly do they determine that from DNA? Do they look for some random gene mutations and say, "this population has the mutation but this one doesn't so they're distinct"... i doubt it. Seems as arbitrary and meaningless as looking at something morphological. Do they say, hey their genes are X percent different? Do they say, if the populations split more than 10,000 yrs ago we consider them distinct? I'm clueless and hoping someone can dumb down how exactly DNA analysis reveals that population A and population B belong to separate species...

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 15th, 2012, 12:00 pm

Joined: June 7th, 2010, 7:36 amPosts: 549Location: Sydney, Australia

I'm sure someone more knowledgeable on the subject will chime in, but this:

Quote:

Do they look for some random gene mutations and say, "this population has the mutation but this one doesn't so they're distinct"... i doubt it. Seems as arbitrary and meaningless as looking at something morphological. Do they say, hey their genes are X percent different? Do they say, if the populations split more than 10,000 yrs ago we consider them distinct? I'm clueless and hoping someone can dumb down how exactly DNA analysis reveals that population A and population B belong to separate species...

...is the general gist of how it works. They look at the allelic differences at specific sections of genetic code to determine the rates at which different single-loci mutations sort among populations. The reason this is theoretically better than morphological characters is because genetic characters are the actual biological components transmitted among individuals during reproduction. Therefore, more closely-related individuals should share more genetic code than less-closely related individuals. I'm not exactly sure how they statistically set cut-offs between species. In contrast, morphological characters may vary tremendously even if the determining genes are identical- some of this is due to plasticity within the environment (ie gene expression varies in different developmental environments, etc.). In addition, multiple alleles at a given locus might code for identical traits, which is one of the reasons why you can have cryptic species- they look identical, but genetically they are the different- the genes just code for similar traits.

One of the issues that is getting better is choice of DNA sampling. Many of these studies have used mitochondrial DNA in the past, which is only transmitted from mothers to offspring. Therefore, it only represents half of the reproductive assortment of genes that can occur. More recently, microsatellite markers have been used. These are typically random tandem repeats of genetic code. They tend to be neutral, so selection isn't acting heavily on them, but they sort well with relatedness- the same types of markers are used to establish paternity in humans.

That's pretty dumbed down from a eco-physiologist. I'll defer to the opinions of the some of the phylogenetics folks, especially if I screwed anything up.

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 15th, 2012, 12:46 pm

Joined: June 8th, 2010, 12:54 pmPosts: 259Location: Colorado

Ok Van... as you might suspect, that did help a lot, but also opened up more questions. I definitely get what you're saying about how observing the allelic differences is a more accurate measure of how closely related two specimens are, and also how in certain cases, morphological differences can be misleading and not be a reflection of genetic differences.

Quote:

I'm not exactly sure how they statistically set cut-offs between species

- This part was a major question of mine... so that basically tells me that it's a judgement call on the part of the observer to determine at what point enough mutations warrants species distinction. So doesn't that get messy and lead to disagreements? Lets say a group is researching these markers and determines that population A and population B are distinct in that the markers are present in one and not the other... can't some other researchers come along and say, "not so fast fuckfaces... those don't constitute ENOUGH of a difference to count them as different species!" Also, if it were left up to the observer to determine when enough is enough to warrant species distinction... wouldnt there be cases where the genetic diversity WITHIN one species was greater than that of another situation where two populations were considered separate species?

Also, is there any way to tell how long the markers being observed have been present in one population and not the other? What if if happened twenty years ago?

Also also, do they always know if those mutations represent any real difference in the animals morphology, behavior, sexual behaviors ect...? I mean, so what if they find mutations in one population and not the other... what if it was recent and doesnt really affect anything? should that mean they are not the same species?

For me, unless they are on independent evolutionary trajectories... that is, they wont ever be mating again and exchanging genes, they could always potentially exchange genes and so on. Like if you had a population on an island somewhere and it had been sexually isolated, but at some point a mainland individual rafted out there or visa versa.

Sorry... i know i'm completely all over the place with this. I apologize for that. Im just having a hard time wrapping my mind around it all.

Joe, excellent question, although not an easy answer. The truth is, there is no single way that scientists delimit species boundaries. There are new computer programs (Bayesian species delimination) out now that "objectively" identify species given some genetic data and a phylogeny. However these are very new and still controversial especially among the "old school" morphologists.

The reality is that biologists have been arguing over what a species is since the time of Darwin, even earlier. Some folks subscribe to the biological species concept, others to the phylogenetic species concept, etc. The phylogenetic species concept (evolutionary species concept) defines species as independently evolving monophyletic groups. The biological species concept focuses on whether different populations can interbreed. I think there are over 30 different definitions of "species", but those two concepts are the most popular. The result is that taxonomy is constantly changing as species are split apart and lumped together as new data comes in. Especially when one scientist will sequence, lets say mtDNA, and another group will come in and sequence nDNA later. Yes scientists argue about this stuff all the time. Yet this is a very exciting time to be studying taxonomy as the technology advances, especially now that it is becoming easier to sequence whole genomes.

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 15th, 2012, 2:50 pm

Joined: April 23rd, 2011, 6:49 pmPosts: 248

I have the same questions as Joe. I am quite skeptical on the current genetic testing for species differentiation. Why is it even logical to use sections of DNA that have less selection or are "slowly evolving"? If there is little change to a section of DNA like microsattelites, then maybe it's function isn't for speciation but essential life functions like metabolism. The idea of knowing when a species splits by measuring allelic mutations seems arrogant and naive to me. Speciation can happen in just a couple generations. Has anyone read The Beak of the Finch? Polyploidy can create new species in plants but that's not even a mutation and can happen in one generation.Using tiny sections of DNA to determine the evolutionary history of an organism seems no more likely to me than guessing the entire ongoings of a day in a given place by seeing a single photograph.Please someone correct and/or educate me because this has bothered me for a couple years now.

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 15th, 2012, 5:36 pm

Joined: June 7th, 2010, 7:36 amPosts: 549Location: Sydney, Australia

Quote:

- This part was a major question of mine... so that basically tells me that it's a judgement call on the part of the observer to determine at what point enough mutations warrants species distinction. So doesn't that get messy and lead to disagreements? Lets say a group is researching these markers and determines that population A and population B are distinct in that the markers are present in one and not the other... can't some other researchers come along and say, "not so fast fuckfaces... those don't constitute ENOUGH of a difference to count them as different species!" Also, if it were left up to the observer to determine when enough is enough to warrant species distinction... wouldnt there be cases where the genetic diversity WITHIN one species was greater than that of another situation where two populations were considered separate species?

Again, I'm not a phylogeneticist, but what I can tell you from other biological fields is that there are agreed-upon rules for statistical analysis, like to say that two populations differ in a given trait (let's say limb length, or something like that) the statistical comparison of the average has to show that the probability that trait A (ie from species A) appears in species B must be less than 5%. I don't know how the phylogenetic stats work, but I'd assume there are similar agreed-upon rules.

Quote:

Also, is there any way to tell how long the markers being observed have been present in one population and not the other? What if if happened twenty years ago?

Sometimes- there are molecular "clocks" that can be used to estimate divergence rates, but we're usually talking on the order of millions of years. Not sure what time necessarily has to do with the question of speciation- if two species are defined to be separated, does it matter if it happened a million years ago or 10 years ago?

Quote:

Also also, do they always know if those mutations represent any real difference in the animals morphology, behavior, sexual behaviors ect...? I mean, so what if they find mutations in one population and not the other... what if it was recent and doesnt really affect anything? should that mean they are not the same species?

Most of the genes they actually sample are ones that evolve slowly and may not face strong selection. Good examples of this are the genes that are often used in mitochondrial DNA analysis, like cytochrome B. Cytochrome B is a gene that is found in all eukaryotes and many prokaryotes, and is therefore nearly universal to life. It is critical component of mitochondrial DNA that has been around since the beginning of life. The reason why investigators focus on these genes is because selection does not act strongly on them, and they could (statistically) be some of the least likely to change, given how canalized they are. Highly labile genes that face higher selection pressures could actually change within populations that have not yet speciated- this is how different populations of a single species adapt to their local conditions. These genes can be terrible markers for speciation because they can actually be quite different within a single species that has interbreeding populations that are each adapted to their own environment (look up "ring" species as an example).

Quote:

For me, unless they are on independent evolutionary trajectories... that is, they wont ever be mating again and exchanging genes, they could always potentially exchange genes and so on. Like if you had a population on an island somewhere and it had been sexually isolated, but at some point a mainland individual rafted out there or visa versa.

This is one of the difficult things to assess. Boundaries to breeding can be pre-zygotic (ie non-overlapping geographic ranges or habitat selection, different breeding behaviors/times of year, incompatible genitalia or gametes, etc.) or post-zygotic (failed embryonic development, hybrid sterility, etc.). The trick is that species can have pre-zygotic constraints on hybridization without having post-zygotic constraints. An example is the hybridization between california kingsnakes and cornsnakes. These species (or populations) are clearly separated by geography, behavior, etc., but if you trick them into mating you can still get viable offspring. The bottomline is that there are few hard and fast rules for defining species, and animals are actually much much easier than plants.

Quote:

Why is it even logical to use sections of DNA that have less selection or are "slowly evolving"? If there is little change to a section of DNA like microsattelites, then maybe it's function isn't for speciation but essential life functions like metabolism. The idea of knowing when a species splits by measuring allelic mutations seems arrogant and naive to me. Speciation can happen in just a couple generations. Has anyone read The Beak of the Finch?

See my point to Joe above- microsatellites are good markers exactly because they are not rapidly evolving. Think of it from a paternity perspective- if microsatellites evolved within generations, then there would be no way to determine if a child and father were related. From a species perspective, the same holds true- analyzing rapidly evolving genes would have you splitting species every time you found the slightest difference in genetic structure due to local selective pressures. This would be akin to splitting snake species based purely on minute differences in their color patterns- these are highly labile traits that don't necessarily indicate relatedness. The genes that are highly canalized across species (like Cyt B) or are essentially neutral and random (like microsats) are likely to be the very last things to change during speciation, and should actually provide the most conservative estimates of separation between populations. Furthermore, when these genes change, it will likely be due to genetic drift (ie random drift across essentially neutral alleles) rather than selection, and the point is that if two populations are not interbreeding enough (if that makes sense) that genetic drift can act independently on each, then they are likely biologically separated either by pre- or post-zygotic means, and should be termed species.

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 15th, 2012, 8:12 pm

Joined: June 8th, 2010, 6:05 amPosts: 223

This is a rather complicated topic, and one of much contention among experts in the field. As Squam mentioned, there isn't even universal agreement on exactly what a species is, and some will argue that species are not even real, but rather constructs we make to help us cope with and organize biodiversity. I don't agree with the argument that species are entirely constructs, there is some reality to species, but the process of speciation is not instantaneous. That is, in most cases, it takes time for lineages to diverge sufficiently to be considered distinct species. It also can take a lot of time for some of the criteria utilized to define species to evolve. I think many (certainly not all, but many) people tend to think of species under what De Quieroz termed the 'unified species concept.' Basically, all these criteria used by various species concepts (reproductive isolation, diagnostic characters, monophyly (all members of one species should be more closely related to each other than to any other species), etc.) are traits that evolve as lineages diverge, but as these all take varying amounts of time, they don't necessarily arise at speciation. Rather, what is important is that species are independently evolving lineages. Of course, it's rather difficult to determine if two lineages are in fact evolving independently. Most agree that in most situations, if two populations are completely reproductively isolated, and any offspring between the two species are always completely sterile, they are two different species. However, many people also admit that in some or many cases, valid, distinct species are capable of producing fertile offspring, because reproductive isolation often takes time to evolve. Even Ernst Mayr, who originally proposed the biological species concept (i.e. species are entities that are reproductively isolated) did admit in more than a few places that good animal species certainly hybridize in some cases. Anyway, the point is that there isn't really universal agreement even on what species are, much less on how best to identify them.

OK, so although what a species is isn't even generally agreed on, why use DNA as opposed to morphology. Well first of all, I do think that in many cases morphology is extremely informative and useful. However, as Van mentioned, one difficulty with morphology is plasticity, traits affected by the environment. Plastic traits are interesting, but not useful for examining phylogenetic relationships or identifying species because they do not track history (since they are affected by the environment). Another important consideration is independence of characters. I was recently at a talk by a mammalogist who discussed scoring >100 characters for a single molar. When scoring that many characters for a single tooth, clearly some of those are affected by the same gene (or genes), which can bias analyses attempting to delimit species or identify how species are related by weighting that change more. Further, a single gene change, such as in a regulatory gene (e.g. a gene that controls how much of some protein is produced) can cause drastic changes in morphology, particularly if that change alters the rate or timing of development of some body part. So morphology has its limitations in possible plasticity, and possible drastic morphological changes resulting from limited genetic changes. Additionally, morphology is impacted by selection, as selection affects the phenotype of an organism, what it physically looks like, but doesn't directly affect a change at the DNA sequence level. That is, if there is a mutation that changes an A to a C in the DNA sequence, selection might act on some phenotypic change caused by that mutation in the DNA, even if it's as simple as the mutation changed one amino acid in a protein that made it slightly less efficient (granted in this case it would be extremely weak selection), but selection doesn't act on that change at the DNA level directly. So selection can also cause some difficulties with morphology, particularly since it's difficult to determine exactly what the selection is, and how it's impacting things. DNA has it's benefits in that it isn't affected by the environment (and thus is not plastic, even if the expression of genes is), and, in most cases, the change in, say, the first base of a stretch of DNA doesn't affect a change in the next base. So, changes in different bases in a stretch of DNA are independent.

So, how can we use DNA to identify species? This is the million dollar question (OK, not really, but it is a big question), and it's actually a major focus of my dissertation research. Often people use some divergence threshold. So, if two populations are X% divergent, they are different species. However, firstly, selecting X is rather arbitrary, and is a source of possible dissent. Also, the level of divergence is heavily impacted by the population size of the populations. It is also heavily impacted by any gene flow. Granted, some argue that any gene flow is indicative of two populations being the same species, but what if one individual a generation gets confused, mates with an individual of the other species, and produces fertile offspring that also mate with the other species. If hybridization is as rare as once a generation, should they really be considered one species? Probably not, but this can still impact the divergence between populations. Also, due to a process called incomplete lineage sorting, the history of a single short stretch of DNA may or may not reflect the true evolutionary history. I won't get into exactly why this is (unless you want more information), but this is a big reason why studies relying solely on mitochondrial DNA are lacking something, you need stretches of DNA scattered throughout the genome to get a good idea of the evolutionary history. Some analyses used to try to delimit species use multiple loci to try to cluster the individual samples that are most similar at these many loci (this is one analysis they used in the NYC frog paper). However, as I'm sure you can imagine, the clustering can be somewhat hierarchical, so it can be difficult to determine what level of clustering is appropriate. Another analysis often used uses statistical models to estimate both the timing of divergence and level of gene flow. Low or no gene flow is suggestive of distinct species, as is a really old divergence. However, it's difficult to give some level of gene flow that is indicative of distinct species. It's also impossible to say that two things must be so many years divergent to be different species, as the time it takes for lineages to speciate varies drastically (depending on what happens in the system, populations can become completely reproductively isolated within a few generations, or they may take millions of years). There are also other statistical models being worked on to try to use DNA sequences from multiple loci to determine if things represent two species or not, like the Bayesian species delimitation Squam mentioned, but we aren't at the point where you can really say do this, then that, and then you'll know if it's one species or two. I think the Bayesian species delimitation is a step in the right direction, and I was super excited when I first saw that paper. However, upon reading the paper, and using the method on some datasets, it's got some issues which would be rather difficult to explain unless you are somewhat versed in Bayesian statistics, so I won't get into those. But suffice it to say, we've got some analyses that can support the hypothesis of distinct species, but no one clear, easy way to an answer (but then, what fun would it be if we could just sequence X loci from Y samples, put it in program Z and have the answer).

Quote:

Why is it even logical to use sections of DNA that have less selection or are "slowly evolving"?

Typically, we try to select regions of DNA that are neutral (i.e. no selection) for several reasons. We have a really good understanding of how sequences evolve under neutrality, so the statistical models we can use are relatively realistic if the DNA is evolving neutrally. Also, while some types of selection don't necessarily 'screw up' phylogenetic analyses, other types of selection can cause huge problems and really mess anything up. Further, exactly how selection is impacting the system can be really difficult to robustly estimate. So in general, if what you're interested in is how things are related, neutral stretches of DNA are more useful. We don't necessarily target 'slowly evolving' loci, but rather the rate of evolution is more a matter of the question being answered. Van mentioned microsatellites a few times. If what you're interested in is parentage, or something on a really fine scale (i.e. something within the past few decades or hundred years), microsatellites are great. However, on an evolutionary timescale, and relative to mitochondrial DNA or the rest of the nuclear genome, microsatellites evolve super fast (not so fast that a parent and child would have different microsatellite alleles, but on an evolutionary timescale super fast). That's exactly what makes them useful for parentage analyses, but it also makes them generally less useful for higher level things, such as trying to figure out how different species or genera are related. So the type of DNA you target depends on whether you're looking at relatively recent things (such as the past few hundred years), or much older things (such as several million years ago).

OK, that turned far more long-winded than I initially intended, this is a focal area of my research, so if it wasn't so late, and if I don't cut myself off now, I could go on for far longer. But hopefully it helps at least a little bit, I'm happy to do what I can to try to explain this DNA stuff.-Eric

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 16th, 2012, 7:43 am

Joined: June 8th, 2010, 9:06 amPosts: 737Location: Montana

Awesom post. Thanks, Van and Eric, for taking the time to do this. You guys rock. I'd like to add a question or two, if you don't mind. Eric, this is along the lines of something we've discussed before. I suppose the papers on Lampropeltis and members of the genus Pantherophis are good working points, too. I'm not a phylogeneticist, but I have a moderate understanding of the biochemistry and mechanics of evolution. Though my formal education is biological in nature (with an emphasis on ecology, evolution, and biochemistry), my past five years or so have been spent working on the in situ remediation of chlorinated solvent spills. In short, I'm not up on my game with this stuff. It's going to take some CLR to get the rust off of this machine. I'm fully aware that there are some "issues" with identifying and isolating usable nDNA in many Lepidosaurs, so these questions aren't meant as an "attack" on non-nuclear DNA phylogenetic work, but rather a seach for clarification. So, here we go:

1. Couldn't/shouldn't a discordance of mtDNA data (such as that from Cytochrome b) with other available data (geographical, morphological, ecological, and even other biochemical data) just as easily signal why it should not be relyed upon so heavily as signal the "wrong-ness" of all other data? (Yes, I know "not all data are created equal") The example given in question 4 would apply here, too, though I could dig up other examples if needed.

2. Also, couldn't/shouldn't geographic regions where more than one mtDNA "lineage" (as detrmined by Cytochrome b analysis, for example) is observed be considered direct evidence that the lineages are NOT evolving independently (a concept akin to an intergrade zone between "subspecies" [a concept I'm aware is fraught with problems])? A decent example of this is given in question 3.

3. What about where sampling near the Cytochome b lineage "boundaries", for example, is scant or non-existant? (An example I'm thinking of here is the relationship between spendida and holbrooki in the Lampropeltis getula complex: http://www.naherpetology.org/pdf_files/1302.pdf).

4. How about when members of the same population don't cluster togther in an mtDNA phylogeny? Couldn't/Shouldn't that signal that the "preferred tree" ought not to be preferred? (an example here is http://www.naherpetology.org/pdf_files/711.pdf). I guess my question boils down to what is the null in in these situations? Shouldn't the default be to reject the discordant or incomplete data set?

Thanks in advance for taking the time to answer - it's truly appreciated.

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 16th, 2012, 7:57 am

Joined: June 8th, 2010, 12:54 pmPosts: 259Location: Colorado

Cole Grover wrote:

Thanks, Van and Eric, for taking the time to do this. You guys rock. ..

Indeed! I sincerely appreciate you guys taking the time to explain this and in a way even i can understand. I read Van's and Eric's replies several times each, and it's really helped to give me an idea of how this process is employed and what the strengths and weaknesses are. Thanks for your post too Squam, reptiluvr and Cole. I'm going to chew on this a bit more and i'll get back to you guys with more questions as they come up.

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 16th, 2012, 9:07 am

Joined: August 26th, 2010, 9:56 amPosts: 81

And thanks to you too Joe. I'm sure your question has helped hundreds of others, including myself, start to understand the current scientific method being used for all the rather recent name changes. Art

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 16th, 2012, 10:45 am

Joined: April 23rd, 2011, 6:49 pmPosts: 248

Eric: I really like your lengthy post. Just enough to not go over my head!I am still unclear to the usefulness of neutral or low-selection DNA segments. I could probably write this more precisely but it boils down to this: how can one be sure or even begin to identify a segment that DOES delineate species? If a group of individuals mutates a gene controlling metabolism so that one geno/phenotype is better at or prefers digesting different foods, is that a new species in development? For example, I have a thought that skinks, which I found out can be morphologically similar on a global scale, could potentially have speciated on the premis of diet. Our sympatric Eumeces of the southeast are so morphologically similar that maybe dietary niches created distinction. But with similar habitat choice and geographic range and unreliable scale counts as identifiers, is that enough to call them distinct species?

First off I would like to say that I agree with just about everything ritt wrote, that was a very good explanation of a very complex topic, thanks for breaking it down in a clear way without dumbing it down.

Cole Grover wrote:

1. Couldn't/shouldn't a discordance of mtDNA data (such as that from Cytochrome b) with other available data (geographical, morphological, ecological, and even other biochemical data) just as easily signal why it should not be relyed upon so heavily as signal the "wrong-ness" of all other data? (Yes, I know "not all data are created equal") The example given in question 4 would apply here, too, though I could dig up other examples if needed.

Yes, these days I think people are leaning away from relying on cytb (or any single gene), instead phylogeneticists now tend to rely on multiple nuclear loci scattered around the genome. This is because the appearance of any one gene tree (and mtDNA is essentially one linked locus without recombination) doesn't perfectly match the actual phylogeny or "species tree" due to stochastic factors like incomplete lineage sorting (in other words when an ancestral population has a lot of genetic variation, the gene tree might not match the true history). 10 years ago people relied much more on mtDNA because there were few published primers for nDNA in most non-model organisms. Nowadays with genomics and next-generation sequencing it is becoming easy to sequence multiple nDNA loci in almost any animal.

Quote:

2. Also, couldn't/shouldn't geographic regions where more than one mtDNA "lineage" (as detrmined by Cytochrome b analysis, for example) is observed be considered direct evidence that the lineages are NOT evolving independently (a concept akin to an intergrade zone between "subspecies" [a concept I'm aware is fraught with problems])? A decent example of this is given in question 3.

3. What about where sampling near the Cytochome b lineage "boundaries", for example, is scant or non-existant? (An example I'm thinking of here is the relationship between spendida and holbrooki in the Lampropeltis getula complex: http://www.naherpetology.org/pdf_files/1302.pdf).

When it comes to hybrid zones, I agree with you it is also important to get nDNA data. MtDNA is maternally inherited and only represents female dispersal. The nDNA has an effective population size 4x that of mtDNA and thus may tell a different story. Using both data types, as well as morphology, ecology is the best way to go IMO for hybrid zones. Dense geographic sampling is also needed. Unfortunately many studies are lacking sufficient sampling at hybrid zones, for a variety of reasons.

An excellent example of a a recent hybrid zone study that used nDNA, mtDNA, morphology, and ecology on the Ensatina contact zone in San Diego County was published recently by Deavitt et al. BMC Evolutionary Biology 2011, 11:24

Quote:

4. How about when members of the same population don't cluster togther in an mtDNA phylogeny? Couldn't/Shouldn't that signal that the "preferred tree" ought not to be preferred? (an example here is http://www.naherpetology.org/pdf_files/711.pdf). I guess my question boils down to what is the null in in these situations? Shouldn't the default be to reject the discordant or incomplete data set?

This might imply there are cryptic species, or a lot of genetic diversity that you are studying. Again I would prefer to get nDNA data as well.

I don't mean to knock mtDNA, it contains a lot of useful information, but it has its limits as well. I think it is best used in conjunction with nDNA, ecology, geography, morphology, etc... the more lines of evidence you have, the more compelling your argument. It is important to remember that species relationships and definitions are basically hypotheses that can change as more data comes in.

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 16th, 2012, 1:12 pm

Joined: June 8th, 2010, 9:06 amPosts: 737Location: Montana

Squam8,

Thanks for the reply. Your answers are pretty much what I was expecting, so perhaps I'm not as rusty as I thought. I'm also not knocking mitochondrial DNA analyses - they're chock full of information about matrilineal descent. It is frustrating at times, though, to see good data mis-used and abused. LOL

Cryptic species are an infinitely interesting subject. It'll be fascinating to see how many more are discovered and which ones hold up under scrutiny as our technologies improve.

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 18th, 2012, 5:11 am

Joined: June 8th, 2010, 6:05 amPosts: 223

Excellent responses, Squam. And excellent questions and comments everyone. It's great to have a mature, respectful, and thought-provoking debate on here.

To expand a bit on Squam's responses, in addition to the greater availability of mtDNA primers than nDNA primers, each cell has numerous copies of the mitochondrial genome, but only a single copy of the nuclear genome. This greater abundance of mtDNA makes it a lot easier to isolate and sequence. But, methods have improved, more nuclear primers are available, so it's getting a lot easier to sequence nuclear DNA. Also, new, 'next-generation' sequencing technologies are now available that allow us to sequence millions of fragments of DNA simultaneously, and get far more data than was possible using previous sequencing technologies (these next-gen technologies are also a huge part of why whole genome sequences are becoming available for more species).

mtDNA does typically do a pretty good job, in part because it evolves relatively rapidly, so you can get a well resolved tree, but also because it usually isn't experiencing the types of selection that can really screw up phylogenetic analyses. For example, say two closely related species (but NOT sister species, other species are more closely related to both) both experience directional selection for increased body size. If you use a gene involved in that increased body size to try to estimate the phylogeny, they may well have similar mutations due to similar selection for larger body size. Thus, its possible that you recover an inaccurate phylogeny with those two larger species as sister taxa because they experiences similar selective pressures for larger body size. There are also some genes, such as Major Histocompatibility Complex (MHC), where there is strong 'balancing selection' - selection for variability (MHC is involved in immune response, so individuals with lots of different MHC alleles are better able to fight off infection than individuals with a bunch of copies of the same MHC alleles). Balancing selection can also really screw up analyses, and result in a big comb phylogeny with little support for relationships. The patterns caused by balancing or direction selection really screw with analyses, and it can be really difficult to detect them and account for them, hence why genes that are either selectively neutral or under weak purifying selection (i.e. mutations that would change the protein being coded for are selected against) tend to be a lot better for estimating phylogeny - we can model how neutral sequence evolves really well, and other kinds of selection can really mess things up. Anyhow, getting back to mtDNA, because it doesn't typically experience the types of selection that can screw up phylogenies. That being said, it's still a single locus, a single estimate of the relationships, so it's got the issues of incomplete lineage sorting that Squam explained. There are also cases, not super common, but it does happen that mitochondrial genomes introgress from one species to another following a hybridization event. Again, this movement of mitochondrial genomes from one species to another is relatively uncommon, so it's definitely not something that should be latched onto without strong additional data, such as from multiple nuclear loci, but it is something that happens. So, we're moving more and more away from mtDNA only studies, and relying more and more on multiple loci, and using methods that account for the possible differences between gene trees to estimate the species-level phylogeny (i.e. the 'species tree' methods Squam mentioned).

Anyway, when you do have discordance between, say, the mtDNA and morphology (such as between mtDNA and coloration in the eastern Pantherophis), I think it depends heavily what the other data is. If it's a character that likely evolves fairly rapidly and is likely to change a lot, like coloration, I'd rely more heavily on the mtDNA, but if it's something that is unlikely to change much, such as the structure of a bone or the place where some muscle attached, discordance with mtDNA would probably raise my eyebrows and get me thinking in more detail about what's going on. So, for example, the eastern Pantherophis, their color pattern probably evolves fairly rapidly. I say this for several reasons. Firstly, much of the range of the 'Black rats' was covered in ice 12,000 years ago, so they probably weren't in that area until relatively recently. It's also easy to hypothesize on the selective advantage of darker coloration further north, and how this may have evolved independently in the three Pantherophis lineages. The amount of black varies a lot among individual black rats. Down south, how dark individuals are also varies quite a bit within populations. So the amount of black is likely controlled by several genes, individuals with a lot of 'black' alleles at all these genes might be very dark, whereas individuals with less 'black' alleles may be lighter in color. Given that in the south you still can have some darker individuals, it's possible that all the alleles necessary for a solid black rat snake are present, but some may be rare, and all may be pretty uncommon, so you might never have an individual with all 'black' alleles to result in a solid black snake, even if the alleles necessary for a solid black snake are in the gene pool in the south. As the rat snakes migrated northward following glacial cycles, potentially three independent routes based off the mtDNA data (east of the Appalachians, between the Appalachians and the Mississippi River, and west of the Mississippi River) there could be a selective advantage for darker individuals in the cooler climate further north, because the darker individuals could thermoregulate more efficiently and warm up quicker on a cool spring morning. So, as they moved north, the alleles responsible for black snakes could increase in frequency to the point where it's uncommon to have lighter snakes further north, hence the discordance between coloration and mtDNA. Note that this scenario doesn't invoke any mutations, it's possible that black ratsnake coloration could evolve simply from changing the frequency of alleles already in the population of non-black rats, so it's not unlikely that this scenario could happen three times in the different lineages. It's still possible that some mutations also occurred that further increased the amount of black pigment further north, but this is mainly a possible explaination as to why mtDNA should outweigh some characters that may evolve rapidly.

Quote:

3. What about where sampling near the Cytochome b lineage "boundaries", for example, is scant or non-existant? (An example I'm thinking of here is the relationship between spendida and holbrooki in the Lampropeltis getula complex: http://www.naherpetology.org/pdf_files/1302.pdf ).

Sampling gaps are always a pain, and it's difficult to tell if you've got a gradient from one mtDNA lineage to another, or if you'll end up with a nice clear boundary between the two with better sampling. It's certainly a sign that more sampling is necessary to tease apart what exactly is happening, but when you have two well supported clades that are as deeply divergent as in the case of the getula complex (I didn't see a table with the divergence in the getula paper, but they estimate, using a fossil calibration, that they diverged about 4 Mya, so they're pretty divergent, probably something on the order of 6-10%), it's more likely that there with better sampling, you'll see a relatively clean boundary between the two, possibly a hybrid zone, or even a nice clean boundary with no evidence of hybridization. When they're that divergent, I find it unlikely that it would be a gradient from one mtDNA lineage to the other, and I think most evidence from squamates supports this. Now if it were a more shallow divergence, say 2-4%, then I'd strongly question any inferences on what's happening there. Any sampling gap between clades is definitely a sign that more work is needed, but I don't think it's always a reason to refrain from make inferences about the history of the populations that are sampled. In the case of the getula paper, that is a pretty deep divergence, so the two lineages were almost certainly isolated for awhile, and I don't think it's super premature to split them up based on the sampling gap between splendida and holbrooki (although I would really have preferred seeing a more thorough morphological examination in the paper slitting up the complex). But, more work, better sampling and more loci is definitely necessary to determine exactly where the boundary between the two species is, and whether or not the two are hybridizing.

Quote:

4. How about when members of the same population don't cluster togther in an mtDNA phylogeny? Couldn't/Shouldn't that signal that the "preferred tree" ought not to be preferred? (an example here is http://www.naherpetology.org/pdf_files/711.pdf ). I guess my question boils down to what is the null in in these situations? Shouldn't the default be to reject the discordant or incomplete data set?

I don't think multiple mtDNA lineages in a single population necessarily means the lineages are not evolving independently, or that the mtDNA tree should be rejected. A big thing to consider first is how easy it is for the individuals to move around. If it's something like a bird that can move around easily, the second mtDNA lineage may represent migrants that may or may not be breeding witht the 'local' mtDNA lineage. If it's something like a snake that probably isn't moving long distance, it may represent secondary contact. So consider a wide ranging species. Suppose the climate changes, such that this species is isolated into two smaller pockets of habitat, and individuals can't move back and forth between the two patches of habitat. Those two populations can diverge due to local selection or even just random chance (genetic drift), resulting in the divergent mtDNA lineages. Perhaps they remain isolated long enough that they speciate, or perhaps not and the mtDNA just diverges. If the climate then shifts again, such that both populations expand and meet again, that meeting following the period of isolation is called secondary contact, and could result in both mtDNA lineages where the two populations meet. What happens in those populations depends on all sorts of factors, time in isolation, if barriers to reproduction evolved, etc. They may be completely reproductively isolated cryptic species if the barriers to reproduction are strong enough. Or if there are some barriers to reproduction, but they are not complete, the two lineages may hybridize and form a stable hybrid zone. Or, if there are no barriers to reproduction, the two might theoretically fuse into a single species. Testing among these is impossible with mtDNA alone as it is maternally inherited, but with numerous nuclear loci it is possible, and in any of these cases you could end up with multiple mtDNA lineages in a single population. When such a pattern is observed, it's sort of a signal that there something going on there, not necessarily that they are not independently evolving, and not necessarily that the mtDNA is screwy, it means that more data and intense sampling is necessary to tease apart what actually is happening.

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 21st, 2012, 10:59 am

Joined: June 7th, 2010, 2:23 pmPosts: 104

I haven't wandered into FHF much recently, but since this is a topic of interest to me...

ritt wrote:

mtDNA does typically do a pretty good job, in part because it evolves relatively rapidly, so you can get a well resolved tree, but also because it usually isn't experiencing the types of selection that can really screw up phylogenetic analyses. For example, say two closely related species (but NOT sister species, other species are more closely related to both) both experience directional selection for increased body size. If you use a gene involved in that increased body size to try to estimate the phylogeny, they may well have similar mutations due to similar selection for larger body size. Thus, its possible that you recover an inaccurate phylogeny with those two larger species as sister taxa because they experiences similar selective pressures for larger body size. There are also some genes, such as Major Histocompatibility Complex (MHC), where there is strong 'balancing selection' - selection for variability (MHC is involved in immune response, so individuals with lots of different MHC alleles are better able to fight off infection than individuals with a bunch of copies of the same MHC alleles). Balancing selection can also really screw up analyses, and result in a big comb phylogeny with little support for relationships. The patterns caused by balancing or direction selection really screw with analyses, and it can be really difficult to detect them and account for them, hence why genes that are either selectively neutral or under weak purifying selection (i.e. mutations that would change the protein being coded for are selected against) tend to be a lot better for estimating phylogeny - we can model how neutral sequence evolves really well, and other kinds of selection can really mess things up. Anyhow, getting back to mtDNA, because it doesn't typically experience the types of selection that can screw up phylogenies. That being said, it's still a single locus, a single estimate of the relationships, so it's got the issues of incomplete lineage sorting that Squam explained.

I disagree. Most of the commonly used mitochondrial markers are coding loci and are unquestionably under selection... and it isn't just directional or diversifying selection that can cause problems, purifying selection can also screw up phylogenetic analyses. AFAIK, few people--if any--make an effort in phylogenetic studies to quantify how strong that selection is, so I don't think saying that it is typically weak and therefore unproblematic is warranted. Purifying selection's main effect is to limit the available character space--most of the possible sequences are not available because they would not yield a functioning protein. As available character space goes down, the likelihood of homoplasy--and thus poor phylogenetic inference--increases. The inverse is true for mutation rates; as mutation rates go up, the likelihood of homoplasy increases. Commonly-used mtDNA markers have both small available character space and high mutation rates. Among sufficiently closely related terminals, this is probably not a significant problem; for deeper nodes, it's a big problem.

Quote:

There are also cases, not super common, but it does happen that mitochondrial genomes introgress from one species to another following a hybridization event. Again, this movement of mitochondrial genomes from one species to another is relatively uncommon, so it's definitely not something that should be latched onto without strong additional data, such as from multiple nuclear loci, but it is something that happens. So, we're moving more and more away from mtDNA only studies, and relying more and more on multiple loci, and using methods that account for the possible differences between gene trees to estimate the species-level phylogeny (i.e. the 'species tree' methods Squam mentioned).

I don't think this is nearly as rare as people think it is. It isn't reported too frequently--but then, datasets involving strong nuclear and mitochondrial topologies that would allow us to see this kind of anomalous pattern in mtDNA are still uncommon (and downright rare in the herp world), so the safest statement would probably be that we know it occurs but do not have enough data to allow us to confidently infer that it is either rare or common.

Quote:

Anyway, when you do have discordance between, say, the mtDNA and morphology (such as between mtDNA and coloration in the eastern Pantherophis), I think it depends heavily what the other data is. If it's a character that likely evolves fairly rapidly and is likely to change a lot, like coloration, I'd rely more heavily on the mtDNA, but if it's something that is unlikely to change much, such as the structure of a bone or the place where some muscle attached, discordance with mtDNA would probably raise my eyebrows and get me thinking in more detail about what's going on. So, for example, the eastern Pantherophis, their color pattern probably evolves fairly rapidly.

OTOH, mtDNA also evolves rapidly; that isn't really a difference here. AFAICT, the main unique problem of rapidly evolving morphological characters is: generally smaller available character space (purifying selection dramatically reduces character space compared to randomly evolving loci, but even if variation is largely limited to third codons we're still talking about a much greater range of possible sequences than, e.g., the 4-5 basic kinds of coloration in Pantherophis obsoletus). When directional selection is likely (which does seem to be the case in Pantherophis obsoletus) this is an additional problem, as directional selection (at least when observed in the phenotype; it's less clear when we're talking about genotypes, as there may be a whole slew of different possible mutations that can lead independently to a single observed change in phenotype, making directional selection on a phenotype may be much less likely to result in homoplasy in nucleotide sequences) is more likely to actively create homoplasy rather than merely increasing the likelihood of chance homoplasy.

Quote:

Sampling gaps are always a pain, and it's difficult to tell if you've got a gradient from one mtDNA lineage to another, or if you'll end up with a nice clear boundary between the two with better sampling. It's certainly a sign that more sampling is necessary to tease apart what exactly is happening, but when you have two well supported clades that are as deeply divergent as in the case of the getula complex (I didn't see a table with the divergence in the getula paper, but they estimate, using a fossil calibration, that they diverged about 4 Mya, so they're pretty divergent, probably something on the order of 6-10%), it's more likely that there with better sampling, you'll see a relatively clean boundary between the two, possibly a hybrid zone, or even a nice clean boundary with no evidence of hybridization. When they're that divergent, I find it unlikely that it would be a gradient from one mtDNA lineage to the other, and I think most evidence from squamates supports this.

There are two cases (references at the end of this post) I know of in which people have looked closely at contact zones after putative species were defined based on mtDNA topologies. In both cases there were not nice clean divisions between the putative species, but instead mtDNA haplotypes from each species were extensively mixed within populations. I haven't been reading the recent herp literature much, so maybe there are now cases in which those mtDNA boundaries do hold up on closer examination... but the only studies I've seen indicate the opposite.

It's also not entirely clear that we should expect the degree of introgression in contact zones to scale with mtDNA divergence. MtDNA sequence variation at the commonly-used markers would not, in itself, be expected to have any influence on reproductive isolation. We have to assume that it is correlated with other changes in the genome that do influence reproductive isolation. That seems like a reasonable guess, but I'm not sure it's much more than that or that we can justify a particular percent-divergence cutoff.

Directly addressing a couple of Cole's questions:

Cole Grover wrote:

1. Couldn't/shouldn't a discordance of mtDNA data (such as that from Cytochrome b) with other available data (geographical, morphological, ecological, and even other biochemical data) just as easily signal why it should not be relyed upon so heavily as signal the "wrong-ness" of all other data? (Yes, I know "not all data are created equal") The example given in question 4 would apply here, too, though I could dig up other examples if needed.

Discordance doesn't necessarily mean any particular source of information is "wrong"... but when dealing with very closely-related individuals, it does suggest that these individuals all belong to the same species. Independent sorting of different characters is what we should expect within a species. It should not (at least, not without various mitigating circumstances--e.g., incomplete lineage sorting) be occurring in a set of distinct species, and all else equal should be taken as evidence against the hypothesis that we are in fact looking at distinct species.

Quote:

4. How about when members of the same population don't cluster togther in an mtDNA phylogeny? Couldn't/Shouldn't that signal that the "preferred tree" ought not to be preferred? (an example here is http://www.naherpetology.org/pdf_files/711.pdf ). I guess my question boils down to what is the null in in these situations? Shouldn't the default be to reject the discordant or incomplete data set?

My answer here is about the same as above--if members of the same population contain divergent mtDNA haplotypes, that is not evidence that there is something wrong with the mtDNA tree. All else equal, our assumption should be that the mtDNA topology is correctly telling us to reject the hypothesis that we're dealing with distinct species rather than variation within a single species.

The authors of that paper basically get it right:

Bryson et al. wrote:

Our results suggest that the current recognition of L. mexicana and L. triangulum may be incongruent with the evolutionary history of these two groups.

Probably Lampropeltis alterna and Lampropeltis ruthveni should be included in that statement (which may have been their intent; it's not entirely clear in a quick reading if they mean just Lampropeltis mexicana in the narrow sense, or the L. mexicana group, including L. alterna and L. ruthveni). However, the authors are understandably (although, for mtDNA studies, unusually) reluctant to make taxonomic changes from a single locus topology and give the standard "more research is needed":

Quote:

Further research using nuclear markers to assess gene flow among these lineages will be necessary to determine if the currently recognized taxa do represent species and if the mtDNA data are indeed in error.

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 21st, 2012, 12:06 pm

Joined: June 7th, 2010, 2:23 pmPosts: 104

Squam8 wrote:

Joe, excellent question, although not an easy answer. The truth is, there is no single way that scientists delimit species boundaries. There are new computer programs (Bayesian species delimination) out now that "objectively" identify species given some genetic data and a phylogeny. However these are very new and still controversial especially among the "old school" morphologists.

In that particular case, Bayesian species delimitation ("BSD") should be controversial to anyone who's read the paper (citation below; unfortunately, the paper is a bit dense & confusing!). The problem is twofold:

1) The results of BSD analysis (specifically, how many species it tells you there are) depend on estimates of a couple of parameters (population size & age of a common ancestral population for the individuals included in the study) for which we simply do not have good estimates. So... Leaché & Fujita guessed.

2) The results of BSD analysis also depend on a guide tree fed into the method. The short version is this: you produce a tree of inferred relationships among all individuals. Then you lump individuals together into groups and simplify the tree to represent only the inferred relationships between those groups. How that lumping process works in the Leaché & Fujita paper varies between different analyses they conducted, from fairly objective & straightforward (each site is treated as a group, yielding 10 groups) to "why on earth would you want to do this?" (running the data through a separate program, Structurama, to infer the number of species and relationships between them, and then feeding those inferences into BSD, which apparently results in BSD confirming the results you already got from Structurama... in this case, BSD doesn't tell you anything new, but just spits back whatever you already told it). The second, baffling process is what Leaché & Fujita decided to go with in naming putative species.

The short version is that what you get out of BSD analysis depends largely on what you put into it. Some of that stuff you put into it is guesswork, some of it comes from baffling and rather circular methodology. To a large extent this is due to the basic oddity of Bayesian statistics. In any Bayesian analysis, the results you get back depend on your prior estimates (or guesses) of what's going on. That's not a bug, it's a feature--it's what this approach to statistics is explicitly designed and intended to do. Bayesian statistics is intended to address the following kind of question: I think I already know what's going on here; should I change my opinion based on these data? In this context, that isn't the right question.

(There are also sampling issues in the Leaché & Fujita paper. The number of populations per putative species is very low--either one or three. That problem doesn't bear on the applicability of the method, but would be a reason for caution in accepting the results even if we were sure the method itself was fine.)

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 21st, 2012, 2:36 pm

Joined: June 8th, 2010, 9:06 amPosts: 737Location: Montana

Eric, Squam8, Patrick - thanks for the responses. This thread is killer.

Patrick,

paalexan wrote:

Cole Grover wrote:1. Couldn't/shouldn't a discordance of mtDNA data (such as that from Cytochrome b) with other available data (geographical, morphological, ecological, and even other biochemical data) just as easily signal why it should not be relyed upon so heavily as signal the "wrong-ness" of all other data? (Yes, I know "not all data are created equal") The example given in question 4 would apply here, too, though I could dig up other examples if needed.

Discordance doesn't necessarily mean any particular source of information is "wrong"... but when dealing with very closely-related individuals, it does suggest that these individuals all belong to the same species. Independent sorting of different characters is what we should expect within a species. It should not (at least, not without various mitigating circumstances--e.g., incomplete lineage sorting) be occurring in a set of distinct species, and all else equal should be taken as evidence against the hypothesis that we are in fact looking at distinct species.

I think I'm on the same page with you here - what I was trying to say is that a discordance between mtDNA and "other data" doesn't necessarily mean that the "other data" are wrong (which has been an often-arrived-at conclusion in many of these recent herp phylogeny papers).

Post subject: Re: can someone dumb down how we use DNA analysis to disting

Posted: March 21st, 2012, 2:55 pm

Joined: June 7th, 2010, 2:23 pmPosts: 104

Cole Grover wrote:

I think I'm on the same page with you here - what I was trying to say is that a discordance between mtDNA and "other data" doesn't necessarily mean that the "other data" are wrong (which has been an often-arrived-at conclusion in many of these recent herp phylogeny papers).

Yeah, absolutely. Conflict between datasets doesn't mean any of them are wrong, and the existence of that kind of conflict is an additional source of biologically meaningful information.

To determine that you're looking at distinct species, what you really want are independent and congruent sources of information.