In 2006 I aligned some intronic sequences obtained by Margarita Beltrán and Kanchon Dasmahapatra from Heliconius erato and melpomene group species, together with Laparus and Neruda. They suggest that Laparus and especially Neruda may have basal sequences at this locus, which was interesting at the time.

However, the power was very low for such low amounts of sequence data, and there was not very strong support for the assertion that Laparus and Neruda are basal to Heliconius. This remained unpublished.

In view of current genomics alignment discussions I thought it could be useful to share these early low-resolution attempts with you all permanently here.

Although butterflies are often thought to be very visual organisms, they also rely heavily on chemical information from their environment. Chemical information can be crucial in important decisions such as choosing a potential mate, or when a female chooses where to lay her eggs. Recent work in Heliconius has characterized some of the compounds used during mating .

Led by Adriana Briscoe, genomic approaches have been used to identify and characterise the genes involved in chemical detection. Odorant-binding proteins (OBPs), chemosensory proteins (CSPs) and odorant receptors (ORs) are families of genes known to play an important role in chemosensation. These are large gene families whose diversity is hypothesized to reflect ecological niches occupied by different insects. To explore how ecological divergence between moths and butterflies might be reflected in these groups of proteins we have comprehensively surveyed OBPs, CSPs, and ORs in H. melpomene as well as B. mori and D. plexippus. The gene predictions published in the genome paper are available here.

Ongoing work is now underway to study another important family of proteins, the Gustatory Receptors.

This is to announce that there will be a meeting of the Consortium on July 12-13 in Cambridge Mass. organised by Jim Mallet. Please contact Jim if you would like to attend.

This will be the first meeting since we published the genome paper last year, and the goal is to start the ball rolling for the future – in particular which genomes should we sequence next, how will we deal with databasing and data sharing issues in the era of multiple genomes, and in general to promote discussion about ongoing projects in all the Heliconius labs.

This morning there was a great article in the NY Times about the history of mimicry research and recent discoveries in Heliconius by Sean Carroll, the evo-devo guru. A great pat on the back for the whole Heliconius community!

In the last few years, the Heliconius community has generated a large quantity of next generation sequencing data, mostly from Illumina sequencers. One of the first steps in the analysis of this data is the alignment of short reads to reference sequences, whether BAC sequences, transcriptomes or whole genomes. There are over 70 read alignment tools available (Fonseca 2012), which has been thought to be too many. Which one(s) should we use for Heliconius butterfly data?

Several popular, fast aligners such as BWA and Bowtie are known to work well on human genomes. But Heliconius genomes, radiating over around 10 million years, are considerably more divergent than human genomes. In July 2011, I ran some aligner tests to see what the impact of divergence was on aligner performance. In those tests, Stampy performed considerably better than other aligners. Since then, we have mostly used Stampy for read alignment, notably for (I think) all of the alignments in the Heliconius genome + introgression paper.

Stampy has a serious drawback, which is that it is extremely slow compared to almost all other aligners – four or five times slower than Bowtie. This has been very inconvenient in the past and is only getting worse as data set size increases. I am about to analyse 1.8 billion reads from 70 whole genomes. Aligning this amount of data with Stampy will take many weeks even over many tens of compute cores. There have been several new aligner tools released in the time since I did the last test, so I thought it was time for another comparison.

Be warned, this is a very basic comparison done over a few days, with many flaws. If it was good enough to publish properly I would do so. It really only gets the ball rolling again, but I don’t really have the data to come to any firm conclusions, just to outline what the problems are.

Sequence Data

Three whole genome sequences were used, one from a Heliconius melpomene melpomene x Heliconius melpomene rosina cross, one from Heliconius ethilla (09-49) and one from Heliconius erato erato (NCS_2556). These three genomes should be increasingly divergent from the Heliconius melpomene melopmene reference genome (see tree at Tree of Life). Each sample has 4m paired end 100bp HiSeq reads or just over. This is only about 3 times coverage of the genome, so we don’t expect to see every base covered or very accurate SNPs called. But using such limited data means the aligners can be tested relatively quickly. (I leave for another day the serious issue of what we might expect from an alignment of H. erato and H. melpomene, and how much sense it makes to align erato or silvaniform clade genomes to the H. melpomene reference, given their somewhat large divergence.)

Analysis

1. Align with mostly default options.

2. Convert to sorted BAM with readgroups using Picard AddOrReplaceReadGroups.

4. Run GATK UnifiedGenotyper to call reference and SNP bases. Use -out_mode EMIT_ALL_CONFIDENT_SITES, -stand_call_conf=1 and -stand_emit_conf=1; the samples used are very low coverage so good SNPs are likely to have qualities lower than the default threshold of 30. 1 is too low, but it may reveal weaknesses in the aligners.

5. Hack coverage and SNPs at various different qualities using one line Perl scripts.

This analysis is OK as far as it goes, but there are several problems with it.

1. It should be possible to improve aligner performance by tuning parameters. I did try a handful of alternative running modes (Bowtie 2’s end-to-end and local, for example) but have not searched numerical parameter spaces at all (see comment below on BWA).

2. Reporting what GATK makes of an alignment does not necessarily reflect the quality of the alignment (see Mihaela Pertea and Steven Salzberg’s takedown of a similar comparison).

3. Mapping quality scores and things like ‘properly paired’ counts are assigned by the aligners themselves and may not reflect actual quality; it would be possible to write an aligner that dumped reads randomly but assigned maximum mapping qualities, which would come out quite well on many of the following metrics (although it might be expected that GATK would call a large quantity of spurious SNPs, of which more later).

4. Performance times are unlikely to be accurate. All jobs were run on a 48-core server (4x 12-core 2.2GHz AMD Opteron 6174s) with 514 Mb RAM. Alignment jobs were run in parallel on 36 cores (for all jobs except bwa sampe, which can only be run on a single core). The server is quiet, but it is quite possible that other jobs or IO bottlenecks slowed down the alignment jobs, which were only run once. Or input data may have been loaded into RAM for later alignments. Still, I expect the orders of magnitude to be correct.

5. There are many other metrics one can think of to assess aligners, many of which have been used in other aligner comparisons, but I don’t like simulated data, and we do not have a well-validated set of real reads with known positions to test against at present.

Aligners

With over 70 aligners available, which to choose? I have, of course, been far from comprehensive. I really just wanted to try Bowtie 2, but I did attempt a mildly systematic search for other potentially useful aligners. I used the HTS Mappers page and limited myself to DNA aligners that process Illumina data, are quality aware, handle paired ends and run multithreaded. Note that I probably ignored several good mappers by doing this; for example, I would have excluded Stampy because it is listed as not being multithreaded, as this feature has only recently been added. I then excluded several aligners because previously published data showed they were likely to be inferior to other aligners (for example, the SeqAlto paper shows SNAP to be very fast but perform poorly against most other aligners, particularly on divergent data, so it was unlikely to be useful for Heliconius data). I also ignored a few because they appeared to be abandoned or were commercial (CLC, Novoalign – Novoalign tests in July 2011 showed it to perform worse than Stampy for our purposes, as also shown in the Stampy paper). If I have missed your favourite aligner and you think it is worth testing on divergent data, please do let me know.

BWA is used throughout as a kind of control, as it is what we used to use and what is still widely used by many others (and recommended by GATK for example). It does not perform very well in any test, but perhaps it could be tuned to do much better, as it uses a seed of 32 bases which is very large for divergent data. But as it is often tempting to simply run using defaults, it is worth seeing just how bad the performance can be with divergent genomes.

Results

H. m. melpomene x H. m. rosina read mappings

Most of the aligners have been tested on the H. m. melpomene x H. m. rosina cross individual, as that’s the dataset I’m most interested in now. I took only the best performing aligners on to the H. ethilla and H. erato comparisons, assuming that aligners that perform badly on nearby genomes will do worse on divergent genomes.

[table id=1 /]

Timing notes: the BWA timings show times for the aln step (one for each read) plus the sampe step. Early versions of Stampy could use BWA in a hybrid mode, rapidly mapping easy reads with BWA and then mapping the remaining unmapped reads with Stampy. Recent versions can now take any BAM file as input for this hybrid mapping, so any aligner can be used in this way. The X + Stampy rows show this hybrid mapping, with the time for aligner X given first and the subsequent Stampy run second.

Looking at numbers of reads mapped is not particularly informative, because exact mappers like BWA and Bowtie 2 with the default end-to-end mapping setting (where the whole read has to map) will inevitably do worse than the partial mappers like Stampy and Bowtie 2 with the local mapping setting (where only part of a read has to map). But it does indicate that Bowtie 2 with local and very-sensitive settings and SMALT are both competitive with Stampy, whereas SeqAlto is not, on this data at least. It also shows that small improvements can be achieved by running Stampy in hybrid mode with any other aligner, but the performance improvement is not that great. Sadly, running Stampy in fast mode achieved nothing like the 50% speed up the manual claims, with slightly degraded results. Both Stampy runs lag way, way behind Bowtie 2, SMALT and SeqAlto; it will make life much easier if we can convince ourselves that one of these aligners is more accurate.

H. ethilla and H. erato read mappings

Only BWA (as a control), Bowtie 2, SMALT and Stampy were tested on the more divergent genomes. Bowtie 2 was run with the local and verysensitive options as these options did not degrade performance very much in the H. melpomene tests.

[table id=3 /]

[table id=4 /]

Clearly BWA fails horribly on H. erato and does badly on H. ethilla. Bowtie 2, SMALT and Stampy all appear to be competitive, and cannot be separated on these metrics. While SMALT maps more reads, Bowtie 2 has more properly paired. As expected, Stampy’s performance is poor; Bowtie 2 is fastest, but SMALT is acceptable.

How many bases mapped?

As mentioned above, counting number of reads mapped is not very informative when partial mapping is allowed. It is much better to count number of bases mapped. The following charts show the total numbers of bases mapped for H. melpomene, H. ethilla and H. erato, broken down by mapping quality as recorded by GATK in the MQ field. Each bin value is a lower bound, so ’30’ means all bases with mapping qualities from 30 to 39.

This chart shows that, with data from similar genomes, all aligners do reasonably well. Actually, quite a lot of bases are not covered across the genome (around 200 million out of 273 million), but that’s presumably because only 4 million reads have been aligned. SMALT actually performs slightly better than Stampy (an extra half a megabase aligned), but any of the aligners would do a good job. Using Stampy in hybrid mode does not beat the SMALT result, except when paired with SMALT, when an additional megabase can be aligned, at the cost of quadrupling the running time.

Also, intriguingly, the chart shows how varied mapping qualities can be. BWA and SMALT don’t call anything over 60, Bowtie 2 doesn’t call anything over 40, but Stampy hands out a large amount of qualities in the 90s. Clearly there is some degree of bullshitting here, if we interpret mapping qualities as true phred scores – a mapping quality of 99 would mean there is a 1 in 10 billion chance of it being wrong. However, I don’t necessarily see a problem with this as long as thresholds are chosen appropriately. For example, in the Heliconius genome paper, we used a threshold of 67 for mapping qualities. This would have excluded all base calls from BWA, but it worked well for Stampy.

This effect also means that it is possible comparisons such as Heng Li’s ROCs are not entirely appropriate; for example, perhaps these curves retain a lot of Stampy bases at Qs of 20 or 30, which Stampy considers to be poor quality bases that should be rejected, and if a higher threshold was used it would produce a much better curve. But then, perhaps it shouldn’t output such high mapping qualities.

For H. ethilla and H. erato, there is a reduction in number of bases aligned as diversity increases, as expected, but Stampy maps several million more bases than any other aligner. Both Bowtie 2 and Smalt do reasonably well, however, and, depending on the analysis, it may be preferable to trade coverage for performance.

How many SNPs mapped?

The variations in a genome are usually our useful data points, rather than the number of bases. The following charts show the number of SNPs called by GATK and their mapping qualities.

Interpreting this chart is not straightforward because we have no way of deciding whether the SNPs are real or not, so it is not clear what is better. The Stampy alignment produces the most SNPs, but if it places a lot of reads incorrectly with highly exaggerated SNP calls, then GATK will call a lot of high quality SNPs from that alignment.

It may help to compare the number of bases mapped to the number of SNPs called. We might expect that the number of SNPs called should shrink or grow proportionally to the number of bases called. However, although Stampy aligns half a million more bases than SMALT, it also produces half a million more SNPs than SMALT, which is considerably more than we might expect proportionally. This makes me suspect that SMALT might be a sensible, conservative alternative here, rather than simply going for the aligner with the most SNPs. But then, maybe BWA would be an even more sensible option, as it maps most of the bases but apparently creates many fewer SNPs, with lower mapping qualities overall.

This indicates that the extra few percent of mapping that Stampy does is actually mostly erroneous, at least for nearby genomes. So to reduce false positives, SMALT appears to be a better option. But this is a test using simulated data by the authors of the tool, so may not be true in the wild (and may be falling foul of the mapping quality exaggeration issue discussed above – perhaps we could easily exclude the erroneous mappings by just raising our mapping quality threshold).

For H. ethilla and H. erato, the number of SNPs called is more in line with the number of bases aligned, and Stampy has a considerable edge over the other aligners, as it did for number of bases mapped. (As we would hope, H. ethilla has more SNPs than H. melpomene. However, H. erato actually has less, presumably because the most divergent regions simply don’t align.)

Discussion

At present, the data above don’t give us enough information to make a choice once and for all. I think I would still prefer to use Stampy over other aligners if I was working with silvaniform clade or erato clade genomes and I had to align them to the H. melpomene genome. But for aligning melpomene clade genomes, I will probably use SMALT for now, and am using SMALT for the current whole genome alignments. It provides the best tradeoff between performance and coverage (and apparently accuracy, but it is too early to confirm that). It means I can get this set of alignments done in a week, rather than two to three weeks with Stampy.

There is obviously a lot more that could be done here, but I wonder if it is worth doing. Obviously if erroneous data is getting into our analyses, that’s a problem, especially if tools are labelling erroneous data with inflated quality scores. But I think I would prefer to figure out better biological filters to apply to the data after alignment than taking several weeks of work to assess aligners in more detail.

However, I would be keen to at least pool opinions on this – should we validate our aligners in more detail? Have we seen any evidence of systematic errors getting into final data sets because of aligner problems? Are there other simple tests we could or should do that would settle the issue?

1) Mallet redefined species to allow hybridization between them! p. 4: The relevance of one third of these, hybrids between H. himera and H. erato, is thrown into question by Brower on the basis that himera was only elevated to a species by Mallet himself, after designing a new species concept to allow hybridization.

An email to Brower queried this criticism: surely himera is separate from erato under most peoples’ ideas of species? His reply, on 14 Dec 2012: “No dispute that the himera x erato specimens are hybrids, or even that the two are different species under your (or my) species concept.” So Brower agrees that these are hybrids between good species, in spite of the snarky suggestion he puts forward.

As this is one of the first points made by Brower in his 2012 critique, one immediately wonders how well the rest of his criticisms will hold up. As we shall see, not very well.

2) Some butterfly collectors were dishonest! p. 4: Older specimens in the database often have poor locality information, and may have been collected by dishonest butterfly-mad people like Anton Fassl, who stole butterfly specimens from the Vienna Museum in 1906. [This criticism could be divided into two, based (i) on faulty locality data, and based (ii) on dishonesty of collectors].

Are hybrid specimens collected, or maybe stolen by dishonest butterfly enthusiasts thereby made less hybrid? I don’t think so.

Brower produces no evidence at all that these older museum hybrids were fraudulently produced in captivity. The example of kleptomania by a butterfly collector in Vienna (as documented by Brower) is no more relevant to the finding of hybridization in the wild than is the dishonesty of bankers or politicians.

A number of key hybrids in the dataset were named as separate species. They were not even recognized as hybrids before the systematics of Heliconius was properly sorted out in the 1960s and 1970s. If these earlier entomologists didn’t know that these rare specimens were hybrids, how could they have known that it was possible to produce such specimens via hybridization in captivity?

3) The rest of the hybrids were also artificially produced! p. 4: For newer specimens (post 1960, according to Brower), “it is not unreasonable to suspect that many of these oddities could have been captive-reared for the butterfly trade, despite [Mallet et al.‘s] assurances to the contrary”.

To obtain hybrids between species in captivity, one must have a number of species flying around as adults in suitable multi-generation breeding facilities for long periods of time. Interspecific hybrids occur in small cultures only extremely rarely.

No such suitable enclosures existed, except in scientific establishments (and there only since the 1960s), prior to the 1980s. I believe the first commercial butterfly house to open world-wide was the London Butterfly House at Syon Park, London, in 1981. It was the brainchild of property magnate Clive Farrell (see Wikipedia for details). Hybrid specimens occurring after the mid-1980s have been reared in captivity. However, this potential problem was clearly outlined in the paper by Mallet et al. (2007), who selected only specimens with bona fide credentials. A number of specimens were excluded as they were clearly laboratory-reared hybrids (See supplementary information in Mallet et al. 2007).

J. Mallet and M. Linares have personally met some of the Colombian collectors of these hybrids (Mallet et al. 2007). Many of the hybrids in Ernesto Schmidt-Mumm’s collection, for example, were collected by Ernesto himself before he died, as he personally described to us. Ernesto never reared any Lepidoptera at all — they were all caught on the wing. At the time he was actively collecting, he was the only butterfly expert capable of distinguishing a hybrid, given that knowledge of the taxonomy of Heliconius was in its infancy in Colombia at the time. Keith Brown also records personally collecting a cydno-melpomene hybrid at Victoria, Caldas, Colombia, where Dr. Schmidt-Mumm had his ranch (Brown & Mielke 1972 p. 10).

A number of hybrids have in fact been collected by Heliconius biologists in the field, in spite of Brower’s contrary assertion. See comments on an ethilla x melpomene hybrid from Peru in (4) below. Mathieu Joron (pers. comm.) and Chris Jiggins (pers. comm.) have also collected hybrids in the wild recently, but only Brower felt that their existence was controversial, so no systematic investigations were felt necessary. I figure below one of these recent hybrids, a melpomene x numata hybrid.

The idea that many or most of these hybrids, collected by so many entomologists in so many different localities, not just in Colombia, but in a variety of locations from Mexico to the extreme South of Brazil, were produced fraudulently is laughable.

Even if some interspecific hybrids documented by Mallet et al. (2007) post-1980 were produced in captivity, this cannot apply to hybrids before this date.

4) Therefore, no hybrids ever occurred in the wild at all! p. 4: Conclusion to this section “In sum, so many of the Mallet et al. [(2007)] records are dubious, at least for H. cydno-H. melpomene ‘hybrids,’ that this dataset must be discounted as convincing evidence of widespread natural hybridization between those species, or more generally for ‘the species boundary as a continuum.’ ”

Even if some of the hybrids were produced in insectaries, this conclusion is absurd. As shown above, Brower’s conclusion is supported by no evidence whatsoever, but is merely a sort of paranoid conspiracy theory.

In the above-mentioned email (dated 14 Dec 2012), Brower agrees to the veracity of a fairly distant hybrid between Heliconius ethilla and H. melpomene netted by field biologists in Peru, and confirmed as an F1 hybrid using molecular genetic tools (Dasmahapatra et al. 2007). However, he doubts this hybrid is fertile. This particular hybrid is similar to another specimen from Colombia named as a new species, Heliconius hippola, by Hewitson in 1867. A number of similar hybrids between Heliconius ethilla and H. melpomene occur elsewhere, and as is usual in the melpomene-silvaniform group of Heliconius, backcross phenotypes also exist from wild collections and form a substantial fraction of existing hybrid specimens. Even if these other supposed wild caught specimens were artificially created in insectaries (for which there is no evidence), these back-cross specimens indicate fertility of F1 individuals.

Furthermore, there is evidence from insectary crosses as distant as this that backcrossing takes place and can be used to transfer colour patterns across the entire melpomene-silvaniform group. For a series of crosses involving Heliconius hecale x atthis x melpomene (with some x cydno), see Jean-Pierre Vesco’s photos in Mallet et al. (2007). See also other examples in Gilbert (2003).

For the sake of argument, let us suppose that ALL the Mallet et al. (2007) wild-caught hybrids were artificially created in captivity by fraudulent dealers. (Again, I reiterate that the idea is laughable, and only Brower would seem to argue for this). Then even so, these specimens still provide excellent evidence for the possibility of hybridization and backcrossing in the field.

5) Allele sharing gives no evidence of introgression! p. 4: “… Explained at least as well by retention of ancestral polymorphism as by recent introgressive hybridization”. “Thus, oft-cited claims of ongoing, evolutionarily significant gene flow between H. cydno and H. melpomene should be viewed with circumspection.”

All of us are well aware of the possibility that ancestral polymorphism may be confused with true genealogical reticulation due to gene flow. This is well recognized in the coalescent-based IM (isolation and migration) algorithm proposed by Jody Hey, Rasmus Nielsen and others to investigate evidence for DNA sequence flow among species. A number of studies on Heliconius have used this methodology on gene fragments revealed by PCR and Sanger sequencing (Bull et al. 2006, Kronforst et al. 2006, Mavárez et al. 2006, Kronforst 2008). These studies have concluded that some, but not all loci do indeed flow among closely related species.

The interpretation of such data is fraught. Apparent genealogical reticulation may result from ancestral polymorphisms, rather than gene flow, and deviations from the very strict assumptions used in the algorithm may indeed give spurious results.

Nonetheless, an inference of gene flow using IM does provide some evidence for gene flow. It is certainly better to use an IM algorithm which acknowledges the possibility that allele sharing may be due to ancestral polymorphism as well as gene flow, than to ignore the possibility altogether!

A typical argument against an IM-based inference of gene flow might be that the locus Mpi, to give a concrete example, is under balancing selection. Therefore related species may inherit multiple balanced alleles from a common ancestor. If so, is there evidence that this is taking place? In the loci studied by Bull et al., Kronforst et al., and Mavárez et al., and inferred to be flowing between species to date, no evidence for such balancing selection is forthcoming. Brower produces no new evidence to refute either the neutrality of variation at these loci, or the inference of gene flow.

C) Heliconius heurippa as a homoploid hybrid species, with red markings coming from H. melpomene, and yellow markings from H. cydno or a related species

7) If the allelic sharing of a new species with its parents isn’t exactly 50%, we must conclude it is not hybrid speciation! p. 5: ” ‘Classic’ homoploid hybrids are expected to exhibit mosaic genomes composed of blocks of DNA from the two parental species.” “Mallet and Jiggins et al., recognizing that H. heurippa’s widespread genomic affinity to H. cydno does not fit that model, have relaxed their concepts of HHS.”

Brower again argues here that certain authors, in the course of proposing a novel hypothesis, have tailored their concepts and definitions to fit the data, and then claimed that the hypothesis is proved. This, in Brower’s view, apparently rules out the inferences made by these authors.

Whatever the truth of this allegation about concepts, all parties must surely agree that what was proposed by Mavárez et al. was not that the genomic constitution of Heliconius heurippa consists of 50% melpomene and 50% cydno. Instead, they suggested that H. heurippa is a cydno-related form with some evidence of introgression from melpomene. The introgressed fraction of the genome includes those regions which determine red patches in the forewing band. Definitions, and the precise distinction between “hybrid speciation” and “hybrid trait speciation” are unimportant in this debate.

Keith Brown, after visiting localities near Villavicencio, Colombia, was the first modern author to recognize H. heurippa as a separate species, and to suggest hybrid origin involving cydno and melpomene. It flies alongside H. melpomene, and Brown was able thereby to refute Emsley’s earlier suggestion that heurippa merely represented an infraspecific hybrid within H. melpomene (Brown & Mielke 1972: 10).

Brower’s argument in this case, as in so many others, is simply a red herring.

8) Red patterns ancestral in H. timareta explain the red markings of H. heurippa, not hybridization! p. 5: H. heurippa is somewhat closer genomically to populations currently designated as H. timareta (many races of which have red markings) than to H. cydno (none of which have red markings) (see Nadeau et al. 2012), and therefore red markings may already be present in the cydno/timareta-like ancestor of H. heurippa.

The whole cydno/timareta group is monophyletic (Heliconius Genome Consortium 2012). However, a monophyletic group of populations, mainly West of the Andes and more Northern, is currently referred to as H. cydno. The other monophyletic group includes H. heurippa and populations Southwards, many of which do indeed have red markings, on the eastern and more Southern slopes of the Andes, which are traditionally referred to as H. timareta.

The nearest populations of H. timareta/cydno to the South of H. heurippa (which occurs near Villavicencio, Meta, Colombia) are the “H. cydno cognate” (with red basal spots on the underside, this is certainly a timareta, and is from Río Pato, Colombia) and the rayed H. t. florencia (occurring at Florencia, Caquetá, Colombia) (Giraldo et al. 2008, Fig. 6). The nearest red-banded form of H. timareta sensu lato is distributed farther South, and was named by Brower himself as a separate species H. tristero (Mocoa, Putumayo, Colombia). Since the adjacent races to the South of H. heurippa, the Río Pato taxon and H. t. florencia have yellow, not red-banded forewings, it is not clear how Brower’s argument would explain the origin of H. heurippa’s red band.

9) Even if hybridization was a likely origin of H. heurippa, there was no reason for it to happen, since H. heurippa is non-mimetic! p. 5: H. heurippa is not a mimic; therefore it had no mimicry reason to acquire red patterns from H. melpomene.

This argument is spurious, another red herring. However H. heurippa evolved, it would face the same problem of being probably non-mimetic.

The empirical evidence that H. heurippa acquired its colour pattern via hybridization cannot be trumped by a theoretical argument based on mimicry that it should not occur.

H. heurippa does have a passing resemblance to the co-occuring local mimicry ring consisting of Heliconius numata messene, H. hecale ithaca, Melinaea marsaeus messenina, Melinaea isocomma isocomma, Mechanitis (mazaeus) messenoides, and a number of other species. This may have helped in the establishment of the colour pattern via a weak mimicry effect.

10) Too much data! p. 6: “It is easy to be intimidated by the overwhelming quantity of data and elaborate analyses in genomics publications.” In 2011 Brower had the same problem with overabundance of data: “Salazar et al. (2010) is an example of several disturbing emergent trends in genomic-era publications. The paper alludes to analysis of an enormous amount of data: nearly 45 kb of sequence from 30 individuals, representing nearly 3,000 individual GenBank accessions. The sheer quantity of data … makes results difficult to evaluate critically.”

However, Salazar’s (2010) phenomenally large dataset was achieved via PCR and Sanger sequencing, as used by Brower himself.

In any case, increased amounts of data must surely give more power to those who would like to reject incorrect hypotheses, rather than being a cause for concern as a “disturbing emergent trend.” Brower’s argument seems silly.

11) Homoplasy is the new parsimony! p.5-6: Some homoplasious colour patterns in the genus Heliconius have evolved via selection for mimicry, as opposed to being maintained by selection after hybrid transfer from another species. De novo truly homoplasious mimicry evolution is therefore considered by Brower to be the most parsimonious mode of origin for all similar patterns.

A basic principle of parsimony seems violated by Brower’s argument here. The principle of parsimony is that ad hoc assumptions (such as novel evolution of mimetic phenotypes) should be minimized. It doesn’t mean that parallel evolution never occurs via selection for mimicry. For example, almost certainly the same colour patterns were invented twice in H. erato and H. melpomene. The species that achieved the parallelism was probably H. melpomene, the mimic of H. erato. No hybrids between species as distant as erato and melpomene are known in Heliconius.

But when the same pattern co-occurs in related species known to hybridize and backcross in the wild and in captivity, as here, one suspects that transfer of patterns is possible and indeed likely. The principle of parsimony doesn’t prove transfer, but suggests that the hypothesis of transfer would obviate the need for parallel evolution of the same nucleotide sites.

12) All next-gen sequencing data is suspect! p. 6: “There are issues both of data quality and analytical rigour that raise concerns” in next-gen sequencing data, in general [and therefore by implication as applied to Heliconius as well].

Brower’s angst about newer genomic data in general does not seem a reasonable criticism of the question at hand. See also (10) above.

13) We must ignore any evidence from partial data! p. 6: “Unfortunately, these data matrices contain an enormous amount of missing data”.

The reason for the apparent “missing data” is that very strict filters were applied so as to reject low coverage or poor quality alignments to ensure clean data. All of the original Illumina reads are available for those who would prefer to work with original data.

But after this strict alignment procedure, surely it’s the filtered data that do support these findings that are more important? Given there are now thousands of times more data than we had before, aren’t these next-gen sequencing data thousands of times more powerful than those from PCR and Sanger sequencing?

Inferences must always be made with the data available, even if these form only a sample of the genomic information.

14) I couldn’t see any pattern in their data! p. 6: “Support for grouping of the taxa by wing pattern reveals that they are rife with ambiguity: there is not a single fixed character state difference”

Brower should look at the data more carefully. The ABBA-BABA peaks shown in Fig. 4b,c of Heliconius Genome Consortium (2012) are based entirely on fixed sites. In any case, although some sites may not follow the major phylogenetic patterns shown in Fig. 4d, Brower must be aware that conflicting signals (“rife with ambiguity”, as he calls it here) are entirely normal in phylogenetics, and do not negate results of a well supported analysis. In Fig. 4d, bootstrap support in the centre of the colour-pattern genomic region, which suggests hybrid transfer, is 100% for both postman and rayed colour patterns.

In 2011, Brower argued: “It … may not be possible to obtain a clear understanding of the evolution of mimetic phenotypes in these butterflies until we are able to examine gene genealogies for the genes that are responsible for the wing pattern elements themselves. I predict that the allele producing a red band on the forewing of H. heurippa will not be homologous (IBD) to that of sympatric H. melpomene melpomene, a pattern that would lay to rest the H. heurippa [hybrid speciation] hypothesis.”

We’ve now disproved Brower’s (2011) prediction with two different colour pattern loci in two races of H. timareta, and also in H. elevatus (Heliconius Genome Consortium 2012). There is similar evidence from the red forewing band locus in H. heurippa itself (Pardo-Diaz et al. 2012). But Brower is not satisfied, and is now apparently claiming that these regions must, in effect, be completely identical and fixed at all divergent bases, and not just genealogically identical by descent (“IBD”), to disprove his hypothesis of independent origin. Brower’s view of the purity of species is proving to be a moving target, which can never be falsified.

15) I didn’t understand what kind of phylogenetic analysis they used! p. 6: “Published trees do not make clear how many or what kind of characters support these patterns, nor what models were used to produce the trees.”

Phylogeneticists often criticize other phylogeneticists’ conclusions on the basis of methodology, especially if different kinds of analyses give different results. Here Brower criticizes our methods, and tries, but fails to show that any other method would give a different result. In fact, he agrees that “phylogenetic analyses of various walk segments do yield the published topologies [of the Heliconius Genome Consortium]”.

Neighbor-joining and maximum likelihood methods based on the nucleotide data were clearly indicated in the methods sections of the Heliconius Genome Consortium (2012) paper. The model used in maximum likelihood analysis was standard — GTR + gamma — a complex model useful due to the large amount of data available for each tree.

16) I don’t understand this other analysis either! p. 6: Use of D-statistics to infer introgression “assumes neutrality”, which is not true in regions affecting colour patterns.

D-statistics were used only in the genome-wide analysis of Fig. 3b, where it can be assumed that most of the variable sites were approximately neutral. In any case, it is hard to imagine a model of selection bias towards variable sites that are preferentially shared between the local races of timareta and melpomene in non-colour pattern regions, when this contravenes the species-wide phylogeny, unless those sites have actually been transferred via gene flow locally. Thus, these data provide genome-wide evidence of polymorphic allele sharing between local races of the two species.

Given that this is the case, it becomes an unnecessary hypothesis to argue that the fixed ABBA-BABA sites shared within the colour pattern regions (in Fig. 4b,c) were not transferred with the rest. Given that they will be shared during occasional hybridization events, an influx of adaptive variation with potential value for mimicry must have taken place.

17) Allele sharing gives no evidence of introgression, again, and yet again!! p. 6: “A similar problem [to that in (15)] occurs with use of the statistical program IM and linkage-disequilibrium tests to infer interspecific gene flow.” “When the traits of interest are under selection, as genes responsible for wing patterns manifestly are, then inferences drawn from coalescent-based methods for inferring gene flow that assume neutrality may be unreliable”

See also (5) above. Genomic evidence for allelic sharing in Heliconius Genome Consortium (2012) does not depend on coalescent-based estimates such as the use of the program IM or analysis of linkage disequilibria. The ABBA-BABA tests and D-statistics used are essentially parsimony-based tests of site patterns, and do not employ strict assumptions of neutrality, unlike coalescent-based methods such as IM.

E) “Paradigms and paradoxes:” This section of Brower’s (2012) paper depends on his strong prior bias against introgression. Brower explains that he cannot imagine how introgression might occur under his interpretations of mimicry, homoploid hybrid speciation, and natural selection for mimicry. This, in his view, militates against the empirical, genetic data which suggests it does.

18) Rare events are not possible! p. 6: “How can wing mimetic pattern alleles flow from one species to another (and apparently be the only gene regions that do so), when it has been shown that wing patterns are perhaps the key adaptations responsible for intrinsic maintenance of species boundaries by mate choice?”

First of all, colour pattern regions are not the only genomic regions to show exchange. See Fig. 3 in Heliconius Genome Consortium (2012) and point (16) above, which Brower apparently misunderstands.

Wing patterns are indeed involved in mate choice. (However, the colour pattern itself may not be the strongest barrier; instead other traits such as behaviour and pheromones may be more important). In spite of rather strong barriers to hybridization, hybrids certainly do occur, both in the wild and in captivity. Although hybrids suffer many problems, including female sterility and mimetic disadvantages, male hybrids can and do backcross both in the wild and in captivity. Hybrids effectively bridge the gap between species, and the mating barriers to backcrossing are much weaker than in the original hybrid mating (Naisbit et al. 2001). Colour pattern genes can thus readily be crossed and backcrossed into rather distant species (Gilbert 2003, Mallet et al. 2007).

19) Selection for mimicry disproves speciation! p. 6: “How can wing pattern alleles spread from one species to another when such introgression does not occur across intraspecific hybrid zones in geographically differentiated species in which there are no barriers to interracial hybridization?”

Actually, abundant hybridization and introgression do occur across intraspecific hybrid zones. The whole Amazon basin, for instance is a mass of such clinal polymorphism in most Heliconius species found there (Rosser et al. 2012).

It is of course true that disfavoured colour patterns are usually selected against. Nonetheless, novel patterns do sometimes become established, with the genus Heliconius perhaps holding the record for the evolution of multiple novel warning colour patterns (Mallet 2010).

20) Non-mimetic phenotypes can never establish, and therefore non-mimetic species cannot arise via hybridization either! p. 6: “How can ‘non-mimetic’ phenotypes arise and become fixed as a result of interspecific gene flow when there is a strong selective advantage to phenotypic conformity due to Müllerian mimicry?”

However, non-mimetic patterns do occasionally become established in Heliconius including in the polymorphic H. timareta timareta in Ecuador (see photos above). The argument by Brower applies to any mode of origin of the novel Heliconius heurippa colour pattern, whether via hybridization or not. This particular argument of Brower’s therefore does not single out introgression as a less likely means of heurippa‘s evolution. Another red herring.

Note that this is merely a repeat of Brower’s argument (9) above.

21) Recombination is not possible! p. 6: Non-mimetic phenotypes like Heliconiusheurippa are not “explained by the introgression hypothesis: if selection for mimicry drives the process of introgression, then phenotypes resulting from introgressed alleles should be identical to those of the species from which they came.”

This would be true if mimicry was driving the fixation of introgressed alleles. However, as Brower points out, in this case, H. heurippa is probably non-mimetic, and therefore mimicry would not explain the establishment of its novel pattern. So Brower’s argument is a red herring, as the establishment of the novel colour pattern in H. heurippa has never been dependent on a hypothesis of perfect mimicry.

The possibility of introgression depends on hybridization between species, not on mimicry. The genetic evidence for introgression of colour patterns in Heliconius is independent of any mimicry hypothesis. The establishment of any introgressed colour pattern alleles may sometimes be helped by mimicry, but could also be dependent on other factors for hybrid taxa such as H. heurippa or H. timareta timareta, such as mate choice.

As already noted (see (9) and (20) above), it is possible that the establishment of the hybrid colour pattern in H. heurippa was helped along by approximate similarity of the large Melinaea, Mechanitis, and some other Heliconius with which it co-occurs.

22) Introgression is unlikely because, if it occurred, all Heliconius butterflies would share the same pattern! p. 6: “If wing patterns are promiscuously shared across species boundaries, then why has this not led to fixation of a single, shared aposematic pattern, which would represent a stable, selectively advantageous global optimum for all Heliconius butterflies?”

Once again, a very interesting question, and one which is a recurring theme in our research.

However, this argument is relevant to the evolution of novel colour patterns, whether or not hybridization was involved, and is therefore another red herring in this context. See also (9), (20), and (21) above.

23) Absence in small samples of allelic sharing indicates absence of introgression! p. 6-7: “The absence of introgressed neutral loci (e.g. microsatellites) between H. heurippa and H. melpomene does not fit the pattern of shared genetic material expected if significant hybridization had taken place between those two species or the ‘parental’ H. cydno and H. melpomene populations.”

Note, this argument appears to be the precise converse of (5), (6) and (17)! If these microsatellite loci did show allelic sharing, then presumably the “ancestral polymorphism” argument of (5), (6) and (17) would instead be deployed by Brower against the idea that allelic sharing suggests introgression. Brower’s arguments are constructed so as to be irrefutable whatever the results!

In any case, Brower does not apparently understand the use of Bayesian STRUCTURE analysis in cluster assignments, as used in Mavárez et al. (2006). Although the full microsatellite data may cluster individuals into separate taxa, this does not preclude the existence of multiple alleles at those same microsatellite loci shared between those taxa via introgression. All that is required to obtain STRUCTURE evidence for different clusters is that the allele frequencies of at least some of the loci should be different in each cluster.

24) De novo homoplasious evolution of mimetic colour patterns in different populations is more likely than introgression from another species! p.7: “Further, the biogeographical pattern, with six or seven different H. cydno cognates east of the Andes exhibiting at least three different H. melpomene-like mimetic phenotypes, implies that the extremely unusual genetic phenomena proposed to produce them must have occurred independently in multiple populations.”

Given that hybridization and introgression occur in many different locations where H. cydno or H. timareta overlap with H. melpomene, it would hardly be surprising if occasional hybrid transfers resulted in the promiscuous sharing of colour pattern loci that we observe in the genomic data.

Conclusion

Brower remains unconvinced by the abundant specimens showing evidence of hybridization and introgression among species of Heliconius in nature, and in captivity. Uniquely, for a systematic biologist, Brower argues that data from museum specimens collected over hundreds of years are not valid.

Brower displays a strong preference for what he was perhaps taught as an undergraduate, that species are reproductively isolated, and that species never exchange genes after separation. This belief leads him to discount all evidence ever produced to suggest hybridization and gene flow among Heliconius species. One correspondent wrote to me in an email “it’s almost entertaining to read; in fact, you find yourself wondering with curiosity what he will find to discredit [in] the topic outlined in the title of each paragraph!”

Brower displays such a fanatic determination to dismiss this evidence that it leads him to make many errors of logic in interpreting the genetic and genomic data recently revealed by the Heliconius community.

It is unlikely that anyone will ever persuade Brower that he is wrong, and maybe it’s best not to try. Nonetheless, it seemed to this author important to document for a wider audience, at least informally, just how mistaken all of Brower’s arguments are.

To dismiss the simple finding of hybridization and introgression, Brower has to use so many different arguments against different aspects of the extensive data that it is impossible to answer them all in a reasonable-length article. However, the multiplicity of arguments he uses itself tells against his theme. One wonders why Brower doesn’t come to the much simpler, alternative conclusion that explains all of the data: that hybridization and introgression do indeed occur.

Brower’s articles and views on this topic are in my view becoming unreasonable.