Abstract

We sequenced 665bp of the Cytochrome C Oxidase I (COI) barcoding marker for 257 specimens and 482bp of Elongation Factor 1-α (EF1-α) for 237 specimens belonging to the leaf-mining subgenus Ectoedemia (Ectoedemia) in the basal Lepidopteran family Nepticulidae. The dataset includes 45 out of 48 West Palearctic Ectoedemia s. str. species and several species from Africa, North America and Asia. Both COI and EF1-α proved reliable as an alternative to conventional species identification for the majority of species and the combination of both markers can aid in species validation. A clear barcode gap is not present, and in some species large K2P intraspecific pairwise differences are found, up to 6.85% in COI and 2.9% in EF1-α. In the Ectoedemia rubivora species complex, the species E. rubivora, E. arcuatella and E. atricollis share COI barcodes and could only be distinguished by EF1-α. Diagnostic base positions, usually third codon positions, are in this and other cases a useful addition to species delimitation, in addition to distance methods. Ectoedemia albifasciella COI barcodes fall into two distinct clusters not related to other characters, whereas these clusters are absent in EF1-α, possibly caused by mtDNA anomalies or hybridisation. In the Ectoedemia subbimaculella complex, both sequences fail to unequivocally distinguish the species E. heringi, E. liechtensteini, E. phyllotomella and one population of E. subbimaculella. DNA barcodes confirm that North American Ectoedemia argyropeza are derived from a European introduction. We strongly advocate the use of a nuclear marker in addition to the universal COI barcode marker for better identifying species, including cryptic ones.

Introduction

The Lepidoptera are one of the megadiverse groups of organisms, with currently more than 157,000 described species (van Nieukerken et al., 2011), but also a group for which there are many specialists and ongoing taxonomic studies (Kristensen et al., 2007). It is therefore no surprise that Lepidoptera were considered particularly useful for DNA barcoding studies, and they have figured importantly in studies on DNA barcoding since the idea was launched in 2003 (Hebert et al., 2003a, 2003b, 2004a; Janzen et al., 2005; Hajibabaei et al., 2006a). The All Lepidoptera barcoding campaign (http://www.lepbarcoding.org/) has resulted in an increasing database of Lepidoptera barcodes (559,920 on October 21, 2011), particularly derived from geographic campaigns in North America, Australia and Europe, and from global campaigns covering the families Sphingidae, Saturniidae and Geometridae (Hebert et al., 2010; Wilson et al. 2011). The method has proven successful for identifying most morphologically recognised species and has many interesting applications. The most frequently cited application is the recognition of cryptic species (Hebert et al., 2004a; Hausmann et al., 2009; Janzen et al., 2009), but on the contrary barcodes can also confirm that a polyphagous species is indeed one species (Hulcr et al., 2007). DNA barcoding also makes matching of unknown immatures with adults possible (Miller et al., 2006; Janzen et al., 2009), or matching the sexes of dimorphic species (Janzen et al., 2009). Further it allows identification of food remains from gut contents (Matheson et al., 2007) and even can be used to identify specimens in collections that have lost important characters. Despite this success story, there has been criticism on the use of a mitochondrial marker and on this particular part of COI. Roe and Sperling (2007) concluded that the barcoding region is not discriminating species better than other parts of the COI-COII genes and suggested the use of longer sequences. In Australian Elachistidae several recently diverged species could not be recognised by another 700 bp part of COI (Kaila and Ståhls, 2006) and in some Hesperiidae the differences between species were just three nucleotides (Burns et al., 2007). In general, mitochondrial DNA has particular issues related to the nature of mitochondrial biology: reduced effective population size and introgression, maternal inheritance, inconsistent mutation rate, heteroplasmy, compounding evolutionary processes and nuclear pseudogenes are some of the cited causes for problems in species discrimination (Rubinoff et al., 2006). In such cases mtDNA based clusters can be composed of specimens belonging to different species through introgression (Ballard and Whitlock, 2004; Ballard and Rand, 2005; Stone et al., 2007) and variability within a species can be far larger than between species, thus incorrectly suggesting the presence of cryptic species (Stone et al., 2007).

Various authors have suggested the use of additional genes for DNA barcoding, particularly nuclear genes (Sonnenberg et al., 2007; Zakharov et al., 2009). For discriminating two cryptic species of Crypsiphona (Geometridae), Õunap and Viidalepp (2009) used both COI and EF1-α, but did not comment on differences between the two genes. Dasmahapatra et al. (2009) found that the COI DNA barcode recognised more haplogroups than they could recover with techniques such as AFLP.

We are not aware of any study that tried to compare species discrimination throughout any animal taxonomic group with two or more barcoding markers. We realised that the dataset that we obtained during a phylogenetic study of the subgenus Ectoedemia Busck, 1907 s. str. (Lepidoptera: Nepticulidae) provides an ideal set for just this type of comparison. When using a mitochondrial marker as well as a nuclear marker the concerns with both markers can hopefully be reduced. We studied the COI barcode and a part of the nuclear gene Elongation Factor 1-α, a frequently studied gene in Lepidoptera, providing much phylogenetic signal (Caterino et al., 2000).

The Nepticulidae are one of the most speciose, basal, non-Ditrysian, Lepidopteran families, with currently slightly over 800 named species (van Nieukerken et al., 2011). They comprise tiny moths of only 3-10 mm wingspan of which the larvae are plant-miners, the majority feeding in leaves. The species are almost invariably monophagous or at most oligophagous, and feed particularly on woody plants in the Rosid clade of the Eudicots, with the notion that related species often use related plants (Menken et al., 2010). The genus Ectoedemia is one of the larger genera, divided into a number of subgenera (van Nieukerken, 1986). The subgenus Ectoedemia is mainly Holarctic, with around 90 known species, feeding on a small number of tree families. The 48 Western Palearctic species (including one that is still unnamed) have been fully revised (van Nieukerken, 1985; van Nieukerken et al., 2010), 20 Eastern Palearctic species have been described (Puplesis, 1994), and 18 species were recorded from North America (Wilkinson, 1981; Wilkinson and Newton, 1981), including the European E. argyropeza. Outside the Holarctic region, five species are known from southern Africa (Scoble, 1978, 1979; Mey, 2004) and two from Central America (Puplesis and Robinson, 2000), while some unnamed species from the Oriental region are recorded here.

Nepticulidae are an ideal group for barcode studies since larvae are easily collected within their leafmines, simultaneously providing information on their host plants. Reared adults provide further tests of species identity, but in many cases identification of larvae and leafmines is possible. The subgenus Ectoedemia provides an interesting mix of species that are straightforward to identify in all stages and sexes, species only identifiable by genitalia, and a few species complexes in Europe of which the members are hard to identify at all.

The Ectoedemia angulifasciella complex comprises four species feeding on Rosaceae: E. angulifasciella mainly on Rosa, E. rubivora on Rubus, E. arcuatella on Fragaria and E. atricollis on several Rosaceous trees (Wilkinson et al., 1983; van Nieukerken, 1985). Since these differ in only few morphological characters, these species are most easily identified by their hostplant. Only E. angulifasciella can be safely identified by their male genitalia.

In the Fagaceae-feeding Ectoedemia subbimaculella group, two complexes occur: the E. albifasciella complex with four nominal species and the E. subbimaculella complex with between two to five species (van Nieukerken, 1985; van Nieukerken et al., 2010). The species of the Ectoedemia albifasciella complex can only be identified easily by their female genitalia, males can only be identified from a combination of hostplant, larva and leafmine data when reared, and two species then still cannot be separated. The species of the subbimaculella group are almost inseparable as adults, with a slight difference in the female genitalia that distinguishes E. subbimaculella from the other species (van Nieukerken, 1985). Ectoedemia subbimaculella can also be distinguished from the others by the conspicuous behavioural character in the leafmine, in which the larva makes a slit to prevent waterlogging, but otherwise a combination of hostplant, larva and adult is usually the only way to get an acceptable identification. It would thus be interesting to test the ability of DNA barcoding to separate species in these complexes and to assess whether the taxonomic decisions on the species level are also supported by molecular data. In fact, the identity of at least one of the species, E. liechtensteini, has been questioned (van Nieukerken, 1985; van Nieukerken et al., 2010). Allozyme studies could not separate three of the studied species of the subbimaculella complex (Menken, 1990).

Our phylogenetic studies (Doorenweerd and van Nieukerken, in prep.) have shown that most of the species groups recognised in Europe (van Nieukerken et al., 2010) are recovered as monophyletic, when the small occultella group is included in the angulifasciella group. Here we will therefore use the European group names with the exception of the occultella group. All Western Palaearctic species, except three, and in addition three from southern Africa, four from North America, eight from the East Palearctic (including two trans-palearctic species) and approximately five species from Southeast Asia were analysed.

Material and methods

Material

Material was either collected for this project, present in the collections of NCB Naturalis or received from third parties. Typically larvae were collected by searching occupied leafmines, after which individual larvae were immediately conserved in 96 or 100% ethanol (occasionally without ethanol) and later kept in a freezer at minus 80°C. Additional specimens were reared to the adult stage in the laboratory. Cocoons with hibernating larvae of European species were kept in polystyrene rearing containers in an unheated shed and taken indoors at ambient temperatures from March onwards, where the adults finally emerged. Tropical species were reared in temperatures around 25°C and high humidity in a climate cabinet. From small samples often all larvae were preserved directly. The leafmines from which larvae were taken were dried and kept as vouchers. Larvae were identified as far as possible on the basis of external larval characters (such as body colour, head colour, presence of plates) and the larval feeding pattern combined with host-species identity. After rearing, the samples were re-identified on the basis of the adults, by dissection of genitalia if needed. In many cases when larvae could not be identified with certainty, sequences were used for a final identification check. In such species, sequences from correctly identified adults serve as barcoding standard.

The dataset includes 45 out of the 48 known West Palearctic species of Ectoedemia s. str. (van Nieukerken et al., 2010). All names and full authorities are given in Appendix 1. We were unable to get amplifiable DNA from only three species: E. hexapetaleae (Szőcs, 1957), E. similigena Puplesis, 1994 and E. albida Puplesis, 1994, for which we only had relatively old specimens. In addition to the European species we used material of several species from Asia, Africa or North America, including some undescribed ones and six species of the subgenus Ectoedemia (Zimmermannia) Hering, 1940 as outgroup. All species analysed are listed with full nomenclature in Appendix 1. Some unnamed species are indicated with tentative names based on hostplants and/or distribution.

For the species of the species complexes, we sampled up to 18 specimens each during focussed collecting. For the remaining species we aimed to use at least two, but preferably more specimens, with the largest possible geographic distances between them to observe representative intraspecific variation (Lukhtanov et al., 2009). The majority of adults studied were at the same time examined for a taxonomic revision (van Nieukerken et al., 2010), and the data are included in the supplementary data of that paper, using the same registry numbers, and also available on GBIF (http://data.gbif.org/welcome.htm). The material includes type material of several species that were described in the cited paper. All specimens received a registry number for our collection, whether extracted destructively or not. Also specimens not belonging to the RMNH collection (NCB Naturalis collection, former Leiden Zoology collections), received such a number as well for practical reason: the number represents in that case the DNA extract, of which the remaining aliquots are kept in our DNA collection. In our laboratory an extra sequence tracking number was added to each extract. Tissue samples of larvae were usually kept in 96% ethanol in a minus 80 freezer, used adults are kept as mounted specimens in the dry collection, with separate permanent genitalia slides.

All sequences are publicly available on the Barcoding of Life Database (BOLD – www.barcodinglife.com) under the project Nepticulidae of the World – Ectoedemia Public Records, with full collection data and images when available. In online Table S3 we provide for each sequenced specimen the identification, sample ID’s, Process ID’s, GenBank accession numbers and the GBIF data portal URL plus some data on occurrence. Further details can be seen on the BOLD site.

Extraction, amplification and sequencing

Genomic DNA was extracted with the Qiagen DNEasy Blood & Tissue kit. Different types of tissue were used for extraction, depending whether the number of the available specimens allowed destructive extraction or not. Hindleg(s) were cut in small pieces with a scalpel prior to digestion with proteinase K, larvae were homogenised with a disposable pestle. Non-destructive extractions from the abdomen (Knölke et al., 2005 slightly modified) were used to combine genitalia preparations with DNA extractions; some larvae were treated in a similar way in order to be able to mount the larval cuticle on a slide.

Primers

For the list of primers see Table 1. We used part of mitochondrial Cytochrome C Oxidase I (COI) – the selected barcoding marker for animals (Hebert et al., 2003a), and amplified a part of 665 bp in length with the Lep primers (Hebert et al., 2004a). We also sequenced a section of 482 bp of the nuclear Elongation Factor 1-α (EF1-α) marker for most of the specimens. Initial attempts to amplify a 1240 bp fragment of this gene by using the primers (five sets) of Cho et al. (1995) largely failed. Only primer M44-1 with rcM52.6 (Cho et al., 1995) amplified a 701 bp fragment consistently for at least five different genera of Nepticulidae (Ectoedemia, Enteucha, Parafomoria, Trifurcula and Stigmella). Based on these results, Nepticulidae-specific primers, EF-NepF and EF-NepR (Table 1) that amplified a 482 bp fragment of this gene, were designed and used throughout this study.

Table 1. Primers used. The names are those that are used on the BOLD site. T-primers are tailed primers, in forward direction tailed with T7 promotor, in reverse with a T3 tail (in bold). The first two published primers are denoted short, because the version most used on BOLD has three more bases in either primer than these.

For many specimens we used T7 promotor and T3 tailed primers for both COI and EF1-α, as this speeds up the work-flow and may improve results (Regier and Shi, 2005; Wahlberg and Wheat, 2008). For some older museum specimens, the DNA was too degraded for

amplifying sections over 400-bp long. For these we used internal primers for COI (Hajibabaei et al., 2006a, 2006b) and EF1-α (specially designed for Nepticulidae). As a consequence, for some specimens there is only a shorter sequence available, denoted with (p) for partial. These shorter sections are, respectively, for COI a 310-bp amplicon and for EF1-α a 251-bp amplicon. PCRThe PCR cycle consisted of 3 minutes initial denaturation at 94°C, 15 seconds cycle denaturation at 94°C, 30 seconds cycle at annealing temperature, 40 seconds cycle extension at 57°C for 40 cycles. A final extension at 57°C for 5 minutes occurred after all cycles had finished. The annealing temperature for COI was 50°C, for EF1-α 57°C. PCR was performed in volumes of 25 µl. For many samples the product was purified using the Promega Wizard Genomic Purification kit using the manufacturers ‘spin column protocol’, for others the purification was done by Macrogen with a Montage purification kit (Millipore). All samples were sequenced in both directions on an ABI 3730 by Macrogen.

Alignment

Sequencher 4.2 software was used to align the forward and reverse sequences, to manually check for ambiguities in the chromatograms and to export contigs. In EF1-α heterozygous bases are scored with ambiguity codes. The sequences of both markers contain no gaps or stopcodons and were aligned by eye in BioEdit 7.0.9.0 (Hall, 2004).

Neighbor joining trees and distances

Neighbor joining trees were created in Paup* 4.0b10 (Swofford, 2003) using Kimura 2 Parameter distance, the algorithm also used for species identification in the BOLD datasystems. Ten thousand bootstrap replicates were performed with Paup*, and bootstrap and distance values are shown on the respective branches present in a Neighbor joining tree (Figs S1-S2). For the trees of different clusters we ran separate analyses. Distances were also calculated using Kimura 2 Parameter distance, either by BOLD tools or with MEGA5 (Tamura et al., 2011). As outgroup we used species of the subgenus Ectoedemia (Zimmermannia), which is the sistergroup of Ectoedemia s.s. in both ongoing unpublished family level phylogenetic studies (Hoare and van Nieukerken, in prep; van Nieukerken et al., in prep.) and based on morphological characters (van Nieukerken, 1985; van Nieukerken et al., 2010).

Diagnostic positions

In cases of closely related species, where sufficient sequences were available, we also analysed the sequences for mutually exclusive diagnostic base positions. They were also defined as ‘simple character attributes’ in the Character Attribute Organisation System (DeSalle et al., 2005; DeSalle, 2006; Rach et al., 2008). Results are depicted in table form (Tables 4, 5, 6), we indicated whether these positions are third codon positions or not.

Results

In total we obtained 262 COI sequences (ten partial) belonging to ca. 64 species of Ectoedemia sensu stricto, including five sequences (one partial) of four species in the subgenus Zimmermannia. Further we obtained 240 EF1-α sequences (25 partial) of ca. 62 Ectoedemia s. str. and three sequences of three Zimmermannia species (one partial).

Quality material

Material collected for molecular studies, usually larvae, kept in 96% or 100% ethanol in a -80 freezer, was almost always successful. In total 85.8% of the successfully extracted material yielded the full COI barcode and 80.4% the EF1-α sequence (see Table 2). Dried collection material was also successful when only a few years old, with a progressive decline for older material, but still some full barcodes were obtained from 19 year old specimens, in all cases extracting DNA from abdomens. Shorter barcodes (335 bp) and a shorter part of EF1-α (251 bp) were obtained from 3-25 year old material. Older larvae kept in 70% ethanol (collected for morphological studies), were partly successful: from 54% we obtained at least a short barcode for material up to 28 years after collecting; this material was less successful for EF1-α: out of 10 larvae we got just one full and one partial sequence.

COI versus EF1-α

In COI 148 out of 658 basepairs are variable in our dataset (22.3%) and in EF1-α 152 out of 482 (31.5%). The effectiveness of COI and EF1-α for barcoding this group was compared by plotting the maximum intraspecific distance against the minimum interspecific distance for 39 species pairs of which more than one specimen was available for both markers (Fig. 1a). If the minimum distance between species is larger than the maximum distance within species, they can reliably be assigned to a species and it can be said that there is a ‘barcoding gap’ present. The vast majority of data points is well above the 1:1 barcoding gap line, indicating that the COI and EF1-α sections we used are reliable barcoding markers. The data points below the 1:1 line are from species pairs belonging to the complexes treated in detail below. The graph also shows that the maximum intraspecific distance of COI for these species can be as high as 3.5%, whereas EF1-α values remain below 2.0%. The effectiveness was further examined by plotting the pairwise distances of COI and EF1-α between specimens in a scattergram (Fig. 1b). If both markers would evolve at exactly the same rate, all data points would be expected to be on the 1:1 diagonal. If the rate with which mutation accumulate would differ constantly between COI and EF1-α, all datapoints would fall either below or above the diagonal. The latter clearly is not the case with our data: EF1-α and COI evolve at roughly the same rate. However, the most fitting trend line, polynomous with an R2 of 0.75, indicates that pairwise distances between closely related specimens (i.e. within species) are higher in COI than in EF1-α, and pairwise distances between more distantly related specimens (i.e. between species groups or subgenera) are higher in EF1-α than in COI. So, even though there is no linear difference between the mutation rate of both markers, this indicates there is an evolutionary difference between both markers nonetheless.

Fig. 1a. Comparison of maximum interspecific divergence versus maximum intraspecific divergence percentages between 39 Ectoedemia s. str. species pairs for which multiple sequences for both markers were available. EF1-α reaches the same minimum interspecific distances as COI, but the maximum intraspecific divergence is much lower. Species pairs below the barcoding gap line involve species belonging to the complexes.

Fig. 1b. Scattergram containing the 21.945 pairwise distance values of COI and EF1-α between all specimens. The polynomous trend line gave the best fit for the data, with an R2 of 0.75. The data does not show that either marker evolves at a higher rate in general, but closely related specimens show larger distances in COI, where more distantly related specimens are the exact opposite of this and show larger distances in EF1-α.

Species recognition

All specimens of a single species, except the species in the Ectoedemia rubivora complex (E. arcuatella, atricollis and rubivora), the E.albifasciella complex and part of the E.subbimaculella complex form distinct clusters in both the COI and the EF1-α neighbour joining trees with all taxa included (Figs S1-S2). They are unambiguously distinguishable from other species by using distance methods for either marker. BOLD highlights intraspecific pairwise differences that exceed 2% as potentially containing cryptic species (Ratnasingham and Hebert, 2007). Several species in our dataset exceed this threshold, their maximum distances as well as their respective values in EF1-α and maximum geographic distance between the samples are shown in Table 3. Below we will discuss species for which we have sequenced more than one specimen from more than one locality.

The Ectoedemia angulifasciella group

Species in this group largely feed on Rosaceae, with the exception of the Betulaceae feeders E. occultella and E. minimella.

The taxonomic status of three specimens found feeding on Rubus in Vietnam and Borneo, and a fourth specimen caught at light in Vietnam, is uncertain (Fig. 2). The external morphology, geographic region, host species and feeding pattern of the Rubus miners suggest that they might belong to a single species, but insufficient material is available to conduct a conclusive morphological study. Both COI and EF1-α results indicate that they likely represent several species, with pairwise differences between 5.6% and 6.2% in COI and 2.1% and 4.0% in EF1-α. The results also suggest that the specimen collected at light is a different species closely related to the aforementioned three, and likely to be found mining Rubus sp.

Fig. 2. K2P Neighbor joining trees containing possibly four species in the Ectoedemia angulifasciella group. Specimen numbers are RMNH.INS registry numbers. All specimens were collected in Vietnam or Borneo, at light as adult or as larva on Rubus spp. 2a: The COI tree. The distances between specimens are large, indicating that they are likely to represent several species. 2b: The EF1-α tree. One specimen less was included here, but the tree also shows relatively long distances between specimens.

In another Rubus feeder, E. erythrogenella, specimens from Spain, France, Sardinia and Greece hardly show differences, whereas the single specimen from Turkey shows a pairwise difference in COI with the others greater than 2%, but less than 0.5% in EF1-α (Table 3). In Ectoedemia spinosella, the single specimen from Greece and from another hostplant (Prunuswebbii) differs considerably in both genes from the completely identical sequences from specimens from France, Italy and the Netherlands (all feeding on sloe, Prunus spinosa). In the closely related E. mahalebella, specimens from southwest France, Italy and Croatia hardly show any variation.

The two specimens of E. spiraeae studied, one from Europe (Slovakia) and one from China, show a distance of 6.47% in COI and 2.9% in EF1-α, which can be correlated with a very large geographic distance. The E. spiraeae species cluster is the only that did not get a bootstrap support over 60 in both markers (online Figs S1-S2). Ectoedemia spiraeae has a scattered distribution from eastern Europe through Siberia to China and Japan, with a relative large gap between Europe and Asia (van Nieukerken et al., 2010). These results suggest that possibly different species are involved, a possibility to investigate by a morphological and molecular study of more material from a wider range of populations.

Specimen RMNH.INS.23741 was discovered in Norway and provisionally described as Ectoedemia sp. n. (Bengtsson et al., 2008). Barcodes show that it is almost identical to larvae and reared adults from Rosa from France. This species has now been described as E. rosae van Nieukerken and Berggren, 2011.

The Ectoedemia angulifasciella complex (Fig. 3). Although this complex was originally established as a complex containing four cryptic species based on morphological characters (Wilkinson et al., 1983; van Nieukerken, 1985), E. angulifasciella can easily be discriminated from the other three by 23 diagnostic basepairs: 3.5% of the entire sequence (Fig. 3a); this is also the only species that can be separated by at least one character in the male genitalia.

Fig. 3. K2P Neighbor joining trees of Ectoedemiaangulifasciella and the E. rubivora complex with bootstrap values. The colours denote the different species. The annotation starts with the RMNH.INS registry number, followed by ISO coded country of origin and host. Outgroup for these trees is E. terebinthivora, bootstrap values represent 10,000 replicates. 3a: The COI tree. There is a large distance between E. angulifasciella and the rubivora complex, but within E. angulifasciella there is little variation and none that can be correlated with different hostplants. E. rubivora, E. arcuatella and E. atricollis cannot be distinguished in COI. 3b: The EF1-α tree. As in COI E. angulifasciella seems only distantly related to the others. E. rubivora, E. atricollis and E. arcuatella group on species clusters and can thus be distinguished using EF1-α, albeit based on only two positions (see also Table 4).

By contrast, there are no diagnostic base positions in COI at all that discriminate between the three remaining species E. arcuatella, E. rubivora and E. atricollis. Since the distance from E. angulifasciella to the other species is also large in EF1-α (Fig. 3b), this strongly suggests that E. angulifasciella should not be regarded as part of this complex; we therefore suggest renaming this the E. rubivora complex. The four species can be distinguished by the host plant they are found on and some minor morphological characters. They have been lumped or split in the past depending on the emphasis on biological data versus morphological data (for a review see Wilkinson et al., 1983). Five sequences of COI, belonging to specimens of E. atricollis and E. rubivora, were completely identical (RMNH.INS #’s 11278, 17626, 17782, 12818 and 12803); a few haplotypes in this complex do not coincide with species boundaries at all. Where COI fails to distinguish species, we found that E. atricollis, E. rubivora and E. arcuatella can be distinguished molecularly based on two synonymous mutations (diagnostic) at third codon positions in EF1-α (Table 4, Fig. 3b). For all three species we have included material originating from a large part of their European range.

Table 4. Simple character attribute positions within EF1-α to distinguish three closely related species of the Ectoedemia rubivora complex. Both positions are 3rd codon positions, the substitutions are synonymous.

Eighteen specimens of E. angulifasciella were sequenced, in order to test whether populations on different hosts can be differentiated by their barcodes. This species feeds mainly on Rosa species, but also locally on Filipendula vulgaris and Sanguisorba species. The fact that the species in Öland (Sweden) can be abundant on Filipendula and completely absent from Rosa in the same locality, and vice versa on other localities, suggests that there might be different, morphologically cryptic species specialising on these different hosts (see Bengtsson et al., 2008). Sanguisorba feeders have also been described as several different species in the past (synonymised by van Nieukerken, 1985). However, the molecular results do not show any difference for material from various hosts, but show a rather invariable E. angulifasciella throughout Europe with a maximum intraspecific pairwise difference of 0.77% in COI and 0.84% in EF1-apha, thus confirming the morphological findings.

The Ectoedemia suberis group

All species in this oak mining group show little or no intraspecific variation. The species E. hendrikseni, E. phaeolepis and E. heckfordi have recently been discovered and belong to a morphologically difficult complex that also includes E. andalusiae and E. suberis (van Nieukerken et al., 2010). All are found in West and Southwest Europe with partly overlapping geographic ranges. COI and EF1-α support their full species status, with interspecific genetic distances varying between 2.6% and 6.8% in COI and distances between 2.7% and 4.0% in EF1-α, comparable to the distances between other species. All species form monophyletic clusters with high bootstrap support (Fig. 4). The branching pattern between these species differs significantly in both markers, but they always group together as four. Besides confirming the species status of these five species, these results show that COI and EF1-α can readily be used to distinguish these species.

Fig. 4. K2P Neighbor joining trees containing species of the Ectoedemiasuberis complex. Specimen numbers are RMNH.INS registry numbers. All hosts are Quercus spp. and all species are only found in South-West Europe. Outgroup for this analysis was E. terebinthivora, bootstrap values represent 10,000 replicates. 4a: The COI tree. The species are on their own clusters with high support and distances comparable to those between other species. 4b: The EF1-α tree. Much the same as the COI tree, with high support for separating the species and good distances. The positions of the clusters here are completely different from those in COI.

The Ectoedemia populella group

All species feed on Salicaceae. A few specimens from North America are included., The pairwise distances within E. intimella are very large, correlated with a very large geographic distance. There was one E. intimella specimen included from Japan, with a distance of 6.74% from the others. This female specimen is morphologically indistinguishable from European specimens (van Nieukerken et al., 2010), but since we have not seen other Japanese material, nor any intermediate populations, no taxonomic conclusions can be based on this finding.

From Ectoedemia argyropeza a North American subspecies, E. argyropeza downesiWilkinson & Scoble, 1979, has been described on slight morphological differences. Wilkinson and Scoble (1979) did not consider the possibility that the North American populations are introduced from Europe. Later Menken and Wiebosch Steeman (1988) concluded on the basis of allozymes that this is most likely the case. Five of the European COI sequences of E. argyropeza are also 100% identical to several North American specimens registered on BOLD when using the BOLD identification engine, corroborating the earlier findings.

The Ectoedemia subbimaculella group

The E.subbimaculella group is the second group specialised on Fagaceae (Quercus), although probably some species feeding on other hosts belong here as well. It includes the two species complexes discussed separately below.

For E. alnifoliae and E. pseudoilicis we found intraspecific pairwise differences in COI greater than 2%, but less than 0.5% in EF1-α (Table 3). In the case of E. pseudoilicis it is the Turkish specimen that differs from the Greek ones, but in E. alnifoliae the sampled populations that show these differences (in Turkey) are just 12 km apart.

In E. haraldi we found a maximum pairwise difference of 4.26% in COI, and 2.53% in EF1-α (Table 3). Four specimens of this species were included, the two western European ones (from France and Spain) and two eastern specimens (Greece and Turkey) form separate clades. Also all four specimens have large pairwise differences between them, with a minimum of 1.2% in COI and an average of 3.16% (Table 3). Superficially the eastern and western populations are similar in morphology and biology, a detailed morphological analysis should be carried out to see if the molecular differences are paired with morphological differences.

Ectoedemia heringella shows two clusters in COI, both including specimens from Italy and Great-Britain. This may indicate that the introduced British population (van Nieukerken et al., 2010) is genetically structured, and not the result of the introduction of a single specimen. The COI clusters are not paralleled in EF1-α.

The Ectoedemia albifasciella complex. This complex comprises four Quercus (Fagaceae) mining species: E. albifasciella, E. pubescivora, E. contorta and E. cerris (van Nieukerken, 1985; van Nieukerken et al., 2010). These species can be distinguished using COI, but there are also two distinct clusters (2.17% difference) in E. albifasciella, making it polyphyletic (Fig. 5a). There are no indications that NUMT’s (nuclear mitochondrial inserts, Zhang and Hewitt, 1996) are the cause of this, since the chromatograms contain no double signals, and a translation into amino acids showed no difference between both haplotypes nor the presence of stop codons within these sequences. In EF1-α these two haplotypes are not recovered, we believe this different haplotype is the result of a mitochondrial anomaly (Ballard and Whitlock, 2004; Rubinoff et al., 2006). There is no biological or morphological data to support the split of the two COI haplotypes, but the second haplotype has up to now only been recovered from immature specimens. Only the geographic data suggests that the rare haplotype might have a limited distribution in the Netherlands and adjacent West Germany, but more specimens will have to be included to confirm this. The distribution of the more common haplotype also includes this area, and some samples from a single locality contain both haplotypes. By looking at mutually exclusive diagnostic base positions or simple character attributes, specimens can be identified, even though E. albifasciella appears polyphyletic. Apart from one, all these differences are synonymous (Table 5). It should be noted that for the other three species relatively few specimens are sequenced. It is thus possible that simple character attributes might disappear when more specimens are examined and show intraspecific variation. In EF1-α E. albifasciella is paraphyletic relative to the other three species. E. contorta, E. cerris and E. pubescivora are represented as a clade in the Neighbor joining tree (Fig. 5b), but there are no simple character attributes to distinguish them. A single studied male E. albifasciella from Morocco (not in Fig. 5) is identical in EF1-α, but possibly belongs to another COI haplotype with 1.4 % difference, not grouping with other E. albifasciella in the NJ trees; it has 13 out of the 15 diagnostic basepairs of E. albifasciella.

Table 5. Simple character attribute positions in COI to distinguish species within the Ectoedemia albifasciella complex. Most positions are 3rd, apart from 389 and 584. All substitutions, except at 389, are synonymous.

Fig. 5. K2P Neighbor joining trees of the Ectoedemiaalbifasciella complex with bootstrap values. The annotation starts with the RMNH.INS registry number, followed by ISO coded country of origin and host, when collected as larva. All hosts are Quercus spp. The colours denote the different species and the two haplotype clusters of E. albifasciella. Outgroup for this analysis was E. rufifrontella, bootstrap values represent 10,000 replicates. 5a: The COI tree. The species have their own clusters with high support and distances over 1.2%. There are haplotype clusters in E. albifasciella, with a 2.17% distance. 5b: The EF1-α tree. Only E. contorta and E. pubescivora can be distinguished in EF1-α. The two E. albifasciella haplotypes found in COI are not found here. The specimen from Morocco (see text) is not included in this analysis.

Fig. 6. K2P Neighbor joining trees of the Ectoedemiasubbimaculella complex with bootstrap values. The annotation starts with the RMNH.INS registry number, followed by ISO coded country of origin and host, when collected as larva. All hosts are Quercus spp., except for specimen 17618 which was collected on Castanea sativa. Sequences of E. rufifrontella were used as outgroup for both trees, bootstrap values represent 10,000 replicates. The colours denote the different species and the aberrant forms of E. heringi on Quercus ithaburensis and E. subbimaculella on Q. cerris. 6a: The tree based on COI data. A large E. subbimaculella cluster; the ‘E. subbimaculella’ specimens on Q. cerris group with the ‘E. heringi’ specimens on Quercus ithaburensis. 6b: The tree based on EF1-α data. Although there are many clusters, they have low bootstrap values.

The Ectoedemia subbimaculella complex. This is the second complex found in this group (van Nieukerken, 1985; van Nieukerken et al., 2010). It contains the widespread E. subbimaculella, as well as E. heringi, E. phyllotomella and E. liechtensteini which all three are restricted to southern or eastern Europe, the last two specialising on Quercus cerris. Ectoedemia subbimaculella is placed on a well supported cluster in the COI tree, but with a bootstrap value of 52 unsupported in EF1-α. Also, RMNH.INS.23514, identified as E. heringi (by COI!) falls inside the E. subbimaculella clade in EF1-α; this may be a case of introgression in mtDNA. The other species are more problematic. Our results for E. liechtensteini are inconclusive, we therefore cannot assess the usability of either marker for this putative species. Ectoedemia phyllo­tomella cannot be satisfactorily distinguished in either marker.

Especially interesting is the clade consisting of two specimens identified as E. subbimaculella from Hungary (feeding on Quercus cerris) and two specimens identified as E. heringi from Greece (found on Quercus ithaburensis), further complicating this complex. They share five third codon positions in COI where they differ from all other complex members (Table 6). In EF1-α these two supposed E. subbimaculella specimens are placed basally to the whole group, although without support. The general picture for this complex is that E. heringi and E. phyllotomella cannot be distinguished based by COI or EF1-α barcodes, but that E. subbimaculella can, when the Quercus cerris feeding form is excluded. E. subbimaculella differs by three simple character attributes in COI from the others and one position in COI distinguishes E. liechtensteini (taking into consideration that only three specimens have been sequenced). In EF1-α there is just a single character attribute: E. subbimaculella (including the misidentified RMNH.INS.23514) has an A in position 221, where the others have a G. All of these are synonymous substitutions at third codon positions. By barcoding we could also identify a peculiar colour aberration reared from Quercus as E. subbimaculella: RMNH.INS.23671, see photograph on BOLD.

Discussion

One or two genes

Ever since the original studies on DNA barcoding in animals (Hebert et al., 2003a, 2003b, 2004b), the COI section amplified by Folmer primers and derivates has become the standard barcoding marker for animals, at least for insects and vertebrates (CBOL Database Working Group, 2009). The use of a single marker is in contrast with other groups of organisms, where also the selection of barcoding has been, or still is a lengthy process. In land plants the choice for two markers, the chloroplast genes rbcL and matK has taken several years (Chase et al., 2005; CBOL Plant Working Group, 2009). Mycologists seem finally to settle for a single gene, ITS, after many years of discussion (Schoch et al., 2011, Santamaria et al., 2009). The choice for COI as single standard marker has enormous advantages, but even in the animal kingdom is no longer the only standard, since LSU and SSU are used frequently for e.g. nematodes (Blaxter, 2004; Blaxter et al., 2005). Despite scientific objections (Will and Rubinoff, 2004; Rubinoff et al., 2006; Roe and Sperling, 2007; Song et al., 2008), the choice for COI has been accepted from the onset. With the present size of the barcode database with 1.4 million records, changing the barcode marker is obviously no option, and not something we would like to advocate, but using additional markers may well be the way forward.

Our study of Ectoedemia barcodes shows that COI is able to identify the majority of Ectoedemia species, but the three species in the rubivora complex share barcodes and show variation independent of species boundaries. In other species complexes, species can be distinguished, but often show interspecific distances far below the ‘ideal’ threshold value of 2%. In contrast, in several species the intraspecific variation is high, much larger than the 2% threshold, and in E. albifasciella occurs even a deep split. In other words, a clear barcoding gap does not exist in parts of the genus, whereas it may be present in other parts.

The partial sequence of Elongation Factor 1-α is also able to identify the majority of species, in this case including the rubivora complex species, but not most species in the subbimaculella complex. In contrast to COI, the intraspecific variation is much smaller (up to 2.5%) and the extra haplotype of E. albifasciella is not present in EF1-α.

Because the intraspecific variation is much larger in COI (up to 6.85%) than in EF1-α (up to 2.53%), while the interspecific variation is rather similar in both genes, the latter in fact might be considered more suitable as ‘barcoding marker’ for Nepticulidae. In COI there is more intraspecific variation than in EF1-α, which could be mitochondrial anomalies (Ballard and Whitlock, 2004; Rubinoff et al., 2006) since most of these differences are not observed in EF1-α. The increase in pairwise distances between more distantly related specimens in EF1-α compared to COI could be explained by a higher level of saturation in COI, and thus loss of information in this marker at this taxonomic level. However, both genes have their limitations and cannot identify all species on their own. Taken together the resolution becomes much better and almost all species are straightforward to identify. Our results support the idea that two barcoding markers are better than one, and we will routinely continue to use these two genes, and in addition usually also the D2-D3 region of the nuclear ribosomal marker 28S. The latter and EF1-α provide better phylogenetic resolution and therefore can more easily place unknown taxa in their correct phylogenetic position. In practice we find that COI sequences often lead to taxonomic mismatches, in contrast to a recent study on sphingid moths (Wilson et al., 2011).

The nuclear EF1-α gene was amplified almost as easily as COI from museum material (80 versus 85%), indicating it can be a useful alternative marker. However, we do not advocate that EF1-α should be a second universal barcoding marker throughout the Metazoa or even the insects, in too many cases introns and multiple copies make this gene less suitable (Caterino et al., 2000; Djernæs and Damgaard, 2006). For Lepidoptera this might be the ideal addition, since introns and multiple copies are as yet unknown and the gene is routinely sequenced for phylogenetic studies (Caterino et al., 2000; Wahlberg and Wheat, 2008). However, even in Lepidoptera specific primers for several subgroups are needed (Cho et al., 1995; Yamamoto and Sota, 2007).

Barcoding gap

The fact that a barcoding gap exists is an interesting biological phenomenon. However, as also has been observed earlier, when larger geographic areas are sampled, the gap may disappear (Lukhtanov et al., 2009) and this is what we see in some of our examples.

In the cases of the widespread species E. intimella and E. spiraeae, where we have a gap in observations, we are unable to decide on the basis of the barcoding results whether these are widespread species or should be split in two taxa. In the first case we have not enough material to check it by morphology, the single female from Japan is inseparable from European ones, although the barcode would clearly suggest two different species, but we suspect that this species has a continuous range from Europe to Japan. In the case of E. spiraeae it might be possible that more species are present, because of the large gap between European and Siberian populations (although this needs to be checked), and also here more morphological study is needed as well as molecular analyses from intermediate populations.

In other cases where both markers show a deep split, there may be cryptic species present (the oriental Rubus feeders, E. haraldi) that need to be scrutinised.

In some species we found the largest distances in COI between specimens from Turkey and Europe, a phenomenon known in various other animal groups and thought to be originating from different glacial refuges (e.g. oak galls, Stone et al., 2007). However, the amount of Turkish material we have seen is too limited for further conclusions.

Species complexes

In the species complexes previously defined by morphology (van Nieukerken, 1985), several species could not be distinguished by DNA barcoding, and when they could, the intraspecific distances are far below the 2% threshold. In the E. rubivora complex, three species share the same barcode variation, but they differ in EF1-α by two simple character attribute positions. The species in this complex have been lumped or split in the past depending on the emphasis on biological data or morphological data (for a review see Wilkinson et al., 1983). This complex has been of molecular interest before and Wilkinson et al. (1983) used allozyme analysis to clarify their specific status. They found evidence for two pairs of sibling species, with E. angulifasciella closest to E. atricollis and E. arcuatella closest to E. rubivora. We, however, found in both COI and EF1-α large pairwise differences between E. angulifasciella specimens and the other species and thus the complex is reduced to just the other three species (the rubivora complex). Whether it is due to recent speciation and incomplete lineage sorting that COI cannot be used to separate these species, or because there has been hybridisation remains an open question. The latter possibility is suggested by the fact that part of the COI haplotypes seem to group by species.

In the E. albifasciella complex we found species clusters for all four species in COI, but also two distinct haplotypes for E. albifasciella. Aside from this issue, the species can be distinguished using both distance and simple character attribute methods in COI. In EF1-α however three species have only poorly supported species clusters, and E. albifasciella is paraphyletic with regard to the other three. This could be because EF1-α is more conservative and has not yet accumulated the differences needed to distinguish species. Both markers combined suggest that there has been recent allopatric speciation and possibly secondary sympatry after post glacial dispersion. We consider the possibility that the second haplotype of E. albifasciella represents a separate species as rather unlikely, but admittedly, we only got this sequence from larvae, and thus have no information on the adult morphology.

In the E.subbimaculella complex, distance methods cannot be used to confidently distinguish species, despite the overall genetic distances in the complex. There is some clustering for species, but because we were only able to include few specimens of E. liechtensteini and E. phyllotomella we cannot yet conclude if they can be distinguished using these markers or not. There seem to be large clusters for E. heringi and E. subbimaculella but there are also specimens that are placed at the base of the tree instead in those clusters. However, from the limited data there appear at least to be some simple character attributes for parts of this complex. The molecular results have not brought us much closer to the understanding of this complex than morphological methods did 25 years ago (van Nieukerken, 1985), but at least these results provide a basis for a detailed genetic analysis of this intriguing complex of two or more species, that possibly show extensive hybridisation after secondary sympatry. The specimen 23514 shows the danger of barcode identification: the nuclear gene identifies this specimen as E. subbimaculella, whereas COI recognizes it as heringi. Because of the widespread occurrence of introgression in mtDNA (Ballard and Whitlock, 2004), the first identification seems to be the more likely one.

Introduced species

Another useful application of DNA barcoding is the possibility to identify potential introduced species, since we expect that barcode haplotypes of the introduced population will form a subset of the source haplotypes. We confirmed that COI sequences of European E. argyropeza are identical to North American sequences already on BOLD, corroborating earlier findings with allozymes (Menken and Wiebosch Steeman, 1988). Already many cases of overlooked introductions from Europe to North America and vice versa have been found by this method (V. Nazari, pers. comm.).

Conclusion

In conclusion, COI and EF1-α are both, and preferably in combination, useful as barcoding markers for most Ectoedemia s. str. species, including some cryptic species with small genetic distances. In addition diagnostic base pair positions are helpful for identification in both genes. In species complexes a combination of both markers will usually identify the species, by distance methods or diagnostic basepairs. Whereas in the past thresholds for species delimitation were often proposed and used (Hebert et al., 2003a; DeSalle et al., 2005), this method has been highly criticised and is now only used as a first indication of the possible presence of cryptic species. For future studies we suggest using two different (independent) markers with comparable resolution, preferably from different genomes, in concord with the advice by Zakharov et al. (2009). This will make it possible to rule out artefacts and anomalies caused by one marker, and strengthen patterns when both markers show the same topology. It will also provide additional phylogenetic information when the correct methods are used. In Nepticulidae and even in Lepidoptera in general, EF1-α is a good candidate as a second marker.

René Glas and Bastian Reijnen (NCB Naturalis, Leiden) assisted with work in the molecular lab. The BOLD team at the Canadian Centre for DNA Barcoding (Guelph), particularly Rodolphe Rougerie, Megan Milton, Kara Layton and Claudia Steinke, are acknowledged for their assistance with the BOLD database and advice. Gerard van der Velde (Nijmegen, Netherlands) jointly supervised with EJvN the MSc thesis by CD that serves as basis for this paper. We are grateful to Barbara Gra­vendeel, Jan van Tol (both NCB Naturalis, Leiden), Felix Sperling (Edmonton, Canada) and two anonymous referees for comments on earlier versions of the manuscript.

Puplesis R, Robinson GS. 2000. A review of the Central and South American Nepticulidae (Lepidoptera) with special reference to Belize. Bulletin of the Natural History Museum London. Entomology 69: 3-114.

Scoble MJ. 1979. A new species of Ectoedemia Busck from Botswana with observations on its imaginal skeletal anatomy (Lepidoptera: Nepticulidae). Annals of the Transvaal Museum 32: 35-54. [available at: http://hdl.handle.net/10499/AJ1828].

Song H, Buhay JE, Whiting MF, Crandall KA. 2008. Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseudogenes are coamplified. Proceedings of the National Academy of Sciences of the United States of America 105: 13486-13491. doi: 10.1073/pnas.0803076105.

Wilkinson C. 1981. A supplement to the genus Ectoedemia Busck (Nepticulidae: Lepidoptera) in North America, dealing with some difficult species and also some new ones. Tijdschrift voor Entomologie 124: 93-110. http://biodiversitylibrary.org/page/28244986.

Appendix

List of Ectoedemia species studied, with complete nomenclature, hostplants and distribution data. For both COI and EF1-α, mean and maximum intraspecific K2P distances are provided as well as the distance between nearest neighbours (K2P model) and the name of the nearest neighbour (species). Non applicable values are given as a dash. Darker colours indicate values that are below (Max Intra-sp) or above (Distance to NN) the barcode threshold of 2%.

S3. All sequenced samples used with identification, Sample ID (= voucher registry), BOLD processed, GenBank codes, GBIF portal URL and details on Stage, Country, province, collection, note and number of traces and images present. Further images will be posted to this site at a later date. All details can be consulted on the BOLD site (http://www.barcodinglife.com/) under the public project ‘Nepticulidae - Ectoedemia – Public records [NEPEC]’.