Research Catalysis: Data-Driven Analysis of Lepidopteran Diversity and Associations

In crafting the LepNet narrative, we have emphasized arguments related to the overall mission of the ADBC program (Sections 1 & 3), where the need to digitize major insect/herbivore lineages is disproportionately urgent, and in particular our community’s substantive prior efforts (e.g., SCAN) and infrastructural preparedness (technology, personnel, practices) that demonstrate feasibility of LepNet’s ambitious digitization and outreach objectives (Sections 4-7, 9). Because LepNet is fundamentally lineage-based, we opt to refrain from positioning our TCN goals mostly or even exclusively in relation to a limited set of specific research hypotheses. Nevertheless, by providing unprecedented access to voucher-based lepidopteran data, LepNet will be uniquely positioned to facilitate derivative research advancing our understanding of wide-ranging factors that influence the evolution and ecology of Lepidoptera and plant-herbivore interactions. Specifically, LepNet-based data will facilitate comparative studies of the evolution and ecology of Lepidoptera, including phenological change [14, 66],conservation ecology [12, 67], and biogeography[68]. These data will also document responses toclimate change [69]and habitat destruction [70]. These analyses will benefit from broad and deep data structures across this megadiverse and biologically impactful order. They will therefore allow for comparisons within and among taxonomic and ecological subgroups, across regions, and encompassing time scales ranging from the mid 1800s to the present. LepNet’s extensive georeferenced and imaged information environment – totaling more than 3 million records – will allow for subsequent high-throughput ecological niche modeling (Figure 5), quantifications of phenotypic variation both within and among taxa and regions, and morphological character extraction for phylogenetic analyses, taxon descriptions, and phenotype annotations in this highly visible group [71].

We have established a research working group within LepNet, which includes numerous scientists and collaborative projects with world-class expertise and publication records in lepidopteran biology and diversity [17, 41, 72-77]. We will extend this membership to engage a range of ecologists and evolutionary biologists in developing and following through with specific hypothesis-driven studies. To this end we will offer virtual seminars to discuss research topics, and host a workshop at the beginning of the fourth year, with a subset of LepNet collaborators and invited evolutionary ecologists, biogeographers, and informatics experts, to finalize further research applications of LepNet data.

Our digitization priorities (Section 5) directly respond to the need to translate LepNet’s outcomes into novel, data-driven inferences (Figure 5). Taxa prioritized for digitization are not just “on hand” but span across the evolutionary/ecological diversity of the entire order, from rare and highly specialized groups (e.g., Eriocraniidae) to nocturnal pollinators (Sphingidae) and densely documented butterfly species. Thus, diverse research questions that can materialize based on LepNet’s data include theevolution of host-plant use [72, 74, 78, 79]and niche conservatism, hybridization [19], color pattern evolution [80]across largephylogenetic scales [81], and identification of cryptic species and mimicry complexes [20, 82-86]. LepNet will both catalyze such question-driven analyses and motivate future specimen-based research to address newly evident gaps, for instance in the context of correlating species’ responses to recent environmental and land-use change [87].

The taxonomic imbalance in our national digitization legacy is dramatic, and there are reasons to expect that data from organisms with unique and important biological features will influence and even change existing paradigms. In comparison to insect herbivores, most plant species have large numbers of occurrence records; e.g., iDigBio and GBIF serve 6.7 and 12.1 million plant records respectively for North America, and the Forest Inventory & Analysis program provides 39 million occurrences for U.S. tree species. No comparable herbivore datasets of this kind have been recorded, and these are needed in order to integrate with host plant data, and link occurrence data with the corresponding knowledge of host plants [30]. The LepNet lead informatics team will pursue full inter-portal integration as a longer-term research facilitation strategy (e.g., through an NSF-ABI proposal planned for 2016).

In summary, LepNet will provide access to more than 3 million “research ready” specimen records that include species-level identifications and reliable latitude-longitude coordinates,and more than 255,000 specimen habitus images [88]. Only 49% of the 300,000 lepidopteran records currently in SCAN have both species identifications and geographic coordinates, underscoring the challenge of producing fully research-ready data for this lineage and scale. In combination with observation data from other portals and existing specimen data, we will exponentially increase the number of species that can be used for ecological niche modeling (Figure 5) [89]. By 2020, we expect to have at least 5,000 species, each with > 30 occurrences.

LepNet (via Symbiota), either natively or through standard-compliant and API-facilitated data output options, has the capacity to produce custom data sets for research projects that emerge from LepNet. For instance, we can model the distributions of 1,000 species per day [90]) [91]and climate niche space (see NAU facilities). All lepidopteran data on iDigBio and GBIF (including observations) will also be available via the portal, allowing researchers to take advantage of the platform’s visualization and data manipulation tools. The GIS-based habitat suitability maps and estimates of climate niche space will help visualize precise species distributions [92]and aid in selecting species of interest for further research [93]. We will provide overlap statistics for multiple species [94], including host plant species. We will use output from modeling to also create ecological filters, including geographic range and flight times for LepSnap to distinguish similar-looking species (Figure 5, inset). Researchers can also use other modeling programs interchangeably (e.g., LifeMapper, BIOMOD2). Because a major focus of the recently funded ButterflyNet project is modeling butterfly species ranges and niche space, we will provide a data pipeline between LepNet and ButterflyNet and focus on custom moth LepNet datasets for ecological niche modeling. LepNet co-PIs Kawahara and Pierce are also co-PIs on ButterflyNet.

Figure 5. Number of species that can be used for ecological niche modeling. We expect at least 5,000 species will have more >30 occurrences by 2020 (i.e., sufficient for modeling). Inset shows distribution of two allopatric moths, one diurnal and one nocturnal, which might hybridize in response to climate change.