Current Knowledge Of Lepidoptera Genomes And Future Directions

Science is often advanced with the development of new technologies. Since the sequencing of the first human genome, there has been much progress made in DNA sequencing technologies. We now have the ability to sequence complete genomes for a relatively low cost and much of the analyses can be done within a small research group. As a result, genomes are being sequenced across multiple taxonomic groups and research in genetics is quickly moving to a genomic scale. Studies that were once done with a few genetic markers are now using data from complete genomes, an approach which expands the scope of scientific questions that can be addressed.

To highlight the current state of genome sequencing within the Arthropoda, the journal Current Opinion in Insect Science published an issue dedicated to reviews of selected insect taxa. This series of articles focused on available genome sequences and future work necessary to accelerate the use of genomic technologies in entomological research. One particular review covered research within the Lepidoptera, the insect order comprised of butterflies and moths. This article not only reviews the current state of Lepidoptera genome sequencing but also emphasizes future challenges, including suggestions for storing and distributing genomic data to the arthropod research community.

The Lepidoptera (butterflies and moths) is one of the most ecologically diverse insect orders with more than 157,000 species described in 43 superfamilies. Most Lepidoptera belong to the taxonomic grouping of Ditrysia, which contains approximately 98% of the described species (Figure 1). Their genomes are relatively small in size (~200 – 800MB or 1⁄4 of the human genome) and lack structural complexity. However, there are < 80 species among <10 superfamilies whose genomes have been sequenced and assembled, with most belonging to butterfly, and a few moth families (Figure 1).

Figure 1. Phylogeny of Lepidoptera showing relationships among the major superfamilies and the number of assembled genomes (modified from Mitter et al. [4]). Orange highlights indicate superfamilies with at least one genome with a functional gene annotation; yellow indicates superfamilies with only a single genome and no functional annotation. The graph on the upper left shows the number of annotated genomes published per year since 2008. Republished with permission from Elsevier from https://doi.org/10.1016/j.cois.2017.12.004.

The growth of genome sequencing has led to larger phylogenomic datasets but with many Lepidoptera families lacking complete genome assemblies, truly robust datasets cannot be compiled. Similarly, the function of many genes, particularly among insects, remains untested, though novel gene editing technologies are emerging quickly. The community support for Lepidoptera genomics is growing with better management and dissemination of data. It would benefit still from more consistent database standardization and additional genome sequences that are more evenly distributed throughout the group.

One central repository for Lepidoptera genomes is lepbase, which provides associated assembly statistics and gene annotations [1]. Platforms such as the i5k Workspace@NAL [2] hosts arthropod genomes and provide analytical assistance for users with limited bioinformatic experience. There are a number of other valuable databases available and often users need to search multiple sources to find a genome assembly of interest. Further complications include sequencing projects occurring in parallel without researchers being aware of related work. To avoid potential conflicts, it is recommended that members of lepbase or i5k be informed of genome sequencing projects to keep the community updated.

As to long-term data storage, it is good practice to archive completed or draft genome assemblies within the National Center for Biotechnology Information site (NCBI) upon completion to ensure that the data are screened and assigned an accession number for reference. It can be difficult to determine when a genome is “complete” and several versions of a single species’ genome can be released at different draft stages, which often makes comparisons difficult. With an assigned accession number, if improvements are made to a released genome, users can archive different versions of the same genome sequence and ensure downstream analyses are completed on a standardized set of genomic data.

The first Lepidoptera genome sequenced was the domesticated silkworm, Bombyx mori [3], a model species important for commercial silk production. Since then, the majority of Lepidoptera genomes have been sequenced within the past 5 years (Figure 1) and continues to grow as sequencing costs decrease and sequencing technologies improve. Broader sampling across major phylogenetic lineages is needed for the field of Lepidoptera genomics to move forward. Moreover, scientists should continue to make genomes publicly available along with metadata describing the assembly process while noting any limitations so they can be used more efficiently.

Never miss a story

Modern Iraq is a relatively dry country faced with mounting water challenges. Ancient Southern Iraq, commonly referred to as Southern Mesopotamia, was also the home of some of the world’s first cities, which influenced many later societies through the development […]

Safe medical treatments are crucial tools in the fight against human diseases. Ensuring that medical treatments are safe inevitably requires investigating their effects on our most basic building blocks: cells. While using animals for research is useful and often necessary, […]

Bread is a typical staple food of the human diet whose consumption is rapidly growing. Whole wheat bread is gaining popularity since it contains a good amount of health-beneficial fiber and minerals. Moreover, the bread contains a low amount of […]

How Crohn’s disease develops is still not really understood today. However, Swedish researchers say that a leaking intestinal wall could play an important role. When they have a relapse, their lives often just happen between bed and toilet. People who […]

Wine aroma is arguably the most important intrinsic factor used to judge wine quality. The perception of wine aroma is the result of a number of different factors including the composition of volatile compounds in the wine, the perceptual interactions […]

Science Trends connects scientists and their research with a global audience.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Accept | Read More

Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are as essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are as essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.