Scientists created a cheap, accurate way to identify insects (and wrote a great big data explainer in the process)

Researchers from the University of California, Riverside, have developed a method for classifying insects that they say blows away all previous methods in terms of accuracy, speed and practicality. The keys to their success: off-the-shelf laser pointers and a thorough understanding of how to use big data.

The paper, which is available for download here, reads like how-to guide for applied big data — even for people unfamiliar with the mathematical and statistical concepts involved. The authors explain how they gathered the data they gathered, why more data matters and how it helped improve the accuracy of their model. They describe clearly the type of model they used — a Bayesian classifier — as well the effects of adding or removing features to aid in classification, and how it compares both in terms of performance and flexibility with other approaches.

Of course, the research — which focused largely on mosquitoes and flies — is potentially very useful, too. Here’s the short version.

Decades of previous research into insect classification, the authors explained, have relied on microphones to capture the sounds insects make when they fly by. Unfortunately, microphones capture so much ambient noise that unless an insect flies within the ideal distance of the microphone under ideal conditions, it can be difficult to capture useful data. Small datasets, combined with sometimes very unnatural conditions in order to maximize data collection, can result in predictive models that prove less accurate once they’re applied to new data that wasn’t part of the study (a result generally referred to as overfitting).