Featured Research

Case Western Reserve University researchers have recently created statistical techniques that improve the chances of detecting a signal in large data sets. The new techniques can not only search for the "needle in the haystack" in particle physics, but have applications in discovering a new galaxy, monitoring transactions for fraud and security risk, identifying the carrier of a virulent disease among millions of people or detecting cancerous tissue in a mammogram.

Related Articles

Case faculty members Ramani Pilla and Catherine Loader from statistics and Cyrus Taylor from physics report their findings in the article, "A New Technique for Finding Needles in Haystacks: A Geometric Approach to Distinguishing between a New Source and Random Fluctuations," December 2, in the journal, Physical Review Letters.

"As haystacks of information grow ever larger--and the needles ever smaller--the search for a signal becomes increasingly difficult to find using traditional approaches. There is a need for sophisticated new statistical methods," the researchers report.

Researchers working with large amounts of data encounter the fundamental problem of determining a real signal from random variation in the data. In many practical problems, a suspected signal may only be a small blip in a noisy experimental background.

The Case team discovered a technique that is built on the principle of comparing a set of summary characteristics for any sub region of the observations with the background variation. From these characteristics, attempts are made to find small regions that appear significantly different from the background--a difference that cannot simply be attributed to random chance.

"Methods used in high-energy particle physics problems traditionally have searched for any departure from a background model; that is, anything that is not a haystack," said Pilla, the project leader. "Our method efficiently incorporates information about the type of disorder expected, thereby enabling us to find the signal of interest more accurately."

At the core of the breakthrough is the idea of posing the problem in terms of a "hypothesis-based testing" paradigm to detect statistical disorder in the data. The method further exploits the flexibility behind a long-established geometric formula in creating a technique that significantly enhances the ability to distinguish a signal.

The researchers said the challenge is two-fold: defining efficient test statistics, and determining the critical cut-off. That is, to help the scientist find what is random variation as opposed to what is the signal. The detection problem involves a large number of comparisons, and the researchers caution that experimentalists should not be fooled into false discoveries by random variation.

"The experimenter wants to control the experiment-wise error rate: if there is nothing in the data, then there must be minimal probability of falsely discovering a signal. On the other hand, we want to maximize our chance of discovering any real signal that may be present in the massive data set," said Loader.

"The probabilistic problem associated with this scenario is reduced to one of finding the areas of certain regions on the surface of high-dimensional spheres," explains Pilla.

The Case researchers then exploit the geometric methods pioneered in 1939 by Harold Hotelling and Hermann Weyl. They tested the statistical techniques by using computer simulated particle physics experiments that mimic the real experiments conducted in colliders to demonstrate that the new technique significantly increased detection probabilities.

"In high-energy particle physics and astrophysics problems, chi-square goodness-of-fit tests are widely employed, although they have relatively low power to detect the signal," notes Taylor. "Through my collaborative work with Professors Pilla and Loader, we will be able to develop powerful statistical tests for detecting a signal from noisy data with high probability, a fundamental problem encountered in many scientific disciplines."

Taylor added that "conducting experiments in a particle collider may cost tens of millions of dollars. Improving efficiency in the analysis of experimental results can lead to enormous cost savings. Furthermore, we can obtain the same results with much smaller experiments, or effectively find much smaller departures from the background model."

"Detecting a real signal (the needle) present in random and chaotic data (the haystack) will lead to scientific success," conclude the researchers.

Case Western Reserve University is among the nation's leading research institutions. Founded in 1826 and shaped by the unique merger of the Case Institute of Technology and Western Reserve University, Case is distinguished by its strengths in education, research, service, and experiential learning. Located in Cleveland, Case offers nationally recognized programs in the Arts and Sciences, Dental Medicine, Engineering, Law, Management, Medicine, Nursing, and Social Sciences. http://www.case.edu.

More From ScienceDaily

More Computers & Math News

Featured Research

Mar. 3, 2015 — By examining the forces that the segments of mosquito legs generate against a water surface, researchers have unraveled the mechanical logic that allows the mosquitoes to walk on water, which may ... full story

Mar. 3, 2015 — Major cities in the UK are falling behind their international counterparts in terms of their use of smart technologies, according to a new study. The research has found that smart cities in the UK, ... full story

Mar. 3, 2015 — To simulate chimp behavior, scientists created a computer model based on equations normally used to describe the movement of atoms and molecules in a confined space. An interdisciplinary research ... full story

Mar. 3, 2015 — Magnetic vortex structures, so-called skyrmions, could in future store and process information very efficiently. They could also be the basis for high-frequency components. For the first time, a team ... full story

Mar. 2, 2015 — The odds of picking a perfect bracket for the NCAA men's basketball March Madness championship tournament are a staggering less than one in 9.2 quintillion (that's 9,223,372,036,854,775,808), ... full story

Mar. 2, 2015 — Scientists report that they could observe experimentally the current flow along channels at the crystal surfaces of topological insulators. The channels are less than one nanometer wide and extend ... full story

Mar. 2, 2015 — Organic light emitting diodes (OLEDs), which are made from carbon-containing materials, have the potential to revolutionize future display technologies, making low-power displays so thin they'll wrap ... full story

Mar. 2, 2015 — What if one day, your computer, TV or smart phone could process data with light waves instead of an electrical current, making those devices faster, cheaper and more sustainable through less heat and ... full story

Mar. 2, 2015 — 3-D printing could become a powerful tool in customizing interventional radiology treatments to individual patient needs, with clinicians having the ability to construct devices to a specific size ... full story

Featured Videos

Forensic Holodeck Creates 3D Crime Scenes

Reuters - Innovations Video Online (Mar. 3, 2015) — A holodeck is no longer the preserve of TV sci-fi classic Star Trek, thanks to researchers from the Institute of Forensic Medicine Zurich, who have created what they say is the first system in the world to visualise the 3D data of forensic scans. Jim Drury saw it in operation.
Video provided by Reuters

Related Stories

Oct. 7, 2014 — It’s like looking for a needle in a haystack. Scientists searching for the gene or gene combination that affects even one plant or animal characteristic must sort through massive amounts of data, ... full story

Oct. 7, 2014 — The battle against AIDS cannot be won in the laboratory alone. To fight the potentially deadly virus that 34 million people are suffering from we need help from computers. Now research turns ... full story

Jan. 3, 2014 — Researchers use brute force supercomputing to identify dozens of platinum-group alloys that were previously unknown to science but could prove beneficial in a wide range of ... full story

July 17, 2013 — Neutrinos are the most elusive particles having extremely weak interactions with all other particles. They have rather unusual properties and are even expected to be identical with their own ... full story

Apr. 25, 2013 — Scientists describe novel statistical models that more broadly and deeply identify associations between bits of sequenced DNA called single nucleotide polymorphisms or SNPs and say lead to a more ... full story

ScienceDaily features breaking news and videos about the latest discoveries in health, technology, the environment, and more -- from major news services and leading universities, scientific journals, and research organizations.