Lab taps universe to test 'data mining'

Pretend you wanted to study the bent-double morphology galaxies in the universe. First, you would have to accomplish the not-too-small task of finding those galaxies by sifting through some 32,000 images, each containing 2 million pixels.

You had best get started tonight.

Or on Monday morning, you could call on Chandrika Kamath, who carries a reputation around Lawrence Livermore National Laboratory as something of a patent czar ­ three patents pending and three in the process of being filed.

Those patents flow from her work in data mining ­ the art of extracting useful information from, typically, mountains of computer data.

Kamath oversees the Sapphire Project, a 4-year-old program to research data mining algorithms, develop software for use by the Lab and find outside applications for the software.

She's applied Sapphire's power to the galaxy hunt and pointed astronomers to about 15,000 galaxies with bent-double features, of which about 2,500 are likely bent-double galaxies. That saves scientists with the Faint Images of Radio Sky at Twenty-cm, or FIRST, astronomical survey the eyeball-numbing task of searching through 1 million galaxies.

The FIRST application lasted two years and was Sapphire's first real world test. Kamath continues to refine Sapphire, gaining insights from each new application of the technology.

"It is tailored to scientific data. The focus is on large data sets," she said, explaining that part of what makes Sapphire unique is the amount of data it can sift through ­ terabytes, or a thousand billion bytes ­ and its ability to handle extremely complex data.

"We are end-to-end," she said, explaining that other data mining systems may do the pattern recognition once data has been pre-processed and features to be searched for have been identified.

Kamath has turned Sapphire from looking at the universe to looking at Earth ­ specifically, the planet's temperature. Her team is looking at global warming and trying to isolate the effects of volcanic eruptions and El Nino from long-term changes in the Earth's climate that can be attributed to other factors.

In the commercial realm, Kamath explained, the data mining technology could be used to detect credit card fraud, and in customer relationship management applications to spot customers who are likely to stop using a product or service, she said. It also has applications related to medical imaging.

Though there is commercial potential, Kamath points out that data mining in the scientific world has unique issues. For example, the need for a high degree of accuracy and precision in scientific work is greater than in commercial data mining. Also, scientific data is often in the form of images, making it difficult to extract features that are the foundation of the data mining. One of the key aspects of Kamath's work is Sapphire's pre-processing of data through what is called dimension reduction, a process to reduce the number of features used to identify an object.

Kamath hopes to reach a point where the software can be licensed, though she says it would have to be in conjunction with consulting work because the technology must be customized for any given task.

Sapphire's technology is not being used commercially today, but, Kamath said, a couple of people expressed interest in the software following a presentation at a Tri-Valley technology seminar in May.