Hip Hop Offers Lessons on Life Science Data Integration

BySalvatore Salamone

Feb 15, 2006 | Hip-hop artists often combine sections of several songs to create a new piece of music. The technique is known as a mashup, since it mashes together disparate sounds from different sources into one recording.

A similar mashup technique is now getting the attention of scientists as a way to quickly bring together disparate informatics, biological, chemical, and imaging information when conducting research.

The idea behind mashups is simple: use information that is available on the Web or in company databases and some relatively simple programming techniques to combine data so that they potentially offer more insight into a problem than when the data are kept or viewed separately.

The idea of aggregating data in this way is not new. But what is drawing attention to mashups these days is that increasingly public databases are making their contents available in formats that make it easier to aggregate. At the same time, some programming aids and utilities are making it easier for nontechnical people to pull together these data.

Over the past six months, mashups have been getting a lot of publicity mainly because of Google, which offers an API (application programming interface) that makes it relatively easy to overlay geographical data on a map. Essentially, with Google Maps data are displayed as a virtual stickpin on a map.

A July 2005 BusinessWeek article noted that many people were using Google Maps mashups to pull together data as varied as real-estate listings to neighborhood crime statistics.

This technique was seized upon last November, when the most recent list of the world’s most powerful supercomputers was announced at the SC05 conference in Seattle (see Supercomputing Show, Dec./Jan. 2006 Bio•IT World, page 36). At that time, the Top500.org published its traditional top 500 list, but the group also created an interactive map displaying the location of the world’s 100 most powerful computer systems. Moving a cursor over the stickpins on the map produces a bubble with information about the particular computer installation.

The ability to display data in this manner has many life science applications. A January article in Nature* noted that this technique could be used to track the progression of an infectious disease or study global health and disease patterns. To emphasize this point, Nature created its own mashup tracking avian-flu outbreaks by combining information from the World Health Organization and the U.N. Food and Agriculture Organization into a Google map.

The article also noted that mashups are not limited to just aggregating geographical data onto maps. It noted that the data in many life science databases such as GenBank are easily accessible and could be combined with other information.

An example cited was of the mashup iSpecies.org. Upon entering a species into what looks like a regular query search line, the mashup returns a page with NCBI genomics information, Yahoo images of the species, and articles culled from Google Scholar.

Bright FutureA limiting factor to using mashups has been that much of the data in public database is not machine-readable. Typically, a person has to manually cut and paste data from a Web site for it to be used by another application. This approach will not work with a mashup.

Some sites are addressing this problem (and not just for the sake of mashups) by enhancing the way data are accessed. For example, many sites are moving from traditional command line interfaces and onscreen queries to exposing a site’s data to applications via a Web services interface.

Another approach that would greatly expand the amount of data available for mashups and other applications would be to use Semantic Web technology such as RDF. Sites that publish their data in RDF format make that data computer-readable. This makes the data easier to find, search, save, and access and as such, makes it easier to incorporate that data into a mashup and other application.

The combination of new tools such as the Google Maps API and increased adoption of Web services and Semantic Web will give researchers new ways to view and aggregate their data in the coming year.