You are here

GUS/Strategies-WDK

Management of biomedical big data to enable intuitive access, visualization, and mining is an ongoing challenge. Exploiting the GUS/Strategies-WDK system, our group has successfully developed and deployed such a system in support of functional genomics data for diverse user communities, including the NIAID EuPathDB Bioinformatics Resource Center (http://eupathdb.org), the NIDDK Beta Cell Biology Consortium Genomics Resource (http://genomics.betacell.org), and the NIA NIAGADS Genomics database (www.niagads.org/genomics). Although thus far used primarily for functional genomics datasets, our system is inherently generalizable beyond omics data, including clinical records.
Our philosophy for building a web-resource

Identify relevant data types for the target community.

Define the information that describes each such data type.

Formulate meaningful data specific questions.

Facilitate discovery and insight through an intuitive interface.

Correspondingly:
In our system

Developers define records of interest.

Developers specify each record’s attributes and how to display them.

For each record type, developers specify a list of parameterizable questions.

Users build complex queries (strategies) in a graphical interface.

Using the GUS/Strategies-WDK System
We have started a major effort to update the provision and documentation of the GUS/Strategies-WDK system. Ultimately, we plan to consolidate access to the code, issue trackers, wiki and documentation at a single GitHub project. In the mean time we will use this page to provide updates, pointers to the old documentation and code repository, and provide info on current status and links to new resources.
We are planning to provide:

A virtual machine preloaded with a GUS4 database in Postrgres and a template Strategies-WDK web site linked to it.

A plug-in to load NCBI Entrez genes into GUS4 along with documentation on how to use it for loading human genes.

A plug-in to load the Gene Ontology (GO) along with documentation on how to use it and get the files. We will also provide a plugin and documentation to load GO associations.

A plug-in to load an annotated genome along with documentation on how to use it for loading the human genome.

This list will be extended and the virtual machine updated to include preloaded genes, GO, and genome.

GUS
GUS (Genomics Unified Schema) is a relational database schema that has been deployed in Oracle and PostgreSQL. The schema is modular and has been updated (GUS4) to better cover investigational studies, results from high (and not so high) throughput technologies, technology used, biological sequences, metadata and associated standards, pathways and networks, and data control. In particular, deep phenotyping (e.g epidemiological results) can be easily captured and interpreted through associations to ontology terms. Objectives, study design, protocols, and results are all linked in a manner consistent with established standards (MAGE-TAB, Ontology for Biomedical Investigations).

The GUS source is maintained in a Subversion-powered repository. You may check the code out directly using a command such as:
svn checkout https://www.cbil.upenn.edu/svn/gus/GusSchema/branches/4.0/ GusSchema

Some GUS installation documentation can be found at http://www.gusdb.org/documentation.php however we plan to replace this with new up-dated documentation either at this site or in GitHub.

Strategies-WDK
Mining the data in GUS can be performed using the Strategies-WDK (Fischer et al. Database 2011) to provide a workspace for generating, combining, saving, and sharing “strategies.” Strategies are graphical workflows of database searches of record types (such as genes, SNPs, studies) that can be selected (favorites, baskets), combined, and transformed (SNPs to genes, genes to pathways, pathways to chemical compounds). This system has proved highly popular and successful in enabling sophisticating data-mining by a diverse group of end users, and has recently been updated so as to provide the ability to browse extensive metadata (e.g., clinical epidemiology variables), inspired by the Harvest data discovery platform (Pennington et. al., JAMIA 2014). Also new is the introduction of an analysis tab for the records returned by Strategy searches. For example, a list of returned genes can be analyzed for Gene Ontology or KEGG pathway enrichment without having to leave the Strategies workspace page. Future plans include refactoring the combined GUS/ Strategies-WDK system so that this package is easier for other projects and communities to install and customize.