Sockeye Version 1.0 User’s Guide

This document describes some of the advanced details of Sockeye’s design and usage. For "How-to" documentation, FAQs, and the latest version of Sockeye’s help documents, visit http://www.bcgsc.ca/gc/bomge/sockeye/docs, Sockeye Documentation.

Importing custom annotation

For individual users that need to visualize information not belonging to either the genomes or annotations offered by EnsEMBL, Sockeye allows a user to import custom annotations from their file system. Sockeye currently supports GFF format files (version 2). Individual users are allowed to preview a summary of their data before importing it. The user is able to import GFF data into an existing EnsEMBL sequence track or create an entirely new sequence track.

The architectural design for importing user annotation allows for the rapid expansion of Sockeye to other file formats. Expanding Sockeye to other file formats requires a developer to specify the newly supported file extension and write a method that extracts TrackFeature objects from the file’s annotation format.

Navigation

Using the navigation toolbar, a user can zoom in and out of regions on a sequence track to either expand their view of particular sequence characteristics or to reduce it. This mechanism allow a user of Sockeye to easily move from whole chromosomal views to single nucleotide views. Additionally, the user has the option of moving their view along a sequence track in either the 3’ or 5’ direction. These features allow the user to explore a chromosome or sequence contig in the context of previously loaded data and without requiring the entry of new sequence coordinates.

Navigation in Sockeye is also designed to be feature-centric. A user can select an annotation on the sequence track and center it in the 3D viewport. This is extended to allow a user to visually align annotation from various sequence tracks.

Sockeye’s navigation mechanisms have been designed in such a way that a new query is performed only if necessary.

3D Genomic Annotation

The varieties of genomic annotations that are imported into Sockeye are displayed in 3D by mapping their name to a stored VRML (Virtual Reality Modelling Language) file. We use VRML to specify the 3D models for individual annotation objects; however, our VRML usage restricts the usage of environment modifying commands which would adversely affect the 3D viewport (Web3D Consortium). This design allows a user to import new annotations with their own custom 3D models. The added flexibility in this design has allowed us to import new annotations without having to invest in directly programming new 3D models. Instead, these models are handwritten in VRML or generated with any simple 3D modelling program that supports VRML.

Sockeye comes with a model directory that contains pre-built VRML objects like rectangles, cones, spheres, and cylinders. This has allowed us to rapidly visualize the display characteristics of new annotations before designing more complicated models to represent them.

In some situations, sequence tracks become saturated with annotation. This is predominantly the case when a user imports an annotation file containing thousands of entries. Sockeye allows a user to handle this by using a 3D distribution object. This object shows how the scores associated with a particular type of annotation changes over the sequence. A user has to create GFF tables with feature count information if they want to display the abundance of a certain type of prediction. We have observed that when using this object to display 3000 annotations we only use 1/25 of the memory that is required to view these annotations as individual objects.

Common annotations extracted from EnsEMBL are frequently endowed with functionality in Sockeye that will aid a user’s exploratory analysis. As mentioned, genes with multiple transcripts are marked with spheres above the first exon. Genes also have the embedded functionality of being able to extract their known orthologues to new sequence tracks. These orthologues are derived from reciprocal BLAST analysis and served from the EnsEMBL compara database This feature aids a user who wants to perform cross-species comparisons of specific genes. All additional functionality embedded in an annotation is available by right-clicking on the object in the viewport.

Extracting genomic sequence

Sockeye allows the user to extract primary sequence information. To retrieve sequence from any EnsEMBL sequence track, region or annotation a user must right-click on it in the 3D viewport and select "Retrieve Sequence". Alternately, this action can be performed from the sequence track descriptions in the sequence track selection tree. A sequence panel appears that allows the user to view primary sequence. The user can export the entire sequence or a selected region of the sequence to a FASTA file.

Configuring Sockeye with XML

Sockeye is configurable through XML start-up files. In these start-up files, we use XML to specify everything from annotation information to database connection parameters. The most frequent use of Sockeye's configuration files is to add new information specifying a genomic annotation. Our XML specification for features allows a user to map an annotation name (like "gene") to a VRML model and a colour. The user can also specify how thick the model should be on the track, how close it should be to the centre of the track, its transparency setting, and whether it resides on the track or floats above. This allows a user to rapidly define the visual characteristics of new data without having to generate new Java code. Furthermore, our XML feature specification is also used to dynamically create the feature selection tree in Sockeye. This allows a user to arbitrarily specify groups of related annotation objects and, without any coding, have them appear in the GUI. In Sockeye, we use this feature to group certain types of annotation objects into classes like "Variation," which contain the SNP selection boxes.

Our XML specification also holds database and website connectivity information. All EnsEMBL database servers that are offered to a user have their connection parameters defined in XML. A user is able to add a new EnsEMBL server by adding a new connection tag in the XML configuration file. Each external web linkage offered in Sockeye is also defined in XML. When a new pop-up tag is defined, the user will be able to connect to that web resource for the specified annotation objects.

A predominant goal of Sockeye development has been to allow the user to control their 3D environment and the 3D characteristics of the new data they would like to add to their installation. XML configuration files have provided the user with this operability and reduce the dependency on code modification when adding custom annotation.

Saving and Loading Sessions

When analyzing complex sets of data from divergent sources, it is onerous to repeatedly request and assemble the information. Sockeye provides facilities for saving and loading data once it has been aggregated. We use XML to store all currently loaded annotation and its state of visibility. The advantage of this is that the user needs to only set-up their data once and save it. This allows the user to review the state of their last session independently of whether they are still connected to the original sources of the data. Furthermore, this allows users of Sockeye to share their data and viewing arrangements.