Adding a Tool to GenomeSpace (version beta 1.3)

Introduction

GenomeSpace is an environment that supports the use and management of genomic analysis tools (see http://www.genomespace.org for full details). Its primary functions are to support the launching of tools, to provision data to the tools, and to transform the data files to the formats required by the different tools. The tools themselves are independent from GenomeSpace. This document describes the specification and process for adding a new tool into the GenomeSpace environment.

Types of Tools

GenomeSpace can include numerous different types of tools including web-based applications and desktop applications. If you want to add command-line tools to GenomeSpace, we recommend doing so by adding them to GenePattern (as modules) or Galaxy (as tools).

Levels of Integration

GenomeSpace can include tools at several different levels of integration. The amount of effort to integrate the tools increases with the level of integration, as does the level of service provided by GenomeSpace to the tools.

Launch

Basic Launch

GenomeSpace can launch the tool from the GenomeSpace User Interface or from other GenomeSpace tools. Once launched, the tool is independent of GenomeSpace and has no connection back to it.

Parameterized Launch

The tool's analysis features and associated parameters are available to GenomeSpace. This enables GenomeSpace to launch the tool with an analysis feature selected and an URL for a data file specified. For example, GenomeSpace can launch a web browser on GenePattern in such a way as to select a particular analysis module and provide that module with an URL to a GenomeSpace data file in an appropriate analysis parameter input field.

Data Transformation

The tool's acceptable file formats are available to GenomeSpace. GenomeSpace verifies that the file format of a selected data file is appropriate for the tool selected. If the format is not correct, GenomeSpace searches a database of transformation scripts and transform the data file to a tool-appropriate format before the download begins.

Authentication

No Authentication

The tool either does not require any authentication of the user who is running it, or the tool has its own internal authentication mechanism that is not connected to GenomeSpace authentication in any way.

Shared GS Login

The tool allows a user to authenticate to GenomeSpace and retains the user’s GenomeSpace authentication token. This allows all interactions with GenomeSpace to be done in the context of one GenomeSpace session. The tool may (or may not) also maintain a separate login to its own internal authentication scheme.

GenomeSpace Single Sign-on

The tool integrates the GenomeSpace authentication with its own internal authentication system allowing single sign-on between tools in the GenomeSpace system. For example, if a user logs into GenomeSpace, that authentication token can be used by a GenomeSpace tool to authenticate the user for that tool. Currently, this works as, essentially, "double sign-on," one token for web-based applications and another token for desktop applications.

Data Upload

No Upload

The tool cannot send a local data file or data from memory back to GenomeSpace.

Upload Data to the Data Manager

The tool can send local data to the GenomeSpace Data Manager (DM). This allows a user to save datasets they are working on to their GenomeSpace folders so that they can access it again with that or another tool (with Parameterized Launch).

Send data to other tool

The tool can launch other GenomeSpace tools and send them data. This is the same as the Parameterized Launch only with a GenomeSpace tool being the launcher instead of GenomeSpace itself.

How to add GenomeSpace support to my tool

The following sections describe the steps necessary to modify your tool to add GenomeSpace support for each of the levels of integration (above). The sections for “No Authentication" and "No Upload" are omitted as they are entirely within the purview of the tool itself and there is nothing special to be done for GenomeSpace for these at this time.

Launch

Basic Launch

All GenomeSpace tool launches are performed using HTTP URLs. This means your tool needs to be launch-able via a web URL that you can put into a web browser.

For web applications (e.g., GenePattern, Galaxy, the UCSC Genome Browser) this requires no additional work, as the URL for the web application will suffice.

For desktop applications (e.g., Cytoscape, Genomica, IGV) an URL to launch the application must be created. For Java applications, the simplest way to do this is to use Java Web Start (http://www.oracle.com/technetwork/java/javase/overview-137531.html). Java Web Start is a protocol that allows the launching of Java applications over the web. Please refer to the URL (above) for details, but to summarize, the steps required are to:

Create a .jnlp file for your application. JNLP stands for Java Network Launch Protocol. It is a small file that defines the name of your application, the main class, and the list of all libraries (.jar files) it needs to run.

Place the .jnlp file and .jar files on a web server that is publicly accessible. You can test on any web server you can see but the web server must be visible to the Internet in order to be included in GenomeSpace.

For non-Java desktop applications or Java applications that use custom classloaders that make launching via Java WebStart problematical, we suggest the use of a web start helper application.

The basic idea of using a JNLP helper application to launch non-webstart tools via GenomeSpace is:

If your tool is installed but not running, the helper uses a command line to start your tool and then uses JRAC to tell it to open on a file.

If your tool is not running or installed, the helper opens the user's default browser on the web page with installation instructions for your tool.

Using this approach avoids classloader issues by making your tool start via JNLP itself. It is, however, up to the user to install your tool using your existing installer.

For further information or assistance please contact us at gs-help@broadinstitute.org and we will help you find a solution for making a web launch-able version of your tool.

Parameterized Launch

First, all of the requirements of the Basic Launch must be satisfied.

A parameterized launch both launches a tool and sends it parameters in the URL. These parameters can specify an analysis to pre-load and, optionally, a data file as a parameter for the analysis. For example, consider the following GenePattern URL, which launches GenePattern and provides two input files as parameters (note you will need to have your browser logged into the public GenePattern to let this URL work; registration is free):

GenomeSpace requires tools to provide HTTP GET URLs to perform the parameterized launches for both web and desktop applications.

Web applications should implement an URL that allows the analysis name and one (or more) input files to be used to launch the application.

Java desktop applications should implement a web site that acts as a dynamic JNLP file generator that takes the input parameters and then creates a custom JNLP file (as in Basic Launch) to launch the tool. An example of this approach is documented at http://www.broadinstitute.org/software/igv/ControlIGV. In addition, this requires that the application itself take the parameters for the analysis and input file URLs on the command line. The application must then also be able to download and open the input URLs provided.

For non-Java desktop applications, please contact us at gs-help@broadinstitute.org and we will help you find a solution for making a web launch-able version of your tool.

Once a parameterized URL has been created, send the GenomeSpace development team the following information:

URL

Analysis name

Analysis description

Input file parameter name(s)

Input file parameter format(s)

Input file parameter description(s)

The GenomeSpace development team will test the URL and add the tool to the GenomeSpace Analysis and Tool Manager (ATM) catalog. Once it is added, it will become available in the GenomeSpace UI and GenomeSpace Client Development Kit (CDK).

Authentication

Shared GS Login

A shared GS login is required to allow a launched tool to communicate back to the GenomeSpace servers (e.g., to request additional files, launch other tools, etc).

To log into GenomeSpace, web-based applications and desktop applications must authenticate against the GenomeSpace identity server. The tool must provide a user interface element that allows the user to indicate they want to log into GenomeSpace. Typically this would be the addition of a GenomeSpace menu or its equivalent, with a Login menu item.

For Java applications, the easiest method is to use the GenomeSpace Client Development kit (CDK). Details for using the CDK to log in (and other features) can be found in the CDK Developers Guide.The CDK provides a basic login dialog (Java swing) that can be used to login. Tool developers may also opt to create tool specific login dialog and then use the login API in the GsSession object to log in.

For non-Java tools, use OpenId and the GenomeSpace OpenId Provider to login. Details can be found in OpenId Integration.

Data Upload

Send Data to the GenomeSpace Data Manager

This feature allows a tool to send data that the tool is operating on to the GenomeSpace Data Manager (DM). Data in the DM can be used again later in Parameterized Launches or can undergo Data Transformation in order to be sent to another tool.

First, the tool needs to implement UI functionality that allows the user to specify that a data file is to be sent to GenomeSpace. Typically this would be the addition of a GenomeSpace menu or its equivalent, with a Send to GenomeSpace menu item.

Data must be sent to the DM in the form of a file (i.e., a collection of bytes with a defined start and end). One aspect of this that is unique for each tool is how to decide what file is to be sent. In the case of file-oriented tools (e.g., GenePattern, Galaxy) this is straightforward. For tools that allow interactive sub-setting of data (e.g., IGV, Genomica) it may make sense to send just the current subset or the entire data set. It is up to the tool developer to choose what is appropriate for their context.

Once the appropriate data have been selected, they should be written to a file and sent to GenomeSpace. For Java applications, the easiest way is to use the GenomeSpace Client Development kit (CDK). Details for using the CDK to upload data to the DM (and other features) can be found in the CDK Developers Guide.

For non-Java tools, use the GenomeSpace REST API to upload data to the DM. Details can be found at: DM RESTful API.

Send Data to Other Tool

This feature allows a tool to send data that the tool is operating on to another GenomeSpace tool. The other GenomeSpace tool must have implemented Parameterized Launch.

As with sending data to the DM, the tool needs to implement UI functionality that allows the user to specify that data is to be sent to another tool. Typically this would be the addition of a GenomeSpace menu or its equivalent, with a Send to submenu item. This submenu would then list the tools (and possibly subanalyses) that are available for parameterized launch. The list of these tools is available from the GenomeSpace Analysis and Task Manager (ATM).

As with sending data to the DM, it must be sent in the form of a file (i.e., a collection of bytes with a defined start and end). One aspect of this that is unique for each tool is how to decide what file is to be sent. In the case of file-oriented tools (e.g., GenePattern, Galaxy) this is straightforward. For tools that allow interactive sub-setting of data (e.g., IGV, Genomica) it may make sense to send just the current subset or the entire data set. It is up to the tool developer to choose what is appropriate for their context.

Once the appropriate data have been selected, they should be written as a file and sent to GenomeSpace. Then the tool needs to obtain a parameterized launch URL for the tool with the uploaded data from the ATM. Finally the tool should point the local browser to this URL. For Java applications, the easiest way is to use the GenomeSpace Client Development kit (CDK) that includes a single method to perform all three steps in one call. Details for using the CDK to send data to other GenomeSpace tools (and other features) can be found in the CDK Developers Guide.

For non-Java tools, you can use the GenomeSpace REST API to perform these steps. Details can be found in the ATM RESTful API.

How to Make My Tool Available from the Public GenomeSpace UI

Once you have added GenomeSpace support to your tool, you can request that the GenomeSpace team add your tool to the GenomeSpace Analysis Tool Manager (ATM). To do this, send the following to the GenomeSpace support team:

A one-to-three paragraph description of your tool for the GenomeSpace website.

A contact email address.

Links (as many as you have) to help documentation, forums, webcasts, etc (for the GenomeSpace website).

An image of your logo: larger than 200x200 pixels with a transparent background in .png format (for the GenomeSpace UI to display).

One (or more) protocols/recipes describing an example GenomeSpace workflow using your tool and one or more existing GenomeSpace tools to illustrate to biologists how your tool can be used in the GenomeSpace environment (for the GenomeSpace website).

Upon receipt, your tool will be added to the ATM in the GenomeSpace test environment where our QA team, beta testers, and developers will validate the GenomeSpace functionality in your tool. Once it has been accepted, the tool will be promoted into the GenomeSpace production environment and the details you provide will be used to create a tool landing page on the GenomeSpace website, making the tool available to all GenomeSpace users.