API function overview

The Workflow Transformation API exposes a service that transforms workflows into research objects. Workflows are often complex data structures that embed data such as their sub-resources, annotations and provenance. The service described by this API creates a research object that exposes these data according to the RO model.

The service API allows to create a transformation job which the client can subsequently monitor. The output of the transformation job contains a list of resources aggregated in the research object that have been created based on the workflow.

API usage

Input:

t2flow workflow

RO identifier

ROSRS URI

OAuth access token

Output:

HTTP Status code: 200 OK or other

Service algorithm:

Extract the UUID from the workflow bundle, which is usually the same as the UUID of the main worfklow.

Check if the RO exists in ROSRS:

if yes, do nothing. In the future, previous workflow versions may be deleted/preserved, together with their annotations.

if no, create a new one.

Upload the workflow bundle to ROSRS, with the UUID as the resource identifier.

Generate a wfdesc description of the workflow bundle (RDF graph). This describes all workflows inside the bundle, including relations between them.

Upload the wfdesc description as workflow bundle annotation.

Generate a roevo description of the workflow bundle (RDF graph). This includes the chains of UUIDs of all workflows in the bundle.

Upload the roevo description as workflow bundle annotation.

Create a conversion job: POST /jobs/

The clients sends a transformation job parameters in a POST requests, requesting all resources to be extracted to the specified folders.

The extract key and its subkeys are optional. If no extract is given, then only the main workflow is extracted, and it will be aggregated without being added to an RO folder.

How to create the conversion job in Java using org.apache.commons.httpclient.methods.PostMethod and Form data, here only extracting main workflow and scripts:

Check job status: GET /jobs/{id}

The job status may be retrieved with a GET request to the job URI.

Job running

Job finished

When the job has finished, the resources added or a reason for the jobs' failure is indicated:

When the job has finished, the service may provide its status for an arbitrary amount time, large enough to allow clients to check that the job has finished. Retrieving the job status after that time will result in 404 Not Found.

Invalid resource

If the workflow resource is not valid, e.g. can't be found or not a supported workflow definition, the status is invalid_resource:

Job failed

If the job failed, the status is runtime_error and reason shows the error message.

Cancel a job

The service MAY support canceling a running job by sending a DELETE request.

Link relations

Creating the transformation job is done by a request to the service URI, and all other requests are done using the URI returned by the first one.

HTTP methods

The API uses a POST method to create a transformation job, GET to retrieve the status of a job and DELETE to cancel a running job.

Resources and formats

A job description is a JSON object with the following attributes:

resource: URI of the workflow that is transformed

format: MIME type of the workflow

ro: URI of the research object to which the service saves the resources

status: Job status, allowed values are: "running", "done", "failed".

token: OAuth 2.0 Bearer token of the research object owner.

Cache considerations

Cacheing can be used to when retrieving statuses of jobs that have not changed.

Discussion

Questions and answers:

Question 1: A t2flow can have many workflows (right?), each with a UUID. Do we assign each UUID to a resource (wf dcterms:identifier uuid), and assign no id to the RO itself?## Yes, but one-and-only-one of them will be the 'main' workflow - and its UUID is used for making the WorkflowBundle URI

Question 2: WorkflowBundle#getMainWorkflow#getWorkflowIdentifier returns a URI - how is it related to the UUID, why should we use UUID not this URI?## It's constructed from the t2flow UUID. However there will be both the workflow bundle ID and the workflow ID - both URIs have the same UUID from the main workflow, but different prefix. We should use the WorkflowBundle URI (that's WorkflowBundle.getGlobalBaseURI()) as the identifier for the RO, and the individual workflow's identifier for the wfdesc (the Workflow.getWorkflowIdentfier).

Question 3: Is there a javadoc for the scufl2 API? I see that the WorkflowBundleIO can save to file, but I'll need something else.## Easiest in Eclipse is to click F3 to get the source code - otherwise seehttp://mygrid.github.com/scufl2/api/0.9/

The authorization algorithm is rather weak - an access token is shared between the caller and the service. A better solution would be one of the following:

The caller sends a single-use authorization code, which is exchanged by the service for an access token. Safer but requires constant reauthorization, especially difficult for offline clients such as ro-manager

The service is considered trusted and has its own access token - currently not supported by RODL (actually I'm not sure that this one is better)

Considerations:

The translator might behave differently for different formats, like Galaxy, SCUFL2 .wfbundle, WINGS - (arguably this could come from the Content-Type of the resource, but that might only works for single-resource workflows!)

The given ro might or might not exist. RODL API should support PUT to create.

Translation might take some time, so a status is returned

Cache headers tell us

Comments from Piotr

Requests to the service don't need to be OAuth-authorized. This would make sense if the user had an account with the service and wanted a 3rd party application (i.e. RO Portal) to act on his behalf. What we need is to authorize the service to interact with RODL on user's behalf.

A typical flow to achieve the above goal would be:

The user makes an unsigned request to the service.

The service recognizes that it needs an access token, so it redirects the user to RODL User Management Application

The user logs in, accepts and is redirected again to the service (with the access token / authorization code).

The service makes a signed request to RODL.

Problems with the above:

Difficult to handle for offline clients (how should ro-manager handle a 302 response?)

Unless the service stores the access token for some time, requires constant user authorization.

I suggest to use OAuth 2.0 instead of OAuth 1.0 which is used below. OAuth 2.0 is much simpler and is supported by RODL unlike OAuth 1.0. However, it's secure only when used over HTTPS.

The API does not allow to send a workflow bundle as a request body, does it always have to be a web resource?

To cancel a job, shouldn't the request be a DELETE rather than GET?

What should be the service response if job parameters are incorrect? In particular, what if the workflow can't be downloaded?