Since then, we’ve worked with many AWS customers and APN partners to implement this solution in genomics as well as in other workloads-of-interest. Today, we wanted to highlight a new feature in Step Functions that simplifies how customers and partners can build high-throughput genomics workflows on AWS.

Step Functions now supports native integration with AWS Batch, which simplifies how you can create an AWS Batch state that submits an asynchronous job and waits for that job to finish.

Before, you needed to build a state machine building block that submitted a job to AWS Batch, and then polled and checked its execution. Now, you can just submit the job to AWS Batch using the new AWS Batch task type. Step Functions waits to proceed until the job is completed. This reduces the complexity of your state machine and makes it easier to build a genomics workflow with asynchronous AWS Batch steps.

The new integrations include support for the following API actions:

AWS Batch SubmitJob

Amazon SNS Publish

Amazon SQS SendMessage

Amazon ECS RunTask

AWS Fargate RunTask

Amazon DynamoDB

PutItem

GetItem

UpdateItem

DeleteItem

Amazon SageMaker

CreateTrainingJob

CreateTransformJob

AWS Glue

StartJobRun

You can also pass parameters to the service API. To use the new integrations, the role that you assume when running a state machine needs to have the appropriate permissions. For more information, see the AWS Step Functions Developer Guide.

Using a job status poller

In our 2017 post series, we created a job poller “pattern” with two separate Lambda functions. When the job finishes, the state machine proceeds to the next step and operates according to the necessary business logic. This is a useful pattern to manage asynchronous jobs when a direct integration is unavailable.

The steps in this building block state machine are as follows:

A job is submitted through a Lambda function.

The state machine queries the AWS Batch API for the job status in another Lambda function.

The job status is checked to see if the job has completed. If the job status equals SUCCESS, the final job status is logged. If the job status equals FAILED, the execution of the state machine ends. In all other cases, wait 30 seconds and go back to Step 2.

Both of the Submit Job and Get Job Lambda functions are available as example Lambda functions in the console. The job status poller is available in the Step Functions console as a sample project.

With Step Functions Service Integrations

With Step Functions service integrations, it is now simpler to submit and wait for an AWS Batch job, or any other supported service.

The following code block is the JSON representing the new state machine for an asynchronous batch job. If you are familiar with the AWS Batch SubmitJob API action, you may notice that the parameters are consistent with what you would see in that API call. You can also use the optional AWS Batch parameters in addition to JobDefinition, JobName, and JobQueue.

The key-value parameters passed into the workflow are mapped using Parameters.$ to the values in the job definition using the keys. Value substitutions do take place. The Docker run looks like the following:

Genomics workflow: Before and after

Overall, connectors dramatically simplify your genomics workflow. The following workflow is a simple genomics secondary analysis pipeline, which we highlighted in our original post series.

The first step aligns the sample against a reference genome. When alignment is complete, variant calling and QA metrics are calculated in two parallel steps. When variant calling is complete, variant annotation is performed. Before, our genomics workflow looked like this:

Conclusion

AWS Step Functions service integrations are a great way to simplify creating complex workflows with asynchronous steps. While we highlighted the use case with AWS Batch today, there are many other ways that healthcare and life sciences customers can use this new feature, such as with message processing.

For more information about how AWS can enable your genomics workloads, be sure to check out the AWS Genomics page.

We’ve updated the open-source project to take advantage of the new AWS Batch integration in Step Functions. You can find the changes aws-batch-genomics/tree/v2.0.0 folder.

Tags

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.