Serverless and Step-Functions at DAZN

For the past year, my client with YLD has been DAZN — a hot-off-the-press sports streaming web service, esteemed by many as the ‘Netflix for Sports’. This has been a fascinating project, and has presented several exciting challenges which have required innovative solutions.

In this article, I’m going to walk you through the problem of offering a user multiple payment providers, each with their own nuances, in a seamless user subscription process.

The Problem

We want to offer valued customers many ways of paying for DAZN, so that they can subscribe in a way they feel most comfortable, familiar, and secure with. However, this means we need to display a single user-interface, while wiring up different payment providers behind the scenes.

How do we handle each payment provider separately, whilst maintaining the same UX flow? And how do we do all this while setting up the user’s subscription to DAZN?

The Solution

Enter AWS Step Functions

We use AWS Lambdas to create the user’s subscription with each payment provider. To coordinate the message flow through these Lambdas, we use Step Functions. If you haven’t tried AWS Step Functions, I would thoroughly recommend doing so.

We configure and provision our Lambdas using Serverless, a YAML based configuration file which provisions the AWS Lambdas and other elements we require for our workflow.

Thankfully, Serverless also supports Step Functions, so in order to implement these we need only add to our serverless.yml file. Step Functions’ job is essentially to run the Lambdas in the configured order.

We begin with a few steps surrounding the user’s subscription; Is the email taken? Does the user already have an active subscription? And so on…

Once we have confirmed they are a legitimate user wanting to sign up, we can start to think about payments. 💸

From here, we can determine what kind of payment the user would like to make.

AWS Step Functions has another handy feature to help us here:

Choices

Choices are basically equivalent to a ‘switch’ statement. Based on the value of a variable, do something different. For example, if the value of the variable containing the user’s payment type is equal to ‘PayPal’, we will set the next ‘step’ to be the PayPal Lambda, which does everything specific to setting up recurring payments in PayPal.

After that, we are free to complete any finishing touches on the user’s subscription, and the user is off watching DAZN!

Nice one 🎉 Sort of, buuuuut…

Errors

Oh yes. How could we forget. We’ve only considered the ‘happy path’, what happens if something goes wrong?

Here I’ll cover a few different error scenarios, and some ideas we’ve had to prevent, or handle them.

Timeouts et al.

We’re calling many external APIs, connections are going to be slow, lost, or reset from time to time.

On each ‘step’ you can specify the number of times to retry, and an exponential backoff strategy. This won’t affect your flow — it will simply retry the Lambda n times (including an exponential backoff strategy), and continue to the next step after a successful attempt. It also allows you to handle different kinds of errors differently, like so:

Permanent failures

Sometimes, a user might attempt to subscribe with a card that has no money, or a fraudulent card. In this case, we don’t want a subscription to be created, we just want to tell the user that they will need to use a different payment method.

In this case, we have a Lambda which notifies the appropriate services (i.e. via SNS) that Dr Bloggs hasn’t provided a valid payment method, and so can’t watch DAZN. 😭

Weird unknown failures

Did somebody say ‘Edge case’!?

It goes without saying that we will endeavour to cover every possible error scenario documented. However, sometimes provider x returns error code xyz, which isn’t documented (*shakes fist* ✊).

{ success: false, code: 42, message: "This isn’t gonna work"}

So what do we do here? Retry? Maybe… But it might be an error which will never be fixed (see: Permanent Failures). So actually in this case, the system doesn’t know what to do, and neither do we. Therefore some human intervention is required!

In this case we can use an ‘unknown errors’ step to notify the appropriate people that we need to take action on a user’s subscription. Once we understand the problem, we can update the service and handle that error within the system correctly.

The Final Step

It’s pretty easy to get started with, and the documentation is good too.

It’s certainly not a tool that needs to be used in every service, and can be fairly pricey if you implement something complex — such as a recursive function. As you not only pay for the Lambda usage, but also you pay for each step transition.

Using Step Functions at DAZN has simplified our system, and enhanced our ability to debug and resolve issues enormously!