1. Preventing overlapping job runs through locks

This release introduces a mechanism to prevent two jobs from running at the same time. This is great
if you have for example an ETL process that needs to run as a singleton, or you have multiple jobs
that each need exclusive access to the same database.

With this feature, Dataflow Runner will acquire a lock before starting the job. Its release will
happen when:

the job has terminated (whether successfully or with failure) with the --softLock flag

the job has succeeded with the --lock flag (“hard lock”)

As the above implies, if a job were to fail and the --lock flag was used, manual cleaning of the
lock will be required.

Two strategies for storing the lock have been made available: local and distributed.

1.1 Local lock

You can leverage a local lock when launching your playbook with ./dataflow-runner run using:

However, unlike the cluster configuration tags which actually tag the EMR cluster, playbook tags don’t
have any effect in EMR.

Note that, compared with version 0.2.0 of Dataflow Runner, the playbook schema version has
changed to 1-0-1. 1-0-1 is fully backward compatible, so if you do not wish to use the tags
introduced in this release you do not have to change anything.