One of our customers wanted to setup a time based job scheduling system, similar to cron, to reliably schedule critical tasks. They were already using sidekiq, to run some tasks in the background.

To improve fault tolerance, we tried running sidekiq-schedular (a popular scheduling library) in a distributed setup. But we found synchronization issues1 that could lead to jobs being scheduled multiple times under some scenarios.

We had to understand some finer aspects of distributed systems, to get a better understanding of how processes synchronize using shared memory. We plan to write about some of these understandings in subsequent notes.

About Sidekiq and Exq

Sidekiq is a job processing library. It has two components, a client and worker. It uses a redis LIST as storage. A job instruction is a json object complying with a schema. A sidekiq client creates a job instruction in the specified schema & pushes it to a queue (redis LIST). A sidekiq worker listening to the queue receives this instruction and performs a related task.

client and worker can be written in any language, as long as they work with the same schema, as implemented by sidekiq. Exq is an elixir library, which complies with the same storage schema. In other words, you could use sidekiq client with exq worker, or visa versa, or even use exq for both client and worker.

exq-scheduler is a sidekiq client that enqueues job instructions as per a time schedule.

API consistent with sidekiq and sidekiq-scheduler.
This enables any compatible job runner to pick up instructions.
Also allows use of sidekiq and sidekiq-scheduler’s web UI.

Deploy with existing Exq workers without major changes in deployment setup.

Stability

exq-scheduler is currently being used in production with one of our customers. The library is backed by tests which introduces various faults and verifies that invariants are always maintained.

Issues were caused by lack of determinism when building a unique key for scheduled tasks and not handling edge cases, which account for failure of scheduler, after acquiring a lock. https://github.com/moove-it/sidekiq-scheduler/issues/156, https://github.com/moove-it/sidekiq-scheduler/issues/181 ↩