Yep. Starting last week, from Monday morning to Friday afternoon, working with Winston to teach him Ansible, I authored almost 1,300 lines of Ansible.

The really interesting part of this was using Ansible as a development tool with Rails. The use case in question was automating production of large scale data processing jobs. Example what these jobs did is highly proprietary but they basically had a shared architecture like this:

Execute on a developer’s workstation

Verify state of the application by calling an API server side

Change the thread count as needed for the right amount of concurrency

Clear the rails log

Set the right redis server to isolate sidekiq from other concurrently executing jobs

Clear the Sidekiq log

Make an AMI of the EC2 instance

Launch N instances of the AMI to do the needed data processing

Fill the Sidekiq queue

Display the count of items in the Sidekiq queue

There were a series of 8 different data processing jobs, 6 of which matched the above list and two of which were slightly different. Each of the stages above was represented by a small Ansible playbook and the coordination between each of the playbooks was handled by a bash script which called each of the stages in succession.

Classically Ansible is a devops tool for provisioning boxes but last week really illustrated to me the power of Ansible as a development tool. Ansible’s idempotent, state based approach of modeling the world as a succession of yaml files can definitely be funky but the model works.

Adding Status Tracking

Early Friday I realized that once these jobs were running, the developer running them would need to understand the status of the job on a highly discrete level. Historically I’ve done this by directly querying the database and just understanding the objects involved and the tables that represent them. But that comes from a huge amount of internal domain knowlege that Winston didn’t have.

This status would need to include:

Amount of data left in the sidekiq queue

Amount of records produced for each job

Sure in an idealized world this would be a pretty, graphical dashboard available on the web to all people in the company. Practically speaking, the following is sufficient:

Ansible Role

My first pass on all this had the output being listed as a jumbled mess (typical to captured output by Ansible). Winston correctly pointed out something to the effect of “Looks a bit like arse”. Well a quick google led to this StackOverflow post where this technique:

msg: “”

could be applied. In our case we had it as result so it was just a matter of replacing msg with result.stdout. And, almost magically, that jumbled mess came into razor sharp focus. My thanks to Winston for recognizing that this was an issue. I was so close to the problem that I didn’t even perceive it.

This approach was based on some internal analysis logic where we had an array of table names that we used for generating some SQL code dynamically. It took about 5 minutes to convert that list of tables into this. The table.classify.constantize call takes the name of the table and first converts it to a model name (classify) and the converts that model name to a constant that represents the class itself. Once you have a class that inherits from ActiveRecord you can then call a .where statement to get a count. Finally you inject the original table name and the count back into a hash to store the results.

Conclusion

I’ve been managing this job production process for almost two years now and most of my approaches to making it better have evolved around different rake tasks and some fairly bad internal documentation. By pulling Ansible into this, for the first time, we actually have a solution which: