Menu

Effective Automation

Automation is a big deal right now and whilst an important part of the relatively new DevOps culture it’s actually been an integral part of many, if not all IT roles for a long long time. Automation done well provides many benefits but it isn’t a straightforward process that can be simply rushed into, there’s careful considerations to make without which can cause automation projects to not be as effective as what might have been initially envisaged. What we’re saying here is that there is a big difference between automation and effective automation.

Perhaps the biggest factor in approaching task automation is the net benefit which is often measured in terms of time spent vs time saved. As a simple example; if you spend 3 days working on an automation project that will ultimately save you 5 minutes per month then clearly that isn’t as effective as spending the same time on a project that’s going to save you a couple of days of effort over a monthly period.

Then there’s task priority; are we automating to relieve a particular pain point or offsetting a critical task to an automated solution? Automating a task such as a critical maintenance activity (clue in the name) is much more vital to say producing a daily report that falls into the category of nice to have. That isn’t to say it’s being totally ruled out, just that it’s not going to have immediate attention, there’s more important things to focus on. Automating a restore/consistency check process is a great example of this; done manually it can be an incredibly time consuming task and in terms of priority it’s absolutely essential for housekeeping and auditing reasons, this is a prime candidate for automation.

There is though much more to automation than, well, the automation itself so at the same time you have to consider some additional bits and pieces. What happens if the process breaks, could it interrupt other processes, how are you even going to be able to tell if’s working correctly? Whilst one of the aims of automation is to reduce, or indeed eliminate human error its important to remember that human error mainly occurs at the design phase, inevitably a poorly designed automated process will have just the same amount of success, or lack of, as a manual one.

This monitoring becomes a key component that requires some serious consideration. Let’s say you implement a new process that has greatly reduced the time spent on a daily task but to check that things are working as expected you must then spend significant time towards monitoring the solution. In terms of net benefit again, this just isn’t an effective use of time.

One example I often use is checking SQL Agent job history for failures. We could use an approach of having each production server email the status of each job at a set time rather than manually opening the job history each morning to check for errors, it might be quicker but by how much (net benefit, again) and is there a better way to do it? Opening each mail and scanning for failures is still going to be time consuming. Perhaps we can alter the process to only report on failures then we don’t have to manually filter results anymore but that’s still slower than having a centralised process check each server and then consolidate all failures into one single email.

But should we leave it there? Despite being automated the process could still contain manual follow-up tasks. Imagine a DBA receiving the job history but then has to forward failures on to the relevant audience who are tasked with fixing the issue, this type of conversation happens all the time:

How long winded was that?!?! Seriously, just writing it was painful enough but yet I see this sort of thing over and over again. In this case we may have automated “part” of a process, in this case capturing information but overall the entire process is still in need of some tuning. Using this example; could we add Person 2 or their team to a failure notification in the first place? A response process handled (and acted upon) correctly achieves exactly everything that the long winded conversation does, but is obviously much quicker, is at the exact time of the failure and actually overall, is quite a straightforward change to do. Would this replace a overall report? Probably not but it does highlight how taking a holistic view of any process is an important piece of the approach.

When approaching automation there’s lots to consider; if it’s actually going to be beneficial, what other components are involved and how should we approach the testing, measuring and monitoring of a proposed solution. It might seem counter-productive that whilst we’re often working towards an overall time saving we’re also making implementations seem a lot more complex, but actually this isn’t the case. Stepping back and asking some key questions can make a huge difference, improving the benefits and making our automation much more of an effective solution.