Pages

Tuesday, 6 May 2014

We like good documentation, but why don't we like to write it?

'...The problem with Software A is that the documentation is really
lacking. Look - there's no reference for these commands, and the online
reference is outdated. I don't think the product is mature.'

'Yeah,
it takes too long to work out how to do something. Have a look at
Product B, the features are similar but they've got tutorial on common
tasks and online docs are much better.'

When
we're evaluating a new piece of software, this type of conversation is
common. But when we need to create documentation ourselves the
conversation is quite different.

'Mate, I've got a customer who want to configure a new public facing web server in a DMZ. Is there documentation for this?''Well,
what you have to do is talk to Networks and get them to set it up.
Actually no, they need information about customer VLANs and new server
needs to be in the same ip range as their other stuff'

'Ok, where are the ip ranges documented? And how does it work behind the load balancer?''For
the ip ranges, there's a document on the 'S drive' but I can't remember
exactly where it is. Not sure about the load balancer, check with John
he set it up. He might have some notes somewhere...'

The
attitudes towards internal documentation and what we expect from third
parties couldn't be more different. The key impacts are

Takes longer to resolve incidents on average
- For example, if a website is becomes unavailable when a load balancer
is failed over, it's probably down to missing configuration somewhere.
But if the engineer doesn't know or can't remember how the website was
provisioned, she will have to work out from scratch how to provision a
web site.

Changes take longer - modifying or improving services takes longer because you have to repeatedly determine the specifics of how a service works

Changes are more likely to cause incidents
- In the real world, an engineer has a limited amount of time to
determine the possible effects of the change. With lacking documentation
about the service, she is less likely to understand fully what the
change will do and therefore more likely to cause an unintended effects.

Engineers time is wasted repeatedly explaining the same thing
- In a team, one person may have to repeat an explanation to different
engineers as they require it. But then people often forget, especially
if they don't do that thing frequently so they'll have to ask again,
consuming time from both engineers and delaying work.

So why don't we make documentation better?

Too much focus on resolving incidents instead of preventing them
- Live incidents get a lot of attention and once they're resolved it's
on to the next thing. But is this incident a one-off or has it happened
before? Wouldn't writing the solution down be useful to the next
engineer? Was the incident a result of a change that wasn't implemented
correctly because the implementer didn't know exactly what to do? If we
start trying to prevent incidents, the value of documentation becomes
more obvious.

A belief that memorizing details is the way to increase knowledge
- Some are of the opinion that with more experience, an engineer should
be able to remember enough about the environment to manage it
effectively. This approach simply doesn't scale well. Once an engineer
stops working on something for a while, the details begins to slip away.
How many times have you come back to a script or service you built and
struggled to remember how it worked?

The cost of poor documentation is not obvious
- The biggest effect is loss of productivity - less work takes more
time. But the average team probably doesn't track how long tasks take.
Lack of weekly review of incidents and changes tends to hide the fact
the people are spending too much time on tasks that could be much
quicker.

Lack of skills to solve non-technical problems
- Most of engineers are geared to technically analyze issues. But there
isn't much training or focus on how to identify and resolve
non-technical challenges like knowledge sharing or incident prevention.
Companies end up recruiting technically proficient teams but don't
recruit people who can see the non-technical components.

IMHO, the fact that many engineers don't like to
write docs is not the real problem. The real issue is that in the
culture of IT Operations, there's isn't a strong understanding of how
capturing knowledge helps make us better engineers.