Hail the Heroes!

If you were asked to identify the biggest killer of productivity for a software
engineer, what would you say it is?

You might say “email” or “drop in managers” but those two things have the same
root cause. Disruptions! : The mother of all flow killing work place
activities.

Disruptions can obliterate an engineer’s productivity. After each distraction is
over it takes about 23 minutes to refocus on the task that was at hand1. This
leads to the average developer having only have two hours of disruption-free time per day2.

We all know how this feels — even after a long day. You keep wondering, “What
did even I do today?” If you find yourself asking that question a lot, I have
great news for you. You need a hero.

Note: This is the post version of the talk given at Denver Startup Week 2016.
See this post for
the slides and information on the talk.

A little background

Before I answer what a hero is and how it can help you, let me give some
background on where we started. I lead the SRE team at
InVision and we were plagued with disruptions.
Understandably, we had many days in which we didn’t know “where the time went.”
We needed to remove these disruptions so that we could build and maintain the
platform that the entire company depended on.

What was disrupting us? After a quick and informal survey, we had a lot of the
typical disruptions that most systems administrators or IT help desks would
have. That would be fine, but we weren’t a help desk, we were a SRE group.
Building the infrastructure that would power the future of our exciting SaaS
offering was being neglected and we were just helping with
printers.

When we looked at how critical the disruptions where, the results were fairly
clear in that the disruptions were required but they were still killing our
productivity.

We solved the problem by leveraging a concept that we already were familiar
with: being on call. Except this wasn’t the kind of on call that wakes you up at
3am with that horrible PagerDuty voice. This was a different kind of on call and
we wanted to set that apart while giving the role a dose of dignity and
awesomeness. The result is what we call our Heroes!

What is a Hero?

Our hero isn’t the kind that wears tight pants while flying around. Our hero is
the team’s representative to the rest of the company. Our hero lets the rest of
us focus on making progress with our forward-thinking work, knowing that the
disruptions are being handled.

The hero role rotates like an on call rotation would and it also has a few other
expectations. As a hero:

You’re responsible for the random questions and small change requests from
the organization as a whole.

You’re responsible for automating the issues/questions that come up.

The first responsibility of a hero helps a little but it doesn’t go very far on
it’s own. As the organization grows, without the second responsibility, hero
work only gets to be more and more until we are overwhelming the hero. The
second responsibility is key. With that in mind, we can decrease both the raw
number of disruptions as well as the time to complete them.

Mechanics

Since we wanted to approach the hero role as an experiment and only roll out an
MVP until we were sure it would work, we wanted to keep it very light on the
implementation side. What we ended up with is a fully SaaS solution that serves
it purpose while adding very little overhead to both the requesting teams or the
hero.

To start, we use Slack as our primary communication tool. One of the things that
Slack gives you is the ability to define a “slash” command that will post to any
endpoint that you give it. The slash command “/hail-hero” is the first part of
our hero workflow. The receiving end of the web hook that the slash command
posts to is hosted on a SaaS based rules engine, Zapier. We used Zapier for the
initial ease of setup for the MVP, anything that can accept and post webhooks
will work here. By using Zapier we can tie the receiving of that
web hook into our other tools, most notably JIRA and PagerDuty.

Why both JIRA and PagerDuty? We use JIRA day in and day out for tracking our
other work, so it was a natural fit. Also, by using JIRA we can create and link
any other work that would take more time than we wanted to allocate in the hero
process. PagerDuty was used because it helps us easily create and use a rotation
of users who will be on hero duty. As an added benefit, PagerDuty can escalate
hero tickets if they aren’t being responded to in a timely fashion.

Outcome

Overall, the hero system has worked phenomenally well. The business continuity
is being maintained and our team has a sense of focus. We are making progress on
our most substantial goals. In addition to this being quite successful for the
SRE team, four other teams have adopted this system and we have more looking to
get their own heroes.

Future Plans

Since this has been a huge success, we’re looking at moving away from using
Zapier as the logical glue and moving towards integrating this into a chat bot
that we use for other business functions. As mentioned earlier in the choice for
using Zapier, it’s will be a fairly easy component to replace.