Oracle Blog

Steve White's Weblog

Wednesday Feb 22, 2006

A short while ago I
was involved in the most interesting SGRT
facilitation of my career so far – interesting because Sun had been
invited to provide a Rational Troubleshooting facilitation even though
our equipment was not involved in the root cause of the customer
problems. The account manager in Sun is a huge supporter of this
process, and offered our services to the customer to help them manage a
very gnarly problem. The customer (to whom I'd presented SGRT a couple
of years earlier) was interested and I volunteered.

We did have Sun equipment in the customer site, and it
was having problems, and it had been identified that the Sun
equipment was a victim of a much more subtle problem to do with the
links between two computer sites. The customer was hugely advanced in
their understanding of the problem, and the answer to “Where on the
object” was really clear. They had excellent "what" data, and
acknowledged that there was no chance of getting the lifecycle
information as the problem occurred 100 times during terabytes of
data transfer over days of full production usage.

Having all the suppliers in one room, providing their view of the
problem was enlightening to all – some suppliers had a view of the
application, some of the underlying network infrastructure and others
had the physical rack and cable view.

There was no resolution to the problem identified during the
facilitation – many actions to take and more importantly many
actions that no longer needed taking as those actions were not
essential to the resolution of the problem. The possible causes
appeared to be centered on one supplier's hardware not doing quite what
it should, so the lens of attention was focused tightly on that
equipment.

Oh, and the very nice man from IBM clearly recognised what
troubleshooting method I was using and came up with some marvelously
incisive questions to forward our understanding of the symptoms still
further. It felt really good to have a peer supplier in the room
recognise the troubleshooting process we were using and actively engage
in it.

For me the key learning points were;

the reinforcement that
getting the right people in the room is not enough – following a
structured analytic technique saved us all time, made the problem
very clear and took the audience with the technical staff so that
everyone understood by the end of the day what the issues were. Even
I understood them.

that the capability of SGRT can be used as a Customer
Relationship offering to assist our customers with the management of
problems (Incidents in ITIL language).

For the End-to-End implementation of KT-Resolve / SGRT / [whatever
the process is known by in the client company] throughout the computer
industry to fully succeed we need customers to call for the use of a
rational approach. It should no longer be a matter of serial trial
fixes tacking toward a lucky break - big companies concentrate on what
their customers demand. Sun's customers should demand a rational
approach to problem management (and some already do), and Sun now has
the capability worldwide to handle problem / incident management in a
rational manner.

Friday Feb 03, 2006

Kepner-Tregoe have a model of human behaviour called the "Performance
System" model, and we use this concept extensively in the management of
the installation of the SGR Troubleshooting process (Sun branding for
the Kepner-Tregoe Resolve process, KT-Resolve) in Sun. The project
office is not empowered to provide consequences, and we make it our job
to get management to understand that; we do provide, and get involved
in the infrastructure that drives the feedback loops.

If we maintain that feedback is provided to an individual to improve
their performance for the next time they see the same situation, the
feedback - to be effective - needs to be timely, accurate and
targetted. We have a rule in the project office that feedback provided
more than 7 elapsed days after the event is not worth either the coach
completing or the engineer receiving. This drives tight loops. Most of
the feedback loops are of less than 48 elapsed hours, and it's that
long because we have a global organisation and any report that runs
once in a 24 hour period arrives in someone's timezone while they are
not at work.

The primary feedback loops we run are as described in a blog entry
below. We have built a number of secondary feedback loops to begin to
measure and reinforce good behaviour.

Daily Coaching Loop

Every day the coaches
that are assigned a group of mentees (who are often the colleagues that
they have trained) receive an invitation
(and key)
to assess the intent and quality of the work that is passing from their
group to the next group. This could be thought of as a daily survey of
the quality of work that is passing between engineers. We are assessing
the quality of the documentation.

Over time we can see whether engineers are improving in the quality of
their documentation or not, and can take action to provide additional
coaching or support for engineers who are not reaching the required
standard of internal documentation quality.

Reputation Feedback

Given that we now have "End-To-End" installation of SGRT almost
everywhere in the Customer Facing organisations and in the backline
support organisations, we can begin to get engineers to measure
engineers by reputation. A loop recently installed (and being used as a
pilot for the Betty Support Model) is asking for process usage by
reputation.

Part of the main coaching loop has process coaches assessing the intent
behind an escalation. The trigger for the reputation feedback loop is
the closure of a case that had "Cause Unknown" set as it's intent by
the process coach. This tells us that the subject of the escalation was
a "Problem" (using the classic definition provided by Problem Analysis
thinking) to the Escalation Generator. Given that it was a "Problem",
it should have been specified using PA thinking and the process of
Problem Analysis continued by the Handling engineer. On escalation
closure, both the Escalation Generator and the Escalation Handler are
offered a survey of how the other engineer did.

This has been an extremely useful loop in an unexpected way - apart
from providing an opportunity to build up a picture of coachable
opportunities, the
comments field is exposing further opportunities to work even more
effectively in Sun.

Process Escape Loops

From time to time things go wrong, and to handle those situations where
effective call handling goes astray we have set up a process escape
loop. This loop can operate in the forward
direction and the backward
direction, and always involves a process expert to assess the situation
and provide coaching where necessary.

Why are we doing this?

Simply put, because it's more effective to do so. Sun is striving
toward providing a better quality customer experience by standardising
on the troubleshooting method we use throughout the support
organisation. It's less expensive to have engineers all use the same
troubleshooting process than it is to have them inventing new processes
every time. It's results in more consistent (think reduction in
variation in terms of manufacturing or Sigma measurement) support by
reducing the standard deviation on elapsed time metrics, and reduces
average elapsed time metrics.

The opportunities that this offers Sun and it's customer are many, and
include the possibility of reaching out to our customers who use, or
are interested in using KT-Resolve. Imagine a time when customers,
empowered with the same troubleshooting method as Sun, perform a clear
Situation Appraisal, identify the Object with the problem and the
Defect that it is seeing and have spent a few minutes gathering
accurate data surrounding a problem. When they pass that info to Sun,
it can be immediately routed to the most likely person to solve the
problem, and if that person can't solve the problem they can continue
the same troubleshooting process. This has to be cheaper for our
customers, and provide a better level of service to their business.

Monday Feb 28, 2005

Individual Program Leaders for Sun Global Resolution
Troubleshooting occasionally tie-up with Program Leaders in other
companies. Recently two colleagues of mine were invited to the
offices of Cisco in San Francisco to talk about the challenges of
instituting this in their company. Sadly I wasn't there (I had
planned to attend and something else got in the way), and I heard
from my colleagues that the Cisco Program Leaders were particularly
interested in the process improvement feedback loop we're using in
Sun. One day I hope to meet you, until then, this is a drawing of the
basic operation of one of the feedback
loops we use.

In the call flow there are people who are generating work for
other people. In this model I'm calling the people who are
generating work “Escalation Generators” and the handlers
of that work “Escalation handlers” Bear in mind that
escalation handlers can also be escalation generators if they then
pass work to others. Every day we get a dump out of the case
management system of all the transfer of work movements between one
group of people and another. A Process Caoch (either a Program
Leader of a Process Facilitator) is associated with a group of
engineers. This can be the staff the Program Leader has they
themselves trained – it provides continuity following the
training course.

A batch program associates all the work from the generators
with their respective process coach, and creates a personalised html
form for the coach, making it easy for the coach to visit the work
of their coaching group.

The coach visits the html form, takes a look at the quality
of the documentation and finds coachable moments, both “well
done”, or “could do better”.

There are two stages to the coaching – the first stage
is getting the end users to recognise when they should use a part of
KT's process, we call this the “triggers for use” and
once we see individuals using a process to document the work we are
looking for “Good use of Process”. Program Leaders
should be able to recognise Good Use of Process when they see it –
if not KT have a definition you can recycle.

Feedback or consequences are provided to the individuals one
to one, either by email, a phone call or in person.

Typical emails that we send are:

Poor quality
from the escalation generator

It
is perfectly reasonable to ask the escalation generator for a
problem statement and specification on this escalation if it
would assist you in the resolution of the problem.

The
escalating engineer has been trained, and should provide you with
a specification.

Ensure
you have cycled this escalation through "Received
Incomplete"at some point in it's life to alert Management to
the lower than expected quality of the documentation.

A
specification from the escalation generator when it was not
necessary

Thank you for
providing the problem statement and specification on the above
escalation.

The provision of
a clear description of the problem in a standard format is very
helpful in the speedy understanding of a customer problem, and
overall will result in a shorter time to resolution, more chance
of a first time fix and more satisfied customers.

Please note that
there are certain escalation types that do not mandate an SGR
specification. For all cases where you know the cause of the
problem;

Reproducible
Test Case (do this and this happens)

Known Bug

Request for
Backport

Technical
Question

you do not need
to also provide a specification. It's fine to do so, but it is
not necessary.

For further
details see .... (url for further details)

A good specification from the escalation
generator

Thank you for
providing the problem statement and specification on the above
escalation.

The provision of
a clear description of the problem in a standard format is very
helpful in the speedy understanding of a customer problem, and
overall will result in a shorter time to resolution, more chance
of a first time fix and more satisfied customers.

A specification that had coachable moments.

Hand crafted by
process experts every time.

Once the reports are completed by the coaches they are
archived (in a mail archive as it happens).

We can the do data mining and compare the performance of
cases where good quality documentation wes provided compared with
not such good quality documentation.

Tuesday Feb 22, 2005

One of the many enjoyable things about the troubleshooting method
job I have with Sun at the moment is talking to our customers about
the use of a rational process in handling their issues.

The challenge is to get the decision makers in companies to
understand that it's the installation of a capability, not just a
training course. With a training course you go, learn the new stuff
and use it straight away, and for technical training on a product
you've been assigned to support you have “no choice”, the
reinforcement of the training is built into the job.

With a thought process an attendee has two choices, either to stay
the same or use the new process, and if rational process installation
is considered to be training the attendees stay the same. A long time
ago I drew this hamburger
to illustrate to management in Sun that while the training may be
considered the meat in this offering, without the lettuce, tomato,
mushroom and bun it's all a pile of greasy sausage.