Please note!

The SAP Design Guild will be retired at the end of the year. Please visit the SAP User Experience Community (www.experience.sap.com) to stay on top of what SAP is doing with regard to user experience and design.

We are currently migrating popular content to the SAP User Experience Community. Please let us know of any SAP Design Guild content that is important to you and should be migrated.

The PeP Project: Evaluating the Responsiveness of SAP Applications
from a User-Centered Perspective

On
this Website, we have published quite a few articles discussing performance
and responsiveness issues in software applications over the past two years
(see the Human
Performance at the Computer highlight topic for a compilation of these
articles). These more general articles are the by-products of a project that
was initiated by SAP User Experience at the beginning of 2008 – the
Perceived Performance project, or "PeP" project for short. This article is
devoted to the PeP project itself; it reports briefly on the project's goals
and methodological approach, work and cooperation with other groups within
SAP, and possible future directions.

Background to the Perceived Performance (PeP) Project

In his editorial What
Matters Most? one of the authors expressed his belief performance issues
are the number one usability issue. Even if you find this statement
too strong, there is general agreement that solving performance, or better,
responsiveness issues is of utmost importance for software companies (see
the appendix or Human
Performance at the Computer – Part 1: Introduction for the difference
between performance and responsiveness): Poor performance degrades
user efficiency and thus the efficiency of the business processes that
depend on the software. The usual step is, of course, to approach these issues
from a technical perspective. At SAP AG, for example, there are dedicated
technical teams that measure the responsiveness of SAP applications in clearly
defined test environments. For this purpose, they have created standardized
step-by-step scenarios that allow them to compare different software versions
and thus evaluate the effects of technical fine-tuning to improve the
system's responsiveness.

The problem with a purely technical approach, however, is that these measurements
tell us little about how users experience an application's responsiveness,
which areas require a greater investment of effort from a user's perspective,
and where the system is already responsive enough. In order to gain a better
overall understanding of these issues, SAP's User Experience team initiated
the Perceived Performance (PeP) project at the beginning of 2008. The primary
goal was to devise a user-centered evaluation method that could be applied
to the scenario-based measurements made by the technical teams. Further goals
were to apply this methodology to dedicated SAP applications, roll out the
insights gained within the company to increase awareness of responsiveness
issues, and to publish them externally via channels such as the SAP Design
Guild Website and conference appearances.

When the technical performance teams at SAP evaluate the responsiveness of
applications they monitor a large number of parameters – one of which
is the overall response time for user-initiated user interface (UI) events.
For this parameter, the teams apply a one-second threshold as a criterion for
whether an application achieves SAP's performance goals. However, this rule
does not reflect the full variety of user expectations and behaviors:
Some actions should take much less than a second, while others may take longer
without annoying users. Thus, the PeP team's challenge was to develop an
evaluation method that provides better insight into the actual user experience
and helps identify areas that need improvement. The concept of human time
ranges that originates from Allen Newell's time scales of human action looked
promising as a starting point for developing such a methodology, because these
time scales refer to the psychological dimensions of perception, operations,
and cognition (thinking, attention, motivation) (see Table 1 below; for details
see Human
Performance at the Computer – Part 2: Making Applications More Responsive and
Waloszek and Kreichgauer, 2009). In their most basic and cited form, the time
ranges are defined as follows:

PeP Application of Time Ranges

The PeP team integrated two further categories into its adoption of time ranges:

Shneiderman and Plaisant (2004) mention an additional category of "common
tasks" of about three seconds, which marks two effects that "waiting" has
on users: After three seconds, (1) users start to feel that the system is
slow and (2) they lose their task focus ( they can maintain a degree of
focus until up to 10 seconds).

The authors also report that after waiting 15 seconds, users become annoyed.

Perception of smooth animations and cause-and-effect relationship
breaks down

1.0 sec.
(0.2-2.0)

Dialog,
action

Presents result of simple task

Engaged user-system dialog breaks down

3 sec.
(2.0-5.0)

Cognition,
attention,
motivation

Presents result of common task

User has time to think – the system is perceived as
slow, the user's focus starts to wander, and the user may turn to other
tasks

10 sec.
(5.0-15)

Presents result of complex tasks

User loses focus on task and may turn to other tasks

>15 sec.

Presents result of very complex task

User becomes annoyed – the system is detrimental to
productivity and motivation

Table 1: PeP adaptation of human time ranges table, including variations
in parentheses

The next question was how the time ranges could be utilized for a user-centered
evaluation of response times. The PeP team's answer to this question
was to classify observed response times according to time ranges, and thus
the psychological effects on users of waiting. This required switching from
discrete times to ranges by extending and connecting the time ranges from 0
to beyond 15 seconds, without leaving any gaps (see the graphic
in the appendix). To define the ranges, the PeP team adopted Shneiderman's
and Plaisant's (2004) values for the variation of the time ranges wherever
possible, but a few decisions could not be backed up with data from the literature.
We therefore initially set fairly conservative upper limits for the time ranges
(see the first column in Table 1 or the graphic
in the appendix).

PeP Assignment of UI Events to Time Ranges

Measuring response times and classifying them according to the time ranges
does not, however, provide the complete picture. As already mentioned,
some UI events need to be blazingly fast, while others may take longer without
annoying users. Thus, to derive guidance from the evaluations, it is also
necessary to know, which response (or waiting) time users expect (and tolerate)
for certain types of UI events. Assigning UI events to time ranges makes it
possible to compare and evaluate observed and expected response times and
to identify which events conform to users' expectations and which do
not (and thus require improvement). As there was very little guidance in
the literature, the PeP team drew up the following list for practical use in
its evaluations:

The PeP Methodology in Short

Finally, we put together the ingredients of an evaluation method for
response time. In short, the PeP methodology is based on three steps:

Preparation: We break the use scenarios into task steps, or technically,
UI events. We then categorize them according to what response time would
be tolerable for users. This (preliminary) assignment is based on the complexity
of interactions, that is, the workload for the computer that experienced
users would expect.

Measurement: We time the UI events and assign them to the time
ranges. This assignment is based on the events' actual duration, and
thus on the users' perception, not their expectations.

Evaluation: This data leads to a frequency matrix of tolerable
versus observed time ranges (see Table 2), which can be interpreted from
a user’s perspective.

The time ranges have distinct
implications (directness, appropriateness, slowness, waning or lost focus,
annoyance) for users' perceptions and reactions. Therefore, the PeP evaluation
matrix provides a more refined picture of how users perceive the performance
of a software application than checking response times against one fixed time
limit. The PeP evaluation is particularly valuable if an application is
considerably slower than expected or exhibits wide response-time variations.

The PeP team measured many standardized
scenarios, the data for which was provided by the technical performance teams.
The (fictional) example in Table 2 below shows a scenario with a fulfillment
rate of 30.1% for simple tasks; this is assumed to have a strong negative impact
on user satisfaction.

Tolerable Range

Observed Range
(Number of Times Measured)

Total

Fulfillment
Rate (%)

Type of Interaction

0.2-2.0 s

2.0-5.0 s

5.0-15 s

> 15 s

Simple Tasks (0.2-2.0 s.)

22

26

20

5

73

30.1

Common Tasks (2.0-5.0 s.)

3

13

9

9

34

47.1

Complex Tasks (5.0-15.0 s.)

0

1

2

1

4

75.0

Overall

25

40

31

15

111

36.9

Table 2: Example of a PeP evaluation matrix (fictional data)

Short Overview of the PeP Team's Work

One of the PeP project's major tasks was, of course, to learn about and gain
an understanding of responsiveness issues from a user's perspective. As shown,
this was essential for developing an evaluation method, and it was also the
basis for consulting other teams at SAP. But the PeP team also had to gain
a basic understanding of the technical constraints underlying application responsiveness
issues. For this purpose, the team attended the internal SAP Performance Focus
Days, for example. In addition, cooperating closely with the technical performance
teams at SAP was mandatory for the PeP team; we already mentioned that the
technical teams provided the data for most of the PeP evaluations.

After the PeP team had developed a user-centered evaluation approach and performed
a number of evaluations based on data provided by the technical teams – about
10 evaluation reports were delivered by the PeP team in 2008 – the team
was also able to roll out information within SAP: PeP members gave a number
of presentations to other teams that were interested in the topic, prepared
a presentation for an internal SAP Developers conference, and took part in
numerous discussions and several work groups, consulting the teams from a user-centered
perspective. Issues that arose during these discussions were, for example:
When should feedback be given and what form should it take? Should pages load
incrementally or completely? Further topics included application startup time,
speed of autocomplete, and the influence of server roundtrips and WANs. It
turned out that the PeP team's time ranges provided a good heuristic for answering
such questions, thus extending their usefulness beyond their sole application
to the PeP evaluation method.

At the beginning of this article, we mentioned that the PeP team also had
the goal of making information available outside of SAP. This was accomplished
by publishing articles on the SAP Design Guild Website. These are additionally
compiled in the Human
Performance at the Computer highlight topic for easier access. They
contain general information and are largely independent from the PeP project.
However, Ulrich Kreichgauer and Gerd Waloszek presented the PeP team's user-centered
evaluation method at the INTERACT 2009 conference in Uppsala, Sweden. This
method will also form part of a keynote that Dan Rosenberg will give at the
20th FQS-Forschungstagung (
research congress of the German Quality Research Community
) in October 2009 in Frankfurt am Main, Germany.
Last but not least, this article delivers some details of the PeP team's work
to the public.

Future Directions

Because it was initiated as a project, the PeP project has a limited time
scope. Many questions were addressed and answered during the project
time span, while others need further research and clarification and may be
beyond the current project scope. First of all, the assumptions underlying
the PeP evaluations need to be validated further. UI events are currently assigned
to time ranges on a heuristic basis and call for more thorough investigation.
In addition, the transition points between the time ranges rely on data from
the literature and on heuristic assumptions. Systematic experiments involving
users who rate the timeliness of selected UI events could help to define the
points more reliably.