Vaut il mieux s'excuser ou offrir un cadeau ? La réponse grâce à Uber

1.
Toward an understanding of the economics of
apologies: evidence from a large-scale natural
ﬁeld experiment ∗
Basil Halperin†
Benjamin Ho‡
John A. List§
Ian Muir¶
September 2018
Abstract: We use a theory of apologies to analyze a nationwide ﬁeld
experiment involving 1.5 million Uber ridesharing consumers who experienced
late rides. Several insights emerge. First, apologies are not a panacea: the
eﬃcacy of an apology and whether it may backﬁre depend on how the apology
is made. Second, across treatments, money speaks louder than words – the
best form of apology is to include a coupon for a future trip. Third, in some
cases sending an apology is worse than sending nothing at all, particularly
for repeated apologies. For ﬁrms, caveat venditor should be the rule when
considering apologies.
Keywords: apologies, ridesharing, ﬁeld experiment
JEL Classiﬁcation Numbers: D80, D91, Z13
∗
This manuscript was not subject to prior review by any party, as per the research
contract signed at the outset of this project. The views expressed here are solely those
of the authors. Thanks to seminar participants at AFE 2017, AEA 2018, and Williams
College, and to Courtney Rosen for helpful comments and assistance. AEA Registry number:
AEARCTR-0002342. All errors are our own.
†
MIT Department of Economics and Uber Technologies, Inc.
‡
Vassar College. 124 Raymond Ave. Poughkeepsie, NY 12604 650-867-8270
beho@vassar.edu
§
University of Chicago, NBER, and Uber Technologies, Inc.
¶
Uber Technologies, Inc.
1

2.
“Virtually every commercial transaction has within itself an element of
trust... It can be plausibly argued that much of the economic backwardness in
the world can be explained by the lack of mutual conﬁdence.” Arrow (1972)
1 Introduction
Economists have come to recognize the importance of trust, reciprocity,
and other social preferences for explaining human behavior: people are self-
interested, but also are often concerned about the payoﬀs of others (e.g., Rabin
(1993), Charness and Rabin (2002), Fehr and List (2004)). Additionally, as
Arrow (1972) and Sen (1977) have argued, networks of trust and reciprocity
are essential for undergirding all economic exchange. However, relatively less
is known about the consequences of violations of trust or reciprocity. What
actions can be taken to avoid the deterioration of mutual conﬁdence when
trust has been compromised?
One common action to avoid the collapse of a relationship after a viola-
tion of trust or an unfortunate incident is to deliver an apology. The act
of apology is an important thread running through households, friendships,
and employer-employee relationships. Recent research has lent important in-
sights into apologies in lab contexts and small-scale ﬁeld experiments (see,
e.g.,Gilbert et al. (2017), Ho (2012)), but much remains ill-understood. For
instance, why do ﬁrms apologize? Do customers actually value apologies?
With these questions as motivating examples, we begin by outlining a
principal-agent model of trust violation and apologies. In the model, a cus-
tomer (the agent) purchases output that provides a noisy signal of the under-
lying trustworthiness of a ﬁrm (the principal). Depending on the stochastic
quality of the output, the ﬁrm may choose to apologize by sending a (poten-
tially costly) signal to the consumer in an attempt to signal trustworthiness
and restore the relationship. Several insights emerge from the model, includ-
ing that in order for the apology to be an eﬀective signal, the apology must
be accompanied by a real cost (Ho, 2012).
We worked with the Uber ridesharing platform to identify a natural setting
2

3.
to lend insights into the underpinnings of the model. Uber is naturally con-
cerned that inaccurate estimates of trip duration may lead to decreased trust
in the platform and decreased spending in the Uber marketplace. Because
Uber services 15 million rides each day (Bhuiyan, 2018), even an extremely
small fraction of rides being late could have large repercussions. Indeed, our
analysis suggests that, absent any apology, a rider who experienced a late trip
spends 5-10% less on the platform relative to a counterfactual rider, suggesting
that there are material consequences to precisely the breach of trust described
above.
With this substantial loss in revenues as a backdrop, we conduct the ﬁrst
large-scale, natural ﬁeld experiment to measure the importance of apologies
as a method for restoring trust in a relationship. Partnering with Uber, we
conduct an experiment across the United States over several months, sending
real-time apology emails following a late trip, as deﬁned by the actual trip time
compared to the initial time estimate shown to the rider. We leverage rich cus-
tomer data from Uber, the customer-ﬁrm relationship history, and situational
context to test speciﬁc predictions of the model. Our primary goal is to mea-
sure the role of apologies in maintaining relationships between a ﬁrm and its
customers who have received a bad trip experience, measured by the level of
future spending with the ﬁrm. The main set of treatments varies whether a
customer receives an apology, as well as the size of the promotional coupon the
customer receives as part of that apology ($5 or zero). We complement these
treatments with a secondary set of treatments that send up to two additional
apologies following a second and third delayed trip.
We report several interesting insights. First, a costly apology after a bad
ride – in the form of a $5 coupon for a future trip – is an eﬀective signal that
increases future demand for future trips. Alternatively, we ﬁnd that a signal in
the form of an apology without a promotion (i.e. words alone) had no eﬀect or
was even sometimes counterproductive. As a placebo check, we ﬁnd that the
$5 coupon administered directly after a bad ride is more eﬀective than a $5
coupon administered at a random time and unrelated to a rider’s experience.
Second, we ﬁnd that repeated apologies in the case of repeated bad ex-
3

4.
periences make things worse relative to fewer apologies. This result is also
consonant with our model. Apologies can restore trust but consumers who
receive an apology hold ﬁrms to a higher standard in the future. If that future
standard is violated, apologies backﬁre.
Finally, we ﬁnd that characteristics of trips and individuals aﬀect the im-
pact of apologies. The eﬃcacy of an apology depends on the severity of the
unsatisfactory service – in this case measured by how late the ride was, in
minutes. In particular, we ﬁnd a U-shaped relationship between severity of
the unsatisfactory experience and apology eﬀectiveness: for slightly bad qual-
ity and severely poor experiences, apologies are eﬀective. Yet, for moderately
poor experiences, apologies may actually make things worse. Moreover, the
eﬃcacy of an apology critically depends on a user’s familiarity with the service.
Apologies are less eﬀective for users who are very familiar with the product
and are much more eﬀective for a given Uber product when the user typically
uses a diﬀerent Uber product. We show that both these results arise naturally
in our model.
Our study ﬁts in nicely with several strands of related work. First, it
extends the social preference literature into an area that considers how trust
can be restored after it is compromised. As Levitt and List (2007) summarize,
lab and ﬁeld experiments with the canonical trust game, dictator game, and
other games have shown that the concepts of trust and reciprocity are essential
for explaining human behavior. Rabin (1993), Charness and Rabin (2002), and
Dufwenberg and Kirchsteiger (2004) formally model these concepts. Second,
the extant literature on the economics of apologies has primarily been limited
to small scale ﬁeld and lab experiments (e.g. Aaker et al. (2004), Abeler et al.
(2010), Fischbacher and Utikal (2010), Gilbert et al. (2017), Chaudhry and
Loewenstein (2017)), or diﬀerence-in-diﬀerence analysis of policy interventions
(e.g. Ho and Liu (2011)). We extend the literature by testing the model in the
ﬁeld, with detailed customer and situational data, and we follow the subjects
for three months after the apology to measure how eﬀects persist over time.
Our data complement this work, showing that methodologically the lab studies
have given us a key ﬁrst look at the eﬃcacy of apologies.
4

5.
The remainder of our paper proceeds as follows. We ﬁrst introduce the
principal-agent model that guided the experimental design. Then we pro-
vide details of the experimental design, brieﬂy describe the Uber ridesharing
platform, and discuss the empirical results. We conclude with a discussion
exploring how ﬁrms and individuals can use our results to further their under-
standing of apologies.
2 Theoretical Motivation
Our theoretical framework is based on the Ho (2012) principal-agent model
of a customer-ﬁrm relationship and apologies that formalizes many of the
ﬁndings about apologies in the psychology literature.1
The model is a two-
player game between a ﬁrm (the agent) and a consumer (the principal). Firms
can be a good “high” type (e.g. high trustworthiness) or bad “low” type
(e.g. low trustworthiness), θ ∈ {θH, θL}. The ﬁrm produces output y for the
consumer, generating utility for the consumer. The quality of the output –
how long the ride takes to arrive to the destination relative to expectations in
our case – depends on ﬁrm type θ as well as external circumstance, ω ∈ Ω,
that is uncorrelated with ﬁrm type (e.g. unexpected weather). Bad outcomes
(i.e. low-quality output) can result from a ﬁrm with bad intentions, θ = θL,
or alternatively from a bad draw from the state of nature ω. The consumer is
only aware of the overall quality of output y = y(θ, ω). We can think of the
ﬁrm’s intent as the expected output over all possible external circumstances,
ω which the ﬁrm does not know in advance, holding the ﬁrm’s actual type,
θ, ﬁxed: Eˆω∈Ωy(θ, ˆω). The type, θ, is known to the ﬁrm but unknown to the
consumer. Type is deﬁned so that higher types have “better” intentions.
Within the context of the rideshare industry, the timeline of the baseline
game proceeds as follows (Figure 1). The consumer begins with a prior p on
the probability that the ﬁrm is high type. She then experiences a good or
1
For example in lab experiments, Ohtsubo et al. (2012) and de Cremer et al. (2011) ﬁnd
that costly apologies can work better than cheap apologies; Skarlicki et al. (2004) and Kim
et al. (2004) ﬁnd that apologies can backﬁre; and many ﬁnd that the eﬃcacy of an apology
depends on the type of oﬀense (e.g. Maddux et al. (2011)).
5

6.
bad outcome for a ride, y(θ, ω) ∈ R. Next, the ﬁrm chooses to apologize or
not a ∈ {0, 1}. Finally, given the quality of the ride y and apology or non-
apology a, the consumer updates her beliefs about the ﬁrm’s type, learns that
an outside option is of high type with probability pout, and then chooses to
stay with the ﬁrm or to go with the outside option.
Agent produces
t = 1
y = y(θ, ω)
Apology/no apology
a ∈ {0, 1}
at cost c(a, θ, ω)
Principal updates
posterior b(a, y)
Outside option
revealed, pout
Principal chooses
stay / leave
Old/new agent
t = 2
produces output
Figure 1: Timeline of the Apology Game
The consumer cares only about maximizing her consumption of rides, where
ride quality y(θ, ω) is a function of the ﬁrm’s type θ and external circumstances
ω such as traﬃc or weather. The consumer’s choice, x, is simply whether to
purchase from the same rideshare ﬁrm in period two or to take an outside
option (e.g. switch to a competitor or take public transit).
Uconsumer(x) =
t=0,1
y(θt(x), ωt)
To keep things simple for this application, the ﬁrm’s problem is simply to
decide whether or not to apologize.2
The ﬁrm receives a proﬁt per customer
of π and pays apology cost (potentially zero) given by c(a|θ, ω1), which can
depend on its type and the state of nature.
Ufirm(a) = π · x − c(a|θ, ω1)
For the moment, assume the cost of apologies is constant: c(1|θ, ω1) = κ.
We will discuss other cost functions and cheap apologies (i.e. c(1|θ, ω1) = 0)
below.
2
As in Ho (2012), we abstract away from the ﬁrm’s choice of eﬀort in the determination
of output quality. Under fairly general assumptions, speciﬁcally that choice of eﬀort is
supermodular with respect to ﬁrm’s type θ, a model that includes costly eﬀort is equivalent
to a reduced form model where cost of eﬀort is subsumed into the apology cost function
which depends only on type. See Ho (2012) for details.
6

7.
Given this simple framework, the consumer observes a sequence of signals
about the ﬁrm’s type, H which in this case includes the ﬁrm output y and the
ﬁrm apology a. The consumer chooses to stay with the ﬁrm provided their
posterior belief, given by b(H) ≡ Pr[θ = θH|H], is greater than the quality of
the outside option: b(H) > pout. The quality of the outside option is drawn
from some known distribution F(·). The ﬁrm chooses to apologize if and only
if:
π · [F(b(y, 1)) − F(b(y, 0))] > κ
The eﬃcacy of an apology, ∆b ≡ b(a = 1) − b(a = 0), is the impact
the apology has on the customer’s beliefs (i.e. the ﬁrm’s reputation) and
thus the likelihood that the customer will stay with the ﬁrm. The model
provides several useful predictions about apology eﬃcacy, ∆b, that inform our
experiment. Below, we discuss how apology eﬃcacy is aﬀected by uncertainty,
the costliness of the apology, and the severity of the bad outcome. We also
discuss predictions regarding repeat apologies.
2.1 Role of Uncertainty and the Role of Costs
A separating equilibrium where apologies signal higher type exists given the
usual conditions. From Proposition 2 in Ho (2012), there are three existence
conditions that allow a separating equilibrium to exist: 1) it is cheaper for
high types to apologize, 2) continuing the relationship is more beneﬁcial for
high types, or 3) high types fail in diﬀerent situations than low types. In the
case of a repeated customer relationship, the second condition is most likely
to hold as repeat customers will ultimately learn the ﬁrm’s type just from
repeat experience with the product. Therefore, the continuation value is lower
for low quality ﬁrms since customers will eventually discover they are inferior
and switch to the outside option. Accordingly, high types are more likely to
maintain a lasting relationship. We examine the data for evidence for the other
two existence conditions by exploring the value of implicit promises and the
role of situation on the eﬃcacy of the apology. We return to these questions
7

8.
in the Discussion.
In a separating equilibrium, three properties about the eﬃcacy of an apol-
ogy follow straightforwardly from Bayes Rule (see Ho (2012) for details):
1. Apologies are more eﬀective when there is greater uncertainty in the
relationship (when the prior p is bounded away from 0 or 1)
2. Apologies are more eﬀective early in relationships
3. Apologies are more eﬀective the greater the apology cost. Further, apolo-
gies are only eﬀective when there is a cost (c(a = 1) > 0)
Property 1 comes from the fact that when the prior belief, p, about the
ﬁrm’s type is close to 0 or close to 1, then the posterior belief is unlikely to
change much given a single additional signal (the apology) and therefore the
apology is likely to be ineﬀective. Apologies move beliefs the most when the
customer is most uncertain. Property 2 follows from Property 1. A customer
receives more and more signals about a ﬁrm’s type over time. As the history
of signals, H, lengthens, beliefs converge to either 0 or 1. Therefore, apology
eﬃcacy is greater early in a relationship.
Finally, Property 3 is based on the cost of apologies. If apologies increase
reputation then all ﬁrms will want to apologize. If costs are too low, then all
types of ﬁrms will apologize. If all ﬁrms apologize with the same frequency then
the equation for ∆b above reduces to p
p+1−p
− p
p+1−p
= 0, and so the eﬃcacy of
apologies is zero. Apologies need to be costly in order to ensure good ﬁrms and
bad ﬁrms apologize at diﬀerent rates, which creates the separation in beliefs
necessary for apologies to function.
2.2 Severity of Outcomes
We can apply the above results to also make predictions about how the
eﬃcacy ∆b of an apology varies according to the outcome y. Apologies are
more eﬀective when there is greater uncertainty about the ﬁrm’s type. Con-
sider the distribution of possible outcomes (as measured by minutes late) for
a ﬁrm with good intentions θH versus a ﬁrm with bad intentions θL. Here we
8

9.
suppose that the lateness of a trip is given by a normal distribution, with a
lower mean for high-type ﬁrms than for low-type ﬁrms, and common variance
(Figure 2).
Figure 2: Distribution of Lateness of Outcomes (minutes)
Assume lateness of outcome is normally distributed, with high-type ﬁrms having a lower
mean then low-type ﬁrms, and common variance (e.g. weather or traﬃc).
In this example, certainty that the ﬁrm’s intentions are bad is maximized at
the mean of the θL distribution. There is more uncertainty when the ride is less
late since the ﬁrm is more likely to have had good intentions. Similarly when
the ride is more late, the lateness is more likely to be due to the common shock
(e.g. weather or traﬃc). As a result we would predict apologies to be least
eﬀective for intermediate values of lateness and more eﬀective when barely late
or extremely late.
2.3 Repeated Apologies
It is also useful to apply the above theory to make predictions regarding
the eﬃcacy of repeated apologies. Repeat apologies should be less and less
eﬀective as the customer gains experience with the ﬁrm. The customer is
acquiring more and more information, and therefore is becoming more certain
about the ﬁrm’s type. Therefore the eﬃcacy of an apology should diminish
with increased interaction with the ﬁrm. In fact, Ho (2012) predicts that an
9

10.
apology could even begin to backﬁre if we assume apologies are cheap but
imply a promise for better behavior.
A cheap talk model of repeat apologies can lead to a backﬁre eﬀect if we
believe that an apology implies an implicit promise to do better in the future
and repeated failure breaks that promise (as seen in the trust game experiment
by Schweitzer et al. (2006)). A promise kept signals higher ﬁrm quality while
a promise broken is worse than no apology at all. This can be seen in a simple
screening contract extension to the baseline model.
Imagine the principal (consumer) oﬀers the agent (ﬁrm) a menu that says
the following: If the ﬁrm apologizes, then the relationship will be continued;
however, if the ﬁrm is late again, the relationship will be immediately termi-
nated in favor of the outside option. A separating equilibrium exists where
good-intention ﬁrms apologize and accept the threat of immediate termination
while bad-intention ﬁrms do not apologize and are judged in the future solely
based on their performance (See Ho (2012) Online Appendix for details). In
the context of Charness and Dufwenberg (2006) a broken promise signals lack
of guilt aversion which serves as a second negative signal about the ﬁrm’s type.
2.4 Hypotheses
In sum, the hypotheses from the model that are applicable to our setting
include:
Hypothesis 1 The eﬃcacy of an apology is higher when apologies are more
costly.
Hypothesis 2 The eﬃcacy of an apology is lowest for intermediate severities
of adverse outcomes when the variance of outcomes within types exceeds the
variance of outcomes between types.
Hypothesis 3 The eﬃcacy of an apology is higher early in a customer-ﬁrm
relationship, when there is greater uncertainty about the ﬁrm’s type.
10

11.
Hypothesis 4 The eﬃcacy of an apology decreases with repeated use and can
backﬁre if overused.
The model deﬁnes apology eﬃcacy as the change in beliefs, ∆b that arise in
response to an apology. While we do not observe the beliefs of our experimental
subjects, we do observe their future decision of whether to stay with the ﬁrm,
or to choose an outside option: Pr [b(H) > pout]. It is this outcome variable
that we will use to test our main hypotheses.
3 Experimental Design
To test the hypotheses from this model, we conducted a natural ﬁeld exper-
iment (see Harrison and List (2004)) on the Uber ridesharing platform. The
Uber platform connects riders with drivers willing to provide trips at posted
rates. A rider provides her desired pickup and dropoﬀ location through a phone
app, and is oﬀered a price, an estimated time to pickup, and an estimated time
to destination (ETD). She then may choose to request an Uber ride and will
be picked up and transported to the destination. At the end of the trip, the
rider has the option to tip the driver (see also Chandar et al. (2018a), Chandar
et al. (2018b)). This describes the standard “UberX” product oﬀering which
is the focus of our experiment, but Uber oﬀers products that slightly vary this
experience. For example, UberPOOL oﬀers a discounted price but may involve
trip detours to pick up multiple riders traveling along a similar route.3
One measure of platform quality is the accuracy of the ETD provided to
riders. Rideshare ﬁrms such as Uber are justiﬁably concerned that inaccu-
rate such estimates may lead to decreased trust and consequently decreased
spending with Uber. As mentioned above, we completed an analysis using a
matching methodology to identify the causal eﬀect on future spending of a
rider who experienced a late trip – a “bad ride” – relative to a statistically
identical customer who took an identical ride but which arrived on time. This
3
Cohen et al. (2016) also use Uber data to study the demand side of the ridesharing
market. A number of other papers use Uber data to examine the supply side, see e.g. Cook
et al. (2018), Hall et al. (2017).
11

12.
analysis, which helped to motivate the present study, found that riders in the
right tail of the lateness distribution spend 5-10% less on the platform relative
to the counterfactual. These results are available upon request.
To attenuate the costs of bad trips and to test the power of apologies,
we designed a natural ﬁeld experiment. Our ﬁeld experiment was conducted
over the course of several months in 2017. We selected many of Uber’s largest
markets to ensure a mix of cities with diﬀering levels of competition between
Uber and competing ridesharing platforms, and separately to ensure large
enough ridesharing markets to generate a suﬃcient sample size. 1.5 million
subjects passed through the experiment across the eight treatment groups
described below.
Riders entered the treatment upon experiencing a bad ride, deﬁned as an
UberX trip which arrived at the destination n minutes later than the ETD
initially displayed to riders when choosing whether to request a trip. The
threshold n varied by city based on the city’s historical distribution of lateness.
The threshold was set so that in expectation only the 5% latest trips would be
classiﬁed as late in each city, which generally implied a 10-15 minute threshold.
An hour after the end of a bad ride, a customer in a treatment group would
receive an email, the content of which varied depending on the treatment
group. We then follow all of the customer’s future interactions with Uber for
84 days.
Following our theory, subjects were divided among eight treatment groups
(Figure 3). Half received a $5 promo code while the other half received no
promo code. The promo code conditions were crossed with four diﬀerent apol-
ogy types:
1. No apology.
2. Basic apology: e.g., “Oh no! Your trip took longer than we estimated.”
3. Status apology: e.g., “We know our estimate was oﬀ.”
4. Commitment apology: e.g., “We’re working hard to give you arrival
times that you can count on.”
The wording of each email was in the spirit of our model and followed Ho
12

13.
(2012). The messages were sent as emails, with subject lines that suggested
the nature of the apology and highlighted the $5 promotion if attached. Full
message details along with the theoretical motivations for each apology type
are found in Appendix A and B.
Figure 3: Treatments
The experiment was a 4x2 design with 4 apology message types crossed with either a no
promo code condition or a $5 coupon condition.
Treatment groups were balanced on eight dimensions:
1. Average fare previously faced by a rider (in all of 2017 prior to the
experiment launch)
2. Days since signing up with Uber
3. Lifetime dollars spent on Uber (up until experiment launch)
4. Lifetime trip count (up until experiment launch)
5. (Number of UberPOOL trips taken in life)/(Number of UberX + Uber-
POOL trips taken in life)
6. Number of UberPOOL trips taken (in the month before experiment
launch)
7. Number of UberX trips taken (in the month before experiment launch)
8. Number of support tickets ﬁled (in all of 2017 prior to the experiment
launch)
Technological limitations meant balancing could only be done for sub-
jects who had signed up for the Uber platform before the start of the ex-
13

14.
periment. Subjects who joined after the start date were randomly assigned
with equal probability to one of the treatment groups. As a result, because of
the large number of subjects, means were signiﬁcantly diﬀerent in t-tests be-
tween groups, but the diﬀerences were economically small, as reported in Table
1. Appendix B contains further details on experimental design, including the
language and imagery contained in the apology email.
Table 1: Balance Check – Mean Rider Characteristics by Treatment
avg fare days since signup lifetime billings lifetime pool share lifetime trips n recent pool trips n recent x trips n tix
Basic apology -14.318 757.443 1973.183 0.124 130.023 1.249 5.254* 0.877
Basic apology + promo -14.366 761.054 1984.287 0.123 131.039 1.27 5.317 0.883
Commitment apology -14.276 759.031 1963.447 0.124 129.578 1.317 5.289 0.87
Commitment apology + promo -14.383 757.146 1979.742 0.123 129.618 1.242 5.279 0.863
Control -14.339 762.622 1990.844 0.124 131.44 1.277 5.363 0.896
Just promo -14.368 762.392 1994.212 0.124 131.528 1.281 5.351 0.886
Status apology -14.309 757.866 1974.789 0.122 129.4 1.234 5.222*** 0.87
Status apology + promo -14.356 761.665 1995.218 0.124 131.619 1.28 5.377 0.893
* indicates signiﬁcance of pairwise t-test versus the control group at the 5% level, with the Bonferroni correction applied. ** indicates the same at the 1% level and *** at the 0.1% level.
In general we report results for future spending net of any promotions
applied, including but not limited to our $5 promo. For example, if a rider
took a single $8 trip in the seven days following treatment, but used a $5
promotion on that trip, her level of spending would be reported as $3. The
analysis using gross spending yields similar results. We also consider future
trip count, future tipping, and the extensive margin of whether the rider took
any future trips as outcome variables.
4 Results
We begin by presenting the unadjusted means of our main outcome vari-
able, net spending, across the seven treatment groups versus the control group.
Figure 4a presents average spending by riders over the seven days following
the bad ride. The ﬁgure can be read as follows: we have 186,584 customers in
the control group who had a bad trip. On average, these customers spent (net
of promotions) $45.42 in the seven days after the bad trip. Comparing this to
the basic apology group, which had 191,825 subjects, we ﬁnd that those who
received our basic apology spent $45.86 in the seven days subsequent to a bad
trip. This result is signiﬁcant at the p < 0.05 level using a standard t-test of
14

15.
means.
(a) (b)
Figure 4: Mean spending by treatment group
Panel (a) presents raw mean spending by treatment arm. Panel (b) aggregates the re-
sults across message types into four treatment categories since the content of the messages
themselves was found to be insigniﬁcant.
Another interesting ﬁnding in the raw data is that we ﬁnd no statistically
signiﬁcant diﬀerences between the diﬀerent message types. F-tests show that
mean spending within the set of three “Just apology” treatments were statis-
tically indistinguishable (ANOVA p = 0.27), as was mean spending within the
set of three “Promo + apology” treatments (ANOVA p = 0.63). Accordingly,
for ease of comparison, we aggregate the treatments into four categories, shown
in Figure 4b. The categories are: the control group, the treatment group
that received just the $5 promo code (“Just promo”), the three treatment
groups that received just an apology email (“Just apology”), and the three
treatment groups that received both a $5 promo and an apology (“Promo +
apology”). The ﬁgure shows similar insights as observed in the disaggregated
data: coupons are an important promotional tool, and apologies alone work
marginally.
To complement this visualization of the raw data, we provide Table 2,
15

16.
which reports summary statistics for the full set of outcome variables, again
at the seven-day horizon. Note that the eﬀect of treatments on trip count and
whether a rider takes a future trip are consistent with the eﬀect on spending.
Additionally, the eﬀects of diﬀerent treatments on future tipping behavior are
indistinguishable from zero.
Table 2: Means (Std Errs) by Treatment Category (7d)
Total spending (net of promos) Trip count Total spending (incl. promos) Total tips Took another trip
Control 45.424 2.848 46.6 0.977 0.674
(0.152) (0.009) (0.154) (0.008) (0.001)
Just apology 45.748 2.851 46.924 0.991 0.672
(0.100) (0.006) (0.101) (0.006) (0.001)
Just promo 45.741 2.877 47.219 0.994 0.680
(0.154) (0.009) (0.156) (0.009) (0.001)
Promo + apology 45.649 2.879 47.166 0.994 0.679
(0.101) (0.006) (0.102) (0.006) (0.001)
Note: Outcome variables at a seven-day horizon are presented here, but data were collected at horizons up to and beyond 84 days after the initial bad
ride.
To supplement the raw data observations, we conduct a series of regres-
sions. Our main empirical speciﬁcation regresses the outcome variables of
interest for each subject i on the set of eight treatment dummies indexed by j,
controlling for the variables X on which we balanced in addition to city, date,
and hour-of-week ﬁxed eﬀects:
ln(Outcomei) =
j
αj · Treatmentj + β · Xi + γcity + δdate + ηhour + εi (1)
Regression results for the eﬀect of apologies on net spending, estimated
using this speciﬁcation, are presented in Table 3. Each column estimates the
treatment eﬀect on net spending over progressively longer horizons (7, 14, 28,
56, 84 days). A main feature to note is that the apology by itself (without a
promotion) has no statistically signiﬁcant eﬀect at conventional levels. In fact,
while the eﬀect of an apology is largely not signiﬁcant, if anything the presence
of the apology in and of itself has a negative eﬀect over longer time horizons
(56 to 84 days). Table 4 presents the same speciﬁcation but with number of
future rides as an outcome variable. It shows the same basic pattern, therefore
we will focus our attention on net spending as the outcome variable.
16

19.
Figure 5 plots the estimated coeﬃcients on the treatment dummies from
our main empirical speciﬁcation estimated over the same horizons described
above. We ﬁnd persistent eﬀects of treatments that include a promotion as
far out as three months after the apology was sent.
Figure 5: Percent change in spending over time
We plot the α coeﬃcient on each treatment dummy from model (1), with total spending as
the outcome, between the date of the bad ride and some future date 7, 14, 28, 56, and 84
days in the future.
One possible explanation for the persistence of the eﬀect is intertemporal
complementarities in consumption. In other words, if taking an additional
ride today increases a rider’s chance of taking a ride tomorrow, then simply
inducing a customer to take an additional ride in the ﬁrst week could have
persistent eﬀects. While this result is intuitively appealing, it should be tem-
pered in that if complementarities were the only force driving the persistence,
one would expect the eﬀect size to get smaller over time. In fact, if anything
the eﬀect (of a promotional coupon alone) stays steady or increases (albeit not
signiﬁcantly) by day 84.
What is especially notable in the results is that the eﬀect of an apology by
itself becomes more negative over time. An apology alone (with no coupon)
becomes signiﬁcantly negative by day 84, in contrast to the eﬀect of an apology
19

20.
with a promo. This conﬁrms Hypothesis 1 that apologies are more eﬀective
when the cost associated with the apology is higher.
4.1 Heterogeneity by Severity of Lateness
Recall Hypothesis 2 that an apology would be least eﬀective for mod-
erate levels of lateness, since this is when the poor experience is most likely
attributable to the ﬁrm itself. On the other hand, apologies would be more
eﬀective for low levels of lateness (when the ﬁrm is more likely to be of the
high type) and high levels of lateness (where the most severe delays can be
attributed to external factors like weather).
This prediction is consistent with the pattern observed in the data. Fig-
ure 6 provides the estimated coeﬃcient for the aggregated treatment variable
interacted with indicators for the degree of lateness as measured by decile.
Since there is signiﬁcant variation in the distribution of lateness for each city,
we measure lateness relative to other rides from the same city, although other
speciﬁcations produce the same pattern. As predicted, apologies are least
eﬀective (or most damaging) for intermediate degrees of lateness.
Figure 6: Eﬃcacy of Apology by Severity of Outcomes
The coeﬃcient on the treatment variable interacted with the decile of how late the ride was
as measured by number of minutes relative to the other rides in the sample from the same
city.
We test this relationship formally by estimating our main speciﬁcation
(1) with the addition of interaction terms between the treatment dummies
and the percentile of lateness and the percentile squared. We ﬁnd that the
20

21.
quadratic interaction term is statistically signiﬁcant for the “promo + apology”
treatment at the p < 0.05 level and for the “Just promo” treatment at the
p < 0.10 level.
4.2 Heterogeneity by Rider History
We now turn to Hypothesis 3 that apologies should be most eﬀective
when there is the greatest degree of uncertainty and therefore we would expect
greater eﬃcacy for new users of the ridesharing platform. Here we present the
eﬀect of apologies within subsamples of riders based on quartiles of riders’
number of rides before having the bad experience.
Figure 7: Rider heterogeneity
Treatment eﬀect on net spending within subsamples deﬁned by the quartile of the number
of past rides in the customer’s history.
As shown in Figure 7, our results are mixed. For the joint promo and
apology treatment, the point estimates indicate that the treatment eﬀect is
highest for the newest quartile of users (those with 0 to 10 lifetime trips) and
lowest for the most experienced users (those with greater than 157 trips), with
the eﬀectiveness decreasing across quartiles. However, for those who received
just an apology, the point estimates are mixed, and in fact the treatment eﬀect
estimate is smallest for the newest users and somewhat higher for the most
experienced users. In both cases, the conﬁdence intervals are wide.
21

22.
Looking instead at a diﬀerent measure of unfamiliarity and uncertainty, the
frequency of UberPOOL usage relative to UberX, we ﬁnd results more consis-
tent with the hypothesis (Figure 8). The two most popular services provided
by Uber are UberPOOL and UberX. Since our experiment was conducted ex-
clusively on UberX riders, we expect riders who have mostly used UberPOOL
in the past to be more uncertain about the quality of UberX. Indeed, our point
estimates indicate that riders who mostly used UberPOOL in the past were
much more likely to be positively inﬂuenced by an apology than riders who
mostly used UberX, although the conﬁdence intervals are again large.
Figure 8: UberPOOL Riders vs UberX Riders
Treatment eﬀect on net spending within subsamples deﬁned by whether a rider is a “consis-
tent UberX user” (≥ 75% of trips in the preceding three months on UberX), a “consistent
UberPOOL user” (≥ 75% of trips in the preceding three months on UberPOOL), or the
intermediate case with no consistent product.
4.3 Impact of Repeat Apologies
Finally, consider our last hypothesis, Hypothesis 4: that repeat apologies
would be less and less eﬀective over time and may even be counterproductive.
For a subsample of riders we conduct the following secondary experiment. We
split the sample and oﬀer a second apology for half of the subjects who in
following weeks receive a second late trip, leaving the other half as a control
22

23.
(having only received one apology). For the subsample who received two
apologies we split the sample again for those who took a third late trip, oﬀering
half a third apology and leaving the other half as one ﬁnal control (who only
received two apologies).
(a) (b)
Figure 9: Treatment Eﬀect of Repeat Apologies
Panel (a) reports the marginal treatment eﬀect over the ﬁrst apology of a second apology
treatment for a second bad experience with Uber, compared to the relevant control group.
Panel (b) reports the marginal eﬀect of a third apology relative to the second. Eﬀects are
at a seven-day horizon.
As before, a cheap-talk apology alone without the $5 promotion remains
largely ineﬀective. However, whereas the short term eﬀect of the ﬁrst apology
with a $5 promotion yielded a 2% increase in net spending, the net eﬀect
on spending of the second apology is not signiﬁcantly diﬀerent compared to
someone who had a second bad ride but received no new apology message.
For the third bad ride, the apology on its own is insigniﬁcant again, while the
third apology with a promotion has a signiﬁcantly negative eﬀect on future net
spending relative to someone who received three bad rides but only received
two apologies with a promotion. In fact, this negative eﬀect shows up not just
in terms of future net spending but also in terms of the number of future rides
taken and in terms of future gross spending.
This backﬁre eﬀect we observe is consistent with an apology acting as a
promise. An apology can temporarily restore a customer’s loyalty after an
23

24.
adverse outcome. However, an apology acts as a promise that the adverse out-
come was due to unexpected external factors, and that the customer should
therefore expect better outcomes in the future. When those higher expecta-
tions go unmet, the ﬁrm reputation suﬀers more than if no apology had been
tendered at all. Apologies should therefore be used sparingly and ideally only
after unexpectedly bad outcomes that are unlikely to repeat again in the near
future.
5 Discussion
Since our principal ﬁnding is that it is primarily a promotional coupon
that can be used for a future trip that restores the ﬁrm’s reputation and not
the apology itself, one can ask: is this an “apology eﬀect” or just a “promo
eﬀect”? One approach to answer this query is to compare our estimated eﬀect
sizes with the eﬀect of a generic $5 promotion sent out randomly by Uber,
which will have no apology connotation.
Running concurrently in the cities where our experiment was conducted
(between the months of June and October of 2017), another experiment tested
the eﬀects of randomly sending a $5 promo against a control group that re-
ceived no promotion. While this serves as an important comparison experi-
ment, we should note that this natural ﬁeld experiment is not a perfect ana-
logue to our main apology experiment for two reasons. First, this experiment
proactively targeted the entire Uber rider population whereas our own exper-
iment targeted only those who had received a late ride. Having a late ride is
more likely to happen to more frequent riders simply by chance: more trips
implies a higher chance of at least one bad draw. To make treatment eﬀects
comparable, we restrict consideration to just those riders who experienced at
least one bad ride during 2017. It is important to note that while these riders
experienced a bad ride, the random $5 promos were not sent because of this
ride and could have been sent months before (or after) the experience.
A second limitation of our comparison experiment is that this generic
promo was usable multiple times and limited to a single week, whereas our
24

25.
promotion was one-time use in the next three months. Therefore, we might ex-
pect this generic promotion to be much more eﬀective at the seven-day horizon
than our apology promotion.
In fact, while the sample size is small (n = 27, 203), we ﬁnd that our “just
promotion” treatment in the aftermath of a bad experience is statistically sig-
niﬁcantly more eﬀective than the randomly-timed generic promotion. Stacking
the generic promotion data with our “just promotion” and control data, we
estimate:
ln(Outcomei) = α1·is generic+α2·is treated+α3·is generic·is treated+β·Xi+γcity+δdate+εi,
(2)
where the coeﬃcient on the interaction term α3 is the treatment eﬀect of receiv-
ing a generic (randomly-timed) promotion, compared to receiving a promotion
in the aftermath of a late trip.
Estimating (2) with net spend as the outcome variable at the seven-day
horizon, and using the same set of controls as in the previous analyses, we
ﬁnd that the randomly-timed promotion has a signiﬁcantly negative eﬀect of
-8.3% (p-value < 0.001) on future net spending, which is in contrast to the
positive eﬀects of the $5 “just promotion” without an apology. Importantly,
this suggests that it matters that the act of remediation occurred after an
adverse event, a breach of trust. This is, at least, consonant with the idea
that our $5 “just promotion” treatment had an extra impact after a bad trip
compared to the eﬀect observed after a generic $5 promotion is received.
This ﬁnding also lines up with the ﬁndings of an experiment that was
run independently, and concurrently, by the ridesharing company Via (Cohen
et al., 2018). This study also found that while a $5 promo after a bad ride
was eﬀective at increasing net spending, a $5 promo randomly given had an
insigniﬁcant eﬀect on gross and net spending. Cohen et al. (2018) also ﬁnd
that a cheap apology (without a promotional coupon) had no signiﬁcant eﬀect.
This replication with a diﬀerent company is encouraging in that it suggests our
results generalize to rideshare ﬁrms beyond Uber, which had perhaps a unique
reputation at the time our experiment was conducted. There are a couple
25

26.
diﬀerences observed between the Cohen et al. (2018) paper and our own that
are worth noting. They ﬁnd that apologies mostly matter for late pickups,
whereas our experiment focused on late arrivals. Indeed they ﬁnd null results
for late arrivals. They also ﬁnd that their apologies are most eﬀective for their
most frequent customers whereas we found indistinguishable treatment eﬀects
on users by frequency. These diﬀerences are likely due to Via’s model which
emphasizes shared rides. When a user hails a ride with Via, she knows that the
driver will pick up other riders along the way. Thus, she does not necessarily
have the same expectation for an on-time arrival.
The Via study, occurring in a diﬀerent geography and diﬀerent setting, is
also informative because apologies are undoubtedly context-dependent. Abeler
et al. (2010), who study apologies on an auction website, is similarly comple-
mentary. Interestingly, they ﬁnd that cheap apologies were more eﬀective than
monetary compensation. We have two possible explanations for the incogru-
ence between our results and Abeler et al.’s insights. First, their outcome
variable was the customer’s rating of the seller on the auction website. This is
relatively costless for the customer to change. The second is that their oﬀer of
monetary compensation was oﬀered as a quid pro quo payment to the customer
to change the rating (which may have been construed as a bribe) whereas in
our case, the monetary compensation was oﬀered as a gift. Of course, our
thoughts are merely speculative, and further experiments are needed to pre-
cisely identify the role of norms and context.
Future research can also better identify the mechanisms that determine
how apologies work. Apologies can contain monetary restitution, admission of
guilt, promises about the future, expression of empathy, or even excuses (See
Appendix for more details). The experiment was designed to test diﬀerent
apology mechanisms by varying the message that accompanied the apology
and by estimating the eﬀect of apologies in diﬀerent traﬃc and weather situa-
tions. While some of the eﬀects of diﬀerent apology messages were directionally
consistent with predictions from theory, the signiﬁcance of their eﬀect was not
consistently robust, perhaps because the email text associated with promotions
was not carefully read by customers (email open rates averaged approximately
26

27.
30%). Similarly, the eﬃcacy of apologies did vary by weather and traﬃc, but
not in any systematic way discernible through the lens of theory.
6 Conclusion
We present results from a large-scale ﬁeld experiment on the eﬀects of
apologies on restoring trust within a principal-agent relationship between cus-
tomers of a ridesharing ﬁrm and the ﬁrm itself. We oﬀer not just evidence
that apologies matter, but also insight into how apologies matter. Our results
have implications both for ﬁrms deciding how, and when, to apologize and
for understanding how trust can be repaired in economic relationships more
generally.
We ﬁnd that the most eﬀective apology was the provision of a $5 coupon,
with or without any accompanying apology text. Giving such a coupon after a
bad ride was more cost-eﬀective than $5 coupons given at random. Apologizing
without oﬀering a coupon has a potentially negative eﬀect on future spending.
We further examine dimensions of customer characteristics and characteristics
of the adverse outcome that could help provide guidance for more eﬀective
apologies going forward, such as the customer’s familiarity with the product.
Furthermore, apologizing repeatedly to the same person who had multiple
bad experiences in a three-month period reduced future spending, relative
to someone who also had repeated bad rides but did not receive repeated
apologies.
We articulate a game theoretic rationale for how customers respond to
apologies that can lend a model to help organize our results. In this way, our
data provide empirical support for interesting aspects of the general apology
model. While previous lab studies have served to provide important insights,
our data demonstrate the value of the signaling view of apologies by showing
that its predictions hold up in the ﬁeld. Our analysis also provides useful advice
for ﬁrms on the ifs, whens, wheres, and hows to best apologize. We ﬁnd that
while apologies can be an eﬀective way to restore and prolong the customer
relationship, the reason why apologies are not more frequent is because they
27

28.
are costly and could potentially backﬁre. Firms often do not apologize because
apologizing is diﬃcult. The safest way to remediate a bad experience is a
simple promotion applied to future purchases. We ﬁnd that money spent in
this way, after an adverse event, yields a positive return for the ﬁrm even when
promotions sent at other times do not.
There are several opportunities to expand on our experiment. Future work
should explore the impact of apologies in other industries and include greater
variation in the cost dimension. In particular, we remain interested in ex-
ploring the role of diﬀerent kinds of apologies where the implicit promises
associated with an apology are made even more explicit.
References
Aaker, J., Fournier, S., and Brasel, S. A. (2004). When Good Brands Do Bad.
Journal of Consumer Research, 31(1):1–16.
Abeler, J., Calaki, J., Andree, K., and Basek, C. (2010). The power of apology.
Economics Letters, 107(2):233–235.
Arrow, K. J. (1972). Gifts and Exchanges. Philosophy & Public Aﬀairs,
1(4):343–362.
Battaglini, B. Y. M. (2002). Multiple Referrals and Multidimensional Cheap
Talk. 70(4):1379–1401.
Bhuiyan, J. (2018). Uber powered four billion rides in 2017. It wants to do
more – and cheaper – in 2018.
Chandar, B., Muir, I., List, J. A., and Gneezy, U. (2018a). Towards an Under-
standing of the Economics of Tipping: Evidence from a Nationwide Field
Experiment at Uber. Working paper.
Chandar, B., Muir, I., List, J. A., Woolridge, J. M., and Horta¸csu, A. (2018b).
Inference in Cluster-Randomized Experiments with Panel Data: Evidence
from the Launch of Tipping at Uber. Working paper.
Charness, G. and Dufwenberg, M. (2006). Promises and Partnership. Econo-
metrica, 74(6):1579–1601.
28

31.
A Appendix A: Diﬀerent Kinds of Apologies
A key part of the original design of the experiment was to test diﬀerent
kinds of apologies by modifying the text of the apology message. The intent
was to identify evidence for the diﬀerent mechanisms identiﬁed by Ho (2012),
which classiﬁed apologies into one of ﬁve categories:
1. Costly apology: “I’m sorry, here’s $5.” An apology that involves a
tangible cost.
2. Commitment apology: “I’m sorry, I won’t do it again.” An apology that
promises to do better in the future. Based on a screening contract.
3. Status apology: “I’m sorry, I’m an idiot.” An apology that admits
incompetence. Based on two-dimensional type.
4. Empathy apology: “I’m sorry, I see that you are hurt.” An apology that
recognizes the other’s pain. Based on information partitions.
5. Excuses: “I’m sorry, it wasn’t my fault.” An apology that blames exter-
nal factors. Based on veriﬁable cheap talk.
Our study was designed to focus on the ﬁrst three. Empathy was thought
to be too diﬃcult for a corporation to communicate over an email while excuses
would have been technically more diﬃcult and potentially had greater negative
consequences.
The idea of the three types of apologies were conveyed to Uber’s marketing
department who designed messages consistent with the intent of the theory but
also consistent with Uber’s marketing practices.
A.1 Commitment Apologies
The theoretical basis of the commitment apology is a screening contract.
As noted in the main text, the principal (consumer) oﬀers the agent (ﬁrm) a
menu that says the following: If the ﬁrm apologizes for the breach of trust,
then the relationship will be continued; however, if the ﬁrm is late again, the
relationship will be immediately terminated in favor of the outside option.
31

32.
In each round the principal has the option of staying with the current
agent or choosing an outside option. In a commitment apology, the princi-
pal commits to a menu of rewarding the agent using its future decisions to
stay with the current ﬁrm based on whether they apologized or not and their
trustworthiness in future periods.
Table 5: Continuation values for commitment apologies.
Agent Behavior Cont. Value
Apologize, then good ride vg
1
Apologize, then bad ride vb
1
No apology, then good ride vg
0
No apology, then bad ride vb
0
If good-intention ﬁrms are more likely to have good rides in the future, a
separating equilibrium exists where good-intention ﬁrms apologize and accept
the threat of immediate termination while bad-intention ﬁrms do not apologize
and are judged in the future based on their performance (See Ho (2012) Online
Appendix for details). The principal must commit to future behavior such that
vg
1 > vg
0 > vb
0 > vb
0
. Note this is not renegotiation-proof since once a ﬁrm apologizes it reveals
itself to be of good intentions. This suggests a role for emotional motivations
that maintain the equilibrium behavior.
A.2 Status Apologies
An alternate contract theory-inspired model for how apologies restore trust
in a relationship is based on the idea that intrinsic type, θ, is two-dimensional.
A ﬁrm can have good intentions but they may be unreliable for some types
of tasks (Chaudhry and Loewenstein (2017) provide recent evidence on how
apologies rely on the trade-oﬀ in the the agent’s perception of the principal’s
32

33.
competence versus the principal’s warmth). Suppose the distribution of ex-
ternal shocks, ω is correlated with the agent’s type. The principal can choose
which tasks to assign to the agent depending on her beliefs about the agent’s
type.
For example, suppose some ﬁrms are better suited for rides to the airport,
while other ﬁrms are better suited for rides downtown. Here, the principal can
oﬀer a screening contract after a bad airport ride, where if the agent apologizes,
they implicitly acknowledge their own inadequacy at airport rides, and if the
agent doesn’t apologize, they implicitly admit to having poor intentions. A
separating equilibrium can be enforced if the agent can assign similar tasks in
the future to agents who do not apologize but diﬀerent tasks to agents who
do apologize.
An agent with good intentions but who admits to being bad at providing
airport rides will get more city rides in the future. An agent with bad intentions
will not apologize because they ﬁnd the airport rides to be more lucrative. The
agent with good intentions would not choose to not apologize because they
know they are bad at them. As in Battaglini (2002), the presence of multiple
dimensions of type allows the principal to get fully revealing information from
a cheap signal.
In our experiment we expected to ﬁnd evidence for status-based apologies
if a rider responded to an apology by decreasing future spending for similar
rides but increasing it for dissimilar rides. However, testing for such eﬀects for
airport versus non-airport rides, weekday versus weekend rides, and rush hour
versus non rush-hour rides, returned statistically indistinguishable treatment
eﬀects.
B Appendix B: Experiment Details
As discussed in Appendix A, riders who were sent an apology email received
one of three types – a “basic apology”, a “status apology”, or a “commitment
apology” – that either included a $5 promotion or did not. A screenshot of the
basic apology email, with a promotion, is shown in Figure 10a. Additionally,
33

34.
(a) Example screenshot of one of
the apology emails sent to riders:
the “basic” apology with a $5 pro-
motion.
(b) Screenshot of the “just promo” email
sent to riders, i.e. not including an explicit
apology.
Figure 10
one treatment group received an email with the $5 promotion, but no explicit
statement of apology, shown in Figure 10b. The apology emails diﬀered in
their subject lines and body paragraphs in the following way: Basic apology:
• Subject line: “Oh no! Your trip took longer than we estimated”
• Body: “Your trip took longer than we estimated, and we know that’s
not ok. We want you to have the best experience possible, and we hate
that your latest trip fell short.”
Commitment apology:
• Subject line: “We can do better.”
• Body: “Your trip took longer than expected, and you deserve better.
34

35.
This time we missed the mark, but we’re working hard to give you arrival
times that you can count on.”
Status apology:
• Subject line: “We know our estimate was oﬀ.”
• Body: “We underestimated how long your trip would take – and that’s
our fault. Every trip should be the best experience possible, and we
recognize that your latest trip fell short.”
35