Incorporating reputational concerns in public sector reform: it may be effective but needs creative monitoring

Public sector reforms often attempt to mimic the “discipline” of the market in order to spur better performance among service providers. Examples include numerous variants of performance based financing for health, where health providers are compensated monetarily for achievement of specified health targets. At the heart of this approach is a standard view of economic agents induced to modify behaviors through pecuniary incentives. Yet incentives do not have to take the form of money, and reforms that attempt to modify behavior based on an agent’s concern with reputation have also found some success.

A quasi-experimental evaluation of a hospital quality ratings system instituted among U.S. hospitals in the state of Wisconsin systematically varied the public release of assessment reports. Low-scoring hospitals in the publicized group took efforts to improve performance in time for the next round of assessment. However hospitals with the same low score but in the non-publicized group did not address the shortcomings. The motivation for redress among low performers in the publicized group did not directly appear to be pecuniary as there was no disciplining device of consumer demand – poor reports didn’t affect patient choice as patients could not process the released information, at least as reported by the hospital managers. What motivated reform effort was management concern with hospital reputation.

(Of course an agent may care about reputation because of pride, or because reputation can ultimately affect earnings and careers – it’s difficult to separate these two channels. Regardless of the mechanism, the wide dissemination of rating scores is an integral feature of a “ratings” reform.)

The U.K. National Health Service adopted a universal benchmark for ambulance organizations: success was defined to be at least 75% of life-threatening emergency calls met within 8 minutes. However the public release of such a rating system was left to the discretion of the different U.K. regions: in England the ambulance ratings were made available to the public starting in 2001 while in Wales the same rating system was only revealed to the assessed health organizations for use as a managerial tool.

Publicizing the English low performers was explicitly meant to motivate behavior through “naming and shaming” while high performing organizations were celebrated. And shame avoidance motivation again appeared to work: For English ambulances, the timely response rate increased significantly from 1999 to 2003 – the percent of priority emergency calls serviced within 8 minutes rose from 55% to 75%, while the rate for Welsh ambulances remained flat at 55%.

But all is not rosy with these relatively simple interventions of informational tracking and release. For one, any reform needs to ensure that the selected ratings measure what matters – for example school ratings are often based on test performance, but test results are not a complete measure of education, and ratings systems risk teachers focusing their efforts on “teaching to the test” at the expense of a broader education.

In the U.K. case, the metric of prompt ambulance response to emergency calls is a salient (although not complete) measure of effective service– for example the heart attack survival rate is very sensitive to prompt treatment. However there were anecdotes that ambulance services relocated closer to urban areas in order to respond more promptly to the majority of calls. While this type of response to the ratings system does increase the number of patients promptly attended, it also raises the issue of equity and presents perhaps unanticipated consequences of such a reform.

Another key challenge is “gaming”: agents’ manipulation of information reporting for their own benefit.

After public disclosure, the English ambulance response time distribution exhibited a sharp discontinuity at exactly 8 minutes whereas before 2001 there was no such discontinuity. This suggests the deliberate reclassification of response times to just below the 8 minute threshold when the true response time actually exceeded it. Subsequent analysis indicates that one third of the reported performance gain is due to gaming and not actual improvements in service. This is a serious challenge to any performance rating system, whether or not it is explicitly tied to monetary incentives.

How can the rating system be made more “game-proof”? Well a highly entertaining paper by Bevan and Hood mentions one approach used widely in a system that overwhelmingly relied on targets to galvanize behavior – the Soviet example of “hanging the admirals” i.e. liquidating the managers who were caught gaming their targets. So certainly excessive or extreme sanction may give pause to potential gamers.

But, more realistically for the settings we work in today, a robust auditing system that introduces unannounced, and hence uncertain, audit activities may reduce gaming behavior. Bevan and Hood give the example of traffic cameras that record speeding cars. Drivers may know the location of the cameras but not whether any particular camera is operating or the precise speed that trips the camera into action. If these parameters were known than drivers would be able to “game” the system and drive right up until the trip speed in the presence of the camera (and speed elsewhere). Introducing uncertainty around such features as the timing of audit or the targets/performances assessed in audit is likely to be key in efforts to reduce gaming.

Comments

This is an excellent analysis of measurement traps and pitfalls that we don't routinely figure into measuring outcomes. It's fairly intuitive to expect people to be super-prepped for official reviews. For example, in American hospitals, we always cleaned up our act when we knew that the Joint Commission for Accreditation of Hospitals was due for a visit. There are other not so sinister risks (i.e., gaming the system the way you describe above can result in huge inequities)of relying on recorded data with no secondary oversight or robust auditing such as just plain laziness in recording patient vital signs and monitoring instrumentation settings. The temptation to just write what the person before you wrote is strong - especially if the patient is in no distress and appears the same as s/he did in your last visit. That's not to say that I did that, but it was a standing joke in the field. Here is another good example of secondary monitoring influencing behavior change - this time in handwashing: http://www.nytimes.com/2012/02/07/health/research/when-watched-and-cheered-on-icu-workers-wash-hands-more.html
Thank you for so writing clearly and succintly, thus maximizing the impact and penetration of the knowledge you share. Will definitely post this blog in this month's HNPFLASH: http://newsletters.worldbank.org/newsletters/listnl.htm?nl=hnpflash.

Homira, thanks very much for the stories (and links). I think this general (optimal monitoring) work can benefit from more formal modeling, so I will be looking for more of this in the future... Best, Jed

This is a great blog. However it has a double fallacy: it assumes that Performance Based Financing for Health is only about mimicking the discipline of the market to improve health provider performance, and second it implicitly equates rich-country Results-Based Financing schemes with those in low-income countries. Performance-Based Financing in low-income countries is a health system reform which has many different aspects: it introduces performance frameworks at all levels of the health system and not only at the health facility level; PBF systems introduce new forms of public oversight at the health facility, district level and national levels; and PBF systems use new and intense ways of looking at performance data which include benchmarking and yard stick competition. In summary, although well-designed PBF systems in low income countries do attach a significant financial signal to quantity and quality performance, they also ensure coaching of individual health facilities in improving their performance, they also strengthen district and national levels to carry out their functions better and they do systematically analyze health facility and health administration performance, which include benchmarking and yard stick competition between health facilities and health administration. Whilst nodding at words of George Bernard Shaw as: ‘lack of money is the source of all evil’, such PBF systems also aim at getting more value for money using all sorts of instruments. Using benchmarking and yard stick competition are part and parcel of PBF approaches in low income countries, but certainly (and luckily) not the only instruments.

Hey Gyuri, thanks very much for your comments/thoughts, and I certainly agree with your characterization of PBF - it is often a whole series of bundled interventions including training, a refocus on reporting mechanisms, devolved autonomy, etc. (i.e it is a program) and I certainly wouldn't want to mischaracterize it - but I'm not sure I do. A full description of PBF is not the focus of the post, hence the title of the post which instead references the broader issue of public sector reform and one particular challenge of effective monitoring.
That said, PBF is not referred to as "Performance Based Financing" simply because it's an elegant title... tying some portion of system financing to agent reporting (and very often self-reporting) is a central motivating concept. Hence the issue discussed here is germane to the various PBF issues... please comment more often!