Pay-for-Performance: An Overview

Written By: Jason Shafrin
-
Jan•
20•11

Pay-for-performance schemes have been attracting much attention of late. A RWJF policy paper from 2007 provides a nice overview of various P4P design considerations. Below, I summarize some of the key findings of this article.

Timeline

Early 1990s. Cost control measures in the early 1990s began a shift for rewarding physicians for specific actions. A survey of HMOs conducted in 1992, about 20 percent of responding organizations said their payments to physicians incorporated some reimbursement for performance on quality of care, with 20 percent also reporting physician payments tied to consumer satisfaction measures.

1991. The first version of Healthcare Effectiveness Data and Information Set (HEDIS) was produced. Version 2.0 was developed two years later.

2001. The Institute of Medicine (IOM) report on “Crossing the Quality Chasm” galvanized purchasers and physician organizations around the challenge of improving quality.

2004. A study by Reschovsky and Hadley found that in just 28% of primary care physicians in group practices reported quality-based incentives in their compensation arrangements, modestly higher than the share reporting such incentives in 1996/1997 (26%).

2010. Health reform (ACA) provisions call for a number of demonstrations to incorporate value-based purchasing into the reimbursement system.

Key P4P Design Issues

Improvement vs. Attainment. Rewarding improvement maximizes the potentially quality of care gains, since low-performing physicians are the ones who can improve the most. On the other hand, rewarding achievement seems more fair as physicians who deliver superior care deserve to be rewarded for their efforts.

Absolute vs. Relative Ratings. Absolute rating reward all physicians who meet a certain threshold. Although absolute thresholds are conceptually appealing, they require significant expenditures to research the correct benchmark to set. Further, absolute rating scales can get expensive if the majority of providers pass a threshold. On the other hand, relative ratings are easier to administer and can be budget neutral. On the other hand, if a measure is “topped up” and most physicians have near 100% ratings, the relative ratings may be uninformative.

Risk Adjustment vs. Exemptions. Typically, physician scores are risk adjusted to reflect differences in case mix. Risk adjustment, however, may be imprecise or even unapplicable to certain pay-for-performance metrics (e.g., influenza vaccinations) On the other hand, some P4P programs allow physicians to exclude patients from performance measurement who have certain pre-specified characteristics. Allowing for these exemptions, however, has lead to gaming in such P4P schemes as the U.K.’s Quality and Outcomes Framework (QOF) program.

Small Sample Size. Often, physicians cannot be scored because they have too few patients who are eligible to be scored for a given metric. If a physician only have 2-3 patients eligible, the physician score is not very informative and can be heavily affected by a single outlier. Methods of dealing with this issue include: not scoring physicians on metrics where they have a small sample size, score physicians based on a moving average score across 2 or 3 years of data to improve sample size, or rating by practice rather than by individual physician. Only 13 percent of responding HMOs in Rosenthal et al. (2006) focused incentives on individual physicians

Multiple Providers. Oftentimes, patients are treated by multiple physicians. In these cases–especially for outcome measures–it is often difficult to attribute the metric to the physician responsible. Episode groupers have been proposed as a solution to this problem, but the current grouping software is far from perfect, especially for patients with multiple chronic conditions.

Unintended Consequences. P4P may lead to a number of unintended consequences. For instance, P4P may result in better documentation of care, without a concurrent improvement in actual care. In addition, physicians may move their practices to areas where they believe patients can more effectively manage their own care; coordination of care could decline, especially for patients with multiple illnesses; physicians might focus on improving care only in areas addressed by financial rewards; and practice administrative costs could increase.

Cost. There are two key cost components from the payers side. These include the value of payouts and the costs of program administration. Providers also incur costs from additional labor hours needed to comply achieve the quality metrics, additional costs to document the care provided, and extra administrative costs to distribute the P4P bonuses from the group practice to individual physicians. Sometimes P4P is cost effective, such as was the case for diabetes care (see Curtin et al. 2006)–but not all P4P has been shown to be cost effective, especially in cases where there is little behavioral response.

Premier HQID

One example where P4P was implemented was the Premier Hospital Quality Incentive Demonstration (HQID). A paper by Lindenauer et al. (2007) review the demonstration. 613 hospitals participated in the demonstration over a two-year paper. These hospitals were judged on 33 quality measures for five clinical conditions: heart failure,acute myocardial infarction, community-acquiredpneumonia, coronary-artery bypass grafting, andhip and knee replacement. Ten of these measures were already being used for public reporting on the Hospital Compare website.

To be eligible for payment, hospitals needed to have a minimum of 30 cases per condition annually. “For each of the clinical conditions, hospitals performing in the top decile on a composite measure of quality for a given year received a 2% bonus payment in addition to the usual Medicare reimbursement rate. Hospitals in the second decile received a 1% bonus. Bonuses averaged $71,960 per year and ranged from $914 to $847,227. These additional payments are anticipated to be partially offset by financial penalties ranging from 1 to 2% of Medicare payments for hospitals that by the end of the third year of the program had failed to exceed the performance of hospitals in the lowest two deciles, as established during the program’s first year.”

The paper by Lindenauer and co-authors examines whether public reporting or public reporting plus payment really makes the difference. As any economist would suspect, payment matters. The authors found the following results:

“Baseline performance was inversely associated with improvement; inpay-for-performance hospitals, the improvement in the composite of all 10 measureswas 16.1% for hospitals in the lowest quintile of baseline performance and 1.9% forthose in the highest quintile (P<0.001). After adjustments were made for differencesin baseline performance and other hospital characteristics, pay for performance wasassociated with improvements ranging from 2.6 to 4.1% over the 2-year period.”