P4P: Do we really want to go there?

PeterElias

28 Jun 2013

In health care, replacing volume based economics with value based economics is essential. The 'do-more-bill-more-earn-more' approach is unsustainable. However, pay-for-performance (P4P), a tool being proposed to control costs and improve value, has its own hazards. Let me give you an example.

Imagine an institution is about to roll out a new compensation package where clinician pay will be cut if the clinician fails to reach defined quality targets. Salary drops 2% unless certain preventive targets are met and an additional 3% if certain diabetes targets are not met. (And 1% of clinician salary is dependent on the clinician being more popular with his or her patients than 75% of other clinicians, a true example of the lake Wobegone paradox. I won't waste time here discussing the absurdity of this.) Here's how the targets might look:

The preventive targets:

76% of eligible women between the ages of 42 and 69 must undergo mammography every 24 months
87% of eligible women between the ages of 21 and 64 must have a Pap smear every 3 years.
49% of eligible women between the ages of 16 an 24 must undergo annual Chlamydia screen.
70% of eligible patients over 50 must have an annual influenza vaccine.
80% of eligible patients over the age of 65 must receive a pneumococcal vaccine.

The diabetes targets:

85% of diabetics must have an A1c http://en.wikipedia.org/wiki/Glycated_hemoglobin of < 9
40% of diabetics must have an A1c of < 7
80% of diabetics must have a documented foot exam within the past 12 months
100% of diabetics must have a PCP visit within the last 12 months
60% of diabetics must have a dilated retinal exam in the last 12 months
85% of diabetics (without known renal disease) must have micro albumin https://en.wikipedia.org/wiki/Microalbuminuria screening for early diabetic renal disease
85% of diabetics must have their smoking status documented or reassessed in the last 12 months
65% of diabetic smokers must have been given smoking cessation advice.
65% of diabetics must have a blood pressure of < 140/90

Don't get me wrong. These are all excellent and unassailable goals and therefore deserve to be selected for financial consequences. Maybe. Or maybe not...

These metrics attempt the impossible: turning complex issues into simple binary questions, with standardized solutions for individualized patients. There isn't much clinical difference between an A1c of 9.1 and 8.9, and smoking cessation and blood pressure control probably have more impact on diabetic complications than lowering the A1c below 9. Tight control of diabetes (e.g. below an A1c of 6.5) can be harmful. Even control to an A1c of 7.0 is problematic in some settings, especially in the elderly where low sugars are associated with increased risk of falls, fractures, heart attacks and strokes. One must have an expected life span of 10 years to benefit (statistically) from tight control, and in the setting of end stage kidney disease or severe vision impairment, there is no evidence of benefit to lowering the A1c. The ADA explicitly states that standardized A1c targets should not be applied to all diabetics (or to grade clinicians), and is especially inappropriate in the elderly. Screening for micro albumin is pointless unless a (+) results in changes in treatment, but the P4P system rewards only screening; one gets full credit by screening but ignoring abnormal results. (A comprehensive discussion of diabetes management is available. It shows how nuanced quality care of diabetes is - and how inappropriate binary incentives are.) Smoking cessation advice can be satisfied by saying clicking a box and saying: "You know that smoking isn't good for you, right?" If the patient has a recent diagnosis of cancer (or a spouse with that diagnosis), is depressed, has had a back injury and is out of work, has a child with special needs, has a high deductible, or has a work setting that puts him at risk of unemployment if he misses work - then focus on these institution-centric goals may not be appropriate, and time spent on them may actually worsen outcomes.

On top of that, it can contribute to a delusional pursuit of metrics (confusing the map for the territory).

Why do I see this as a problem? Four reasons.

First, and most importantly, it is unethical because it undermines patient autonomy.

Patient autonomy is one of the four core principles of medical ethics. Some argue that it is THE core principle, and that the other three - beneficence, non-maleficence, and justice - are derivative principles. Patient autonomy is the right or ability of a competent patient to make independent decisions based on accurate information and using their own preferences and values. It assumes honesty and the absence of coercion or manipulation, which are often listed as secondary principles, necessary for patient autonomy.

P4P is an attempt to align the incentives of the clinician with the incentives of their employer, most often to reflect some form of revenue need or public health policy. Note the absence of any patient input!

Tying clinician pay to patient decisions changes the role of the clinician in a devastating way: it pressures the clinician to elevate personal or institutional financial goals, or public policy goals, ahead of the goals of the patient. Instead of providing accurate information and then helping the patient make an individual decision based on the patient's values, preferences and current context, the provider is expected to sell a particular decision.Instead of an unbiased advocate working for the patient, the clinician becomes a biased advocate working for the institution and himself. Replacing patient-centric goals with institution-centric (or public policy) goals will increase unwarranted care and widen the gulf between patients and their clinicians.

This fosters a reversion to paternalism. Two simple but illustrative examples:

Instead of asking what the patient knows about pneumococcal vaccine, explaining the potential benefits and harms, and then asking what the patient would like to do, the clinician will say: "You're 65 now, so you need the pneumococcal vaccine."
Instead of discussing the uncertain ratio of benefits and risks of mammography in the 44 year old woman, the clinician will say: "You are due for a screening mammogram."

Second, it is a barrier to patient engagement.

The time available during an office visit is already severely limited and rarely adequate to cover all the issues. It is routinely necessary to prioritize and defer, and patient issues tend to be ignored; it is hard for patients to bring their issues to the fore while the clinician is focused on the blood pressure, diabetes or medication reconciliation. When 5% of a clinician's pay requires accomplishing a defined list of tasks (regardless of the patient's preferences), and with no additional time or staff set aside for these tasks, it will be harder for the patient to contribute to (let alone set) the agenda at a visit. ("I know you're having trouble with your abdominal pain, Mrs. Smith, and you don't like the side effects from your BP medicine, but first we need to talk about flu and pneumonia shots and get your systolic under 140 so they don't cut my pay.")

Third, it disrupts other care processes.

A serious but insidious and often overlooked impact is the diversion of resources. Medical offices are the poster child for infinite demands on a system with finite resources. Every time a new task is mandated, an old task may be dropped. This is rarely a conscious decision based on needs or priorities, but usually happens without being noticed…until a problem occurs. ("When did we stop doing lead screenings at ages 12 and 24 months?") The need to achieve a defined and limited set of metrics to maximize reimbursement (or avoid penalties) generates an entire new collection of time and energy consuming systems. Staff must mine the data base and print lists of 'elegible patients' and then work on reaching the threshhold for quality. (Multiple people are assigned pieces of this and work independently of each other, resulting in considerable duplication of effort.) Staff must go through charts looking for exclusions to lower the denominator. Staff must go through charts to improve the numerator: searching alternative databases, calling patients, and contacting other offices in order to find patients who have had the interventions but where our non-interoperable electronic system has failed to capture it. All this time and energy must come from somewhere and the result is that there is less time and energy spent caring for patients and managing the practice. No staff meetings to discuss and improve processes, no system to track and recall patients who cancel appointments, no cancellation list, no system for ongoing staff training or education about appointment scheduling or nursing protocols, no system for secure electronic communication with patients, no quality review or updating process for triage or other high-risk office work flows. If leadership is asked about this, the answers may boil down to a lack of time/staff and the potential impact on productivity.

And there is no question that this disruption can cause serious harm.

(You would think these three would be enough to make institutions think twice about P4P. But wait. There's more.)

Fourth, it doesn't work. (At least, that’s what the evidence shows.)

There is little doubt that P4P changes behavior. In fact, one perspective is that a large part of our current problem is that we are currently using P4P, but using it to reward clinicians for volume rather than value.

There is weak evidence that P4P can change metrics when applied at the hospital level. There is no evidence that P4P improves quality when applied at the clinician level. We should understand the behavioral science and published evidence and practice EBM (evidence based management).

A 2011 Cochrane Review found "insufficient evidence to support or not support the use of financial incentives to improve the quality of primary health care." A 2012 review in the Annals of Family Medicine found the cost of P4P high, the benefits unclear and modest at best. The authors noted considerable uncertainty about adverse effects, and suggested limited use, preferably only as part of research into the efficacy and safety of P4P. A more recent 2013 Cochrane review found the studies too poor and the evidence too weak to support P4P as a quality activity.

A excellent review and commentary by Ariely and Woolhandler discusses in some detail the mounting number of studies that fail to show that P4P works in medicine (or other fields) and reviews it in the context of behavioral economics, and the science ofperformance and rewards. Their review concludes that clinician level P4P does not improve quality.

The single major US trial (Medicare’s Premier Hospital Quality Incentives Demonstration involving 200 hospitals) failed to show significant improvement in metrics after the first year and showed no improvement in quality. A major study of P4P and hypertension in primary care also failed to find an impact on quality.

An excellent paper by Berenson at the Robert Wood Johnson Foundation reviews the experience to date, and finds that P4P is only reasonable when applied at the institutional level (not for individual clinicians), suggests that quality metrics should focus on outcomes rather than processes and should be used as part of the rapid learning cycle to evaluate the efficacy of intervention to improve quality, not to evaluate individual clinician behavior.

The failure of studies of P4P to show improvement in outcomes is discussed frequently byFrakt and Carroll, two economists who specialize in health care systems.

To be fair, there is one study that clearly shows P4P can be effective at improving quality. The differences between the approach described in this paper and the approach taken by most institutions (including mine) are important. The UK study involved much larger bonuses which were not paid to clinicians but were re-invested in quality processes within the system. It also involved regular face-to-face meetings of front line staff to discuss and modify clinical processes.

Like any clinical intervention, P4P can have negative consequences. Replacing intrinsic motivation (based on moral and social contracts inherent to patient care) with a market economy (based on metrics tied to penalties/bonuses) rarely improves quality and usually has unpredictable negative consequences. A recent (2012) analysis in the BMJ suggested that P4P financial incentives are likely to do more harm than good and provides a checklist for doing P4P correctly to minimize this risk. Their first recommendation is that P4P be limited to circumstances where good quality randomized control trials show that changed behavior will improve outcomes – and the single out A1c as NOT meeting this criterion. They also discuss the risk of common unintended consequences, including attention shift, gaming, and harm to the patient-clinician relationship. An accompanying editorial argues that P4P is incompatible with quality improvement.

Deming, the Founding Father of quality and process improvement science, argued strongly against P4P:

“[It] nourishes short-term performance, annihilates long-term planning, builds fear, demolishes teamwork, [and] nourishes rivalry and politics. It leaves people bitter, crushed, bruised, battered, desolate, despondent, dejected, feeling inferior, some even depressed, unfit for work for weeks after receipt of rating, unable to comprehend why they are inferior. It is unfair, as it ascribes to the people in a group differences that may be caused totally by the system that they work in.”

The predictable failure of P4P is also discussed frequently in the non-medical management literature such as this article about how incentives are irresistable but likely to backfire.

In summary, then, P4P is unethical because it undermines patient autonomy,inappropriate because it interferes with patient centeredness, destructive because it impairs the health care process, and foolish because the evidence is clear that it doesn't work and has harmful side effects. I am inclined to agree with Don Berwick, who argued (1995: The Toxicity of Pay for Performance):

“Stop it.” [Such pay] is destructive of what we need most in our healthcare industry – teamwork, continuous improvement, innovation, learning, pride, joy, mutual respect, and a focus of all of our energies on meeting the needs of those who come to us for help. We can find better ways to decide on how we pay each other and better uses for our energies than in the study and management of carrots and sticks.”

Where does that leave us?

Should we abandon metrics and the quest for quality? Absolutely not. But we should do QI right.

Various more sensible uses of P4P have been suggested: put real money at risk, offer the incentives to institutions rather than individuals, change the metrics to reflect outcomes rather than process, align the outcomes with patient goals, make the process open and transparent, invest in fixing barriers to quality, and stop being prescriptive in order to allow/encourage innovation.

Instead of using quality metrics as a determinant of pay (under the assumptions that the barrier to quality is that clinicians are inadequately motivated and that quality is a single variable phenomenon) we should be applying standard and well established improvement strategies:

Identify what outcome we think needs improvement. Lets use screening for colorectal cancer as an example, but the process is broadly applicable.
Mine the data base for patients eligible and notify them, offering screening using current approaches.
Identify the patients who were NOT screened and investigate why. The possibilities based on my conversations with patients might include: knowledge deficit, logistics, barriers to scheduling, transportation, fear, lost time from work, poor insurance coverage, other issues in the patient's life that are more immediately important.
Stratify the list of identified barriers according to frequency and remediability.
Create test interventions to address the most important barriers: phone call with education, offer transportation, find funding sources, expand endoscopy hours, discuss non-colonoscopy alternatives.
Use the metrics to determine which of these work and which do not.
Review and modify those that do not work, and retest.

This kind of self-conscious quality program would improve colorectal screening rates by finding and addressing the barriers. This is basic improvement science. It is also the standard in teaching and coaching.

What many institutions are doing with P4P is not quality improvement work. It is revenue enhancement and forensic documentation work. It may enhance revenue and make documentation conform to the demands of auditors and payors, but it will most emphatically NOT benefit our patients. If medicine must do P4P, so be it. (I personally believe that if we do true quality improvement work we will meet and exceed the quality metrics as a side benefit of improving quality.) But we should not lie to ourselves or others by calling it a quality improvement strategy. It is not.

Links to more on this topic:

Medicine