STATS ARTICLES 2008
How bad is American health care?
Nirit Weiss MD, MBA, December 2, 2008
The Commonwealth Fund’s National Scorecard has been hailed in the media for diagnosing America’s health care woes and for offering a roadmap to recovery. But now that reform is a political priority, it’s time to ask, does the Scorecard do what it claims it does, or is it in need of urgent reform itself?
In 2005, the Commonwealth Fund established a commission to help, as it put it, “move the U.S. toward a higher performing health care system that achieves better access, improved quality, and greater efficiency.” The commission’s first task was to develop a way of measuring health care performance – to set benchmarks that could be evaluated from year-to-year and thus provide a way of tracking improvement or decline. The commission said its scorecard was the most comprehensive attempt to rate health care across so many areas; and so, when the first results were published in 2006, the conclusions – that American health care was falling far short of what might be achieved, and what other countries were achieving – received widespread coverage in the media.
On July 17, 2008, the Commonwealth Fund commission published its second Scorecard report, Why Not the Best?: Results from the national scorecard on U.S. Health System Performance, 2008. Its findings that U.S. health care performance had not improved since 2006 and that access to health care significantly declined were again reported widely in the media and in a way that gave wide credence to the Scorecard’s conclusions (see sidebar). As the New York Times noted, “The findings are likely to provide supporting evidence for the political notion that the nation’s health care system needs to be fixed.”
"This is a real wake-up call," Paul Ginsburg, president of the Center for Studying Health System Change, a nonpartisan research group in Washington. told Dow Jones MarketWatch. "It's really telling us that because our delivery system is so fragmented [and] disorganized with the wrong payment incentives that our country is really suffering from that."
Summary of 2008 Scorecard findings
The 2008 Scorecard’s conclusions can be summarized as follows
- The U.S. falls short in performance in 37 indicators across five dimensions of health system performance.
- National and collaborative efforts to measure and report performance have led to improved health care delivery.
- The U.S. health-care system provides a poor return-on-investment given its high costs.
- Eliminating inefficiencies would drastically improve quality of health care, at a small fraction of the current cost.
- Improved primary care delivery would lead to better outcomes, and lower costs.
- There are already many countries currently providing significantly higher quality of care, with significantly lower costs.
- A universal, one payer insurance system would provide higher quality care, at reduced costs
However, the 2008 Scorecard must be interpreted with caution. In attempting to diagnose the ills of America’s healthcare system, the Scorecard suffers from serious flaws that challenge the validity of its conclusions – flaws that were, essentially, ignored by the authors of the study and completely missed by the media coverage. These flaws fall into three categories:
- The methodology by which the data were collected and the studies were designed to address specific questions.
- Arbitrary definitions and metrics used to define the concept of “quality” in health care.
- Sweeping, broad conclusions that are unsubstantiated by the findings of the study.
1. Flawed methodology
The 2008 Scorecard reviewed and synthesized numerous individual studies in order to compare U.S. and international performance on 37 “indicators” across five “dimensions” of health system performance. U.S. performance was compared with a “benchmark” performance for each indicator, which was based upon scores achieved in any of the following: the top 10 percent of U.S. states, or regions, or hospitals, or health plans, or other providers, or the international community. In a number of calculations, “benchmark” performance was simply based upon “logical policy goals, such as 100 percent of the population to be adequately insured.”
Unfortunately, the Scorecard is based upon multiple disparate studies, using various methodologies, non-uniform definitions of “benchmark,” and arbitrary assumptions as to what “logical policy goals” are, and what “adequately insured” actually means. The Scorecard attempts to draw meaningful conclusions based on a summation of individual studies with varying sample sizes, varying performance comparisons, and varying data collection techniques.
In peer-reviewed, scientific literature, it is invalid to lump together the results of multiple studies, using multiple methodologies, in the same charts, graphs, and conclusions, without assigning relative weight to the results of the studies. Adding even more to the confusion, many of the reported data are not directly referenced to published studies, so it is impossible to trace and evaluate the sources of the information. A substantial number of the individual analyses were merely described as “conducted by the authors,” limiting the reader’s ability to evaluate the quality and validity of the studies.
Perhaps more troubling are those data which can be traced back to their source studies, and turn out to be based on patient self-reporting. For example, in its section on “quality,” the Scorecard quantified mistakes made in health care delivery according to patient interpretation and self-reporting! Patients were asked how often they felt mistakes were made in their medical care, such as errors in laboratory testing, or medication errors.
The problem with this approach is clear: Identifying mistakes in health care requires a great deal of medical training, insight, and experience. Even among highly trained specialists, what constitutes a medical error is often hotly debated, and this has been made clear in multiple studies, including the Institute of Medicine’s own 1999 report To Err is Human. Patient perception of medical error in their care may or may not be determined to be a valid metric to be studied in quantifying patient satisfaction as a consumer of healthcare; however, it simply is not a measure of true medical error, and would never be accepted for publication in a peer-reviewed scientific or medical journal as a determinant of such. Surely, any nationally-distributed report striving for policy-changing influence should be held to the same standards?
In addition to medical error, the Scorecard findings on activity limitations due to health problems, “dissatisfaction” with the health care system, ease of access to after-hours care, and unnecessary repetition of medical tests, were also quantified based on patient self-reporting. Again, this is a highly problematic metric, because of the questionable ability of a non-medically-trained patient to determine what is appropriate medical testing, and because the scoring of these self-reported measures depends on the relationship between the reality of health care delivery and the expectations of the American healthcare consumer.
Americans expectations of their healthcare system, and of the state of medicine as it exists in 2008, differ from the expectations of those in other nations, and are often not achievable even with the highest quality of delivery at the present time. These discrepancies bias any data drawn from studies based on patient self-reporting, with the potential for U.S. patients reporting lower satisfaction rates, despite higher quality of care.
2. Arbitrary metrics used to define “quality”
The 37 “indicators” across five “dimensions” are modeled after those used in studies of industry, and focus on health care delivery systems performance, which is indeed one component of value and return-on-investment.
But it is not the largest determinant of what most Americans would define as “quality.” Assessing the quality of delivery of goods by studying uniformity, for example, is appropriate when evaluating the transformation of undifferentiated inputs into uniform outputs, each machined to be identical to the other.
In other words, the conclusions of the study are dependent on the authors’ assumption that all patients with a given diagnosis, say diabetes, are otherwise identical, and should have no difference in outcomes. This input/output calculation disregards the fact that all inputs, such as patients with diabetes, have other comorbidities, and cannot be expected to have the same outcome or outputs.
In the practice of medicine, one deals with people and disease processes which are unique on a case-by-case basis, and the outcomes are not expected to be identical between patients. A large contribution toward patient outcomes depends upon the population of individual patients being treated, and there is tremendous variability in the degree of responsibility patients are willing to assume or are capable bearing.
Another problem with the metrics under evaluation is that they are taken to be uniformly applicable throughout the entire health care system. Clearly, primary care and the treatment of related chronic diseases affect the largest number of U.S. patients, consume most of our health care resources, and should be optimized; however, there are a large number of highly specialized, costly, life-altering treatments, and other subspecialty healthcare interventions which should not just be ignored. Americans expect access to latest surgical techniques, the most effective medications, and cutting-edge technology, without government restriction or attempts at rationing. This represents a huge cultural difference between patients in the United States and other nations. While uniformity of health care delivery has some value in being studied, additional metrics need to be designed and validated for use in quantifying these aspects of the U.S. healthcare system.
3. Unsubstantiated conclusions
The authors of the 2008 Scorecard emphasize the point that what receives attention, and what gets measured and reported, gets improved. That is likely true, as hospitals and physicians scramble to meet compliance requirements with ever-changing guidelines. Numeric targets are set, and vast resources are dedicated to reaching “benchmark” goals. The result is that the follow-up measurements of these metrics likely do gradually improve. However, achieving the quantified, targeted scores does not necessarily mean that quality has improved.
This places the greatest burden upon those who study the U.S. healthcare system to define and select metrics which truly reflect the values shared by the American public, and which have the greatest impact on “quality” of care. Indeed, many physicians perceive that the quality of care they deliver is decreasing as a result of recently imposed regulations, some of which have been structured in response to reports of questionable validity. As a result, physicians are reluctantly leaving the practice of medicine in record numbers, leading to regional shortages, and decreased patient access to high-quality medical care. Given the potentially far-reaching implications of publications such as the 2008 Scorecard on policy-making and financing of healthcare, the authors and sponsors of these studies must be held to the same high standards as are physicians and scientists when reporting results of their investigations.This responsibility must be shared by the media, who control the dissemination of information, and must present the results of such studies in as objective and informative a manner as their audience deserves.
How the media covered the 2008 Scorecard
View the Technorati Link Cosmos for this entry