Centre for Health Evidence: Home » Users' Guides to EBP |
Heiner C. Bucher, Gordon H. Guyatt, Deborah J. Cook, Anne Holbrook, Finlay A. McAlister, for the Evidence-Based Medicine Working Group
Based on the Users' Guides to Evidence-based Medicine and reproduced with permission from JAMA. (1999;282(8):771-778). Copyright 1999, American Medical Association.
You are a physician seeing a 62-year-old woman with postmenopausal osteoporosis. Her bone mineral density as measured by dual-energy X-ray absorptiometry is 2.5 standard deviations below the mean value in premenopausal women and, though she does not suffer from back pain, a spinal radiograph shows an old vertebral fracture. Although she has not yet experienced problems as a result of her vertebral fracture, she is disturbed by the prospect that she may end up like her mother whose osteoporotic fractures have resulted in severe, chronic, back pain. The patient suffers from reflux esophagitis and a past endoscopy has revealed non-specific gastritis. A specialist had prescribed alendronate which the patient had to stop after several weeks because of dyspepsia. She has searched the Web, and discovered a new drug, raloxifene, and wonders whether this drug might be an alternative. You know that this drug has been licensed for the prevention of postmenopausal osteoporosis. You promise to examine the literature and to get back to her.
Using Medline you identify a study of raloxifene for the treatment of osteoporosis demonstrating an effect on bone mineral density [1]. You are wondering whether this warrants administration to lower your patient's risk of osteoporotic fracture.
Ideally, clinicians making treatment decisions should always refer to methodologically strong clinical trials examining the impact of therapy on clinically important outcomes. By clinically important outcomes we mean outcomes that are important to patients such as health-related quality of life, morbid end points such as stroke or myocardial infarction, or death. Often, however, conducting these trials requires such large sample size, or long patient follow-up, that their feasibility becomes questionable. Substituting surrogate end points for the target event allows conduct of shorter and smaller trials, thus offering an apparent solution to the dilemma.
A surrogate end point may be defined as 'a laboratory measurement or a physical sign used as a substitute for a clinically meaningful end point that measures directly how a patient feels, functions or survives' [2]. Surrogate end points include physiologic variables (such as bone mineral density as a surrogate for long-bone fractures, blood pressure for stroke, LDL cholesterol levels for myocardial infarction, and CD4 cell count for AIDS and AIDS-related mortality) or measures of subclinical disease (such as degree of atherosclerosis on coronary angiography).
The use of surrogate end points is indispensable for drug evaluation in phase II and early phase III trials geared to establishing a drug's promise of benefit. In many countries, companies may obtain drug approval by demonstrating a positive impact on surrogate end points. The use of surrogate end points for regulatory purposes reflects drug approval decisions that regulators must make in the face of public health exigencies.
Reliance on surrogate end points may be beneficial or harmful. On the one hand, use of the surrogate end point may lead to the rapid and appropriate dissemination of new treatments. For example, the Food and Drug Administration's (FDA) decision to approve new antiretroviral drugs based on information from trials using surrogate end points recognized the enormous need for effective therapies for patients with HIV infection. Subsequently, several of these drugs have proved effective in randomized trials focusing on clinically important outcomes [3] [4] [5] [6].
On the other hand, reliance on surrogate endpoints may lead to excess morbidity and mortality. For example, while dihydropyridine calcium channel blockers are efficacious in lowering blood pressure, their effects on clinically important outcomes such as stroke, myocardial infarction, or death are less certain. While at least one of these agents has been shown to be beneficial in the treatment of hypertension [7], preliminary evidence from comparative trials of other dihydropyridines versus antihypertensive drugs from other classes suggests that they may be less beneficial than diuretics or angiotensin converting enzyme inhibitors [8] [9] [10] [11].
How are clinicians to distinguish between these two situations? A surrogate outcome will be reliable only if there is a validated causal connection between change in surrogate and change in the clinically important outcome, and if the surrogate fully captures all of the effects of treatment on that outcome. In this Users' Guide, we build on previous discussions of how one can establish a causal relationship [12] and present an approach to critical appraisal of studies using surrogate end points and application of their results to managing individual patients.
As our discussion will make evident, the clinician needs to assess far more than a single study to make the decision about the adequacy of a surrogate. Evaluation may require a comprehensive review of observational studies of the relation between the surrogate and the target, and of some or all of the randomized trials that have evaluated treatment impact on both the surrogate and the target. While most clinicians would hesitate to conduct such an investigation, our guidelines will allow them to evaluate experts', or pharmaceutical industry's, arguments for prescribing treatments on the basis of their effect on surrogate end points.
In this guide, we follow the frame work of previous articles in the series [13] and ask three sorts of questions: Are the results valid? (In particular, is the surrogate end point a valid substitute for clinically important outcome?); What were the results?; and Will the results help me in caring for my patients? [Table 1]
Table 1: Users' guide for a surrogate end point trial
|
To provide a valid substitute for an important target outcome, the surrogate must be associated or correlated with that target. In general, researchers choose surrogate end points because they have found a correlation between a surrogate and a target outcome in observational studies, and their understanding of the biology makes it plausible that changes in the surrogate will invariably lead to changes in the important outcome. The stronger the association, the more likely the causal link between the surrogate and the target. Many biologically plausible surrogates are only weakly associated with clinically important outcomes. For example, measures of respiratory function in patients with chronic lung disease, or conventional exercise tests in patients with heart and lung disease, are only weakly correlated with capacity to undertake activities of daily living [14] [15]. When correlations are low, the surrogate is likely to be a poor substitute for the target outcome.
In addition to the strength of the association, one's confidence in the validity of the association depends on whether it is consistent across different studies and after adjustment for known confounders. For example, ecologic studies such as the Seven Countries Study [16] suggested a strong correlation between serum cholesterol levels and coronary heart disease mortality even after adjusting for other predictors such as age, smoking, and systolic blood pressure. Subsequent cohort studies confirmed this association and suggested that long-term reductions in serum cholesterol of 0.6 mmol/L would lower the risk of coronary heart disease by approximately 30% [17].
Similarly, cohort studies have consistently revealed that a single measurement of plasma viral load predicts the subsequent risk of AIDS or death in HIV infected patients [18] [19] [20] [21] [22] [23]. For example, in one study the proportion of patients that progressed to AIDS after 5 years in the lowest through the highest quartiles of viral load was 8, 26, 49, and 62%, respectively [23]. Moreover, this association retained its predictive power after adjustment for other potential predictors such as CD4 cell count [18] [19] [20] [21] [22].
Returning to the scenario, you are wondering if you can substitute bone mineral density for fractures or health-related quality of life in considering whether to recommend raloxifene. A large cohort study investigated risk factors for hip fracture [24]. Postmenopausal women with a calcaneal bone density in the highest third had a hip fracture rate of 9.4 per 1000 woman-years while women in the middle and lowest third had a fracture rate per 1000 women year of 14.7 and 27.3, respectively. Furthermore, after considering other risk factors for osteoporotic hip fractures including maternal history of hip fracture, previous fractures from any site, poor self rated health, use of long acting benzodiazepines, impaired visual function, and reduced physical activity, bone mineral density continued to predict the risk of hip fracture [24].
These findings are consistent across studies looking at the association between bone density and fracture risk [25] [26]. Thus, bone mineral density is a moderately strong, independent predictor of fracture, and meets our first criterion for an acceptable surrogate end point.
While meeting this first criterion is necessary, it is not sufficient to support reliance on a surrogate outcome. As we will emphasize below [Table 1], before offering an intervention on the basis of effects on a surrogate outcome, the clinician should note a consistent relation between surrogate and target in randomized trials; the effect of the intervention on the surrogate must be large, precise and lasting; and the benefit-risk tradeoff must be clear.
Given the possibility of effects unrelated to the surrogate end point, pathophysiologic studies, ecological studies, and cohort studies are insufficient to establish that the link between surrogate and clinically important outcomes is ironclad. Surrogate end points can only be considered validated when their relationship with the clinically important outcome has been firmly established in long-term randomized trials showing that modification of the surrogate is associated with concomitant modifications in the target outcome of interest. For example, although ventricular ectopic beats are associated with adverse prognosis in patients with myocardial infarction [27] and class I antiarrhythmic agents effectively suppress ventricular arrhythmias in animals and humans [28], these drugs have proved to an increase mortality when evaluated in randomized trials [29]. In this case, reliance on the surrogate end point of suppression of non-lethal arrhythmias lead to the deaths of tens of thousands of patients [30].
The treatment of heart failure provides another instructive example. Trials of angiotensin converting enzyme (ACE) inhibitors in heart failure have demonstrated parallel increases in exercise capacity [31] [32] [33] [34] and a decrease in mortality [35], suggesting clinicians may be able to rely on exercise capacity as a valid surrogate. Milrinone [36] and epoprostol [37] have both demonstrated improved exercise tolerance in patients with symptomatic heart failure. However, when these drugs were evaluated in randomized controlled trials both showed an increase in cardiovascular mortality which in one instance was statistically significant [38], and in the second case led to the early termination of the study [39]. Thus, exercise tolerance is inconsistent in predicting improved mortality and is therefore an unsatisfactory substitute. Other suggested surrogate end points in heart failure have included ejection fraction, heart rate variability, and markers of autonomic function [40]. The dopaminergic agent ibopamine positively influences all three surrogate end points, and yet a randomized trial demonstrated that the drug increases mortality in heart failure [41].
CD4 cell count is an example of a surrogate end point which has been validated in randomized trials. A number of trials comparing different classes of antiretroviral therapies have demonstrated that patients randomized to more potent drug regimens had higher CD4 cell counts and were less likely to progress to AIDS or death [6] [42]. While there is no guarantee that the next trial using a different class of drugs will show the same pattern, these results greatly strengthen our inference that if therapy for HIV infection increases CD4 count, a reduction in AIDS-related mortality will result.
Returning to our scenario, trials of etidronate [43] [44] and alendronate [45] for the prevention of osteoporotic fractures in postmenopausal women have shown parallel increases in bone mineral density and reduced incidence of new vertebral fractures. This would suggest that clinicians might rely on bone density to evaluate new drugs in osteoporosis making the assumption that if they saw increases in bone density, decreases in fractures would follow.
However, another secondary prevention trial in post-menopausal women using sodium fluoride showed divergent results [46]. Although sodium fluoride increased bone mineral density at the lumbar spine by 35% over 5 years, more vertebral and non-vertebral fractures occurred in the intervention group than in the placebo group (163 and 72 in 101 women with sodium fluoride versus 136 and 24 in 101 women with placebo). In another, randomized trial, fluoride again showed a large increase in bone density without any change in fracture rate [47]. Inferences on the basis of unchanged bone density may also be problematic: a study of calcium and vitamin D in the elderly showed virtually no change in bone density, but a reduction in fracture risk of approximately 50% [48]. Thus, increase in bone mineral density as a surrogate end point has shown an inconsistent relation to osteoporotic fractures.
Clinicians are in a stronger position relying on surrogate end points if the new drug they are considering is from a class of drugs in which the relation between changes in the surrogate and changes in the target has been verified in randomized trials. For instance, thiazide diuretics and beta blockers have both been shown to reduce blood pressure and clinically important outcomes such as stroke in hypertensive patients. Thus, we would be much more comfortable relying on reduction in blood pressure to justify administering a new beta blocker or thiazide diuretic than to justify offering a novel antihypertensive agent from another class [49]. As alluded to in the Introduction, four trials have now shown that calcium channel blockers are less efficacious than thiazides or angiotensin converting enzyme inhibitors in preventing hard clinical endpoints despite exerting similar degrees of blood pressure lowering [8] [9] [10] [11].
We will consider the example of cholesterol reduction as a surrogate for cardiovascular outcomes such as myocardial infarction and death in Part B of this User's Guide [50]. Briefly, several large trials of primary and secondary prevention of coronary heart disease with statins have consistently shown that these drugs reduce cardiovascular outcomes [51]. We could therefore make a strong case that a new statin with a similar LDL-cholesterol lowering potency will also reduce clinically important outcomes. However, we would be very reluctant to generalize to another class of lipid- lowering agents since trials of one such class (the fibrates) have shown that these drugs reduce the incidence of myocardial infarction but increase the risk of mortality from other causes (with no impact on overall mortality) [51] [52] [53].
This criterion is complicated by the variable definitions of drug class. A manufacturer of a drug related to a class of agents with a consistent positive association between modification of a surrogate end point and modification of the target (such as a beta blocker) will naturally argue for a broad definition of class. Manufacturers of agents that are related to drugs with known or suspected adverse effects on target events (clofibrate, or some calcium antagonists) are likely to argue, on the other hand, that the chemical or physiological connection is not sufficiently close to consider the new drug to be in the same class as the harmful agent. Part B will address these issues more fully [50].
Returning to the scenario, we have established that because of the inconsistent relation between increase in bone mineral density and fracture reduction we would be reluctant to offer patients a new anti-osteoporotic agent solely on the basis of evidence of its effect on the surrogate end point. Raloxifene, the drug we are considering for our patient, is a nonsteroidal benzothiophene, a selective estrogen-receptor modulator and thus, represents a new class of drugs for the prevention of osteoporosis related bone fractures. Thus, it is likely that the mechanisms of action will be considerably different from bisphosphonates and an assumption that similar reductions in loss of bone density will lead to parallel reductions in clinical fractures is not warranted. In Table 2, we apply our validity criteria to a number of controversial examples of the use of surrogate end points.
Table 2: Selected examples of applied validity criteria for the critical evaluation of studies using surrogate end points
|
||||||||||||||||||||||||||||||||||||||||||||||||
We are interested not only in whether an intervention alters a surrogate end point, but also in the magnitude, precision, and duration of the effect. If an intervention shows large reductions in the surrogate end point, the 95% confidence intervals around those large reductions are narrow, and the effect persists over a sufficiently long period, our confidence that the target outcome will be favourably effected increases. Positive effects that are smaller, with wider confidence intervals, and shorter duration of follow up, leave us less confident.
We have already cited evidence suggesting that CD4 counts may be an acceptable surrogate for mortality in patients with HIV infection. A randomized controlled trial of immediate versus delayed zidovudine therapy in HIV-infected asymptomatic individuals declared a positive result for immediate therapy, largely on the basis of a greater proportion of treated patients with CD4 cell counts above 350 per mm3 at a median follow-up of 1.7 years 54. Subsequently, the Concorde study addressed the same question in a randomized trial with a median follow-up of 3.3 years 55. The Concorde investigators found a continuous decline in CD4 cells in both treated and control groups, but the median difference of 30 cells per mm3 in favour of treated patients at study termination was statistically significant. However, the study showed no effect of zidovudine in terms of reduced progression to AIDS or death. The median CD4 cell difference was insufficient to impact on clinically important outcomes. The Concorde authors made the following conclusion: The small but highly significant and persistent difference in CD4 count between the groups was not translated into a significant clinical benefit and "called into question the uncritical use of CD4 cell counts as a surrogate endpoint". Had the Concorde analysis that showed significantly shorter times to reach a CD4 count of 350 per mm3 in the control group been regarded as fundamental, the trial might have been stopped early with a false positive result.
Returning to our scenario, the trial of raloxifene in osteoporotic women demonstrated that after two years of treatment raloxifene-treated patients in the group receiving the highest dose showed an increase in bone mineral density at the lumbar spine of 2.2%(SE 0.3) compared to a slight decrease in the control group 0.8% (SE 0.3). This difference in change over time was statistically significant (p < 0.03). Ideally, the investigators would have provided us with a confidence interval around the 3.0% difference in percentage change in bone mineral density in the treatment and control groups. As we will illustrate when we consider weighing benefits and harms, the magnitude of the effect on the surrogate may (or may not) help us estimate the size of possible impact on the target outcome.
The questions clinicians should ask themselves in applying the results are the same ones we have suggested for any issue of therapy or prevention [56] and elaborated on in our Users' Guide regarding applicability [57]. These three questions have to do with whether the results can be applied to your patient's care, whether all important outcomes were considered, and whether the likely benefits are worth the down sides of treatment.
"Can the results be applied to my patient's care" refers to the extent to which the patient before you is similar to those who participated in the published studies under consideration, and the extent to which the therapy, and the associated technologies for monitoring and responding to complications, are available in your setting. "Were all important outcomes considered" relates to the focus of this Users' guide, and all the issues we have raised thus far: was the primary outcome really the one in which patients will be interested? This second criterion also draws issues of adverse intervention effects to our attention. Applying the third criterion, judging whether the benefits are worth the down sides of treatment, presents particular challenges when investigators have focused on surrogate end points, and we will discuss this criterion in some detail.
To know whether to offer a treatment to their patients, clinicians must be able to estimate the magnitude of the likely benefit. When the data available are limited to the effect on a surrogate end point, estimating the extent to which treatment will reduce clinically important outcomes becomes a challenge.
One approach is to extrapolate from one or more randomized trials assessing a related intervention in a similar patient population that provides both surrogate end point and clinical outcome data. For example, until recently there was very little long-term data on the efficacy of lovastatin in reducing clinically important outcomes. However, one could extrapolate from short-term dose efficacy studies assessing the surrogate endpoint of cholesterol lowering. Thus, since 40 mg lovastatin produced a similar degree of LDL cholesterol lowering as 40 mg pravastatin (31% reduction vs. 34% reduction) in the CURVES Study [58], one could theorize that lovastatin would have similar long-term benefits to pravastatin. Subsequently, the AFCAPS/TexCAPS Trial (a five year trial assessing the efficacy of lovastatin in the primary prevention of ischemic heart disease) [59] did confirm that this agent had a benefit profile similar to pravastatin (as determined by the five year long, primary prevention WOSCOPS Trial) [60]: the relative risk reductions (and 95% confidence intervals) for myocardial infarction were 40% (17% to 57%) and 31% (17% to 43%) respectively. However, this approach is likely to be seriously flawed when one is extrapolating from trials of another class of drugs.
Returning to our scenario, to estimate the magnitude of the fracture reduction we might expect with raloxifene (where we only have surrogate end point data), we could (recognizing the limitations of this approach pointed out above) examine the results of randomized controlled trials of alendronate (a drug from a different class in which we have data on the same surrogate end point as well as clinical end points such as fracture reduction). While alendronate appears to improve vertebral bone density by 7.5% over two years (versus control) [45], raloxifene is associated with only a 3.0% improvement over the same time frame. A systematic overview of the alendronate trials [61] reported a 29% reduction in relative risk of nonvertebral fracture over a period of two years. Only one trial looked at symptomatic vertebral fractures in women with decreased bone density and an existing vertebral fracture [62]. This study demonstrated a relative risk reduction of 55% with alendronate and suggested that our patient's risk over three years of a nonvertebral fracture would be approximately 15%; symptomatic vertebral fracture would be about 5%. Given the relative risk reductions with alendronate, one would need to treat approximately 25 women to prevent a non-vertebral fracture and 40 to prevent a symptomatic vertebral fracture over a three year period.
Since the improvement in bone mineral density with raloxifene is at best 50% of the effect of alendronate, we would anticipate a considerably lower reduction in fracture risk with raloxifene. However, interim analysis of an ongoing raloxifene trial [63] reported a 46% relative risk reduction with this therapy (despite less increase in bone mineral density than seen with the alendronate trials). This serves to emphasize the dangers of extrapolating results across classes when it is uncertain that the effects on clinically important outcomes are mediated in the same fashion by the two comparison drugs.
In deciding whether the likely magnitude of the treatment effect warrants offering our patients the intervention, clinicians must consider not only the uncertainty associated with that estimate, but the trade off with potential toxicity and costs of therapy. In addition, clinicians must ponder the consequences of not treating, and the available management alternatives. The deadly and usually relentless progression of HIV infection, and the paucity of alternative therapies, has contributed to the readiness of patients, clinicians, and regulatory agencies to accept evidence from surrogate end points in instituting novel therapies in HIV-infected patients. In osteoporosis, where the consequences of the condition are less immediately devastating, and a variety of agents are available, the case for relying on surrogate end points is far less compelling.
We have found a strong, consistent, independent and biologically plausible association between bone mineral density and vertebral and nonvertebral fractures. Randomized trials, however, have failed to show a consistent association between increased bone density and reduced fracture across all drug classes.
Because our patient is at substantial risk of fracture over the short term, the NNT to prevent both nonvertebral and vertebral fractures is moderate, as is the absolute benefit she might expect. Moreover, she is interested in longer-term fracture prevention, and her risk will grow over time. One might offer her alternative interventions, including hormone replacement therapy, calcium and vitamin D, bisphosphonates, or calcitonin. While there is evidence from randomized trials supporting the use of bisphosphonates or calcitonin to decrease osteoporotic fractures, randomized trial data showing fracture reduction in populations similar to our patient with the other agents is lacking. Our patient is very concerned about her long term risk. Raloxifene was well tolerated during this two year trial but no information is available about long term side effects. Our judgment is that a number of options (including a trial of etidronate, offering hormone replacement therapy, calcium and vitamin D, calcitonin, or suggesting only a balanced diet and exercise) would be reasonable. Data indicating a reduction in fracture rate with raloxifene, which a preliminary report suggest may soon be available [63], would greatly strengthen the case for including raloxifene as one of the options.
When we use surrogate end points to make inferences about expected benefit we are making assumptions regarding the link between the surrogate end point and the target outcome. We have outlined criteria clinicians can use to decide when these assumptions might be appropriate. Even if a surrogate end point meets all of these criteria, inferences about a treatment benefit may still prove misleading. Thus, treatment recommendations based on surrogate outcome effects can never be strong. Furthermore, difficulties in estimating the magnitude of effects on clinically important end points compromises economic analysis examining the cost-effectiveness of alternative management strategies.
These considerations emphasize that waiting for randomized trials investigating the effect of the intervention on outcomes of unequivocal importance to patients is the only ironclad solution to the surrogate outcome dilemma. However, when patients' risk of serious morbidity or mortality are high, this "wait and see" strategy may pose problems for many patients and their physicians.
We encourage clinicians to critically question therapeutic interventions in which the only proof of efficacy is from surrogate end point data. When the surrogate end point meets all our validity criteria, the effect of the intervention on the surrogate end point is large, the patient's risk of the target outcome is high, the patient places a high value on avoiding the target outcome, and there are no satisfactory alternative therapies, clinicians can recommend therapy on the basis of randomized trials evaluating only surrogate end points. In other situations, clinicians must carefully consider the known side effects and cost of therapy, and the possibility of unanticipated adverse effects, before recommending an intervention solely on the basis of surrogate end point data.
We are grateful to Cliff Rosen for helpful comments concerning the scenario and the associated discussion. Deborah Maddock provided invaluable co-ordination for the EBM Working Group in the development of this manuscript.
© 2001 Evidence-Based Medicine Informatics Project
© 2001 Centre for Health Evidence.
Home.
Users' Guides to EBP.
Webmaster.
Disclaimer.