How to Use Articles About Clinical Prediction Rules
Thomas McGinn, Gordon Guyatt, Peter Wyer, David Naylor, Ian Stiell, Scott Richardson, for the Evidence Based Medicine Working Group
Based on the Users' Guides to Evidence-based Medicine and reproduced with permission from JAMA. (2000;284(1):79-84). Copyright 2000, American Medical Association.
- Clinical Scenario
- The Search
- Clinical Prediction Rules
- Resolution of the Scenario
- Conclusion
- References
Clinical Scenario
You are the medical director of a busy inner city emergency department. Faced with a limited budget and pressure to improve efficiency, you have conducted an audit of radiological procedures ordered for minor trauma, and have found a high rate of x-rays ordered for ankle and knee trauma. You are aware of the "Ottawa Ankle Rules" (OAR) (Figure 1) that identifies patients in whom ankle radiographs can be omitted without adverse consequences. In addition you aware that a small number of faculty and residents currently rely on these models to make quick frontline decisions in the emergency room.
Figure 1: Ottawa Ankle RulesThe CHE regrets that we are unable to supply this graphic image. If you or your institution has a subscription to JAMA Online, you may refer to the JAMA version of the figure. Otherwise, please refer to the print version. |
You are interested in knowing the accuracy of the rule, whether it is applicable to your patient population, and whether you should be implementing the rule in your own practice. Further, you wonder if implementing the rule can change clinical behavior and reduce costs without compromising quality care. You decide to consult the original medical literature and to assess the evidence for yourself.
The Search
Currently "prediction rules" or "decision rules" have no separate medical subject (MESH) heading in the National Library of Medicine (NLM) MEDLINE data base. You therefore search PubMed under the MESH heading {ankle fractures} and cross it with the text word {rules} and decision rules. This search yields 5 citations, of which 3 deal directly with the Ottawa clinical prediction rules for ankle fractures. [1] [2] [3]
In reviewing these articles, and deciding whether to implement changes in your emergency department, you require criteria for deciding on the strength of the inference you can make about the accuracy and impact of the Ottawa Ankle Rules. This article will provide you with the tools to answer those questions.
Clinical Prediction Rules
Establishing patients' diagnosis and prognosis are closely linked activities central to every physicians practice. The diagnoses we make, and our assessment of patients' prognosis, often determine the recommendations we make to our patients. Clinical experience provides us with an intuitive sense of which findings on history, physical examination, and investigation are critical in making an accurate diagnosis, or an accurate assessment of our patients' fate. While often extraordinarily accurate, this intuition may sometimes be misleading. Clinical prediction rules attempt to formally test, simplify, and increase the accuracy of clinicians' diagnostic and prognostic assessments.
A clinical prediction rule (CPR) can be defined as a clinical tool that quantifies the individual contributions that various components of the history, physical exam, and basic laboratory results make towards the diagnosis, prognosis or likely response to treatment in an individual patient. [4] CPRs are most likely to be useful in situations where decision making is complex, where the clinical stakes are high, or where there are opportunities to achieve cost savings without compromising patient care.
Developing and testing a CPR involves three steps: (1) the creation or derivation of the rule, (2) the testing or validation of the rule and (3) the assessment of the impact of the rule on clinical behavior, the impact analysis. The validation process may require several studies in order too fully test the accuracy of the rule at different clinical sites. [Figure 2] Each step in the development of a CPR may be published separately by different authors or all three steps may be included in one article. Table 1 presents a hierarchy of evidence that can guide clinicians in assessing the full range of evidence supporting use of a CPR in their practice.
Figure 2: Development of a Clinical Prediction Rule
|
Table 1: Hierarchy of Evidence for Clinical Prediction Rules
|
We note that our hierarchy applies only to CPRs intended for application in clinical practice. Investigators may use identical methodology to generate equations that stratify patients in to different risk groups. These equations can then be used for statistical adjustment in studies involving large databases. These not-so-clinical prediction rules do not involve application by front-line practitioners, and would thus require a somewhat different hierarchy of strength of evidence.
We will now review the steps in the development and testing of a CPR. We will relate each stage of the process to the hierarchy presented in Table 1.
Developing a Clinical Prediction Rule
Our search revealed three articles related to the Ottawa Ankle Rules, the first of which described the CPR derivation. [1] CPR developers begin by constructing a list of potential predictors of the outcome of interest, in this case radiological ankle fractures. The list typically includes items from the history, physical exam, and basic laboratory tests. The investigators then examine a group of patients and determine if the candidate clinical predictors are present, and the patients status on the outcome of interest, the result of the ankle x-ray in this case. Statistical analysis reveals which predictors are most powerful, and which predictors can be omitted from the rule without loss of predictive power. Typically, the statistical techniques used in this process are based on logistic regression; readers can find a clinician-friendly description of these methods in another paper. [5] Other techniques that investigators sometimes use include discriminant analysis, which produces equations similar to regression analysis, [6] recursive partitioning analysis, which builds a tree in which the patient populations are split into smaller and smaller categories based upon risk factors, [7] and neural networks. [8]
Clinical prediction rules that have been derived but not validated should not be considered ready for clinical application. [Table 1] Investigators interested in performing the validation of a CPR, however, need criteria to judge whether the derivation process has been well done, and thus whether the rule is promising enough to address certain questions before moving forward to the validation phase. Some examples of important questions are: "How were predictors chosen and defined", "how was the selection of study subjects performed", "was the sample size adequate (including adequate number of outcome events)", "were all important predictors present in the study population", and "does the rule make clinical sense". Interested readers can find a complete discussion on the derivation process and these criteria in a paper by Laupacis et al. [4]
Validation
There are three reasons why even rigorously derived CPRs are not ready for application in clinical practice without further validation. First, the prediction rules derived in one set of patients may reflect associations between given predictors and outcomes that are due primarily to the play of chance. If that is so, a different set of predictors will emerge in a different group of patients, even if they come from the same setting. Second, predictors may be idiosyncratic to the population, to the clinicians using the rule, or to other aspects of the design of individual studies. If that is so, the rule may fail in a new setting. Perhaps most important, clinicians may, due to problems in the feasibility of rule application in the clinical setting, fail to implement a rule comprehensively or accurately. The result would be that a rule succeeds in theory but fails in practice.
Statistical methods can deal with the first of these problems. For instance, investigators may split their population in to two groups and use one to develop the rule, and the other to test it. Alternatively, they may use more sophisticated statistical methods built on the same logic. Conceptually, these approaches involve removing one patient from the sample, generating the rule using the remainder of the patients, and testing it on the patient whom was removed from the sample. One repeats this procedure, sometimes referred to as a bootstrap technique, in sequence for every patient under study.
While statistical validations within the same setting or group of subjects reduce the chance that the rule reflects the play of chance rather than true associations, they fail to address the other two threats to validity. The success of the CPR may be peculiar to the particular populations of patients and clinicians involved in the derivation study. Even if this is not so, clinicians may have difficulties using the rule in practice, difficulties that compromise its predictive power. Thus, to graduate from Level IV, studies must involve the actual use of the rule by clinicians actually using the rule in their clinical practice.
A CPR developed to predict serious outcomes (heart failure, ventricular arrhythmia, etc) in syncope patients highlights the importance of validation. [9] Investigators derived the rule using data from 252 patients who presented to the emergency department and then attempted to prospectively validate it in a sample of 374 patients. The prediction rule gave individuals a score from zero to four depending on the number of clinical predictors present. The probability of poor outcomes corresponding to almost every score in the derivation set was approximately twice that of the validation. For example in the derivation set the risk of a poor outcome in a patient with a score on the prediction rule of 3 was estimated to be 52%; a patient with the same score in the validation set had a probability of a poor outcome of only 27%. This variation in results may have been due a difference in the severity of the syncope cases entered into the two studies, or to different criteria for generating a score of 3. Because of the risk that it will provide misleading information when applied in a real-world clinical setting, we situate a CPR that has undergone development without validation as Level IV on our hierarchy. [Table 1]
Despite this major limitation, clinicians can still extract clinically relevant messages from an article describing the development of a CPR. They may wish to note the most important predictors, and consider them more carefully in their own practice. They may also consider giving less importance to variables that failed to show predictive power. For instance, in developing a CPR to predict mortality from pneumonia, the investigators found the white blood cell count (WBC) had no bearing on subsequent mortality. [10] This being the case, clinicians may wish to put less weight on the WBC when making decisions about admitting pneumonia patients to the hospital.
We have argued that to move up the hierarchy, CPRs must provide additional evidence of validity. The second article in our search described the refinement and prospective validation of the Ottawa Ankle Rules.[2] Validation of a CPR involves demonstrating that its repeated application as part of the process of clinical care leads to the same results. Ideally, a validation entails the investigators applying the rule prospectively in a new population with a different prevalence and spectrum of disease from that of the patients in whom the rule was derived. One key issue is to be sure that the CPR performs similarly in a variety of populations, in the hands of a variety of clinicians, working in a variety of institutions. A second is to be sure that it works well when clinicians are actually consciously applying it as a rule, rather than as a statistical derivation from a large number of potential predictors.
If the setting in which the prediction rule was originally developed was limited, and its validation has been confined to this setting, application by clinicians working in other settings is less secure. Validation in a similar setting can take a number of forms. Most simply, after developing the prediction rule, the investigators return to their population, draw a new sample of patients, and the test the rule's performance. Thus, we classify rules that have been validated in the same, or very similar limited or narrow populations to the sample used in the development as Level III on our hierarchy, and recommend clinicians use the result cautiously. [Table 1]
If investigators draw patients in the derivation phase from a sufficiently heterogeneous population across a variety of institutions, testing the rule in the same population provides strong validation. Validation in a new population provides the clinician with strong inferences about the usefulness of the rule, corresponding to Level II in our hierarchy. [Table 1]
The Ottawa Ankle Rule was first derived in two large university based emergency departments in Ottawa [1] and was then prospectively validated in a large sample of patients from the same emergency departments. [2] At this stage the rule would be classified as level II in our hierarchy because of the large number and diversity of patients and physicians involved in the study. Since that initial validation, the rule has been validated in several different clinical sites with relatively consistent results. [11] [12] [13] [14] This evidence even further strengthens our inference about its predictive power.
Many clinical prediction rules are derived and then validated on a small narrowly selected group of patients (level III). One such rule was derived to predict preserved left ventricular function after a myocardial infarction. [15] The initial derivation and validation were performed on only 314 patients who had been admitted to one tertiary care center. The prediction rule was first derived using 162 patients and then validated on 152 patients in the same setting. The prediction rule demonstrated a PPV of 99%. At this stage in the rule development the rule would be considered Level III, only to be used in similar settings as the validation study, i.e. similar CCU settings. The rule was further validated in two larger trials, one trial using 213 patients [16] from one site and a larger trial using 1,891 patients from several different institutions. [17] The positive predictive value in both of these studies was 89%. This drop in predictive value is significant and changes the potential use and implications of the rule in clinical practice. At this point in development, the rule would be considered level II, meaning the rule can used in clinical settings with a high degree of confidence but with the adjusted values. The development of this rule highlights the importance of the validation of a rule on a diverse patient population before it can broadly applied in clinical settings.
Whether or not investigators have conducted their validation study in a similar, narrow (Level III) or broad, heterogeneous or different (Level II) population, their results allow stronger inferences if they have adhered to a number of methodological standards. [Table 2] First, were the patients chosen in an unbiased fashion and do they represent a wide spectrum of severity of disease? Second, was there a blinded assessment of the criterion standard for all patients? Third, was there an explicit and accurate interpretation of the predictor variables and actual rule without the knowledge of the outcome? Lastly, they should achieve close to 100% follow-up of those they enrolled. Interested readers can find a complete discussion on the validation process and these criteria in a paper by Laupacis et al. [4]
Table 2: Methodological Standards for Validation of a Clinical Prediction Rule
|
If those evaluating predictor status of study subjects are aware of the outcome, or if those assessing the outcome are aware of patients status with respect to the predictors, their assessments may be biased. For instance, in a CPR developed to predict the presence of pneumonia in patients presenting with cough, [18] the authors make no mention of blinding during either the derivation or the validation process. Knowledge of history or physical examination findings may have influenced the judgements of the unblinded radiologists.
The investigators testing the Ottawa Ankle Rules enrolled consecutive patients, obtained radiographs in all of them, and ensured that not only were the clinicians assessing the clinical predictors unaware of the x-ray results, but the radiologists had no knowledge of the clinical data.
Interpreting the Results
Whatever the Level of evidence associated with a CPR, its usefulness will depend on its predictive power. Investigators may report their results in a variety of ways. The ankle component of the Ottawa Ankle Rules states that an ankle x-ray series is only indicated for patients with pain near the malleoli and either inability to bear weight or localized bone tenderness at the posterior edge or tip of either malleolus. [Figure 1] The developers calculated the sensitivity and specificity of their rule as a diagnostic test using this criterion. In the development process all patients with fracture had a positive result (sensitivity of 100%), but only 40% of those without fractures had a negative result (specificity of 40%). These results suggest that if clinicians order radiographs only in those patients with a positive result they will not miss any fractures and will avoid the test in 40% of those without a fracture.
The validation study confirmed these results; in particular, the test maintained a sensitivity of 100%. This is reassuring, and more so because the sample size was sufficiently larger to result in a relatively narrow confidence interval (95% confidence intervals 93% to 100%). Thus, clinicians adopting the rule would miss very few, if any, fractures.
Another way of reporting CPR results is in terms of probability of the target condition being present given a particular CPR result. For example a recent prediction rule for pulmonary embolus derived and validated by Wells and colleagues19 accurately placed patients into low (3.4%, 95% CI 2.2%-5%) intermediate (28%, 95% CI 23.4%-32.2%), or high probability (78%, 95% CI 69.2%-19.6%) categories. When investigators report prediction rule results in this fashion, they are implicitly incorporating all clinical information. In doing so, they remove any need for clinicians to consider independent information in deciding on the likelihood of the diagnosis, or a patient's prognosis.
Finally, prediction rules may also report their results as likelihood ratios, or as absolute or relative risks. For example the CAGE, a prediction rule for detecting alcoholism, has been reported as likelihood ratios (for example CAGE scores of 0/4 the LR = 0.14, for 1/4 the LR=1.5, 2/4 the LR=4.5, 3/4 the LR=13, 4/4 the LR=100). In this example the probability of disease, alcoholism, depends on the combination of the prevalence of disease in the community and the score on the CAGE prediction rule. [20] When investigators report their results as likelihood ratios, they are implicitly suggesting that clinicians should use other, independent information to generate a pre-test (or pre-rule!) probability. The can then use the likelihood ratios generated by the rule to establish a post-test probability. Clinicians can find approaches to using likelihood ratios in clinical practice in a previous Users Guide. [21]
Impact Analysis
Use of CPRs involves remembering predictor variables, and often entails making calculations, to determine a patient's probability of having the CPR's target outcome. Pocket cards and computer algorithms can facilitate the task of using complex CPRs. Nonetheless, CPRs demand clinician time and energy, and their use is warranted only if they change physician behavior, and if that behavior change, furthermore, results in improved patient outcomes or reduced costs while maintaining quality. If these conditions are not met, whatever the accuracy of a CPR, attempts to use it systematically will be a waste of time.
There are a number of reasons why an accurate CPR may not produce a change in behavior or an improvement in outcomes. First, clinicians intuitive estimation of probabilities may be as good as, if not better, than the CPR. If this is so, CPR information will not improve their practice. Second, the calculations involved may be cumbersome, and clinicians may, as a result, not utilize the rule. Third, there may be practical barriers to acting on the results of the CPR. For instance, in the case of the Ottawa ankle rule, clinicians may be sufficiently concerned about protecting themselves against litigation that they may order radiographs despite a CPR result suggesting a negligible probability of fracture.
These are the considerations that lead us to classify a CPR with evidence of accuracy in diverse populations as Level II, and insist on a positive result from a study of impact before a CPR graduates to Level I.
Ideally, an impact study would randomize patients, or larger administrative units, to either apply or not to apply the CPR, and follow patients for all relevant outcomes (including quality of life, morbidity, and resource utilization). Randomization of individual patients is unlikely to be appropriate because one would expect the participating clinicians to incorporate the rule into the care of all their patients. A suitable alternative is to randomize institutions or practice settings, and conduct analyses appropriate to these larger units of randomization. Another potential design is to look at a group before and after clinicians began to use the CPR.
Investigators examining the impact of the Ottawa Ankle Rule randomized 6 emergency departments to use or not use their CPR. [3] Just prior to initiating the study one center dropped out leaving a total of five emergency rooms, two in the intervention group and three in the usual care group. The intervention consisted of: (1) introducing the CPR at a general meeting, (2) distributing pocket cards summarizing the rule, (3) posting the rule throughout the ED, and (4) applying preprinted data collection forms to each chart. In the control group the only intervention was the introduction of preprinted data collection forms without the Ottawa rule attached to each chart. A total of 1911 eligible patients entered into the study, 1005 in the control group and 906 in the intervention group. There were 691 radiographs requested in the intervention group and 996 in the control group. In an analysis that focused on the ordering physician, the investigators found that the mean proportion of patients referred for radiography was 99.6% in the control group and 78.9% in the intervention group (p=0.03). The investigators noted 3 missed fractures in the intervention group, none of which led to adverse outcomes. Thus, the investigators demonstrated a positive resource utilization impact of the Ottawa ankle rule (decreased test ordering) without increase in adverse outcomes, moving the CPR to Level I in the hierarchy. [Table 1]
Resolution of the Scenario
You have found Level I evidence supporting the use of the Ottawa decision rule in reducing unnecessary ankle radiographs in patients presenting to the ER with ankle injuries. You therefore feel confident that you can productively utilize the rule in your own practice. Another recent study makes you aware that changing the behavior of your colleagues to realize the possible reductions in cost may be a challenge. Cameron and Naylor reported on an initiative in which clinicians expert in the use of the Ottawa ankle rule trained 16 other individuals to teach the use of the rule. [22] These individuals returned to their emergency departments armed with slides, overheads, a 13-minute instructional video and a mandate to train their colleagues locally and regionally in the use of the rule.
Unfortunately this program led to no change in the use of ankle radiography. The results demonstrate that even the
availability of a Level I clinical prediction rule may require local implementation strategies with known effectiveness in
changing provider behavior to ensure implementation. [23] [24] [25]
Conclusion
Clinical prediction rules inform our clinical judgment and have the potential to change clinical behavior and reduce unnecessary costs while maintaining quality of care and patient satisfaction. The challenge for clinicians is to evaluate the strength of the rule and its likely impact, and to find ways of efficiently incorporating Level I rules into their daily practice.
A summary of clinical prediction rules, evaluated in an evidenced based fashion (i.e. highlighting the Level of Evidence), is currently available on the internet for clinicians use (http://med.mssm.edu/ebm).
Acknowledgement
The authors would like to thank Deborah Maddock for her superb co-ordination of the Users' Guide project. Dr McGinn would like to thank Dr Gerald Paccione and the internal medical residents at Montefiore Medical Center for their input in the area of CPRs over the years; and Dr. Roseanne Leipzig for her input to the manuscript.
References
1. Stiell IG, Greenberg GH, McKnight RD, Nair RC, McDowall I, Worthington JR. A study to develop clinical decision rules for the use of radiography in acute ankle injuries. Ann Emerg Med. 1992;21:384-390.
2. Stiell IG, Greenberg GH, McKnight RD, et al. Decision rules for the use of radiography in acute ankle injuries: refinement and prospective validation. JAMA. 1993;269:1127-1132.
3. Auleley G, Ravaud P, Giraudeau B, et al. Implementation of the Ottawa ankle rules in France: a multicenter randomized controlled trial. JAMA. 1997;277:1935-1939.
4. Laupacis A, Sekar N, Stiell I. Clinical prediction rules: a review and suggested modifications of methodological standards. JAMA. 1997;277:488-494.
5. Guyatt GH, Walter S, Shannon H, Cook D, Jaeschke R, Heddle H. Basic statistics for clinicians: 4.correlation and regression. Can Med Assoc J. 1995;152:497-504.
6. Rudy TE, Kubinski JA, Boston JR. Multivariate analysis and repeated measurements: a primer. J Crit Care. 1992;7:30-41.
7. Cook EF, Goldman L. Empiric comparison of multivariate analytic techniques: advantages and disadvantages of recursive partitioning analysis. J Chronic Dis. 1984;39:721-731.
8. Baxt WG. Application of artificial neural networks to clinical medicine. Lancet. 1995;346:1135-1138.
9. Martin TP, Hanusa BH, Kapoor WN. Risk stratification of patients with syncope. Ann Emerg Med. 1997;29:459-466.
10. Fine MJ, Auble TE, Yealy DE, et al. A prediction rule to identify low-risk patients with community-acquired pneumonia. N Engl J Med. 1997;336:243-250.
11. Lucchesi GM, Jackson RE, Peacock WF, Cerasani C, Swor RA. Sensitivity of the Ottawa ankle rules. Ann Emerg Med. 1995;26:1-5.
12. Kerr L, Kelly A, Grant J, et al. Failed validation of a clinical decision rule for the use of radiography in acute ankle injury. NZ Med J. 1994;107:294-295.
13. Stiell I, Wells G, Laupacis A, et al. Multicenter trial to introduce the Ottawa ankle rules for use of radiography in acute ankle injuries. BMJ. 1995;311:594-597.
14. Auleley G, Kerboull L, Durieux P, Cosquer M, Courpied J, Ravaud P. Validation of the Ottawa ankle rules in France: a study in the surgical emergency department of a teaching hospital. Ann Emerg Med. 1998;32:14-18.
15. Silver MT, Rose GA, Paul SD, O'Donnell CJ, O'Gara PT, Eagle KA. A clinical rule to predict preserved left ventricular ejection fraction in patients after myocardial infarction. Ann Intern Med. 1994;12:750-756.
16. Tobin K, Stomel R, Harber D, Karavite D, Sievers J, Eagle K. Validation in a community hospital setting of a clinical rule to predict preserved left ventricular ejection fraction in patients after myocardial infarction. Arch Intern Med. 1999;159:353-357.
17. Krumholz HM, Howes CJ, Murillo JE, Vaccarino LV, Radford MJ, Ellerbeck EF. Validation of a clinical prediction rule for left ventricular ejection fraction after myocardial infarction in patients >or= to 65 years old. Am J Cardiol. 1997;80:11-15.
18. Heckerling PS, Tape TG, Wigton RS, et al. Clinical prediction rule for pulmonary infiltrates. Ann Intern Med. 1990;113:664-670.
19. Wells PS, Ginsberg JS, Anderson DR, et al. Use of a clinical model for safe management of patients with suspected pulmonary embolism. Ann Intern Med. 1998;129:997-1005.
20. Buchsbaum DG, Buchanan RG, Centor RM, Schnoll SH, Lawton MJ. Screening for alcohol abuse using CAGE scores and likelihood ratios. Ann Intern Med. 1991;115:774-777.
21. Jaeschke R, Guyatt GH, Sackett DL, for the Evidence-Based Medicine Working Group. Users' guides to the medical literature, III: how to use an article about a diagnostic test , B: what are the results and will they help me in caring for my patients? JAMA. 1994;271:703-707.
22. Cameron C, Naylor CD. No impact from active dissemination of the Ottawa Ankle Rules: further evidence of the need for local implementation of practice guidelines. CMAJ. 1999;160:1165-1168.
23. Davis DA, Thomson MA, Oxman AD, Haynes RB. Changing physician performance: a systematic review of the effect of continuing medical education strategies. JAMA. 1995;274:700-705.
24. Cabana MD, Rand CS, Powe NR, et al. Why don't physicians follow clinical practice guidelines? A framework for improvement. JAMA. 1999;282:1458-1465.
25. Davis D, O'Brien MA, Freemantle N, Wolf FM, Mazmanian P, Taylor-Vaisey A. Impact of formal continuing medical education: Do conferences, workshops, rounds, and other traditional continuing education activities change physician behavior or health care outcomes? JAMA. 1999;282:867-874.
© 2001 Evidence-Based Medicine Informatics Project

