Robert S.A. Hayward, Mark C. Wilson, Sean R. Tunis, Eric B. Bass, Gordon Guyatt for the Evidence Based Medicine Working Group
Based on the Users Guides to Evidence-based Medicine and reproduced with permission from JAMA. (1995;274(7):570-4) and (1995;274(20):1630-2). Copyright 1995, American Medical Association.
You are relieved to find that the last patient in your busy primary care clinic is a previously well 48 year old woman with acute dysuria. There has been no polydipsia, fever or hematuria, physical examination reveals suprapubic tenderness and urinalysis shows pyuria but no casts. You arrange cultures and antibiotic treatment for a lower urinary tract infection. On her way out the door, your patient observes that her friend has just started taking "female hormones," and she wonders whether she should too. Her menstrual periods stopped 6 months ago and she has never had cervical, ovarian, uterine, breast or cardiovascular problems, but her mother had a mastectomy at age 57 for post-menopausal breast cancer. You give the same general advice you have offered similar patients in the past but suggest that the matter be discussed at greater length when she returns after completing the antibiotic treatment. Later, as you lament door-knob consults, you are irritated when a colleague asserts that your primary advice about prophylactic hormone replacement therapy was wrong and that you should have recommended exactly the opposite. You resolve to revisit this disagreement, armed with the best evidence.
You begin by using Grateful Med to look for a recent overview because many articles about prophylactic hormone replacement therapy (HRT) have appeared recently, your time is short, and your patient would want to know about all significant benefits and harms associated with HRT. On the first subject line of the Grateful Med search, you select "estrogen replacement therapy" by marking this as a major concept in the list of Medical Subject Headings (MeSH) that Grateful Med associates with the term "estrogen". After limiting your search to English language reviews (Publication Type = "review"), you still have 131 articles to consider. A quick scan of the first 25 titles reveals diverse topics, including the effect of HRT on lipid profiles, bone density, fracture rates and the incidence of endometrial, cervical and breast cancer. Knowing that "practice guideline" is among the publication types listed by Grateful Med, you reason that clinical practice guidelines might address multiple HRT-related outcomes at one time, and thus provide you with the most efficient access to the best summary or summaries of the available data. A repeat search with the new publication type yields 5 citations. Two of these are "technical bulletins" of the American College of Obstetrics and Gynecology,  one is written for surgeons,  one is a recent guideline from the American College of Physicians (ACP),  and the last is a commentary on the ACP guideline.  Observing that the ACP guideline is published together with a systematic overview of the evidence supporting its recommendations,  you begin your review of issues in HRT decision-making with the ACP guideline.
Clinicians serve patients by addressing each individual's health care needs. This includes recognizing important health problems, considering sensible options for managing each problem, interpreting evidence about the outcomes of each option, and ascertaining patient preferences for each outcome. Increasingly, clinicians must also consider the resource implications of their decisions. This involves detecting, treating, palliating and preventing health problems in a way that maximizes the public good achieved with available resources.
To meet patients' expectations, individually and in aggregate, clinicians face intimidating tasks of information management. Overviews can help by systematically gathering, selecting and combining evidence that links options to outcomes. Clinical decision analyses can help by refining questions and exploring the trade-offs between competing benefits and harms. Economic analyses can help by tallying the costs associated with different options. While useful, these approaches do not always synthesize information in a way that directly supports specific clinical recommendations.
Clinical practice guidelines, which have been defined as "systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances,"  represent an attempt to distill a large body of medical knowledge into a convenient, readily useable format.  Like overviews, they gather, appraise and combine evidence. Guidelines, however, go beyond most overviews in attempting to address all the issues relevant to a clinical decision and all the values that might sway a clinical recommendation. Like decision analyses, guidelines refine clinical questions and balance trade-offs. Guidelines differ from decision analyses in relying more on qualitative reasoning and in emphasizing a particular clinical context.
Guidelines make explicit recommendations, often on behalf of health organizations, with a definite intent to influence what clinicians do. These suggestions about what should be done go beyond a simple presentation of evidence, costs, or decision models. They reflect value judgments about the relative importance of various health and economic outcomes in specific clinical situations. As a result, they should be required to pass unique tests about how matters of opinion, in addition to matters of science, are handled.
When appraising a consultant's counsel, we are impressed if she states and explains her suggestions clearly, discusses alternatives, and acknowledges possible biases and extenuating circumstances. We can use this common sense approach to assess the validity, importance and applicability of clinical practice guidelines. In this article, we offer suggestions for deciding whether to use a clinical practice guideline in formulating one's own clinical policies (Table 1) . Our focus is on evaluation of interventions, including prevention, diagnosis, and therapy, that are designed to improve important patient outcomes. For prevention and diagnosis, this involves looking beyond the accuracy of the test to the ultimate consequences of choosing a diagnostic strategy on patients' morbidity, mortality, and health-related quality of life.
Table 1: Users' Guides for a Practice Guideline
I. Are the results of the study valid?
II. What were the results?
III. Will the results help me in caring for my patients?
We use the same basic questions as the users' guides for original research articles, overviews, and decision analyses. Are the recommendations valid? If they are, what are the recommendations and will they be helpful in patient care? To answer these questions, we draw upon an emerging literature about practice guideline development and evaluation,         while emphasizing the perspective of practitioners who must adopt, adapt, or reject recommendations. Busy clinicians might hope that criteria for appraising practice guidelines would obviate the need for reviewing how the guideline developers have brought together the evidence, and how they have chosen the values reflected in their recommendations. Unfortunately, any shortcuts that bypass at least a cursory look at evidence and values will leave the clinician open to being misled by guidelines that may be based on a biased selection of evidence, a skewed interpretation of that evidence, or an idiosyncratic set of values. Shortcuts that do not highlight health conditions and interventions, patients and practitioners, benefits and harms will leave the clinician open to misapplication of guidelines in clinical practice.
You need to determine whether guideline developers used appropriate methods and adduced evidence that support the recommendations made. If developers do not include, in their policy statement or in a supporting article, information about how they chose options and outcomes, selected evidence, and decided on values, you might suspect that these steps were not done systematically.  In any case, you cannot evaluate such guidelines and their recommendations probably should not influence your decision-making.
Guidelines pertain to decisions and decisions involve choices and consequences. To appreciate why a particular practice is recommended, you should check to see that guideline developers have considered all reasonable practice options, and all important potential outcomes.
Whether developers present guidelines for prevention, diagnosis, therapy, or rehabilitation, they should specify both the interventions of interest and sensible alternative practices. For example, in a guideline based on a careful systematic literature review,  the American College of Physicians offers recommendations about medical interventions for preventing strokes.  While carotid endarterectomy is mentioned as a possible surgical intervention in the preamble to the guideline, the procedure is not considered in the recommendations themselves. This guideline could have been strengthened if medical interventions for transient ischemic attacks had been placed in a management context that included the highly effective surgical procedure. 
In its HRT guideline, the ACP makes recommendations about counseling women who are postmenopausal and are considering HRT to prevent disease and to prolong life.  The interventions they considered were long term daily prophylaxis (10 to 20 years) with 0.625 mg of oral conjugated estrogen, daily estrogen and medroxyprogesterone acetate (2.5 mg orally per day or 5 to 10 mg on days 10 to 14 of the month), short term HRT therapy (1 to 5 years), or no prophylactic hormone use. The guideline did not consider calcium supplementation, newer estrogen delivery systems, or other approaches to the prevention of osteoporosis-related fractures.
Guideline developers must consider not only all the best management options, but all the important consequences of the options. As a clinician looking after individual patients, you look for information on morbidity, mortality, and quality of life and you must decide if the guideline ignores outcomes that your patients would care about. As a practitioner interested in using resources efficiently, you must also mind economic outcomes. Whether developers examine economic outcomes at all, and if they do whether they look at costs from the patients', insurers', or health care system perspective, or consider broader issues such as the consequences of time lost from work, can strongly influence final recommendations.  The majority of published guidelines do not include formal cost analyses, those that do use a variety of analytic techniques, and it will be difficult for you to determine whether actual cost estimates are valid or applicable for your practice setting. You can gain a better understanding of the potential importance of these issues by seeing if the economic projections are subjected to sensitivity analysis. If so, you can gauge the extent to which guideline recommendations might change if assumptions about costs change. You can also check to see if the guideline developers offer clinically relevant comparisons. For example, the average cost of preventing one cardiovascular-related death by means of HRT might be compared to the cost of doing the same by means of cholesterol reduction, blood pressure control or smoking cessation counseling.
In its HRT guideline, the ACP used lifetime probability of developing endometrial cancer, breast cancer, hip fracture, coronary heart disease, and stroke, and median life expectancy to estimate risks and benefits for subgroups of women. They acknowledged possible HRT effects on serum lipoproteins, uterine bleeding, sexual and urinary function, and the need for endometrial surveillance by biopsy, but did not include these considerations in the model used to synthesize evidence. The effects of HRT on costs and quality of life, which could have a major impact on patient choices, were not explicitly considered.
The users' guide on overviews includes criteria that can be used to judge whether guideline developers have done an adequate job in accumulating and synthesizing evidence.  Developers should specify a focused question, define appropriate evidence using explicit inclusion and exclusion criteria, conduct a comprehensive search, and examine the validity of the results in a reproducible fashion.
The best guidelines define admissible evidence, report how it was selected and combined, make key data available for your review, and report that they found randomized trials which link the interventions to the outcomes. Such randomized trials may, however, be unavailable, and guideline developers are in a different position from the authors of overviews who may abandon their project if there are not any high quality studies to summarize. Many important clinical problems are technically, economically or ethically difficult to address with randomized clinical trials. Because guideline developers must deal with inadequate evidence, they may have to consider a variety of studies as well as reports of expert and consumer experience. They must formulate recommendations but they should be candid about the type and quantity of evidence upon which those recommendations are based.
The nature and appropriate use of expertise is one of the most hotly debated areas in guideline development. Sometimes "experts" have pre-eminent knowledge the basic science, pathophysiology and natural history of a health condition. They may also be distinguished by extensive direct clinical experience. Persons who have witnessed and understood the limitations of clinical trials in the clinical domain offer another dimension of expertise. For some guidelines, extra emphasis may be placed on the expertise of generalists, who can gauge the practical implications of interventions applied to large groups. Although the RAND corporation and others have developed protocols for recording and quantifying expert assessments of the appropriateness of health interventions,   guideline developers must decide what type of expert opinion to solicit and how to incorporate it into the evidential foundation for guideline development. You are unlikely to find systematic methods for selecting, capturing and grading relevant expertise in today's guidelines, but you should try to determine whether and how expert opinion was used to fill in gaps in the evidence from clinical trials.
A quality of evidence scale can be used to rate different categories of evidence (e.g., expert opinion or clinical investigation) and methods for producing it (e.g., blinded or non-blinded outcome assessment) according to the likelihood that the source or design will yield biased results.  Developers working on a different problem with a different supporting literature may devise an evidence filtering instrument that stratifies case-control studies into categories of differing quality.  The prospective development and application of a systematic approach to appraising and classifying evidence is important because this means that the strength of the evidence in support of the recommendations can be reported. Strategies for summarizing the strength of both evidence and recommendations will be addressed in the second of our articles about using practice guidelines, which deals with interpreting and applying the results.
The ACP HRT guideline developers searched MEDLINE (1970 to 1991), citations from articles, and expert consultants to identify studies published in English about the treatment options and outcomes. They conducted formal overviews, including meta-analysis, and derived summary estimates of relative risks and lifetime probabilities of the principal outcomes with and without HRT for subgroups of women. These subgroups included women without risk factors, women at increased risk for coronary disease, hip fracture, or breast cancer; and women who had a hysterectomy. Their overviews met the validity criteria we have suggested. In most cases, randomized trials had not been conducted, and the investigators relied on observational studies. Therefore, they appropriately conducted sensitivity analyses to determine the implications if the results of observational studies represented over- or under-estimates of the true effect of the interventions on the relevant outcomes.
Linking treatment options to outcomes is largely a question of fact and a matter of science. In contrast, assigning preferences to outcomes is largely a question of opinion and a matter of value. The extent to which HRT increases the incidence of breast cancer or decreases death rates from myocardial infarction can be ascertained from the evidence. The relative importance placed on avoiding breast cancer or cardiovascular disease depends upon what patients care about most. Consequently, it is important that guideline developers report the sources of their value judgments and the method by which consensus was sought.
You should look for information about who was explicitly involved in assigning values to outcomes, or who, by influencing recommendations, was implicitly involved in assigning values. Expert panels and consensus groups are often used to determine what a guideline will say. You need to know who the panel members are, bearing in mind that panels dominated by members of specialty groups may be subject to intellectual, territorial, and even financial biases (some organizations screen potential panel members for conflicts of interest, others do not). By identifying the agencies that have sponsored and funded guideline development, you can decide whether their interests or delegates are over-represented on the consensus committee. Panels which include a balance of research methodologists, practicing generalists and specialists, and public representatives are more likely to have considered diverse views in their deliberations.
Even with broad representation, the actual process of deliberation can influence recommendations. You should therefore look for a report of methods used to synthesize preferences from multiple sources. Informal and unstructured processes for arbitrating values may be vulnerable to undue influence by individual panel members, particularly the panel chair. Appropriate, structured, processes increase the likelihood that all important values are duly considered. 
It is particularly important to know how patient preferences were considered. Health interventions have beneficial and harmful effects, and they have associated costs, and recommendations may differ depending on our relative emphasis on specific benefits, harms and costs. What is the relative importance of an uncertain risk for increases in breast cancer versus a fairly clear expectation of decreased incidence of heart attacks and strokes? Many guideline reports, by their silence on the matter of patient preferences, assume that guideline developers adequately represent patients' interests. Methods for directly assessing patient and societal values exist but are rarely used by guideline developers. You may be limited to gauging whether the values implicit in the guideline appear to favor patient, third party (e.g., reimbursement agencies), or societal priorities.  You can also consider which ethical principles -- such as patient autonomy (the patient's control over decisions about their health), nonmaleficience (avoiding harm), or distributive justice (the fair distribution of health care resources) -- prevailed in guiding decisions about the value of alternative interventions. For guidelines based on formal risk-benefit and cost-benefit analyses, declarations of acceptable levels of risks and cost per benefit achieved can help you make comparisons across guidelines.
Variation (disagreement) and uncertainty (ambivalence) in values could affect summary recommendations and so should be recorded and reported by guideline developers. The clinical problems for which practice guidelines are most needed often involve complex tradeoffs between competing benefits, harms and costs, usually under conditions of uncertainty. Even in the presence of strong evidence from randomized clinical trials, the effect size of an intervention may be marginal or the intervention may be associated with costs, discomforts, or impracticalities that lead to disagreement or ambivalence among guideline developers about what to recommend. Explicit strategies for documenting, describing, and dealing with dissent among judges, or frank reports of the degree of consensus attained, can help you decide whether to adopt or adapt recommendations. Unfortunately, until guideline development methods mature, you will rarely find this information.
An example of the implicit, and perhaps questionable, value judgments guideline developers make comes from the American College of Physicians recommendations for medical therapies to prevent stroke.  This guideline recommended that aspirin be considered the drug of choice in patients with transient ischemic attacks, and suggested that ticlopidine be reserved for patients who do not tolerate aspirin. The best estimate of the effect of ticlopidine relative to aspirin in patients with transient ischemic attacks is a 15% reduction in relative risk, a benefit that would translate into preventing one stroke for every 70 patients treated in a group of patients with a 10% risk of stroke. The ACP presumably makes their recommendation that aspirin, not ticlopidine, be the drug of choice for patients with transient ischemic attack on the basis of the increased cost of ticlopidine, and the need for checking the white blood cell count in patients receiving ticlopidine. This implicit value judgment could be questioned, and the guideline would be strengthened if the authors had made the values that underlie their judgment explicit.
In the case of the ACP HRT guideline, the developers gave priority to outcomes that are major contributors to morbidity and mortality in North America (e.g., the effect of long-term estrogen use on risk of death from myocardial infarction, osteoporosis-related fractures, and endometrial cancer), but acknowledged that other considerations may be as important as preventing disease and death for some women (e.g., resumption of menses, changes in mood and sexual function). The task of assigning relative value to different types of morbidity or causes of mortality is left to patients and their clinicians.
Guidelines often concern controversial health problems about which new knowledge is actively sought in ongoing studies. Because of the time required to assemble and review evidence and achieve consensus about recommendations, the guideline may be out of date by the time you see it. You should look for two important dates: the publication date of the most recent evidence considered and the date on which the final recommendations were made. Some authorities also identify important studies in progress and new information that could change the guideline. Ideally, these considerations may be used to qualify guidelines as "temporary" or "provisional," to specify dates for expiration or review, or to identify key research priorities. For most guidelines, however, you must scan the bibliography to get an impression of how current a particular guideline may be. The ACP HRT guideline gives dates for evidence considered (1970-1991) and final approval (March, 1992). The guideline acknowledged that its advice about use of estrogen in combination with a progestin was limited by uncertainty about whether the progestin neutralizes the beneficial effects of estrogen on risk factors for unwanted cardiovascular outcomes. The guideline did not alert readers to watch for results from the Postmenopausal Estrogen/Progestin Interventions (PEPI) trial, initiated in 1988, which would directly address that uncertainty. An early report from the PEPI group concludes that estrogen alone or in combination with a progestin improves lipoproteins and lowers fibrinogen levels without detectable effects on insulin or blood pressure. 
People may interpret evidence differently and their values may differ, and guidelines are subject to both sorts of differences. Your confidence in the validity of a guideline increases if external reviewers have judged the conclusions reasonable, and clinicians have found the guidelines applicable in practice. If the guidelines differ from those adduced by others, you should look for an explanation. On the other hand, if the guidelines meet the first four validity criteria and the underlying evidence is strong, rejection by clinicians or peer reviewers may have more to do with their biases than to any limitation in the validity of the guidelines.
If the underlying evidence is weak, no matter what the degree of consensus or peer review, the clinicians' confidence in the validity of the guideline will be limited. In the second part of our Users' Guide for practice guidelines, we will describe explicit frameworks for judging the strength of recommendations. The weaker the underlying evidence, the greater the argument for actually testing the guideline to determine whether its application improves patient outcomes.  The question for any such test would be: are patient outcomes better, or are outcomes equivalent at decreased cost, when clinicians operate on the basis of the practice guidelines?
Weingarten and colleagues conducted such an investigation examining the impact of implementation of a practice guideline suggesting that low-risk patients admitted to coronary care units should receive early discharge.  On alternate months over the period of a year clinicians either received or did not receive a reminder of the guideline recommendations. During the months in which the intervention was in effect, hospital stay for coronary care unit patients was approximately a day shorter and the average cost of the stay was over $1,000 less. Mortality and health status at one month were similar in the two groups. The investigators concluded that the guideline reminder reduced hospital stay and associated costs without adversely affecting measured patient outcomes. Although in this case the authors used alternate month allocation which makes the study weaker than a true randomized trial, a study of this type helps to validate the predicted consequences of guideline implementation for defined outcomes.
Once you are confident that the clinical practice guideline addresses your clinical question and is based on a rigorous up-to-date assessment of the relevant evidence, you can review the recommendations to determine how useful they will be in your practice. While not pristine, the ACP guidelines on HRT do a good job at meeting the primary criteria for using a practice guideline. We will describe how to interpret and apply the results in the next article of this series.
At the conclusion of our first article on practice guidelines in this series on how to use the medical literature, we left you examining the full text of a practice guideline that could help you marshall a convincing response to a colleague who disagrees with your approach to hormone replacement therapy (HRT) in post-menopausal women.  Later that day, chatting with another colleague, you mention the disagreement. He shrugs, and avows, "it's entirely a matter of personal preference, the evidence doesn't support either of you." You return to the guideline, looking for how particular recommendations may be justified and adapted to your patient's circumstances.
To be useful, recommendations should give practical, unambiguous advice about a specific health problem. For guidelines about managing health conditions, you should determine if the intent is to prevent, screen for, diagnose, treat, or palliate the disorder. For guidelines about the appropriate uses of health interventions, the recommendations should include a definition of the intervention and its optimal role in patient management. In the ACP guideline on HRT, recommendations are divided into general observations that can help the clinician discuss with patients the potential effects of therapy, and specific management recommendations concerning what should be done in patient evaluation, risk assessment, hormone administration and follow-up in order to achieve the outcomes predicted by the available evidence.
To be clinically important, a practice guideline should convince you that the benefits of following the recommendations are worth the expected harms and costs. You should consider both the relative and absolute changes in outcomes. A 25% reduction in relative risk of death from a disease is much more compelling if it involves a reduction in the proportion of deaths from 40/100 to 30/100 (an absolute risk reduction of 10 in 100), than if it involves a reduction in the proportion of deaths from 4/100 to 3/100 (an absolute risk reduction of 1 in 100). 
The ACP guideline cites extensive and consistent observational data to show that unopposed estrogen therapy (ET) reduces the lifetime risk of developing CHD by about 35% (for 50 year old women with no extraordinary CHD risks, about 12 out of 100 would be spared CHD in their lifetime) and hip fractures by about 15% (2 to 3 out of 100 avoid hip fracture because of ET use). In women with a uterus who take unopposed ET, the risk of developing endometrial cancer increases up to 8 fold (approximately 17 women out of 100 who take ET and would not otherwise have developed endometrial cancer will develop the disease) and the risk for breast cancer may increase as much as 25% (absolute increase of about 3 out of 100 women). Clearly, the relative increases or decreases in outcomes can be misleading if baseline risks and absolute changes in outcomes are not reported. Addition of progestin maintains hip fracture risk reduction and removes the increased risk of endometrial cancer, but has uncertain effects on risks for breast cancer and cardiovascular disease. HRT can increase life expectancy by 10 months to 2 years, depending on the presence of risk factors, a gain similar to that achieved by treatment of hypertension. The guideline did not consider personal or societal costs associated with HRT.
The "strength," "grade," "confidence," or "force" of a recommendation should be informed by multiple considerations: the quality of the investigations which provide the evidence for the recommendations, the magnitude and consistency of positive outcomes relative to negative outcomes (adverse effects, burdens to the patient and the health care system, costs), and the relative value placed upon different outcomes. Even in the presence of strong evidence from randomized clinical trials, the effect size of an intervention may be marginal. The intervention may be associated with costs, discomforts, or impracticalities that downgrade the strength of a summary recommendation about what practicing clinicians should do. It is very important for you to consider this distinction and to scrutinize a guideline document for what, in addition to evidence, determines the wording of actual recommendations. These factors are key to understanding apparent conflicts among guidelines on similar topics from different organizations. 
In our first article about using practice guidelines, we pointed out that the best available evidence about the effects of health interventions may come from sources as diverse as, on the one hand, well-conducted randomized trials and, on the other, expert opinion. Thus, users of practice guidelines will find tremendous variability in strength of the evidence linking options and outcomes. Among guidelines developed by different groups about the same health condition or intervention, however, there should be little variability in estimates of the strength of evidence as long as the supporting overviews considered the same body of literature.    Here, differences in recommendations probably reflect differences in the relative value placed on various health and economic outcomes.  Unfortunately, these considerations are rarely exposed in guideline documents and there is no commonly accepted approach for grading evidence or recommendations.     For you to answer -- "how strong are the recommendations?" -- it is sufficient to discern whether the clinical importance of a recommendation, determined by the size of expected positive and negative outcomes, costs and consequences, is sufficient to motivate a change in your practices.
Formal taxonomies of "levels of evidence" and "grades of recommendations" were first popularized but the Canadian Task Force on the Periodic Health Examination (CTF), and later revised in cooperation with the United States Preventive Services Task Force (USPSTF).  Like previous articles in this series, these guideline developers emphasized that the strongest evidence comes from rigorous randomized trials, and weaker evidence from observational studies using cohort or case-control designs (Table 2).  Inferring strength of evidence from study design alone, however, may overlook other determinants of the quality of evidence, such as sample size, recruitment bias, losses to follow-up, unmasked outcome assessment, atypical patient groups, unreproduceable interventions, impractical clinical settings, and other threats to internal and external validity. Moreover, results from a single randomized clinical trial with a small sample size are not necessarily more convincing than consistent results with high precision from a large number of high-quality trails of non-randomized design conducted in a variety of places and times. Recent proposals for summarizing strength of evidence have emphasized the need for overviews to filter out studies with major design flaws, and meta-analyses to consider the precision, magnitude and heterogeneity of study results.  The USPSTF now supplements its "study design categories" with prose descriptions of flaws in the published evidence. 
Table 2: Levels of Evidence*
* Canadian Task Force on the Periodic Health Examination: The periodic health examination: 2. 1987 update. Can Med Assoc J 1988;138:618-26.
Another approach to categorizing evidence from multiple studies offers a hierarchy from overviews of observational studies with inconsistent results to overviews of randomized trials with consistent results (Table 3).  Since inferences about the health effects of interventions are weakened when there are unexplained major differences in effects in different studies, guidelines based on randomized trials are stronger when the results of individual studies are similar, and weaker when major differences between studies (heterogeneity) is present. If the evidence linking interventions and outcomes came from overviews of articles, you could apply the criteria for a valid overview and the schema in Table 3to decide on the strength of evidence supporting recommendations.
Table 3: Grades of Recommendations for a specified level of baseline risk
*RCT - Randomized controlled trial
CI - Confidence interval
NNT - Number needed to treat to avoid one unwanted outcome
This approach is constrained by its focus on only one major outcome (for HRT we are interested in many outcomes), but it exemplifies how the strength of evidence and the strength of recommendations could be integrated on a common scale. It considers study design, heterogeneity, effect size, confidence intervals around the effect sizes, and threshold effect sizes over which negative outcomes outweigh the benefits. The threshold effect size presumes value judgments about the relative importance of various outcomes resulting from the health intervention have been applied. In principle, strong recommendations are warranted when the smallest effect compatible with the data (the lower boundary of the confidence interval) is still greater than the threshold below which the negative outcomes outweigh the benefits.
If the guidelines are developed on the basis of observational studies or if the estimate of the magnitude of the treatment effect is imprecise, the user should not expect strong recommendations unless there are major harms and costs associated with the intervention or a catastrophic outcome (e.g., death) may be prevented by a low risk, low cost, intervention of probable efficacy. Guideline developers could compensate for weak evidence by testing the effect of their guideline on patient outcomes in a real-world clinical situation.  Such a study, if methodologically strong, could enhance the strength of the recommendations in the absence of strong evidence from original studies.
While the ACP HRT guideline does not grade its recommendations, the guideline does cross-reference recommendations to discussions about evidence and effect sizes in the accompanying overview. Because the guideline is based largely on observational studies, the recommendations are relatively weak, and would be categorized as C1 in the schema we present in Table 3.
Guideline developers should consider the possibility that the effect of a management option on an outcome, or the relative value of different outcomes, is much greater, or much less, than their best estimate. We have discussed how to examine this possibility, a process we call sensitivity analysis, in the users' guide for decision analysis.  The weaker the evidence linking intervention and outcome, and the greater the possible range of competing values, the greater the need for a sensitivity analysis. For example, the range of plausible estimates of the impact of HRT on breast cancer is very wide, and guideline developers should test how their recommendations would differ across the range of possible effects. When the evidence is of the weakest sort, arising from expert opinion, sensitivity analysis is essential.
The authors of the HRT guideline acknowledge that the observational design of the studies may introduce bias, and they alert us to areas where the evidence is particularly weak (such as the effect of combined estrogen and progestins on breast cancer). They don't, however, provide a formal sensitivity analysis. Such a sensitivity analysis might have been useful in highlighting the uncertainty of many of the estimates on which the recommendations are based, particularly those relating to life expectancy.
You should try to anticipate how a guideline will be used. Guidelines may be disseminated to assist physicians with clinical decision making (for example, clinical algorithms and reminders), to enable evaluation of physician practices (for example, utilization review, quality assurance), or to set limits on physician choices (for example, recertification, reimbursement). Guidelines may be directed at different practitioners. Some guidelines about detection and treatment of depression have, for example, aimed to guide primary care providers and others to guide psychiatrists.  You should ensure the purpose of the guideline meets the use you intend for it.
To be really useful, guidelines should describe interventions well enough for their exact duplication. You must determine whether your patients are the intended target of a particular guideline. If your patients have a different prevalence of disease or risk factors, for instance, the guidelines may not apply.
The flexibility of the guideline may be indicated by patient or practice characteristics that require individualizing recommendations or that justify departures from the recommendations. For example, the American College of Cardiology, the American Heart Association, and the American College of Physicians advise against using electrocardiograms to screen asymptomatic adults, but they acknowledge that this advice may not be valid for persons who smoke, are male and of "increased age," have a family history of coronary artery disease, have hypertension, diabetes or other cardiovascular risk factors, are sedentary, or whose occupation affects public safety.      The caveats reflect reluctance to make recommendations in the absence of good evidence. They also exclude groups of patients who, in total, may account for a majority of an internist's patients!
You should look for information that must be obtained from and provided to patients and for patient preferences that should be considered. It is important to consider whether the values assigned (implicitly or explicitly) to alternative outcomes could differ enough from your patients' preferences to change a decision about whether to adopt a recommendation.
When you review the HRT guidelines, you may begin to understand why your colleague in the scenario with which this article began felt that recommendations regarding HRT must be different for every patient. In its HRT guideline, the ACP offers separate recommendations for women at increased risk for CHD, women at increased risk for hip fracture and for breast cancer, and women who have had a hysterectomy. These different recommendations reflect the fact that different women are at varying risk of adverse outcomes, and the impact of HRT on them will therefore differ. The most vivid example is women who have had a hysterectomy: since they are not at risk of endometrial cancer, unopposed estrogen is much more likely to be the right treatment choice.
The ACP recommends that all women consider taking preventive hormone therapy, while admitting that no evidence supports strong advice except for some women who are at increased risk for some outcomes. The guidelines suggest that women at increased risk for CHD are likely to achieve longevity gains from HRT, but that conclusion needs to be confirmed by randomized trials. HRT is likely to decrease the risk of hip, vertebral and wrist fractures, but, without a progestin, risks for endometrial cancer increase up to eight fold. Women who have had a hysterectomy should take estrogen therapy alone; others should add a progestin or comply with careful endometrial monitoring. The effect of estrogen on breast cancer appears to be small, but the evidence is weak and many women may not be willing to "take a chance," particularly if they bear low or average risks for CHD. Clinicians should assess risks, estimate benefits and harms, educate patients, and facilitate individualized decision making for all postmenopausal patients.
There is certainly much more to making decisions about HRT than perhaps you or your colleague had at first appreciated. There are many options, multiple outcomes, and significant trade-offs in benefits and harms. A good guideline, based upon solid scientific evidence and an explicit process for judging the value of alternative practices, allows you to review, at one sitting, links between multiple options and outcomes. Unfortunately, well developed and usefully summarized guidelines are still rare in the clinical literature. We hope that more consistent reporting of guideline development methods will prevail, making the guidelines literature more accessible to and useful for prospective guideline users. 
© 2001 Evidence-Based Medicine Informatics Project
© 2001 Centre for Health Evidence.
Home. Users' Guides to EBP. Webmaster. Disclaimer.