Evidence-based Medicine Toolkit Carl Heneghan and Douglas Badenoch Centre for Evidence-based Medicine, Nuffield Department of Clinical Medicine, John Radcliffe Hospital, Headington, Oxford © BMJ Books 2002 BMJ Books is an imprint of the BMJ Publishing Group All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording and/or otherwise, without the prior written permission of the publishers.
First published in 2002 by BMJ Books, BMA House, Tavistock Square, London WC1H 9JR www. bmjbooks. com British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 0 7279 1601 7 Typeset by Newgen Imaging Systems Pvt. Ltd. Printed and bound in Spain by GraphyCems, Navarra Contents
Introduction Asking answerable questions Finding the evidence Appraising therapy articles Appraising diagnosis articles Appraising systematic reviews Appraising articles on harm/aetiology Appraising prognosis studies Applying the evidence Evidence-based medicine: glossary of terms Selected evidence-based healthcare resources on the web Levels of evidence and grades of recommendations Study designs Critically appraised topics (CATs) Index 1 2 5 11 18 25 29 33 37 43 49 50 55 57 62 This handbook was compiled by Carl Heneghan and Douglas Badenoch.
The materials have largely been adapted from previous work by those who know better than us, especially other members of the Centre for Evidence-based Medicine (Chris Ball, Martin Dawes, Jonathan Mant, Bob Phillips, David Sackett, Kate Seers, Sharon Straus) and CASPfew (Steve Ashwell, Anne Brice, Andre Tomlin). Introduction This “toolkit” is designed as a summary and reminder of the key elements of practising evidence-based medicine (EBM). It has largely been adapted from resources developed at the Centre for Evidence-based Medicine. For more detailed coverage, you should refer to the other EBM texts and web pages cited throughout.
The first page of each chapter presents a “minimalist” checklist of the key points. Further sections within each chapter address these points in more detail and give additional background information. Ideally, you should just need to refer to the first page to get the basics, and delve into the further sections as required. Occasionally, you will see the dustbin icon on the right. This means that the question being discussed is a “filter” question for critical appraisal: if the answer is not satisfactory, you should consider ditching the paper and looking elsewhere.
If you don’t ditch the paper, you should be aware that the effect it describes may not appear in your patient in the same way. Definition of Evidence-based Medicine Evidence-based Medicine is the “conscientious, explicit and judicious use of current best evidence in making decisions about individual patients”. This means “integrating individual clinical expertise with the best available external clinical evidence from systematic research”. 1 We can summarise the EBM approach as a five-step model: 1. 2. 3. 4. Asking answerable clinical questions.
Searching for the evidence. Critically appraising the evidence for its validity and relevance. Making a decision, by integrating the evidence with your clinical expertise and the patient’s values. 5. Evaluating your performance. Reference 1. Sackett DL et al. Evidence based medicine: what it is and what it isn’t. BMJ 1996;312:71–2. 1 Asking answerable questions The four main elements of a well-formed clinical question are: 1. 2. 3. 4. Patient or Problem Intervention Comparison intervention (if appropriate) Outcome(s) Element Patient or Problem
Tips Starting with your patient ask “How would I describe a group of patients similar to mine? ” Ask “Which main intervention am I considering? ” Specific example “In women over 40 with heart failure from dilated cardiomyopathy …” Intervention “… would adding anticoagulation with warfarin to standard heart failure therapy…” “… when compared with standard therapy alone …” “… lead to lower mortality or morbidity from thromboembolism. ” Comparison intervention Ask “What is the main alternative to compare with the intervention? ” Ask “What can I hope to accomplish? ” or “What could this exposure really affect? Outcome The terms you identify from this process will form the basis of your search for evidence and the question as your guide in assessing its relevance. Bear in mind that how specific you are will affect the outcome of your search: general terms (such as “heart failure”) will give you a broad search, while more specific terms (for example, “congestive heart failure”) will narrow the search. Also, you should think about alternative ways or aspects of describing your question (for example, New York Heart Association Classification). 2 Asking Answerable Questions Patient or problem
Firstly, think about the patient and/or setting you are dealing with. Try to identify all of the clinical characteristics which influence the problem, which are relevant to your practice and which would affect the relevance of research you might find. It will help your search if you can be as specific as possible at this stage, but you should bear in mind that if you are too narrow in searching you may miss important articles (see next section). Intervention Next, think about what you are considering doing. In therapy, this may be a drug or counselling; in diagnosis it could be a test or screening programme.
If your question is about harm or aetiology, it may be exposure to an environmental agent. Again, it pays to be specific when describing the intervention, as you will want to reflect what is possible in your practice. If considering drug treatment, for example, dosage and delivery should be included. Again, you can always broaden your search later if your question is too narrow. Comparison intervention What would you do if you didn’t perform the intervention? This might be nothing, or standard care, but you should think at this stage about the alternatives. There may be useful evidence which directly compares the two interventions.
Even if there isn’t, this will remind you that any evidence on the intervention should be interpreted in the context of what your normal practice would be. Outcome There is an important distinction to be made between the outcome that is relevant to your patient or problem and the outcome measures deployed in studies. You should spend some time working out exactly what outcome is important to you, your patient, and the time-frame which is appropriate. In serious diseases it is often easy to concentrate on the mortality and miss the important aspects of morbidity.
However, outcome measures, and the relevant time to their measurement, may be guided by the studies themselves and not by your original question. This is particularly true, for example, when looking at pain relief, where the patient’s objective may be “relief of pain” while the studies may define and assess this using a range of different measures. 1 3 Evidence-based Medicine Toolkit Type of question Once you have created a question, it is helpful to think about what type of question you are asking, as this will affect where you look for the answer and what type of research you can expect to provide the answer.
Typology for question building 1. Clinical findings: how to interpret findings from the history and clinical examination. 2. Aetiology: the causes of disease and their modes of operation. 3. Differential diagnosis: when considering the possible causes of a patient’s clinical problem, how to rank them by likelihood, seriousness and treatability. 4. Prognosis: the probable course of disease over time and prediction of likely outcomes. 5. Therapy: selection of treatments based on efficacy, cost and your patient’s values. 6.
Prevention: identifying primary and secondary risk factors, leading to therapy or behavioural change. 7. Cost-effectiveness: is one intervention more cost-effective than another? 8. Quality of life: what will be the quality of life of the patient following (or without) this intervention? Consult the Levels of Evidence table on p50–4 to see what type of study would give you the best evidence for each type of question. Deciding which question to ask • Which question is most important to the patient’s wellbeing? (Have you taken into account the patient’s perspective? • Which question is most feasible to answer in the time you have available? • Which question is most likely to benefit your clinical practice? • Which question is most interesting to you? Further reading Educational Prescriptions: http://www. cebm. net Gray J. Doing the Right Things Right. In: Evidence Based Health-Care. New York: Churchill Livingstone, 1997, chapter 2. Richardson W, Wilson M, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions [editorial]. ACP J Club 1995;123:A12–13. See also http://cebm. jr2. ox. ac. uk/docs/focusquest. tml 4 Finding the evidence Convert your question to a search strategy Identify terms which you would want to include in your search. Patient or Problem Intervention Comparison Outcome Identify sources of evidence 1. Levels of evidence (see p50–4): what type of study would give you the best quality evidence for your question? 2. Critically Appraised Topics (see p57–61): is there a CAT available on your clinical question? 3. Secondary sources: is there a quality and relevance-filtered summary of evidence on your question, such as in ACP Journal Club or Best Evidence? 4.
Systematic reviews: is there a systematic review in the Cochrane Library? 5. Bibliographic databases: in which database would you find a relevant clinical trial? Electronic sources of evidence Source CATs (see p57) Best Evidence Cochrane Library Availability http://www. cebm. net your collection CD Rom Advantages Pre-appraised summaries for a clinical question Pre-appraised summaries filtered for clinical relevance High-quality systematic reviews which cover a complete topic Original research articles, up-to-date Disadvantages Only one study per CAT; time-limited; quality control Limited coverage
CD Rom, online from http://www. updatesoftware. com CD Rom, online Limited coverage, time lag, can be difficult to use Difficult to search effectively, no quality filtering, bibliographic text Bibliographic databases (MEDLINE, CINAHL, etc) 5 Evidence-based Medicine Toolkit Secondary sources Of course, if someone has already searched for and appraised evidence around your question, it makes sense to share that information if possible. One way this can be done, either for your own private use or for sharing with others, is in the form of Critically Appraised Topics or CATs.
Many people make their CATs available on the web (see p57) and you might like to start searching here. You should be wary, however, of the provenance of these CATs: is there an explicit quality control process which has been applied to them and have they been updated recently? Source Bandolier TRIP http:// www. jr2. ox. ac. uk/ Bandolier www. tripdatabase. com Contains User-friendly, searchable collection of evidencebased summaries and commentaries Searchable database of links to evidence-based summaries and guidelines on the web
Secondary journals, such as ACP Journal Club and Evidence-Based Medicine, publish structured abstracts which summarise the best quality and most clinically useful recent research from the literature. This is an excellent way to use the limited time at your disposal for reading, and the Best Evidence CD Rom provides quick access to the back catalogue of both of these journals. The Cochrane Library, which contains the full text of over 1 000 systematic reviews, may be your next port of call.
A good systematic review will summarise all of the high-quality published (and unpublished) research around a specific question. However, bear in mind that there may not be a systematic review which tackles your specific question, interpreting reviews can be time-consuming, and there may be more recent research which has not yet been incorporated into the review. Choosing the right bibliographic database(s) A bibliographic database consists of bibliographic records (usually with abstract) of published literature from journals, monographs, and serials.
It is important to be aware that different bibliographic databases cover different subject areas, and to search the one(s) most relevant to your needs. 6 Finding the Evidence Database CINAHL Coverage Nursing and allied health, health education, occupational and physiotherapy, social services European equivalent of MEDLINE, with emphasis on drugs and pharmacology US database covering all aspects of clinical medicine, biological sciences, education, technology, and health-related social and information sciences Psychology, psychiatry and related disciplines, including sociology, linguistics and education
EMBASE MEDLINE PsycLIT Search strategies for MEDLINE and other bibliographic databases There are two main types of strategy for Unfortunately, the index may not correspond searching bibliographic databases: exactly to your needs (and the indexers may thesaurus searching (all articles are not have been consistent in the way they assigned articles to subject headings); indexed under subject headings, so if similarly, using textword searching alone may you search for a specific heading you miss important articles.
For these reasons, will pick up lots of potentially relevant you should use both thesaurus and textword materials) and textword searching searching where possible. (where you search for the occurrence of specific words or phrases in the article’s bibliographic record). Most databases allow you to build up a query by typing multiple statements which you can combine using Boolean operators (see below). Here is an example: Question: In postmenopausal women, what are the effects of HRT on osteoporosis? Textword search #1 #2 #3 #4 #5 #6 hormone OR ? strogen #1AND therap* #2 OR HRT bone AND density #4 OR osteoporosis #3 AND #5 Thesaurus search #1 Estrogen-Replacement Therapy/all subheadings #2 Bone-Density/all subheadings #3 Osteoporosis/all subheadings #4 #2 OR #3 #5 #1 AND #4 It is best to start your search by casting your net wide with both textword and thesaurus searching (a high-sensitivity search, to catch all the articles which may be relevant), and progressively narrowing it to exclude irrelevant items (increasing specificity). 7 Evidence-based Medicine Toolkit To increase sensitivity: 1. 2. 3. . Expand your search using (broader terms in) the thesaurus. Use a textword search of the database. Use truncation and wildcards to catch spelling variants. Use Boolean OR to make sure you have included all alternatives for the terms you are after (for example (myocardial AND infarction) OR (heart AND attack)). Use a thesaurus to identify more specific headings. Use more specific terms in textword search. Use Boolean AND to represent other aspects of the question. Limit the search by publication type, year of publication, etc. To increase specificity: 1. . 3. 4. Depending on which databases you use, these features might have different keystrokes or commands associated with them; however, we have tried to summarise them as best we can in the table below. Feature Expand Key thesaurus (MeSH) *(or $) ? Explanation Use explosion and include all sub-headings to expand your search. analy* analysis, analytic, analytical, analyse, etc. Truncation Wildcards gyn? ecology gynaecology, gynecology; randomi? * randomisation, randomization, randomised. Article must include both terms. Article can include either term.
Excludes articles containing the term (for example econom* NOT economy picks up economic and economical but not economy). Terms must occur close to each other (for example within 6 words) (heart NEAR failure) As appropriate, restrict by publication type (clinicaltrial. pt), year, language, possibly by study characteristics, or by searching for terms in specific parts of the document (for example diabet* in ti will search for articles which have diabetes or diabetic in the title). Once you’ve found a useful article, this feature (for example in PubMed by clicking the “Related” hyperlink) searches for similar items in the database. Boolean
AND OR NOT Proximity Limit NEAR variable Related variable 8 Finding the Evidence If you want to target high-quality evidence, it is possible to construct search strategies that will only pick up the best evidence; see the CASPfew web site for examples (http://www. phru. nhs. uk/~casp/filters. htm). Some MEDLINE services provide such search “filters” online, so that you can click them or upload them automatically. However, you might also like to check out the Levels of Evidence on p50–4 to get an idea of what type of research would yield the best quality of information for each type of question (therapy, diagnosis, prognosis, etc. . PubMed: MEDLINE on the internet The US National Library of Medicine now offers its MEDLINE database free on the web at http://www. pubmed. gov. Here are some quick hints to help you to get the most out of this excellent service. • Type search terms into the query box and click GO. • Multiple terms are automatically ANDed unless you specifically include Boolean operators in UPPER CASE, for example (hormone replacement) OR hrt. • Search terms are automatically truncated and mapped to the thesaurus. • You can bypass truncation by enclosing your terms in double quotes. You can target a specific field of the record by following your query with the field code in square brackets: bloggs j [au] will search for bloggs j in the author field. • Use the asterisk (*) for truncation. • The Details button allows you to view your search as PubMed translated it and to save your search (as a Bookmark in your browser). • Once you’ve found a good article, use Related Articles to search for similar ones. Consult PubMed’s online help for more details. Searching the internet You might like to begin searching the internet using a specialised search engine which focuses on evidence-based sources.
Two such services are TRIP (see above) and SUMSearch (http://sumsearch. uthscsa. edu/) which searches other websites for you, optimising your search by question type and number of hits. Generic internet search engines offer two main types of search: by category (where the search engine has classified web pages into subject category) or by free text search (where any occurrence of a term in a web page provides you with a “hit”). Obviously, the former strategy offers greater specificity, while the latter offers better sensitivity. 9 Evidence-based Medicine Toolkit
In searching for clinical information on the internet, you should be wary of the provenance of the material; ask yourself first: does this website have a clear quality control policy which has been applied to the material? Using Yahoo! (www. yahoo. com) Yahoo has a clear selection of categories, but there is considerable overlap between them, so it is worth doing a text search, which will list all the Yahoo categories as well as individual websites. Feature Truncation Adjacency AND Limits t: u: Key * “” Explanation analy* analysis, analytic, analytical, analyse, etc.
Words must be adjacent to each other: for example “heart attack” natural childbirth documents must contain both words Words must occur in title of the document (t:natural childbirth) or words must occur in web address (u:uk) Yahoo ranks the outcome of your search: documents that contain multiple matches with your search text are ranked highest; those that match your search in the document title are next highest. Other good search engines include Google (www. google. com), which has no advertising on its simple front-end and a very user-friendly search optimisation page.
Further reading CASPfew: http://www. phru. nhs. uk/~casp/filters. htm: includes introductory exercises, toolkit and sources guide. CEBM: http://www. cebm. net: includes tips on how to target high-quality trials on specific question types (therapy, diagnosis, etc. ). McKibbon A et al. PDQ Evidence-Based Principles and Practice. Hamilton, ON: BC Decker, 2000. Snowball R. Finding the evidence: an information skills approach. In M Dawes (ed. ), Evidencebased Practice: a primer for health care professionals. Edinburgh: Churchill Livingstone, 1999, pp15–46.
The SCHARR guide to EBP on the internet: http://www. nettingtheevidence. org. uk. 10 Appraising therapy articles Is the study valid? 1. Was there a clearly defined research question? 2. Was the assignment of patients to treatments randomised and was the randomisation list concealed? 3. Were all patients accounted for at its conclusion? Was there an “intention-to-treat” analysis? 4. Were research participants “blinded”? 5. Were the groups treated equally throughout? 6. Did randomisation produce comparable groups at the start of the trial?
Are the results important? Relative Risk Reduction (RRR) (CER EER) / CER Absolute Risk Reduction (ARR) CER EER Number Needed to Treat (NNT) 1 / ARR Is the study valid? 1. Was there a clearly defined research question? What question has the research been designed to answer? Was the question focused in terms of the population group studied, the intervention received and the outcomes considered? 2. Were the groups randomised? The most important type of research for answering therapy questions is the randomised controlled trial (RCT).
The major reason for randomisation is to create two (or more) comparison groups which are similar. To reduce bias as much as possible, the decision as to which treatment a patient receives should be determined by random allocation. Concealed randomisation As a supplementary point, clinicians who are entering patients into a trial may consciously or unconsciously distort the balance between groups if they know the 11 Evidence-based Medicine Toolkit treatments given to previous patients. For this reason, it is preferable that the randomisation list be concealed from the clinicians.
Why is this important? Randomisation is important because it spreads all confounding variables evenly amongst the study groups, even the ones we don’t know about. Stratified randomisation True random allocation can result in some differences occurring between the two groups through chance, particularly if the sample size is small. This can lead to difficulty when analysing the results if, for instance, there was an important difference in severity of disease between the two groups. Using stratified randomisation, the esearcher identifies the most important factors relevant to that research question; randomisation is then stratified such that these factors are equally distributed in the control and experimental groups. 3. Were all patients accounted for at its conclusion? There are three major aspects to assessing the follow up of trials: • Did so many patients drop out of the trial that its results are in doubt? • Was the study long enough to allow outcomes to become manifest? • Were patients analysed in the groups to which they were originally assigned (intention-to-treat)?
Drop-out rates The undertaking of a clinical trial is usually time-consuming and difficult to complete properly. If less than 80% of patients are adequately followed up then the results may be invalid. The American College of Physicians has decided to use 80% as its threshold for inclusion of papers into the ACP Journal and Evidence-Based Medicine. Length of study Studies must allow enough time for outcomes to become manifest. You should use your clinical judgement to decide whether this was true for the study you are appraising, and whether the length of follow up was appropriate to the outcomes you are interested in.
Intention-to-treat Sometimes, patients may change treatment aims during the course of a study, for all sorts of reasons. If we analysed the patients on the basis of what treatment they got rather than what they were allocated (intention-to-treat), we have altered the even distribution of confounders produced by randomisation. So, all 12 Appraising Therapy Articles patients should be analysed in the groups to which they were originally randomised, even if this is not the treatment they actually got. 4. Were the research participants “blinded”?
Ideally, patients and clinicians should not know whether they are receiving the treatment. The assessors may unconsciously bias their assessment of outcomes if they are aware of the treatment. This is known as observer bias. So, the ideal trial would blind patients, carers, assessors and analysts alike. The terms single, double and triple blind are sometimes used to describe these permutations. However, there is some variation in their usage and you should check to see exactly who was blinded in a trial. Of course, it may have been impossible to blind certain groups of participants, depending on the type of intervention.
Note also that concealment of randomisation, which happens before patients are enrolled, is different from blinding, which happens afterwards. Placebo control Patients do better if they think they are receiving a treatment than if they do not; the placebo effect is a widely accepted potential bias in trials. So, the ideal trial would perform “double-blind” randomisation (where both the patient and the clinician do not know whether they are receiving active or placebo treatment), and where the randomisation list is concealed from the clinician allocating treatment (see above).
In some cases, it would not be possible to blind either or both of the participants (depending on the type of intervention and outcome), but researchers should endeavour to carry out blind allocation and assessment of outcomes wherever possible. 5. Equal treatment It should be clear from the article that, for example, there were no co-interventions which were applied to one group but not the other and that the groups were followed similarly with similar check-ups. 6. Did randomisation produce comparable groups at the start of the trial?
The purpose of randomisation is to generate two (or more) groups of patients who are similar in all important ways. The authors should allow you to check this by displaying important characteristics of the groups in tabular form. Outcome measures An outcome measure is any feature that is recorded to determine the progression of the disease or problem being studied. Outcomes should be objectively defined and 13 Evidence-based Medicine Toolkit measured wherever possible. Often, outcomes are expressed as mean values of measures rather than numbers of individuals having a particular outcome.
The use of means can hide important information about the characteristics of patients who have improved and, perhaps more importantly, those who have got worse. Are the results important? Two things you need to consider are how large is the treatment effect and how precise is the finding from the trial. In any clinical therapeutic study there are three explanations for the observed effect: 1. Bias. 2. Chance variation between the two groups. 3. The effect of the treatment. Once bias has been excluded (by asking if the study is valid), we must consider the possibility that the results are a chance effect. Values Alongside the results, the paper should report a measure of the likelihood that this result could have occurred if the treatment was no better than the control. The p value is a commonly used measure of this probability. Quantifying the risk of benefit and harm Once chance and bias have been ruled out, we must examine the difference in event rates between the control and experimental groups to see if there is a significant difference. These event rates can be calculated as shown below: For example, a p value of 0. 1 means that there is a less than 1 in 100 (1%) probability of the result occurring by chance; p 0. 05 means this is less than 1 in 20 probability. Control Event Experimental a b Control event rate (CER) a / (a c) Experimental event rate (EER) b / (b d) No event c d 14 Appraising Therapy Articles Relative risk reduction (RRR) Relative risk reduction is the percentage reduction in events in the treated group event rate (EER) compared to the control group event rate (CER): CER EER CER RRR Absolute risk reduction (ARR) Absolute risk reduction is the absolute difference between the control and experimental group.
ARR CER EER ARR is a more clinically relevant measure to use than RRR. This is because RRR “factors out” the baseline risk, so that small differences in risk can seem significant when compared to a small baseline risk. Consider the two sets of sample figures below, where the same RRR is found even though the treatment shows ten times greater absolute benefit in sample 1: CER 1 2 0. 36 (36%) 0. 036% (3. 6%) EER 0. 34 (34%) 0. 034 (3. 4%) 0. 36 0. 036 ARR 0. 34 (2%) 0. 02 (0. 36 RRR 0. 34) / 0. 36 5. 6% 0. 034 0. 002 (0. 2%) (0. 036 0. 034) / 0. 036 5. 6%
Number needed to treat (NNT) Number needed to treat is the most useful measure of benefit, as it tells you the absolute number of patients who need to be treated to prevent one bad outcome. It is the inverse of the ARR: NNT 1 ARR 15 Evidence-based Medicine Toolkit Mortality in patients surviving acute myocardial infarction for at least 3 days with left ventricular ejection fraction 40% (ISIS-4, Lancet 1995) Placebo: Captopril: control event experimental event rate (CER) rate (EER) 275 / 1116 0. 2464 (24. 64%) 228 / 1115 0. 2045 (20. 45%) Relative risk reduction (RRR) Absolute risk reduction (ARR) Number needed to treat (NNT)
CER EER CER 0. 2464 0. 2045 0. 2464 17% CER EER 1 / ARR 0. 2464 0. 2054 1 / 0. 0419 24 0. 0419 (NNTs always (4. 19%) round UP) Confidence intervals (CIs) Any study can only examine a sample of a population. Hence, we would expect the sample to be different from the population. This is known as sampling error. Confidence intervals (CIs) are used to represent sampling error. A 95% CI specifies that there is a 95% chance that the population’s “true” value lies between the two limits. The 95% CI on an NNT 1 / the 95% CI on its ARR: CER (1 CER) EER (1 EER) 95% CI on the ARR /1. 96 # of control patients # of exper. patients
If a confidence interval crosses the “line of no difference” (i. e. the point at which a benefit becomes a harm), then we can conclude that the results are not statistically significant. Relative risk (RR) Relative risk is also used to quantify the difference in risk between control and experimental groups. Relative risk is a ratio of the risk in the experimental group to the risk in the control group. RR EER / CER Thus, an RR below 1 shows that there is less risk of the event in the experimental group. As with the RRR, relative risk does not tell you anything about the baseline risk, or therefore the absolute benefit to be gained. 6 Appraising Therapy Articles Summary An evidence-based approach to deciding whether a treatment is effective for your patient involves the following steps: 1. 2. 3. 4. 5. Frame the clinical question. Search for evidence concerning the efficacy of the therapy. Assess the methods used to carry out the trial of the therapy. Determine the NNT of the therapy. Decide whether the NNT can apply to your patient, and estimate a particularised NNT. 6. Incorporate your patient’s values and preferences into deciding on a course of action. Further reading Bandolier Guide to Bias: http://www. jr2. ox. ac. k/bandolier/band80/b80-2. html Dawes M et al. Evidence-Based Practice: a primer for health care professionals. Edinburgh: Churchill Livingstone, 1999, pp. 49–58. Greenhalgh P. How to Read a Paper, 2nd ed. London: BMJ Books, 2001. Guyatt GH et al. Users’ Guides to the Medical Literature II: How to use an article about therapy or prevention A: Are the results of the study valid? JAMA 1993;270(21):2598–601. Guyatt GH et al. Users’ Guides to the Medical Literature II: How to use an article about therapy or prevention B: What were the results and will they help me in caring for my patients? JAMA 1994:271(1):59–63.
ISIS-4 (Fourth International Study of Infarct Survival) Collaborative Group. Lancet 1995:345:669–85. See also the CAT at www. eboncall. org Sackett DL et al. Evidence-Based Medicine: How to practice and teach EBM. New York: Churchill Livingstone, 2000. 17 Appraising diagnosis articles Is the study valid? 1. Was there a clearly defined question? 2. Was the presence or absence of the target disorder confirmed with a validated test (“gold” or reference standard)? • Was this comparison independent from and blind to the study test results? 3. Was the test evaluated on an appropriate spectrum of patients? 4.
Was the reference standard applied to all patients? Are the results important? Target Disorder Present Positive Test result Negative Totals Absent Totals a c b d a b c d a c b d a b c d Sensitivity Specificity a/(a+c) d/(b+d) LR LR sens/(1 (1 spec) sens)/spec Likelihood ratio for a positive test result Likelihood ratio for a negative test result 18 Appraising Diagnosis Articles Is the study valid? 1. Was there a clearly defined question? What question has the research been designed to answer? Was the question focused in terms of the population group studied, the target disorder and the test(s) considered? . Was the presence or absence of the target disorder confirmed with a validated test (“gold” or reference standard)? How did the investigators know whether or not a patient in the study really had the disease? To do this, they will have needed some reference standard test (or series of tests) which they know “always” tells the truth. You need to consider whether the reference standard used is sufficiently accurate. Sometimes, there may not be a single test that is suitable as a reference standard. A range of tests may be needed, and/or an expert panel to decide whether the disease is present or absent.
Were the reference standard and the diagnostic test interpreted blind and independently of each other? If the study investigators know the result of the reference standard test, this might influence their interpretation of the diagnostic test and vice versa. 3. Was the test evaluated on an appropriate spectrum of patients? A test may perform differently depending upon the sort of patients on whom it is carried out. A test is going to perform better in terms of detecting people with disease if it is used on people in whom the disease is more severe or advanced.
Similarly, the test will produce more false positive results if it is carried out on patients with other diseases that might mimic the disease that is being tested for. The issue to consider when appraising a paper is whether the test was evaluated on the typical sort of patients on whom the test would be carried out in real life. 4. Was the reference standard applied to all patients? Ideally, both the test being evaluated and the reference standard should be carried out on all patients in the study.
For example, if the test under investigation proves positive, there may be a temptation not to bother administering the reference standard test. Therefore, when reading the paper you need However, this may not be possible for both practical and ethical reasons. For example, the reference test may be invasive and may expose the patient to some risk and/or discomfort. 19 Evidence-based Medicine Toolkit to find out whether the reference standard was applied to all patients, and if it wasn’t look at what steps the investigators took to find out what the “truth” was in patients who did not have the reference test.
Is it clear how the test was carried out? To be able to apply the results of the study to your own clinical practice, you need to be confident that the test is performed in the same way in your setting as it was in the study. Is the test result reproducible? This is essentially asking whether you get For example, if two observers made a the same result if different people carry diagnosis by tossing a coin, you would out the test, or if the test is carried out at expect them to agree 50% of the time. A different times on the same person.
Many kappa score of 0 indicates no more agreement than you would expect by studies will assess this by having different chance; perfect agreement would yield a observers perform the test, and kappa score of 1. Generally, a kappa score measuring the agreement between them of 0. 6 indicates good agreement. by means of a kappa statistic. The kappa statistic takes into account the amount of agreement that you would expect by chance. If agreement between observers is poor, then this will undermine the usefulness of the test.
The extent to which the test result is reproducible or not may to some extent depend upon how explicit the guidance is for how the test should be carried out. It may also depend upon the experience and expertise of the observer. Are the results important? What is meant by test accuracy? (a) The test can correctly detect disease that is present (a true positive result). (b) The test can detect disease when it is really absent (a false positive result). (c) The test can identify someone as being free of a disease when it is really present (a false negative result). d) The test can correctly identify that someone does not have a disease (a true negative result). Ideally, we would like a test which produces a high proportion of (a) and (d) and a low proportion of (b) and (c). • Sensitivity: is the proportion of people with disease who have a positive test. • Specificity: is the proportion of people free of a disease who have a negative test. 20 Appraising Diagnosis Articles These measures are combined into an overall measure of the efficacy of a diagnostic test called the likelihood ratio: the likelihood that a given est result would be expected in a patient with the target disorder compared to the likelihood that the same result would be expected in a patient without the disorder (see p39). These possible outcomes of a diagnostic test are illustrated below1 (sample data from Anriole et al . ). Target disorder (prostate cancer) Present Diagnostic test result (prostate serum antigen) Positive (65 mmol/l) Totals Sensitivity Specificity a/(a c) d/(b d) 26 a a c 46 c 46 a c 72 Absent b 69 b d d 249 249 b d 318 26/72 249/318 26/95 249/295 c)/(a b c d) 72/390 0. 36/0. 22 0. 64/0. 78 0. 18/0. 82 a Totals 95 a b c d 295 b c 390 36% 78% 27% 84% 18% 1. 6 0. 82 0. 22 d Positive predictive value a/(a b) Negative predictive value d/(c d) Pre-test probability (prevalence) (a Likelihood ratio for a positive test result Likelihood ratio for a negative test result Pre-test odds sens/(1 spec) (1 sens)/spec prevalence/(1 prevalence) For a positive test result: Post-test odds pre-test odds likelihood ratio 1) 0. 22 1. 66 0. 37 27% Post-test probability post-test odds/(post-test odds 0. 37/1. 37 Using sensitivity and specificity: SpPin and SnNout Sometimes it can be helpful just knowing the sensitivity and specificity of a test, if they are very high. If a test has high specificity, i. e. f a high proportion of patients without the disorder actually test negative, it is unlikely to produce false positive results. Therefore, if the test is positive it makes the diagnosis very Sensitivity reflects how good the test is at picking up people with disease, while the specificity reflects how good the test is at identifying people without the disease. 21 Evidence-based Medicine Toolkit likely. This can be remembered by the mnemonic SpPin: for a test with high specificity (Sp), if the test is Positive, then it rules the diagnosis “in”. Similarly, with high sensitivity a test is unlikely to produce false negative results.
This can be remembered by the mnemonic SnNout: for a test with high sensitivity (Sn), if the test is Negative, then it rules “out” the diagnosis. Effect of prevalence on predictive value Positive predictive value is the percentage of patients who test positive who actually have the disease. Predictive values are affected by the prevalence of the disease: if a disease is rarer, the positive predictive value will be lower, while sensitivity and specificity are constant. Since we know that prevalence changes in different health care settings, predictive values are not generally very useful in characterising the accuracy of tests.
The measure of test accuracy that is most useful when it comes to interpreting test results for individual patients is the likelihood ratio (LR). The next section shows how the LR can be used to derive a probability that the patient has the disease given a particular test result. Summary 1. 2. 3. 4. 5. 6. Frame the clinical question. Search for evidence concerning the accuracy of the test. Assess the methods used to determine the accuracy of the test. Find out the likelihood ratios for the test. Estimate the pre-test probability of disease in your patient. Apply the likelihood ratios o this pre-test probability using the nomogram to determine what the post-test probability would be for different possible test results. 7. Decide whether or not to perform the test on the basis of your assessment of whether it will influence the care of the patient, and the patient’s attitude to different possible outcomes. References 1. Anriole GL et al. Treatment with finasteride preserves usefulness of prostate-specific antigen in the detection of prostate cancer: results of a randomised, double-blind, placebocontrolled clinical trial. Urology 1998;52(2):195–202. 2. Altman D.
Practical Statistics for Medical Research. Edinburgh: Churchill Livingstone, 1991. 3. Fagan TJ. A nomogram for Bayes’ Theorem. N Engl J Med 1975;293:257. 4. Sackett DL et al. Evidence-Based Medicine: How to practice and teach EBM. New York: Churchill Livingstone, 2000. 22 Appraising Diagnosis Articles Further reading Fleming KA. Evidence-based pathology. Evidence-Based Medicine 1997;2:132. Jaeschke R et al. Users’ Guides to the Medical Literature III: How to use an article about a diagnostic test A: Are the results of the study valid? JAMA 1994;271(5):389–91. Jaeschke R et al.
How to use an article about a diagnostic test A: What are the results and will they help me in caring for my patients? JAMA 1994;271(9):703–7. Mant J. Studies assessing diagnostic tests. In: M Dawes et al. Evidence-Based Practice: a primer for health care professionals. Edinburgh: Churchill Livingstone, 1999, pp59–67,133–57. Richardson WS, Wilson MC, Guyatt GH, Cook DJ, Nishikawa J. How to use an article about disease probability for differential diagnosis. JAMA 1999;281:1214–19. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology; a basic science for clinical medicine, 2nd ed. Boston: Little, Brown, 1991. 3 Evidence-based Medicine Toolkit Nomogram for likelihood ratios 0. 1 0. 2 0. 5 1 2 5 10 20 30 40 50 60 70 80 90 95 1000 500 200 100 50 20 10 5 2 1 0. 5 0. 2 0. 1 0. 05 0. 02 0. 01 0. 005 0. 002 95 90 80 70 60 50 40 30 20 10 5 2 1 0. 5 0. 2 99 Pre-test probability Likelihood ratio 0. 1 Post-test probability 99 0. 001 How to use the nomogram 3,4 Position a ruler (or any straight edge) so that it connects the point on the left hand scale which corresponds to your (estimate of your) patient’s pre-test probability with the point on the middle scale for the likelihood ratio for their test result.
Now read off the post-test probability on the right-hand scale. http://www. cebm. net/likelihood_ratios. asp 24 Appraising systematic reviews Is the systematic review valid? 1. Is it a systematic review of high-quality studies which are relevant to your question? 2. Does the methods section adequately describe: • a comprehensive search for all the relevant studies? • how the reviewers assessed the validity of each study? 3. Are the studies consistent, both clinically and statistically? Are the results important?
If the review reports odds ratios (ORs), you can generate an NNT if you have an estimate of your patient’s expected event rate (PEER). 1 (1 {PEER (1 OR)} OR) NNT PEER) PEER (1 A systematic review is “a review of a clearly formulated question that uses systematic and explicit methods to identify, select and critically appraise relevant research, and to collect and analyse data from studies that are included in the review. Statistical methods may or may not be used to analyse and summarise the results of the included studies” (Cochrane Library 1998, Glossary).
Three key features of such a review are: • a strenuous effort to locate all original reports on the topic of interest • critical evaluation of the reports • conclusions are drawn based on a synthesis of studies which meet pre-set quality criteria When synthesising results, a meta-analysis may be undertaken. This is “the use of statistical techniques in a systematic review to integrate the results of the included studies” (Cochrane Library 1998, Glossary), which means that the authors have attempted to synthesise the different results into one overall statistic.
The best source of systematic reviews is the Cochrane Library, available by subscription on CD or via the internet. Many of the systematic reviews so far completed are based on evidence of effectiveness of an intervention from randomised controlled trials (RCTs). 25 Evidence-based Medicine Toolkit Is the systematic review valid? 1. Is it a systematic review of high-quality studies which are relevant to your question? This question asks whether the research question in the review is clearly defined and the same as the one you are considering, and whether the studies covered by the review are high quality.
Reviews of poor-quality studies simply compound the problems of poor-quality individual studies. Sometimes, reviews combine the results of variable-quality trials (for example randomised and non-randomised trials in therapy); the authors should provide separate information on the subset of randomised trials. 2. Does the methods section describe how all the relevant trials were found and assessed? The paper should give a comprehensive account of the sources consulted in the search for relevant papers, the search strategy used to find them, and the quality and relevance criteria used to decide whether to include them in the review.
Search strategy Some questions you can ask about the search strategy: • The authors should include hand searching of journals and searching for unpublished literature. • Were any obvious databases missed? • Did the authors check the reference lists of articles and of textbooks (citation indexing)? • Did they contact experts (to get their list of references checked for completeness and to try and find out about ongoing or unpublished research)? • Did they use an appropriate search strategy: were important subject terms missed?
Did the authors assess the trials’ individual validity? You should look for a statement of how the trials’ validity was assessed. Ideally, two or more investigators should have The importance of a clear statement of inclusion applied these criteria independently criteria is that studies should be selected on the and achieved good agreement in basis of these criteria (that is, any study that their results. matches these criteria is included) rather than You need to know what criteria selecting the study on the basis of the results. were used to select the research. 6 The reviewers’ search should aim to minimise publication bias: the tendency for negative results to be unequally reported in the literature. Appraising Systematic Reviews These should include who the study participants were, what was done to them, and what outcomes were assessed. A point to consider is that the narrower the inclusion criteria, the less generalisable are the results. However, this needs to be balanced with using very broad inclusion criteria, when heterogeneity (see below) becomes an issue. 3. Are the studies consistent, both clinically and statistically?
You have to use your clinical knowledge to decide whether the groups of patients, interventions, and outcome measures were similar enough to merit combining their results. If not, this clinical heterogeneity would invalidate the review. Similarly, you would question the review’s validity if the trials’ results contradicted each other. Unless this statistical heterogeneity can be explained satisfactorily (such as by differences in patients, dosage, or durations of treatment), this should lead you to be very cautious about believing any overall conclusion from the review.
Are the results important? Terms that you will probably come across when looking at systematic reviews include vote counting, odds ratios, and relative risks, amongst others. Vote counting If a systematic review does not contain a meta-analysis (a statistical method for combining the data from separate trials), the results may be presented as a simple count of the number of studies supporting an intervention and the number not supporting it. This assumes equal weight being given to each study, regardless of size.
Odds ratio (OR) In measuring the efficacy of a therapy, odds can be used to describe risk. The odds of an event are the probability of it occurring compared to the probability of it not occurring. By dividing the odds of an event in the If the experimental group has lower odds, experimental group by the odds in the the OR will be less than 1; if the control control group, we can measure the efficacy group has lower odds, the OR will be above 1; and if there is no difference between the of the treatment. ORs are useful because two groups, the OR will be exactly 1. hey can be used in a meta-analysis to combine the results of many different trials into one overall measure of efficacy. 27 Evidence-based Medicine Toolkit To calculate the NNT for any OR and PEER: NNT 1 (1 [PEER (1 OR)] (1 OR) PEER) PEER Logarithmic odds Odds ratios are usually plotted on a log scale to give an equal line length on either side of the line of “no difference”. If odds ratios are plotted on a log scale, then a log odds ratio of 0 means no effect, and whether or not the 95% confidence interval crosses a vertical line through zero will lead to a decision about its significance.
Binary or continuous data Binary data (an event rate: something that either happens or not, such as numbers of patients improved or not) is usually combined using odds ratios. Continuous data (such as numbers of days, peak expiratory flow rate) is combined using differences in mean values for treatment and control groups (weighted mean differences or WMD) when units of measurement are the same, or standardised mean differences when units of measurement differ. Here the difference in means is divided by the pooled standard deviation. How precise are the results?
The statistical significance of the results will depend on the extent of any confidence limits around the result (see p17). The review should include confidence intervals for all results, both of individual studies and any meta-analysis. Further reading Altman D. Practical Statistics for Medical Research. Edinburgh: Churchill Livingstone, 1991. Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A comparison of results of metaanalyses of randomised control trials and recommendations of clinical experts. JAMA 1992;268:240–8. Cochrane Library: http://www. update-software. om NHS Centre for Reviews and Dissemination: http://www. york. ac. uk/inst/crd/ Oxman AD et al. Users’ Guides to the Medical Literature VI: How to use an overview. JAMA 1994;272(17):1367–71. Sackett DC, Straus SE, Richardson WS, Rosenberg W, Haynes RB. Evidence-Based Medicine: How to practice and teach EBM. Churchill Livingstone, 2000. Seers K. Systematic review. In M Dawes et al. (eds) Evidence-Based Practice: a primer for health care professionals. Edinburgh: Churchill Livingstone, 1999, pp85–100. 28 Appraising articles on harm/aetiology Is the study valid? 1.
Was there a clearly defined research question? 2. Were there clearly defined, similar groups of patients? 3. Were exposures and clinical outcomes measured the same ways in both groups? 4. Was the follow up complete and long enough? 5. Does the suggested causative link make sense? Are the valid results from this study important? Adverse outcome Present (case) Exposure Totals Yes (Cohort) No (Cohort) a a c c Absent (control) b d b d Totals a c a b d b c d In a randomised trial or cohort study: In a case–control study: Relative risk RR [a/(a+b)]/[c/(c Odds ratio OR ad/bc d)] 9 Evidence-based Medicine Toolkit Is the study valid? In assessing an intervention’s potential for harm, we are usually looking at prospective cohort studies or retrospective case–control studies. This is because RCTs may have to be very large indeed to pick up small adverse reactions to treatment. 1. Was there a clearly defined question? What question has the research been designed to answer? Was the question focused in terms of the population group studied, the exposure received, and the outcomes considered? 2. Were there clearly defined, similar groups of patients?
Studies looking at harm must be able to demonstrate that the two groups of patients are clearly defined and sufficiently similar so as to be comparable. In a cohort study, for example, patients are either exposed to the treatment or not according to a decision; this might mean that sicker patients, perhaps more likely to have adverse outcomes, are more likely to be offered (or demand) potentially helpful treatment. There may be some statistical adjustment to the results to take these potential confounders into account. 3. Were treatment exposures and clinical outcomes measured the same ways in both groups?
You would not want one group to be studied more exhaustively than the other, because this might lead to reporting a greater occurrence of exposure or outcome in the more intensively studied group. 4. Was the follow up complete and long enough? Follow up has to be long enough for the harmful effects to reveal themselves, and complete enough for the results to be trustworthy (the 80% rule from p13 applies: lost patients may have very different outcomes from those who remain in the study). 5. Does the suggested causative link make sense? You can apply the following rationale to help decide if the results make sense. Is it clear the exposure preceded the onset of the outcome? It must be clear that the exposure wasn’t just a “marker” of another disease. • Is there a dose–response gradient? If the exposure was causing the outcome, you might expect to see increased harmful effects as a result of increased exposure: a dose–response effect. 30 Appraising Articles on Harm/Aetiology • Is there evidence from a “dechallenge–rechallenge” study? Does the adverse effect decrease when the treatment is withdrawn (“dechallenge”) and worsen or reappear when the treatment is restarted (“rechallenge”)? • Is the association consistent from study to study?
Try finding other studies, or, ideally, a systematic review of the question. • Does the association make biological sense? If it does, a causal association is more likely. Are the results important? This means looking at the risk or odds of the adverse effect with (as opposed to without) exposure to the treatment; the higher the risk or odds, the stronger the association and the more we should be impressed by it. We can use the single table to determine if the valid results of the study are important. Adverse outcome Present (case) Exposure Totals Yes (Cohort) No (Cohort) a a c c Absent (control) b d b d Totals a c a b d b c d
In a cohort study: In a case–control study: To calculate the NNH for any OR and PEER: Relative risk RR [a/(a+b)]/[c/(c Odds ratio OR ad/bc d)] [PEER (OR 1)] 1 NNH PEER (OR 1) (1 PEER) A cohort study compares the risk of an adverse event amongst patients who received the exposure of interest with the risk in a similar group who did not receive it. Therefore, we are able to calculate a relative risk (or risk ratio). In case–control studies, we are presented with the outcomes, and work backwards looking at exposures. Here, we can only compare the two groups in terms of their relative odds (odds ratio).
Statistical significance As with other measures of efficacy, we would be concerned if the 95% CI around the results, whether relative risk or odds ratio, crossed the value of 1, meaning that there may be no effect (or the opposite). 31 Evidence-based Medicine Toolkit Further reading Levine M et al. Users’ Guides to the Medical Literature IV: How to use an article about harm. JAMA 1994;272(20): 1615-19. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology. A basic science for clinical medicine, 2nd ed. Boston: Little, Brown, 1991. Sackett DL et al. Evidence-Based Medicine: How to practice and teach EBM.
New York: Churchill Livingstone, 1996. 32 Appraising prognosis studies Is the sample representative? Were they recruited at a common point in their illness? Did the study account for other important factors? Is the setting representative? Was follow up long enough for the clinical outcome? Was follow up complete? Were outcomes measured “blind”? Are the results important? What is the risk of the outcome over time? How precise are the estimates? 95% Confidence Intervals are + 1. 96 times the Standard Error (SE) of the measure. SE of a proportion: p SE n (1 p) 33 Evidence-based Medicine Toolkit
Is the study valid? In asking questions about a patient’s likely prognosis over time, the best individual study type to look for would be longitudinal cohort study. Is the sample representative Does the study clearly define the group of patients, and is it similar to your patients? Were there clear inclusion and exclusion criteria? Were they recruited at a common point in their illness? The methodology should include a clear description of the stage and timing of the illness being studied. To avoid missing outcomes, study patients should ideally be recruited at an early stage in the disease.
In any case, they should all be recruited at a consistent stage in the disease; if not, this will bias the results. Did the study account for other important factors? The study groups will have different important variables such as sex, age, weight and co-morbidity which could affect their outcome. The investigators should adjust their analysis to take account of these known factors in different sub-groups of patients. You should use your clinical judgement to assess whether any important factors were left out of this analysis and whether the adjustments were appropriate.
This information will also help you in deciding how this evidence applies to your patient. Is the setting representative? Patients who are referred to specialist centres often have more illnesses and are higher risk than those cared for in the community. This is sometimes called “referral bias”. Was follow up long enough for the clinical outcome? You have to be sure that the study followed the patients for long enough for the outcomes to manifest themselves. Longer follow up may be necessary in chronic diseases. Was follow up complete?
Most studies will lose some patients to follow up; the question you have to answer is whether so many were lost that the information is of no use to you. 34 Appraising Prognosis Studies You should look carefully in the paper for an account of why patients were lost and consider whether this introduces bias into the result. • If follow up is less than 80% the study’s validity is seriously undermined. You can ask “what if” all those patients who were lost to follow up had the outcome you were interested in, and compare this with the study to see if loss to follow up had a significant effect.
With low incidence conditions, loss to follow up is more problematic. Were outcomes measured “blind”? How did the study investigators tell whether or not the patients actually had the outcome? The investigators should have defined the outcome/s of interest in advance and have clear criteria which they used to determine whether the outcome had occurred. Ideally, these should be objective, but often some degree of interpretation and clinical judgement will be required. To eliminate potential bias in these situations, judgements should have been applied without knowing the patient’s clinical characteristics and prognostic factors.
Are the results important? What is the risk of the outcome over time? Three ways in which outcomes might be presented are: • as a percentage of survival at a particular point in time; • as a median survival (the length of time by which 50% of study patients have had the outcome); • as a survival curve that depicts, at each point in time, the proportion (expressed as a percentage) of the original study sample who have not yet had a specified outcome. Survival curves provide the advantage that you can see how the patient’s risk might develop over time.
How precise are the estimates? Any study looks at a sample of the population, so we would expect some variation between the sample and “truth”. Prognostic estimates should be accompanied by Confidence Intervals to represent this. A 95% Confidence Interval is the range of values between which we can be 95% sure that the true value lies. You should take account of this range when extracting estimates for your patient. If it is very wide, 35 Evidence-based Medicine Toolkit you would question whether the study had enough patients to provide useful information. SE n (1 p) Assuming a Normal distribution, the 95% Confidence Interval is 1. 96 times this value on either side of the estimate. Further Reading Laupacis A, Wells G, Richardson WS, Tugwell P. Users’ guides to the medical literature. V. How to use an article about prognosis. JAMA 1994;272:234–7. Sackett DL et al. Evidence-Based Medicine: How to practice and teach EBM. New York: Churchill Livingstone, 2000. 36 Applying the evidence Are your patients similar to those of the study? How much of the study effect can you expect for your patient or problem?
For Diagnostic tests Start with your patient’s pre-test probability Pre-test odds = (pre-test probability)/(1 pre-test probability) Post-test odds = pre-test odds LR Post-test probability = post-test odds/(post-test odds 1) For Therapy Estimate your Patient’s Expected Event Rate (PEER) NNT (for your patient) = 1/(PEER RRR) Is the intervention realistic in your setting? Does the comparison intervention reflect your current practice? What alternatives are available? Are the outcomes appropriate to your patient? 37 Evidence-based Medicine Toolkit
Are your patients similar to those of the study? Of course, your patients weren’t in the trial, so you need to apply your clinical expertise to decide whether they are sufficiently similar for the results to be applicable to them. Factors which would affect this decision include: • The age range included in the trial (many trials exclude the older generations); your group of patients may have a different risk profile, as many drugs have increasing adverse effects in the ageing population which may not be taken into account in the study. Many of your patients will have co-morbidity which could affect drug interactions and adverse events as well as benefits. • Will your patients be able to comply with treatment dosages and duration? For example, compliance might decrease if your patient is taking other medications or if the treatment requires multiple doses daily rather than single ones. • If NNTs are similar for different treatments, then the NNHs for harmful side effects will become more important; lesser side effects may increase compliance (Bloom, 2001).
The inclusion and exclusion criteria for the study may help as a starting point for your clinical judgment here. It is unlikely, however, that your patient will present an exact match with the study; Sackett et al (2000) have recommended framing this question in reverse: How different would your patient have to be for the results of the study to be of no help? How much of the study effect can you expect for your patient or problem? To work out how much effect your patient can expect from the intervention, you first need an estimate of their risk of the outcome.
This information might be available from a number of external sources, such as cardiovascular risk tables in the British National Formulary, Evidence-b