"Comparative effectiveness" research is the best tool available today for making decisions about which new medication, medical device, or diagnostic test is most supported by the evidence.1 The purpose of a systematic review is to synthesize the results of multiple primary studies using explicit and reproducible methods.2,3 Meta-analysis is a form of systematic review that goes one step further. Rather than qualitatively synthesizing the results of multiple studies, the purpose of a meta-analysis is to develop a quantitative summary of the evidence with the help of special statistical techniques. Whereas a systematic review may conclude that a certain drug is effective in preventing a morbid event, a meta-analysis will tell us that the drug is, say, 3.7 times more effective than a placebo in preventing that morbid outcome. Conceptually, the distinction between the 2 types of reviews is straightforward. Despite the appeal inherent in developing a quantitative synthesis, the decision to conduct, or to make decisions based on, a meta-analysis must be made carefully, with an understanding of the benefits and limitations of the method. This paper provides an overview of the meta-analysis process, highlights the strengths and weaknesses of the method, and offers guidance on how to interpret and judge the value of meta-analytic results.
Key Elements of Meta-Analysis
Meta-analysis is a method used to critically evaluate evidence in an attempt to develop a single synthesis of the results and, where appropriate, to use statistical methods to combine findings from different studies into a single "pooled estimate." The pooled estimate is also called the "overall treatment effect," even when referring to pooling of diagnostic tests, screening studies, or other areas that are not treatments per se. The reasons to perform a meta-analysis are to:
- Increase statistical power (relative to individual studies) and determine if a treatment effect exists by combining multiple trials
- Improve the precision of the measurement of a treatment effect
- Combine data from conflicting studies and determine if a treatment effect exists
- Explore the impact on the treatment effects associated with differences in the design, conduct, analysis, and results of individual studies.
A meta-analysis begins with a systematic review of the literature, followed by a statistical analysis. Generally, it proceeds according to these steps:
- Formulation of a study question
- Setting inclusion and exclusion criteria for studies to be reviewed
- Searching the literature
- Triaging articles
- Retrieving data from the selected studies
- Pooling the data by applying specific statistical techniques
- Investigating sources of differences between studies
- Summarizing and presenting results.
Selecting the Data
To conduct a meta-analysis, one must begin by formulating the study question. A well-focused study question has 4 components that are referred to as PICO:
Population—a description of the patient population to be addressed
Intervention—a detailed description of the intervention or exposure to be investigated
Comparison—a well-defined comparison group Outcome—a specific outcome.
For example, the following research question includes all 4 PICO elements. In adult patients with type 2 diabetes, does monotherapy with rosiglitazone improve A1c levels compared with metformin at 6 months? To minimize bias, inclusion and exclusion criteria should be identified before performing the literature search. Many of the inclusion criteria are determined as a direct result of the research question (eg, definition of the patient population or the outcome to be studied); other criteria (eg, study design) are determined by the research team.
Ideally, meta-analyses are limited to one type of study design. Significant problems are associated with metaanalyses that combine results across different study designs (eg, randomized controlled trials and case-control studies). Other inclusion and exclusion criteria may include years of publication, language of publication, and minimum study size. Because the quality of a metaanalysis is very dependent on the studies that comprise it, well-thought-out inclusion and exclusion criteria are necessary to ensure the validity of the conclusions.
Once selection criteria are established, an extensive literature search should be performed. A literature review typically begins with a systematic, well-documented, and repeatable search of electronic databases, such as PubMed, Embase, and/or the Cochrane Database of Systematic Reviews. The search should also include a "hand-search" of the reference lists of individual studies included in the meta-analysis. A well-designed search strategy will also utilize other sources, for example, "grey" literature (ie, not published in a peer-reviewed journal but in the form of reports, unpublished theses, books, and so on).4 A reliable literature search must be exhaustive and should be reproducible, with specific search terms reported in the methods section.5
The literature search will produce an extensive list of potentially relevant citations. Articles must then be triaged for possible inclusion. The Quality of Reporting of Meta-Analyses (QUOROM) committee has created a flowchart that outlines a recommended format for presenting the "flow" of included and excluded studies. This chart clearly shows how the investigators move from the entire pool of articles identified to the final set of articles included in the pooled analysis.6 This allows readers to evaluate the number of and the rationale for all exclusions. Each study included in the meta-analysis should be evaluated for quality. Ideally, this assessment should follow a generally accepted scale or checklist, such as the Jadad Score for randomized trials7 or STROBE (Strengthening the Reporting of Observational Studies in Epidemiology).8 Data such as measures of treatment effects, confidence intervals, study characteristics, and other variables hypothesized to affect the study results are then abstracted from the chosen studies, using a standardized form. The identification of studies and the data abstraction process should be carried out independently by at least 2 researchers (who resolve disagreements by consensus) to minimize bias in article selection and reduce errors in data collection.
The next step is to decide whether statistical pooling is justified. Although the goal is to obtain meaningful results from previous studies, there is also a possibility of reaching erroneous conclusions, especially if data are pooled from studies that are too diverse. If the differences between studies are significant—different patient populations or different comparison groups—combining results may not make clinical sense. In such cases, results should not be pooled but rather be presented in a systematic review. The extent of variability between studies is subjective, and meta-analyses are often criticized for inappropriately combining heterogeneous data (see below).9
The results of a meta-analysis are graphically expressed in a type of graph called a "forest plot" (Figure 1), which displays the independent results of each study and the overall pooled result on the same plot. This allows the reader to quickly visualize the treatment effect in individual studies relative to the combined treatment effect across all studies. The results of each study are represented by squares that vary in size according to the relative size of the study(ie, larger squares denote larger studies). Each square is bisected by a horizontal line, the length of which spans the 95% confidence interval for the treatment effect reported in that study. The overall pooled estimate is shown at the bottom of the figure, represented by a diamond, whose center and edges correspond to the mean overall result and confidence interval, respectively.10
How to Identify Potential Problems
Once the studies to be included in the pooled analysis have been identified, some differences will still be evident between studies. The nature and magnitude of these differences play a critical role in determining the methods to be used in the statistical analysis. Researchers have developed summary measures that quantify the degree of interstudy "heterogeneity." Cochran's Q (based on the chi-square) test is the most common test for heterogeneity.
If "significant" heterogeneity is identified (ie, P <.20 is a typical cut-off in meta-analysis), it is important to determine whether it is valid to combine heterogeneous studies into a single pooled measure of treatment effect. For example, in a meta-analysis of hormone replacement therapy (HRT) and breast cancer, the Q test identified heterogeneity (Figure 2). One study clearly stood out—the Million Women Study, not only because it was the largest and only study conducted in Europe but also because of other important underlying clinical differences. When this study was excluded, the Q test was no longer significant, and the point estimate better reflected the pooling of similar studies.11
The I2 statistic (also referred to as "inconsistency") is an updated version of the Q test. This updated measure is designed to allow comparison of heterogeneity between meta-analyses with different numbers of pooled studies, which cannot be done with the Q test alone.12 An I2 value of less than 25% is considered good, 25% to 50% is acceptable, and more than 50% is unacceptable.
When heterogeneity is identified by the Q test or by I2, several approaches can be considered to address the differences. The first approach is to simply ignore the interstudy differences. Some investigators have argued that "one true effect" must underlie all studies on a given topic, and results should be pooled by use of a "fixed effects" model. A second method is to use a "random effects" model, which statistically accounts for heterogeneity and considers several "true effects." (The difference between fixed and random effects models13 is beyond the scope of this review.) In general, a random effects model results in the same pooled treatment estimate but with a wider confidence interval than the fixed effects model. The majority of meta-analyses now report random effects results, and those that report only fixed effects results should be viewed with skepticism, especially those that report pooled treatment effects with marginal statistical significance.13
Another common approach to addressing heterogeneity is using meta-regression, which involves applying traditional regression techniques—and their inherent explanatory power—to meta-analysis. In meta-regression, the dependent variable is the outcome (eg, the odds ratio or the relative risk), and the independent variables are study factors, such as publication year, country, study size, or type of drug, that may reasonably be expected to contribute to interstudy differences. Stratify ing and conducting subgroup meta-analyses based on these factors may help identify important sources of heterogeneity (eg, the HRT example). Although meta-regression is useful to assess heterogeneity, the results should be considered "hypothesis generating."
The final step in meta-analysis is to look for evidence of publication bias—the exclusion of negative studies that might have influenced the conclusions. Studies published in peer-reviewed journals tend to report positive results more than negative results. In practice, researchers who find negative results often do not believe them, and editors may not as readily accept negative studies.14
Publication bias is typically evaluated by using a funnel plot. A funnel plot is a type of scatter plot in which each study's treatment effect is plotted on the x-axis, and a measure of study size is plotted on the y-axis (Figure 3). Small studies are more likely to have less precise effect estimates than large studies and are therefore more likely to be scattered along the bottom of the plot.15 In the absence of publication bias, the resulting plot should resemble an upside-down funnel, as seen in Figure 3. If publication bias is present, smaller studies that fail to show a significant effect will likely be missing, and the funnel plot will appear asymmetric (ie, the open circles in Figure 3 would be absent).15 Techniques for assessing publication bias continue to be refined,16 thus it's best to confirm that publication bias assessment was conducted in the meta-analysis rather than to assess the specific techniques used.17
The Limits of Meta-Analysis
Meta-analysis can be a powerful technique for summarizing evidence. Each meta-analysis is, in and of itself, a scientific investigation, and its quality is dependent on the methods used in carrying out the "experiment."2 Different researchers may use different techniques, include different studies, and draw different conclusions. Like any experiment, meta-analyses are subject to bias and error, both of which may affect the validity of the conclusions and their utility for decision makers. As a result, not all meta-analyses are of equal quality. Thus "consumers" of meta-analyses—especially decision makers—must carefully assess the quality of each meta-analysis by considering the research questions asked, the methods used, the analysis and interpretation of the data, the investigation of heterogeneity, and the conclusions drawn.
Several instruments for assessing the quality of a systematic review have been developed.15 It is important to differentiate between the quality of the reporting of a meta-analysis and the quality of the meta-analysis itself. The report may, for a number of reasons (eg, space limits, author preferences), omit important information. The QUOROM statement offers a checklist of evidence-based standards for reporting the results of meta-analyses of randomized trials.6 The QUOROM checklist—which identifies 18 key items to be included in a report and explains how to describe them—is widely used in the reporting and the evaluation of published meta-analyses. The Meta-analyses of Observational Studies in Epidemiology (MOOSE) is a similar checklist that is used to guide the reporting of meta-analyses of observational studies.18
Over the past 20 years there has been tremendous growth in the number of systematic reviews and metaanalyses. A recent study investigated the average length of time until a published systematic review requires updating.19 The authors searched the relevant literature to determine whether new evidence had been published that would either invalidate the results of a previous systematic review or would affect clinical decision-making in an important way. In almost 60% of the systematic reviews, the evidence showed that the previously published review required updating. The average length of time from publication to the emergence of new evidence was about 5.5 years, with 25% of studies becoming outdated by 2 years. This suggests that meta-analyses may have a relatively short "shelf-life,"20 and that the longer it has been since publication, the more important it is to assess whether new evidence has emerged that may change that study's conclusions.
Conclusions Meta-analysis is a sophisticated tool for decision makers. As with all medical evidence, however, systematic reviews and meta-analyses should be regarded with due skepticism and be read critically, focusing on the elements discussed here. As meta-analyses begin to address the development of pooled estimates of cost, adverse effects, and comparative effectiveness (in addition to efficacy and effectiveness), their relevance to our understanding of how to define and measure value in healthcare will continue to grow.
Dr Shah has received unrestricted re search grants from GlaxoSmithKline, Novartis, Astra Zeneca, Roche, Berlex, and Pfizer. Mr Jones and Dr Blecker have nothing to disclose.
- Shah NR. What is the best evidence for making clinical decisions? JAMA. 2000 Dec 27; 284(24):3127-3128.
- Cook DJ, Mulrow CD, Haynes RB. Systematic reviews: synthesis of best evidence for clinical decisions. Ann Intern Med. 1997 March 1;126(5):376-380.
- Greenhalgh T. Papers that summarise other papers (systematic reviews and meta-analyses). BMJ. 1997 Sep 13;315(7109):672-675.
- Hopewell S, McDonald S, Clarke M, Egger M. Grey literature in meta-analyses of randomized trials of health care interventions. Cochrane Database of Systematic Reviews 2003, Issue 4. Art No: MR000010. DOI: 10.1002/14651858.MR000010.pub3.
- Egger M, Smith GD, Phillips AN. Meta-analysis: principles and procedures. BMJ. 1997 Dec 6; 315(7121):1533-1537.
- Moher D, Cook DJ, Eastwood S, et al. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet. 1999 Nov 27; 354(9193):1896-1900.
- Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials. 1996 Feb; 17(1):1-12.
- von Elm E, Altman DG, Egger M, et al, for the STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008 Apr;61(4):344-349.
- Higgins JPT, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions 4.2.6. [Updated September 2006]. In: The Cochrane Library, Issue 4, 2006. Chichester, UK: John Wiley & Sons, Ltd. http://www2.cochrane.org/resources/handbook/Handbook4.2.6Sep2006.pdf.
- Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ. 2001 Jun 16; 322(7300):1479-1480.
- Shah NR, Borenstein J, Dubois RW. Postmenopausal hormone therapy and breast cancer: a systematic review and meta-analysis. Menopause. 2005;12(6):668-678.
- Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003 Sep 6;327(7414):557-560.
- Shuster JJ, Jones LS, Salmon DA. Fixed vs random effects meta-analysis in rare event studies: the rosiglitazone link with myocardial infarction and cardiac death. Stat Med. 2007 Oct 30;26(24):4375-4385.
- CMAJ Editorial. The "file drawer" phenomenon: suppressing clinical evidence. CMAJ. 2004 Feb 17;170(4):437 [editorial].
- Egger M, Smith GD, Altman DG, eds. Systematic Reviews in Health Care: Meta-Analysis in Context. 2nd ed. London, UK: BMJ Books; 2001.
- Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Comparison of two methods to detect publication bias in meta-analysis. JAMA. 2006 Feb 8;295(6):676-680.
- Ioannidis JP, Trikalinos TA. The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. CMAJ. 2007 Apr 10; 176 (8):1091-1096.
- Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000 Apr 19;283(15):2008-2012.
- Shojania KG, Sampson M, Ansari MT, et al. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007 Aug 21;147(4):224-233.
- Jones JB, Shah NR. Meta-analyses: caveat lector. J Clin Outcomes Manag. 2008 Feb;15(2):61-62.