Elsevier

NeuroImage

Volume 45, Issue 1, Supplement 1, March 2009, Pages S210-S221
NeuroImage

Review
Evaluating the consistency and specificity of neuroimaging data using meta-analysis

https://doi.org/10.1016/j.neuroimage.2008.10.061Get rights and content

Abstract

Making sense of a neuroimaging literature that is growing in scope and complexity will require increasingly sophisticated tools for synthesizing findings across studies. Meta-analysis of neuroimaging studies fills a unique niche in this process: It can be used to evaluate the consistency of findings across different laboratories and task variants, and it can be used to evaluate the specificity of findings in brain regions or networks to particular task types. This review discusses examples, implementation, and considerations when choosing meta-analytic techniques. It focuses on the multilevel kernel density analysis (MKDA) framework, which has been used in recent studies to evaluate consistency and specificity of regional activation, identify distributed functional networks from patterns of co-activation, and test hypotheses about functional cortical-subcortical pathways in healthy individuals and patients with mental disorders. Several tests of consistency and specificity are described.

Introduction

Recent years have seen a rapid increase in the number and variety of investigations of the human brain using neuroimaging techniques. Studies using functional magnetic resonance imaging (fMRI) or positron emission tomography (PET) have emerged as a major methodology for investigating function in the intact and disordered human brain. Psychological processes under investigation are as diverse as psychology itself, and nearly every major domain of psychology is represented in this growing body of work. Many popular domains–such as cognitive control, working memory, decision-making, language, emotion, and disorders such as attention deficit disorder, schizophrenia, and depression–have been the subject of a large number of neuroimaging studies, whose results can be synthesized and interpreted in the context of data from lesion studies, electrophysiology, behavioral studies, and related methodologies.

This burgeoning literature places increasing demands on scientists to understand, integrate, and evaluate the neuroimaging work that has been performed in each of these areas. One important set of questions relates to the consistency, or replicability across laboratories, scanners, and task variants, of activated regions1. Which brain regions are consistently associated with domains such as working memory, decision-making, emotional experience, and so on? And where are the boundaries between functional regions that identify studies that do and do not replicate?

Another important set of questions relates to synthesis across areas of study, and in particular the specificity of particular activation patterns for particular psychological processes. Is a region typically related to working memory load unique to this domain, or is it shared by a broader set of cognitive demands? For example, some brain regions, such as the left inferior frontal gyrus, are characterized variously as “working memory regions,” “language regions,” “emotion regulation regions,” or “decision making regions”, depending on the functional domain being investigated. Before positing hypotheses about common underlying functions, it is important to establish whether these different researchers are discussing the same region, or whether nearby activations in different task domains can be reliably discriminated.

Meta-analysis fills a unique role in the neuroimaging literature because many of the important, fundamental questions posed above are difficult or impossible to address within individual studies. Therefore, a major use for meta-analysis in imaging is to identify the consistently activated regions across a set of studies. In Fig. 1A, for example, a number of reported activation peaks from many studies (top panel) are summarized in a meta-analysis of consistently reported regions (bottom panel).

Evaluating consistency is important because false positive rates in neuroimaging studies are likely to be higher than in many fields (somewhere in the range of 10–40%; see below). Thus, some of the reported activated locations shown in Fig. 1A are likely to be false positives, and it is important to assess which findings have been replicated and have a higher probability of being real activations.

Inflated false positive rates are a byproduct of the commonly used strategy of making statistical brain maps composed of many tests (“voxel-wise” mapping), combined with the use of small sample sizes (usually 8–25 participants) due to the considerable expense of neuroimaging. Although there is a trend towards larger sample sizes and more rigorous multiple comparisons correction, until recently most studies have not corrected for multiple comparisons because they were too under-powered (Nichols and Hayasaka, 2003, Wager et al., 2007a). Many studies that have used corrections used methods whose assumptions are likely to be violated, or ad hoc methods that do not control the false positive rate at the nominally specified level (Wager et al., 2007b). Gene arrays in genetic research have the same problem, for the same reasons—though in both fields, the benefits of making whole-brain or whole-genome maps make them preferred choices for many researchers.

Data that illustrate these issues are shown in Table 1, which summarizes the results of four meta-analyses on a total of 415 studies involving 4,856 participants. The meta-analyses were all performed using the same method, multi-level kernel density analysis (MKDA; Kober et al., 2008, Wager et al., 2008; data from Wager and Smith, 2003, was also reanalyzed using MKDA). The median sample size from the included studies range from N = 10 to N = 12 across studies of working memory, long-term memory, and emotion. A basic power calculation (see Fig. 12 in Wager et al., in press) shows that with a standard effect size of d = 1 (Cohen's d, an effect divided by its standard deviation), approximately 45 participants are required to achieve 80% power using Bonferroni correction in a typical whole brain, voxel-wise analysis. Correction methods such as those based on Gaussian Random Field Theory are often just as conservative, but nonparametric correction improves power substantially (Nichols and Hayasaka, 2003). With nonparametric correction, approximately only 30 participants are needed for 80% power (Wager et al., in press), though this sample size is still larger than all but the largest studies in our samples2. Thus, performing proper correction is impractical without relatively large sample sizes, but failing to make appropriate corrections leads to increased false positive rates.

The MKDA results can also be used to provide a rough estimate of false positive rates. For each meta-analysis in Table 1, we calculated the number and proportion of peaks reported near (within 10 mm) one of the regions identified as consistently activated in the meta-analysis. The proportion of peaks outside of the consensus regions provides a rough estimate of false positive rates across studies. Table 1 shows an estimated false positive rate around 10% for the larger databases, and 20%–40% for the smaller meta-analyses, which may have been underpowered and therefore failed to find more truly activated regions. Of course, there are a number of reasons why this figure is imprecise; false-positives could contribute to consistently-activated regions, and heterogeneity among studies could result in true positives outside those regions found to be significant in meta-analyses. However, even if imprecise, this figure provides a rough estimate of how big the false-positive problem may be. Using another method based on examining the estimated search space, thresholds, and image smoothness, we previously estimated a false positive rate of roughly 17% (Wager et al., 2007a, Wager et al., 2007b). In sum, there is a need to integrate and validate results across studies.

The simplest goal of a meta-analysis is to provide summaries of the consistency of regional brain activation for a set of studies of a particular task type, providing a consensus about which regions are likely to be truly activated by a given task. In addition, meta-analysis can also be used to extend beyond regional activation to identify groups of consistently co-activated regions that may form spatially distributed functional networks in the brain. We have used this approach to identify distributed groups of functionally related brain regions in emotion (Kober et al., 2008) and anxiety-related disorders (Etkin and Wager, 2007), and other groups have used similar approaches to identifying large-scale functional networks organized around particular cognitive processes (Neumann et al., 2005), or functional co-activation with target brain structures across many tasks (Postuma and Dagher, 2006). Identifying co-activated networks can provide the basis for testing them as units of analysis in individual studies, and can lead to the development of testable hypotheses about functional connectivity in specific tasks.

In addition to establishing consistent activation in one task type, meta-analysis can be used to evaluate the specificity of activation (in regions or ‘networks’) to one type of task among a set of different tasks. For example, one might identify a set of regions consistently activated by self-referential processes (Northoff et al., 2006), and then ask whether activity in these regions is specific to self-referential processes—that is, that they are not activated by other tasks that do not involve self-reference. This information is critical to using measures of brain activity to predict psychological processes (i.e., making a “reverse inference” that activity in some region implies the involvement of a given psychological process; Poldrack, 2006, Sarter et al., 1996).

Specificity can only be examined across a range of tested alternative tasks: A region that is specific for faces compared with houses may not be specific for faces compared with tools. Likewise, a region that discriminates self-referential word judgments from non-self-referential ones does not imply that the region discriminates self-referential processes from retrieval of semantic knowledge from long-term memory. Unfortunately, different psychological domains are usually studied in isolation, and it is virtually impossible compare a wide range of tasks in a single study. However, meta-analysis provides tools for doing exactly that: Activation patterns can be compared across the entire range of tasks studied using neuroimaging techniques, providing a unique way to evaluate activation specificity across functional domains.

The simplest kind of specificity analysis compares activation patterns among two or more task types, such as positive vs. negative emotion (Phan et al., 2002, Wager et al., 2003), high-conflict vs. low-conflict conditions in cognitive control tasks (Nee et al., 2007), or various types of executive demand in working memory tasks (Wager and Smith, 2003). Many more examples appear in the growing meta-analysis literature, some of which is referenced in Table 2.

However, it is also possible to compare the results of meta-analysis from a number of functional domains, such as the results across 5 different task types shown in Fig. 1. In a recent chapter (Van Snellenberg and Wager, in press) we examined the overlap in meta-analytic results among studies that isolated cognitive control processes (e.g. task switching and speeded response selection) and studies that involved maintenance of information in working memory (WM), including WM storage, the subtraction of [Executive WM  WM storage], and long-term memory encoding and retrieval. Our working hypothesis was that the more complex memory maintenance and manipulation tasks would involve task switching and response selection, and so would activate a super-set of the areas involved in more elementary cognitive control processes. The illustration in Fig. 1B supports this notion, showing that the inferior frontal junction and pre-supplementary motor area are consistently activated across studies within each task type, but that more rostral portions of the prefrontal cortex were only consistently activated when WM was involved. The most anterior prefrontal regions were activated only when manipulation of information in memory was required.

Whereas the results in Fig. 1 present a qualitative comparison across five task types that summarize commonalities and differences across types, quantitative analyses of specificity can also be performed using several other methods discussed below. These methods include χ2 (chi-square) and approximations to multinomial exact tests, analysis of reported peak density differences, and pattern classifier systems. In each analysis, formal predictions can be made about task types given patterns of brain activity. For example, in a particularly interesting application using meta-analytic data, Gilbert et al. (2006) used classifier analyses to identify regions within the medial and orbitofrontal cortices that discriminated different cognitive functions of the anterior frontal cortex. This study is an example of how formal predictions about psychological states can be tested across diverse kinds of studies using meta-analysis.

There are now a number of quantitative meta-analyses of neuroimaging data in the literature, as evidenced by the partial list in Table 2. The vast majority use reported peak coordinates from published studies, which are readily available in published papers and stored electronically in databases such as Brainmap (http://brainmap.org/). We refer to this as the “coordinate-based meta-analysis” approach. Alternatively, full statistic maps for each study could be used and effect sizes aggregated at each voxel (Lewis, 2006). Though we consider this to be a “gold standard” approach, and advocate its development in future meta-analytic work, this approach is complicated by the lack of readily-available statistic images.

Collectively, the coordinate-based meta-analysis literature to date covers a cornucopia of innovative techniques. Some meta-analyses evaluate consistency by combining effect size data (Van Snellenberg et al., 2006) or analyzing the frequencies of reported peaks (Phan et al., 2002) within anatomically defined regions of interest. Variants on this theme use multiple logistic regression (Kosslyn and Thompson, 2003, Nee et al., 2007) or summarize co-activations among regions (Etkin and Wager, 2007, Nielsen et al., 2004). A popular approach to examining specificity has been to analyze the locations of coordinates in stereotaxic space, testing for functional gradients or spatial distinctions (Gottfried and Zald, 2005, Joseph, 2001), and sometimes extending these analyses to perform space-based classification of study types using MANOVA (Joseph, 2001, Wager et al., 2004) or cluster analyses using χ2 tests (Nickel and Seitz, 2005, Northoff et al., 2006, Wager and Smith, 2003).

While the procedures above refer to analyses carried out on pre-defined anatomical areas, the most popular approaches for summarizing reported coordinates from neuroimaging studies involve so-called “voxel-wise” analyses, or the construction of statistical maps summarizing peak coordinates in a neighborhood around each voxel in a standard brain (Chein et al., 2002, Fox et al., 1999). At their heart, these kernel-based methods are related to kernel-based methods for analyzing the multivariate distributions of sparse data, and essentially summarize the evidence for activation in a local neighborhood around each voxel in a standard atlas brain. They are popular because they provide ways of summarizing activations across the brain without imposing rigid prior constraints based on anatomical boundaries, which are currently difficult to specify precisely.

Our goal in the remainder of this paper is to describe recent advances and applications using this kernel-based approach. We focus in particular on MKDA, a recently developed extension of voxel-wise meta-analysis approaches, for example activation likelihood estimation (ALE; Laird et al., 2005, Turkeltaub et al., 2002) and kernel density analysis (KDA; Wager et al., 2007b). The essence of the approach is to reconstruct a map of significant regions for each study (or statistical contrast map within study), and analyze the consistency and specificity across studies in the neighborhood of each voxel.

In Section 1, we describe how MKDA can be used to evaluate the consistency of activations. We consider issues of level of analysis (peak vs. study contrast map), weighting, thresholding, and multiple comparisons, and show the results of simulations comparing ALE, KDA, and MKDA methods. We also show how this approach lends itself to the construction of analogues to some meta-analysis plots in the traditional meta-analytic literature, in particular logistic funnel plots. In Section 2, we consider how MKDA can be used to analyze specificity. We consider a) density difference maps to compare activations in two types of studies, and b) A multinomial permutation test–an alternative to the χ2 test with several desirable properties–for comparing two or more study types. Finally, in Section 3, we describe extensions of the MKDA approach to analyze co-activations across regions, including clustering and mediation analysis on co-activation data to develop models of functional pathways.

Section snippets

Section I. The MKDA approach

The MKDA method analyzes the distribution of peak coordinates from published studies across the brain. The technique, used in several recent published analyses (Etkin and Wager, 2007, Kober et al., 2008, Wager et al., 2008, Wager et al., 2007b) is summarized in Fig. 2. Essentially, the reported x (left–right), y (posterior–anterior), and z (inferior–superior) coordinates in a standard stereotaxic space (i.e., Montreal Neurological Institute space) are treated as a sparse representation of

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Acknowledgments

This research and the preparation of this manuscript were supported in part by National Science Foundation grant (SES631637) and National Institute of Mental Health grant (R01MH076136) to Tor D. Wager. We would like to thank Lisa Feldman Barrett for helpful discussions on multi-level aspects of meta-analysis, and Lisa Feldman Barret, Eliza Bliss-Moreau, John Jonides, Kristen Lindquist, Derek Nee, and Edward Smith, for their contributions to the meta-analysis datasets presented here.

References (74)

  • HoekertM. et al.

    Impaired recognition and expression of emotional prosody in schizophrenia: review and meta-analysis

    Schizophr. Res.

    (2007)
  • JobardG. et al.

    Evaluation of the dual route theory of reading: a metanalysis of 35 neuroimaging studies

    NeuroImage

    (2003)
  • KrainA.L. et al.

    Distinct neural mechanisms of risk and ambiguity: A meta-analysis of decision making

    NeuroImage

    (2006)
  • KoberH. et al.

    Functional grouping and cortical-subcortical interactions in emotion: a meta-analysis of neuroimaging studies

    NeuroImage

    (2008)
  • KringelbachM.L. et al.

    The functional neuroanatomy of the human orbitofrontal cortex: evidence from neuroimaging and neuropsychology.

    Prog. Neurobiol.

    (2004)
  • McDonaldA.J. et al.

    Projections of the medial and lateral prefrontal cortices to the amygdala: a Phaseolus vulgaris leucoagglutinin study in the rat

    Neuroscience

    (1996)
  • NorthoffG. et al.

    Self-referential processing in our brain—a meta-analysis of imaging studies on the self

    NeuroImage

    (2006)
  • PhanK.L. et al.

    Functional neuroanatomy of emotion: a meta-analysis of emotion activation studies in PET and fMRI

    NeuroImage

    (2002)
  • PoldrackR.A.

    Can cognitive processes be inferred from neuroimaging data?

    Trends Cogn. Sci.

    (2006)
  • SteeleJ.D. et al.

    Prefrontal cortical functional abnormality in major depressive disorder: a stereotactic meta-analysis

    J. Affect. Disord.

    (2007)
  • TurkeltaubP.E. et al.

    Meta-analysis of the functional neuroanatomy of single-word reading: method and validation

    NeuroImage

    (2002)
  • ValeraE.M. et al.

    Meta-Analysis of structural imaging findings in attention-deficit/hyperactivity disorder

    Biol. Psychiatry

    (2007)
  • Van EssenD.C.

    A population-average, landmark- and surface-based (PALS) atlas of human cerebral cortex

    NeuroImage

    (2005)
  • VigneauM. et al.

    Meta-analyzing left hemisphere language areas: phonology, semantics, and sentence processing

    NeuroImage

    (2006)
  • WagerT.D. et al.

    Valence, gender, and lateralization of functional brain anatomy in emotion: a meta-analysis of findings from neuroimaging

    NeuroImage

    (2003)
  • WagerT.D. et al.

    Neuroimaging studies of shifting attention: a meta-analysis

    NeuroImage

    (2004)
  • WhitesideS. et al.

    A meta-analysis of functional neuroimaging in obsessive-compulsive disorder

    Psychiatry Res.: Neuroimaging

    (2004)
  • AgrestiA.

    Categorical Data Analysis (2nd ed.)

    (2002)
  • AnX. et al.

    Prefrontal cortical projections to longitudinal columns in the midbrain periaqueductal gray in macaque monkeys

    J. Comp. Neurol.

    (1998)
  • BrownS. et al.

    Stuttered and fluent speech production: an ALE meta-analysis of functional neuroimaging studies

    Hum. Brain Mapp.

    (2005)
  • BuchsbaumB.R. et al.

    Meta-analysis of neuroimaging studies of the Wisconsin Card-Sorting task

    Hum. Brain Mapp.

    (2005)
  • CostafredaS.G. et al.

    A systematic review and quantitative appraisal of fMRI studies of verbal fluency: role of the left inferior frontal gyrus

    Hum. Brain Mapp.

    (2006)
  • DicksonJ. et al.

    The surface management system (SuMS) database: a surface-based database to aid cortical surface reconstruction, visualization and analysis

    Philos. Trans. R. Soc. Ser B

    (2001)
  • DicksteinS.G. et al.

    The neural correlates of attention deficit hyperactivity disorder: an ALE meta-analysis

    J. Child Psychol. Psychiatry

    (2006)
  • EggerM. et al.

    Bias in meta-analysis detected by a simple, graphical test

    BMJ

    (1997)
  • EtkinA. et al.

    Functional neuroimaging of anxiety: a meta-analysis of emotional processing in PTSD, social anxiety disorder, and specific phobia

    Am. J. Psychiatry

    (2007)
  • FerstlE.C. et al.

    The extended language network: a meta-analysis of neuroimaging studies on text comprehension

    Hum. Brain Mapp.

    (2008)
  • Cited by (0)

    View full text