The Elephant in the Room: A Systematic Review of Stimulus Control in Neuro-Measurement Studies on Figurative Language Processing

Koller, Sina; Müller, Nadine; Kauschke, Christina

doi:10.3389/fnhum.2021.791374

SYSTEMATIC REVIEW article

Front. Hum. Neurosci., 21 January 2022
Sec. Speech and Language
Volume 15 - 2021 | https://doi.org/10.3389/fnhum.2021.791374

The Elephant in the Room: A Systematic Review of Stimulus Control in Neuro-Measurement Studies on Figurative Language Processing

Sina Koller

Nadine Müller^*

Christina Kauschke

Department of German Studies and Arts, Institute of German Linguistics, Philipps University of Marburg, Marburg, Germany

The processing of metaphors and idioms has been the subject of neuroscientific research for several decades. However, results are often contradictory, which can be traced back to inconsistent terminology and stimulus control. In this systematic review of research methods, we analyse linguistic aspects of 116 research papers which used EEG, fMRI, PET, MEG, or NIRS to investigate the neural processing of the two figurative subtypes metaphor and idiom. We critically examine the theoretical foundations as well as stimulus control by performing a systematic literature synthesis according to the PRISMA guidelines. We explicitly do not analyse the findings of the studies but instead focus on four primary aspects: definitions of figurative language and its subtypes, linguistic theory behind the studies, control for factors influencing figurative language processing, and the relationship between theoretical and operational definitions. We found both a lack and a broad variety in existing definitions and operationalisation, especially in regard to familiarity and conventionality. We identify severe obstacles in the comparability and validation potential of the results of the papers in our review corpus. We propose the development of a consensus in fundamental terminology and more transparency in the reporting of stimulus design in the research on figurative language processing.

Introduction

Our everyday language is infused with figurative expressions: when our lives turn into a roller-coaster ride, we need to keep a clear head and find a steady path again. We might even form close relationships with people with warm personalities along the way, treasuring them for their big hearts. And should we come across any more obstacles, we can take them in stride and look at the bright side of life.

The high amount of figurative expressions in language convey information on a multitude of communicative levels, e.g., affective, intentional, or simple factual messages. Therefore, the comprehension and utilisation of figurative language plays an essential role in interpersonal communication, and impairments in figurative language processing and production may lead to substantial problems with social competence (e.g., Kauschke, 2021) or mental health (Cohen et al., 2013), even though problems with figurative language can exist in the absence of any other verbal problems. Impaired figurative language processing is documented for several clinical populations that all present some kind of structural and/or functional brain deviations, such as neurodegenerative, psychiatric and neurodevelopmental disorders (Thoma and Daum, 2006) as well as patients with acquired brain trauma. Research on the structural and functional cerebral conditions of figurative language processing can therefore also serve to better understand higher order language impairments. However, results of neuro-measurement studies on figurative language processing are often contradictory due to discrepancies in definitions, terminology, and practical implementation.

A central issue in figurative language research is that there is no universally agreed upon definition of the term of “figurative language” or its subtypes, which makes it harder to pinpoint what kind of language exactly has been researched in numerous studies. In principle, figurative language represents the counterpart to literal language: the meaning intended by a speaker is not equivalent to the literal meaning of the expression. An addressee must therefore realise the inadequacy of a literal meaning to a given context, a situation or pre-existing world knowledge, making linguistic violations on a pragmatic and/or semantic level a core defining feature of figurative language (cf. Thoma and Daum, 2006).

The nature of the contrast between figurativeness and literality is a contentious point; the core issue being the question whether figurativeness and literality are inherently distinct, exclusive categories or if they represent the opposing ends of a continuum (Kasparian, 2013). On the one hand, the assumption of distinct categories carries the question of how to clearly distinguish between the two categories—where exactly lies the boundary between figurative and literal, and what exactly characterises it? On the other hand, a localisation on a continuum allows for a smooth transition between the extremes and grants a certain dynamic, potentially developmental character to any expression.

In recent decades, neuroscientific research has supplemented psychological and behavioural investigations, especially by employing neuro-imaging techniques. A key challenge of research on such a highly complex and responsive organ as the brain is stimulus control. However, the materials and methods in many studies are oftentimes not sufficently described and characterised by broad inconsistencies in definitions, terminology, and implementation, resulting in likewise inconsistent findings. The present paper aims to critically review the liguistic aspects of the research methods of functional neuro-measurement studies on figurative language, specifically on metaphor and idiom. We closely investigated the current state of research by taking a detailed look at the theoretical foundations the research papers are built upon, and by analysing stimulus design and control. As a systematic review of methods, our review explicitly does not pursue a comparison of results. We synthesised the literature to perform a quantitative and qualitative evaluation of the past and current methods with the goal of facilitating clearer, more consistent and less ambiguous research methods, enabling better comparability and validation, and advancing collective comprehension and research approaches.

Figurative Language

Figurative language serves as an umbrella term for two main categories: phrasemes and free non-literal word compositions. Phrasemes are characterised by three main criteria: polylexicality, rigidity, and idiomacity (Burger, 2003). They are fixed, structurally non-dynamic n-grams whose meaning is not congruent with the summary of literal meanings of its constituents. The rigid sequence of words is rarely modified; exceptions may for instance occur in cases of inflection. Some cognitive models assume that phrasemes are not stored as a combination of their single components but instead as whole lexical units (Burger, 2003). Phrasemes include idioms and proverbs.

In contrast, free non-literal word compositions are not subject to rigid structure. For the subtypes of metaphor and metonymy, they constitute expressions of subconscious conceptualisation which are of a generative nature and can therefore continuously generate new expressions. In the case of the subtypes irony and sarcasm, they are utterances whose figurativeness is only spontaneously created through context and the intended evaluation of a speaker (Klappenbach and Malige-Klappenbach, 1980). The distinctions between metaphor and other figurative subtypes are not universally agreed upon, leading to varying operational definitions in empirical studies. In the following, we aim to characterise core features in order to arrive at working definitions for our review, which includes studies on metaphor and idiom processing only. These two subtypes constitute the majority of figurative language examined in neuro-scientific research, therefore representing a solid basis for our review. For an overview of working definitions of figurative language subtypes other than metaphor and idiom, see Supplementary Table S1¹.

Subtypes of Figurative Language: Metaphor and Idiom

Metaphor

Cognitive linguistics considers metaphor a cognitive means of conceptualising abstract issues by means of concrete experiences. In 1980, Lakoff and Johnson (1980) introduced their Conceptual Metaphor Theory (CMT), which will provide the terminology for the present paper.

The CMT is based on the embodiment hypothesis: “The detailed nature of our bodies, our brains, and our everyday functioning in the world structures human concepts and human reason” (Lakoff and Núñez, 2009, p. 5). This embodied experience of our bodies in a three-dimensional environment determines human learning processes which continuously build on each other; thought is therefore not inherently abstract or independent from our bodies (Kövecses, 2010).

In order to understand internal and external sensations and to be able to interact adequately with ourselves and the environment, a certain cognitive structuring mechanism is required. The stimulations we experience naturally vary in their complexity—compare, for example, the warmth and tactility of a freshly made cookie in our hand with the complexity and abstractness of a political debate.

In order to facilitate comprehension of complex and not primarily bodily grounded concepts, conceptual metaphors link these concepts with simpler, less abstract concepts and therefore act as a subconscious mechanism of conceptualising our experiences through our environment and self. In principle, a conceptual metaphor links a concrete source domain (e.g., machine) with a more abstract target domain (e.g., mind) by mapping relevant elements of the source domain onto elements of the target domain. The resulting basic form of a conceptual metaphor can therefore be phrased as “A is B” (“the mind is a machine”), where A is the target domain and B is the source domain. The conceptual metaphor, a cognitive mechanism, now generates actual expressions on the linguistic surface (“that's what makes her tick,” “the holidays allowed them to refuel,” “you can watch the wheels turning in his head”). The abstract target domain (mind) is made accessible by the tangible source domain (machines) by using the pre-existing comprehension and retrievable experiences of the source domain for conceptualisation (Lakoff and Johnson, 1980; Kövecses, 2010).

Conceptual metaphors are of a creative, generative nature: the conceptualisation mechanism can steadily give rise to new linguistic expressions. These novel metaphors can then be conventionalised through continuous usage within a language community. Bowdle and Gentner (2005) describe this potential development in their model of the Career of Metaphor: the conventionalisation of a novel metaphor is a gradual process at whose end the metaphor can even become a “dead metaphor”—a metaphor that has entirely lost its figurative character and has become lexicalized (Schmidt et al., 2009). Words like “table leg” or “laptop” are lexicalised as complete units and do not require any mappings for comprehension. Not every metaphor is subject to the career of metaphor—some are simply only conventionalised to a certain degree, many never reach this point or disappear from common language usage.

For the purposes of this review, we define metaphors as free non-literal expressions which are not subject to rigid structure and which follow the notion of conceptual metaphor after Lakoff and Johnson's CMT (Lakoff and Johnson, 1980), i.e., metaphors as a conceptualisation mechanism with a source and a target domain.

Idiom

Both metaphor and idiom are highly frequent in colloquial language usage (Gibbs and Beitel, 1995; Thoma and Daum, 2006) and represent a group of expressions whose figurative meaning is not composed by the literal meanings of its constituents. However, idioms are a subclass of phrasemes, i.e., they do not follow the usual linguistic-productive rules and are not a creative-generative class (Dobrovol'skij, 1995).

Idioms are not a uniform class but can instead differ in many aspects. One of those is non-compositionality, i.e., the non-additivity of the meanings of single constituents from the perspective of the total meaning of the idiom (Dobrovol'skij, 1995). The figurative meaning of semantically non-transparent/opaque idioms cannot be extracted from the literal meanings of its constituents (e.g., “kicking the bucket”); semantically transparent idioms however contain components in their literal meaning (e.g., “pouring money down the drain”; Canal et al., 2017). Some idioms can be understood through transferred metaphorical comprehension (“putting one's cards on the table,” “taking something in stride”), which reveals the possibility of overlap with highly conventionalised metaphors. Another dimension characterising idioms is the degree of their literal interpretability (literality). Idioms such as “being on thin ice” or “a piece of cake” do allow for a literal interpretation, although it will seem unsuitable in most contexts. Idioms such as “the elephant in the room” or “raining cats and dogs” however refer to unrealistic or entirely impossible scenarios, giving stronger indication for an intended non-literal meaning.

The two primary dimensions characterising idioms and distinguishing them from metaphors, however, are syntactic stability and conventionality. Idioms are generally considered conventionalised (Desai et al., 2013; Canal et al., 2017); some authors go as far as equating idioms with dead metaphors (Mashal et al., 2014) or describing metaphors as a subgroup of idioms (Rapp and Wild, 2011). However, idioms do not necessarily have to be of a metaphorical nature. Idioms also generally possess the rigid syntactic structure of phrasemes: they are highly collocating n-grams. It is often argued that their meaning is learned as a whole and stored as a unit in the mental lexicon (Gibbs and Beitel, 1995); other approaches however propose different models (cf. Mashal et al., 2014; Canal et al., 2017). Consequently, we use the following working definition: idioms are conventional multi-word expressions of rigid syntactic structures whose meaning cannot be extracted by the meaning of its single constituents.

Neural Processing of Figurative Language

The specifics of the neural processes underlying figurative language processing are the subject of considerable debate (cf. Thoma and Daum, 2006; Bohrn et al., 2012; Kasparian, 2013; Wang and He, 2013; Diaz and Eppes, 2018). A primary issue in the research on the cerebral localisation of figurative language processing is the specialisation of the hemispheres, and research on finer localisation has supplemented this focus. Functional neuro-measurement methods are able to visually display cerebral processes and allow for the neural investigation of online language processing, i.e., the processing of language at the point of measuring. Our review includes studies using functional magnetic resonance imaging (fMRI), electroencephalography (EEG), positron emission tomography (PET), magnetoencephalography (MEG), and near-infrared spectroscopy (NIRS).

Generally, in right-handed people the left hemisphere (LH) has been proven dominant for basic language processing but early studies reported a critical role of the right hemisphere (RH) for the understanding of metaphors (Winner and Gardner, 1977). The hypothesis of a special role of the RH was reinforced in the 1990s by Bottini et al. (1994) and by a divided visual field (DVF) study by Anaki et al. (1998). Both studies tested neurologically healthy participants and observed a dominance of the RH in the processing of metaphors. On the other hand, other studies could not find any special involvement of the RH (e.g., Rapp et al., 2004; Lee and Dapretto, 2006; Stringaris et al., 2007). A comparison of studies representing opposite hypotheses on the involvement of the RH reveals a fundamental problem: most of these studies are so different in their design, their material, and their execution that a general comparison is hardly possible (see below, also cf. Thoma and Daum, 2006; Bohrn et al., 2012; Kasparian, 2013).

Some models attempt to explain the potential lateralisation differences between literal and figurative language. The two most prominent among these models are the graded salience hypothesis (GSH, Giora, 1997) and the coarse semantic coding theory (CSCT, Beeman et al., 1994; Jung-Beeman, 2005). Both approaches share the assumption that it is not figurativeness itself but instead other characteristics that are the cause for hemisphere specialisation.

Giora's GSH considers salience the critical factor for hemispheric differences. Giora defines salience as a combination of familiarity, conventionality, frequency, and predictability of the meaning of an expression. Processing is therefore not determined by an objective contrast between literal and figurative but depends on the subjective context and previous contact with possible meanings. According to this hypothesis, the LH is responsible for the processing of salient meanings, while the RH is called upon for the processing of non-salient meanings. The figurative meaning of dead metaphors or already familiar idioms would be salient, the metaphorical meanings of unfamiliar metaphors would be non-salient.

The CSCT is based on the semantic-lexical network of an individual speaker: the theory attributes the responsibility of fine semantic coding to the LH, i.e., the activation of closely related word meanings and semantic features. The RH, on the other hand, activates weaker, more diffuse, big semantic fields and is therefore involved in the processing of ambiguities, synonyms, and more broadly related meanings. Since the meaning of figurative expressions, especially metaphors, is often semantically more distant than the literal meaning, these more broadly activated semantic fields are necessary—the semantic fields of single words of polylexical expressions overlap at critical points for relevant mappings, enabling the comprehension of the figurative meaning. Consequently, this increasingly recruits the RH for figurative language whose meaning is not part of the close semantic environment of its single constituents. Both approaches therefore agree that not all figurative expressions can be treated as a uniform collective, but instead have to be more finely distinguished. Furthermore, both models emphasise the subjectivity of language experience and the importance of controlling for possible influence factors.

Research on activation localisation is not only limited to the role of the hemispheres but also examines finer areas. In 2012, Bohrn et al. (2012) conducted a meta-analysis in which they collectively analysed the data of studies concerned with online figurative language processing. A predominant area proved to be the left inferior frontal gyrus (IFG) when contrasting figurative with literal language; the IFG appears to be more strongly involved in metaphor and idiom processing than in irony or sarcasm. Overall, a picture of a bilateral network with a dominance in the LH emerges: the bilateral IFG, temporal lobe, medial frontal gyrus and left amygdala show increased activation in the processing of figurative language (Bohrn et al., 2012). Bambini et al. (2011) also describe a bilateral network that includes the left angular gyrus and the anterior cingulum in addition to the bilateral IFG and superior temporal gyri. It is important to note that special activation for literal language but not for figurative language is reported in only about a third of the studies examined by Bohrn et al. (2012). This may point toward the processing of figurative language generally using the same network as the processing of literal language, but requiring additional cognitive resources. The cognitive load in language processing does not only depend on the distinction between figurative and literal, but is also influenced by a number of factors characterising the stimulus material.

Factors Influencing Figurative Language Processing

The successful processing of figurative language requires the integration of cognitive, affective, communicative, social, and linguistic information (Farnia, 2018). Our review will examine in detail how relevant studies control their stimuli for (psycho-)linguistic factors empirically. For this purpose, we will give our working definitions for the most prevalent influence factors that were shown to influence the neural response in figurative language research. In a first step, we collected all influence factors mentioned in several reviews on figurative language processing (Blasko and Connine, 1993; Thoma and Daum, 2006; Rapp and Wild, 2011; Bohrn et al., 2012; Rapp et al., 2012; Vartanian, 2012; Kasparian, 2013; Wang and He, 2013; Lundgren and Brownell, 2016; Diaz and Eppes, 2018). During the further literature analysis, the list was inductively extended by other factors frequently controlled for. All of the following influence factors were included as analysis factors in our review, serving as indicators for the depth, scope and implementation of stimulus control. We divided the influence factors into two categories: pycholinguistic factors, e.g., psycholinguistic variables, whose values are dependent on personal (linguistic) experience, and structural factors, e.g., syntactic complexity or length, which are intrinsic characteristics of linguistic stimuli and not dependent on individuals' perspectives.

Psycholinguistic Influence Factors

Valence

Emotional valence measures how pleasant (or positive) or unpleasant (or negative) a linguistic expression is perceived to be (Russell and Barrett, 1999). It therefore represents one part of affect, the conveyance of which is an important function of figurative language (Cardillo et al., 2012). Highly emotionally valenced words have been found to be processed with priority (especially positively valenced words, resulting in a “positivity superiority effect,” Lüdtke and Jacobs, 2015) and to elicit stronger event related potential (ERP) components associated with emotional processing (cf. Citron et al., 2016a). Differently valenced expressions have also been shown to result in different activation patters in both children and adults (Sylvester et al., 2021) and several studies found metaphors to be more emotional than literal expressions (Gibbs, 2002; Citron and Goldberg, 2014; Mohammad et al., 2016).

Arousal

Arousal joins the factor of valence as the second factor of affect. It measures the physiological activation caused by a stimulus, i.e., how “exciting” the stimulus is (Russell and Barrett, 1999). A verbal expression is therefore localised on two axes indicating its affectivity: valence encompasses negative and positive experience, while arousal indicates how stimulating, or intense, an expression is (Jacobs et al., 2015). Both factors have been found to behaviourally and neurally influence (figurative) language processing, specifically word processing (Kuperman et al., 2014; Kever et al., 2019; Pauligk et al., 2019).

Familiarity

Idioms, proverbs, and metaphors can be known or unknown to speakers—this subjective previous experience with figurative expressions is called familiarity (Schweigert, 1986; Titone and Connine, 1994). Familiarity is a crucial influence factor; the more experience a speaker has with a figurative expression, i.e., the more they hear it, read it, or use it themselves, the deeper it ingrains itself in their language usage and is integrated into the close semantic field of the single components. Given this close semantic relationship within one expression and the increased salience of familiar items according to the GSH, familiar figurative expressions are indicated to be processed more efficiently and directly than unfamiliar ones (cf. Schmidt and Seger, 2009). The term familiarity is often used synonymously with the term conventionality in the literature.

Conventionality

On the surface, conventionality may easily seem synonymous with familiarity as both terms refer to a certain degree of usualness. However, the two terms have to be distinguished clearly. Conventionality refers to the entrenchment of a figurative expression (proverb, idiom, metaphor) in the collective general language usage (Lai et al., 2009), which is enabled by frequent use by a significant number of speakers of a language community (Forgács et al., 2012; Goldstein et al., 2012). Consequently, conventionality does not carry an individual-subjective component but instead refers to the familiarity with an expression on the level of a speaker collective. To illustrate, consider non-native speakers: the German idiom jemanden auf den Arm nehmen (literally: “take somebody onto the arm,” meaning “to kid,” “to tease”) is conventional in German language usage but an English speaker learning German has not yet encountered the expression often enough (or at all) to become familiar with it. The learner has therefore now been inducted into a language community where the idiom is conventional, but it does not possess any individual familiarity for them. Citron et al. (2020b) indeed reported processing differences of conventional metaphors for L1 and L2 speakers, demonstrating the need for a careful distinction between individual familiarity and collective conventionality.

Frequency

The frequency of proverbs, idioms, and metaphors strongly correlates with familiarity; the more frequent an expression, the more familiar speakers tend to be with it (Rapp, 2005; Tanaka-Ishii and Terada, 2011). Per definition, the frequency of metaphorical meanings cannot be measured objectively, which is why alternative means have to be found. Frequency has been considered the frequency of occurrence in corpora, taken from normed databases or been rated subjectively; one must also distinguish between the frequency of entire polylexical compositions and the frequency of single words. Depending on the method of measurement, frequency has been used interchangeably with familiarity and conventionality (cf. Kasparian, 2013), leading to confounding of the respective factors. We use frequency of occurrence as our working definition.

Concreteness/Abstractness

The definition of concreteness is subject of dispute, as well. Forgács et al. (2015) equate the term “concrete” to “physical”; “abstract” consequently means “not physical” here. Citron et al. (2016b) however describe concreteness as referring to “a state or event that one can experience in one or more sensory modalities”; abstract things are therefore not tactile, audible, visible, smellable, or tasteable (Paivio et al., 1968). This broadens the definitions of concreteness and joins it with the theory of embodiment: the most direct experiences are those with one's own body, which then serve as reference points to abstraction. Figurative and literal expressions can markedly differ in their concreteness. For the purposes of this paper, we follow the definition by Citron et al. (2016b) and localise verbal expressions on a continuum between concrete and abstract.

Imageability

Imageability is linked to the factor of concreteness; the two factors are not always used in clear separation (e.g., Lachaud, 2013; Lai et al., 2015). Imageability refers to the ease with which an expression evokes a mental image. Concreteness and imageability have been shown to influence recall duration and comprehension difficulty (Barry and Gerhand, 2003; Sabsevitz et al., 2005).

Comprehensibility

Neuroscientific research papers use many terms to refer to the basic comprehensibility of a stimulus (e.g., Rapp et al., 2004; Mashal et al., 2005; Ahrens et al., 2007; Diaz et al., 2011; Cardillo et al., 2012; Lacey et al., 2017): understandability, comprehensibility, ease of understanding, and interpretability. These terms essentially describe how accessible and easy the comprehension of the meaning of the stimuli is. This factor naturally does not exist isolated from other characteristics of the stimuli—the activation of cognitive resources for instance depends on familiarity, syntactic complexity, and context (Schmidt and Seger, 2009).

Plausibility

The factor of plausibility is sometimes used in overlap with comprehensibility. However, it does not refer to the individually perceived difficulty of comprehension, but describes the degree of sensicality and therefore measures the meaningful content of linguistic stimuli (Weiland et al., 2014). For figurative expressions, one must distinguish between literal and figurative plausibility—for instance, some metaphors may be literally plausible (“an upstanding person”) but this meaning is not the intended one; other metaphors are literally implausible (“she is an angel”; Zempleni et al., 2007). The terms meaningfulness and sensicality have been used synonymously with plausibility (e.g., Stringaris et al., 2007; Weiland et al., 2014; Zane and Shafer, 2018; Jończyk et al., 2020).

Compositionality/Transparency

In regard to idioms, compositionality refers to the degree to which the components of an expression contribute to its total meaning (Laurent et al., 2006; Mashal et al., 2008). As detailed above, most definitions characterise idioms as non-compositional (Mashal et al., 2008; Zhang et al., 2013); however, some idioms are semantically transparent.

Context

The context of a linguistic expression, i.e., the linguistic (Diaz and Eppes, 2018) and situational environment, crucially determines the effort of semantic processing (Sela et al., 2015). The (in)adequacy of the literal meaning of an expression in relation to its context is an essential characteristic of figurative language: the clearer the context indicates a certain meaning, the easier the (subconscious) choice between literal and figurative interpretation. Context fulfils a disambiguating role and consequently influences the predictability of a certain meaning of an expression (Cacciari and Tabossi, 1988). The meaning of proverbs and idioms can indeed be stored independent from context; however, context can play a crucial role in these cases as well (compare the statements, “My week will be hectic because I have a lot on my plate” vs. “I knew my kid wasn't going to finish their dinner because they had a lot on their plate”). Furthermore, ironic and sarcastic meaning cannot exist independently from context.

Cloze Probability

Cloze probability refers to the probability of a certain word completing a certain expression given the preceding context (Lai et al., 2019): it is therefore a kind of context-dependant expected value. The CP influences essential components in EEG (Weiland et al., 2014) and can vary between literal and metaphorical expressions (Coulson and Van Petten, 2007). Context does not necessarily mean extensive context consisting of several sentences; the beginning of phrase or a sentence can suffice as a prior condition for CP.

Salience

This factor integrates several other factors and interacts dynamically with a given context (see above). For our review, we define salience according to Giora's GSH, i.e., a combination of familiarity, conventionality, frequency, and predictability.

Figurativeness

Although it may at first seem circular to mention figurativeness as an individual influence factor, one has to remember the behavioural and neural differences in the processing and production between figurative and literal language. Controlling stimuli for their actual figurativeness avoids the possibility of classifying subtly figurative stimuli as literal or vice versa.

Structural Influence Factors

Part of Speech

Linguistic stimulus material can consist of various parts of speech; it is especially important to which part of speech the critical (i.e., figurative) elements of the material belongs. In nominal metaphors (“he is a treasure”) a noun carries the figurative meaning, this function can also be conveyed by verbs (“the praise made her soar”), adjectives (“he is a broken man”) and prepositions (“she is beside herself”). Since parts of speech refer to different concepts (things/emotions/states of being vs. actions vs. relations), they entail different levels of abstraction (cf. Lai et al., 2019).

Tense

If stimulus material contains verbs and if these verbs are not used as isolated infinitives but instead are embedded in a phrase or a sentence, the tense of the stimuli has to be considered. Tenses have been found to be processed differently on a cerebral level (cf. Desai et al., 2006; Gilead et al., 2013) and to be conceptualised by different means in a figurative sense (cf. Gilead et al., 2013; Parkinson et al., 2014), making tense a potential confounding factor.

Length

Stimuli of different lengths engage the working memory to a different degree (Pointe and Engle, 1990; Tehan et al., 2001), and longer stimuli naturally require longer reading or listening times (Bonin et al., 2013). The length of linguistic stimuli can be stated in letters, phonemes, syllables, words, or entire sentences, in the case of auditory stimuli the temporal duration can be given as well. Depending on the nature of the stimuli, one unit of measurement might be more suitable than others; it is however not important which unit of measurement is used but rather that length is controlled for at all.

Syntactic Complexity

Not only the length of stimuli but also the actual syntactic complexity has to be considered. In the case of phrasal or sentential stimuli, stimuli can contain a broad spectrum of syntactic structures. With increasing complexity more cognitive resources are activated (Citron et al., 2016b), which in turn influences the recruitment and functional connectivity of the hemispheres (Thoma and Daum, 2006).

The influence factors mentioned above all play a role in the processing of figurative language. Since the characteristics of these factors vary between literal and figurative meanings—for example, a bitter feeling has a more negative connotation than a bitter taste—most norms for figurative language cannot be extracted from databases based on literal language alone. To obtain reliable values, they have to be rated by a large number of native speaking individuals, expending a lot of time and resources. To have all possible influence factors rated in advance of a study is therefore an unrealistic expectation. However, there are specific metaphor and idiom databases in many languages, such as English (e.g., Cardillo et al., 2010, 2017; Nordmann et al., 2014), German (e.g., Citron et al., 2016a, 2020a; Müller et al., 2021), Italian (Bambini et al., 2014), Spanish (Gavilán et al., 2021), Bulgarian (Nordmann and Jambazova, 2017), French (Bonin et al., 2013, 2018), Chinese (Li et al., 2016), and Dutch (Hubers, 2019). Figurative stimuli and scores of influence factors can be extracted from these databases and used in empirical research on figurative language.

Methods

The aim of the present review lies in systematically investigating the theoretical background and the research methods of neuro-measurement studies on figurative language, specifically on metaphor and idiom. Our leading questions are:

(a) Definitions: How are subtypes of figurative language defined and distinguished, and which criteria mark the distinctions?

(b) Influence factors: Which stimuli characteristics are controlled for, and how are the control factors defined and implemented?

Inclusion Criteria

Our review follows the PRISMA guidelines (Moher et al., 2009). Since the research on the comprehension of figurative language stretches across many scientific fields and makes use of a diverse number of methods, an extensive number of research papers have been published over the past decades. The present review sets the inclusion criteria stated in Table 1.

TABLE 1

Table 1. Inclusion criteria for our review.

Process

Given the above criteria, not every literature database presented a suitable source for our review. For practical reasons, we worked with databases that had to be accessible to the public or via a university account, and had to offer advanced search functions (i.e., allow for logical operators) and export functions. We therefore selected four databases: PubMed (pubmed.ncbi.nlm.nih.gov), Cochrane (www.cochranelibrary.com), Google Scholar (scholar.google.com) and Web of Science (WoS, webofknowledge.com).

By screening already available literature summarising neuroscientific research on figurative language, we inductively collected keywords that were to serve as critical search items. Those keywords fell into two categories: linguistic (figurative language, non-literal, proverb, metaphor, idiom, metonymy, simile, sarcasm, irony) and neuroscientific (neuro^*, imaging, brain, hemisphere, fMRI, EEG, PET, ERP, MEG). For the final search term, we combined the first with the latter with an additional specification to single out papers where the linguistic keywords occurred in context of applied (neuro-)linguistics². The keywords also had to occur in the title and/or abstract; full-text searches were avoided explicitly. The literature accumulation in all four databases began August 5th 2020 and ended August 10th 2020. The search was repeated, with a publication time widow set to 2020–2021, on August 30th 2021 in order to update the review corpus. (See Supplementary Material C) for an example of the full search term and restrictions.

In addition, the source material of ten already available reviews on related topics (Blasko and Connine, 1993; Thoma and Daum, 2006; Rapp and Wild, 2011; Bohrn et al., 2012; Rapp et al., 2012; Vartanian, 2012; Kasparian, 2013; Wang and He, 2013; Lundgren and Brownell, 2016; Diaz and Eppes, 2018) was systematically screened for relevant literature which was subsequently added to the database search results. All ten of these reviews had other foci than the present review. None examined the linguistic research methods in quantitative and qualitative detail, which was the purpose of our review.

The result was a raw literature corpus encompassing several hundred sources. In a next step, we manually sorted this corpus using the open source software JabRef (JabRef Development Team, 2020). After deleting all duplicates, we judged the remaining sources on their suitability based on title and abstracts, using the inclusion criteria described above and following the PRISMA process (Moher et al., 2009).

For the in-depth analysis of the final corpus of research papers (“review corpus”), we entered all relevant data into a structured database (“analysis chart”) using LibreOffice Calc (The Document Foundation, 2021). Note that an in-depth analysis was undertaken for papers on metaphor and idiom only, all other literature is merely listed as a source along with measurement method and figurative subtype, and is available for further research.

Please refer to the analysis chart (Supplementary Table S2) for a detailed description of the purposes of each analysed aspect. The complete analysis chart is available in Supplementary Material A and at https://osf.io/hpzb8/. All data was analysed with LibreOffice Calc and R (R Core Team, 2020).

Results

Literature Identification

The literature identification resulted in 116 research papers (Supplementary Table S3) which we accepted as suitable material for our review. 98 papers claimed to have worked with metaphors, 18 described their stimuli as idioms. For details on the selection process, (see Figure 1).

FIGURE 1

Figure 1. The literature synthesis process. Graphics template by Moher et al. (2009).

The papers were published between 1994 and 2021, giving representation to 28 years of research. In regards to measurement methods, fMRI was applied most frequently. For more details, (see Supplementary Figures S1, S2).

Definitions of Figurative Language

Figurative language as a term in itself is defined in 13 of the 116 research papers. However, we found definitions and differentiations for the subtypes more frequently included: of the 98 papers on metaphor, 50 (=51%) define the term of metaphor. Ten out of the 18 papers on idiom (=55.6%) define the term of idiom. For these numbers, we deliberately only included definitions that mentioned formal criteria and/or cognitive modalities, e.g., mappings. If a paper merely listed an example instead of including a definition, this was not counted as a viable definition.

Where included, metaphor is primarily defined in its function as a cognitive conceptualisation mechanism, mostly by way of the roles of source and target domains (or “topic” and “vehicle”) and mappings. In total, metaphor is defined in 57 papers, i.e., almost 50% of all articles included in our review. In 15 of these cases, it is distinguished against other figurative subtypes (among that idiom: n = 10).

Conventionality may serve as the distinguishing factor between metaphor and idiom in most cases (e.g., Laurent et al., 2006; Zempleni et al., 2007; Lauro et al., 2008; Desai et al., 2013; Mashal et al., 2013, 2014; Pomp et al., 2018). Idiom is defined in 19 out of all research papers, and other subtypes are explained only in context with metaphor and idiom (irony: n = 6; simile, metonymy: each n = 2; proverb, sarcasm, hyperbole: each n = 1).

In terms of actual implementation, studies on metaphor processing clearly outweigh studies on idiom processing in our review corpus. 98 papers report studies on the first, and 18 are concerned with the latter. Eleven papers contrast metaphors with another figurative subtype in their paradigms, i.e., the stimuli included figurative subtypes besides metaphor: idiom (Desai et al., 2013; Romero Lauro et al., 2013; Lorusso et al., 2015), irony (Eviatar and Just, 2006; Prat et al., 2012; Deckert et al., 2021), metonymy (Weiland et al., 2014; Yurchenko et al., 2020), sarcasm (Uchiyama et al., 2012), and simile (Shibata et al., 2012; Lai and Curran, 2013).

The majority of introductions and theoretical background sections of the papers refer to cognitive models of figurative language processing (n = 81). For more detail, (see Supplementary Table S4).

Stimulus Design

We found a wide variety of factors that the studies controlled for. Table 2 summarises the numbers of papers controlling for each psycholinguistic and structural factor, calculated from a binary analysis system (controlled for/did not control for).

TABLE 2

Table 2. Number of studies controlling for each influence factor. Total studies: n = 116.

These totals only allow for superficial insight, however. As detailed in the introduction, there are no generally accepted definitions for the pycholinguistic factors. For the purposes of the present review, we meticulously examined which definitions were mentioned and which were operationalised. Consequently, we did not indiscriminately trust the statements of the papers but instead compared the respective definitions with our working definitions as stated in the introduction. The classification in this review follows our working definitions. Thus, we occasionally classified some factors contrary to their respective papers' statements. This was the case, for example, with papers that defined conventionality as individual-subjective (e.g., Mashal et al., 2005, 2007; Lai et al., 2009; Subramaniam et al., 2012; Tang et al., 2017) and were therefore registered under “familiarity.” We proceeded similarly with papers such as Mashal et al. (2005) and Kircher et al. (2007), which claimed to have controlled for salience but only considered one aspect of salience (e.g., familiarity or frequency).

From a quantitive perspective, 64 out of the 116 (55%) research papers do not contain any definitions of their psycholinguistic influence factors, regardless of whether their definitions were congruent with our working definitions or not. Four studies (Iakimova et al., 2005; Vespignani et al., 2010; Lu and Zhang, 2012; Wang et al., 2021), i.e., 3.4%, define all factors for which their stimuli are controlled. The remaining 48 papers define at least one factor.

In the following, we will focus on the most prominent and most frequently controlled factors, representing the current status quo in stimuli control.