Elsevier

Cognition

Volume 122, Issue 3, March 2012, Pages 267-279
Cognition

Semantic similarity, predictability, and models of sentence processing

https://doi.org/10.1016/j.cognition.2011.11.011Get rights and content

Abstract

The effects of word predictability and shared semantic similarity between a target word and other words that could have taken its place in a sentence on language comprehension are investigated using data from a reading time study, a sentence completion study, and linear mixed-effects regression modeling. We find that processing is facilitated if the different possible words that could occur in a given context are semantically similar to each other, meaning that processing is affected not only by the nature of the words that do occur, but also the relationships between the words that do occur and those that could have occurred. We discuss possible causes of the semantic similarity effect and point to possible limitations of using probability as a model of cognitive effort.

Highlights

► We model the effect of semantic similarity, predictability, frequency, and length on reading time. ► Words with greater semantic similarity to their semantic cohort are read faster. ► Semantic similarity, predictability, frequency, and length all affect processing difficulty.

Introduction

It is a (true) cliché of psycholinguistics that the accuracy of human sentence processing is something of a feat, as words must be processed and integrated very quickly, given the continuous nature of the input stream. A popular, partial explanation for this feat is that, when processing sentences, we use all kinds of information to predict what is coming up next and that preactivation of the upcoming material makes integrating it easier.

Because of the widespread belief in the importance of predictability in sentence comprehension, much work has been done to enumerate the factors that comprehenders use to make their predictions about upcoming linguistic material. Factors that have been proposed to influence comprehension include verb subcategorization biases (e.g., Trueswell, Tanenhaus, & Kello, 1993), thematic fit of noun phrases (e.g., McRae et al., 1997, Tanenhaus et al., 1989), the likelihood of different agents carrying out different actions (e.g., Kamide, Altmann, & Haywood, 2003), and various discourse factors (e.g., Binder et al., 2001, Hare et al., 2004).

Comprehenders appear to use contextual information to make predictions about upcoming material. DeLong, Urbach, and Kutas (2005) found an N400 response to indefinite determiners in English (a, an) that did not correspond to the noun that was most likely to occur next given the context. Similarly, Van Berkum, Brown, Zwitserlood, Kooijman, and Hagoort (2005) found an ERP response when Dutch determiners did not match the anticipated following noun in grammatical gender. Both of these results suggest that comprehenders have formed expectations for specific words to occur in advance of the point at which the words actually occur.

The linking assumption between predictability and cognitive effort is that the cognitive representations for expected words (or phonemes, syntactic structures, etc.) are presumed to be more highly activated than those for less expected ones. Consequently, they are presumed to be easier to retrieve from memory, and require less additional activation to incrementally update the set of representations created during the comprehension of the utterance. In a sentence like The poor student ate macaroni and cheese, the word cheese is highly predictable. As a consequence, processing the word cheese results in only minor changes to the overall set of cognitive representations involved in comprehending the sentence. If the word cheese were replaced with a less predictable word, such as in The poor student ate macaroni and caviar, the processing of the sentence at the word caviar would require a larger change to the overall set of activated cognitive representations, and thus more cognitive effort.

The expectations for a particular word also reflect the expectations for structures at levels besides the word level. In a sentence like The horse raced past the barn fell, the reduced relative structure is very unexpected, as is the word fell. Consequently, processing the word fell results in major changes to the set of cognitive representations involved in comprehending the sentence. We will discuss the issue of exactly how expectations at different levels of representation are related to expectations at the word level in the final discussion section of this paper, but informally we assume the inclusion of expectations at all levels when we refer to word predictability throughout the paper.

The relationship between word probability and cognitive effort has been formalized in theories such as the surprisal theory (Hale, 2001, Levy, 2008), which relies on Eq. (1) to make predictions of cognitive effort. This equation indicates that the degree of cognitive effort required to process a word is dependent on the negative log probability of that word, given the preceding context. This measure has been described in several ways that are mathematically equivalent, but which emphasize different aspects of possible cognitive interpretations of the measure. Levy (2008) characterizes the measure in terms of the degree of difference between the probability distributions of the possible interpretations of the message before seeing the word and after seeing the word. Jurafsky (2003) characterizes it in terms of the amount of information conveyed by the word. Hale (2001) characterizes it in terms of the probability mass of the interpretations that are disconfirmed upon hearing a word.difficulty-logp(wi|w1i-1CONTEXT)

One commonality across discussions of expectations in comprehension is that the degree of cognitive effort needed to process a particular message tends to be cast in terms of how likely a particular word, structure, or message is, relative to another word, structure, or message. Aside from their relative probabilities, little attention is paid to potential relationships between the various possible words or structures. The other words that could have occurred in that position are only relevant in that, if a particular word is very likely, other possible words must necessarily be unlikely. This indirect relationship arises because the probabilities of all possible words must sum to 1. Importantly, it is assumed that the nature of the other words that could have occurred has no other bearing than this indirect relationship on the level of difficulty faced in processing the target word itself.

We challenge this often implicit assumption that the degree of cognitive effort is determined solely by the properties of the material that actually occurs by providing evidence for our Semantic Similarity Hypothesis, which predicts that processing will be facilitated to the degree that the different possible choices that could occur in a given context are semantically similar to each other. One possible cause for the predicted processing facilitation is that activation may spread between the representations of the different possible choices that are being activated during processing (e.g., McRae, Ferretti, & Amyote, 1997). In this view, greater semantic similarity between the possible word choices would result in greater activation of this set of words, and thus greater facilitation in processing. Alternate possible causes of a semantic similarity effect will be addressed in the final discussion section.

To better understand our Semantic Similarity Hypothesis, consider the sets of possible instruments that could occur in the sentential contexts shown in (1) and (2). Based on the hypothetical distributions of possible instruments shown in Fig. 1 for these contexts, probabilistic theories of language comprehension would predict that instruments such as spear and sword would be easier to process than instruments such as machete and rock, due to their greater degrees of anticipatory activation. This prediction is consistent with a long history of experimental results showing that the degree to which material is predictable from the context affects comprehension processes, as reflected in measures such as reading times (e.g., Rayner & Well, 1996), electrophysiological response (e.g., Federmeier, Wlotko, De Ochoa-Dewald, & Kutas, 2007), and the ability to comprehend degraded input (e.g., Obleser & Kotz, 2010). Probability-based accounts such as the surprisal theory (Hale, 2001, Levy, 2008) and the SynSem Integration Model (Padó, Crocker, & Keller, 2009) have had good success at modeling such differences in relative cognitive loads during language comprehension across a wide variety of psycholinguistic phenomena based solely on knowing how likely a target word is, given its context.

  • (1)

    The aboriginal man jabbed the angry lion with a/an —.

  • (2)

    The aboriginal man attacked the angry lion with a/an —.

Models which base their predictions only on the probability of target words (e.g., Hale, 2001, Levy, 2008) necessarily also make the following predictions for the contexts shown in (1) and (2), given the distributions of possible instruments shown in Fig. 1. First, because the probability of spear is the same in both contexts, spear should have the same degree of difficulty in either context. Second, because machete has the same probability in context (1) as rock has in context (2), machete and rock should also have the same respective degree of difficulty, once other factors such as length and frequency are taken into account. However, in the examples shown in Fig. 1, there is a difference between the distributions of possible instruments for these two contexts. The set of likely instruments for the jab context are typically all sharp, pointy objects. Several of the possible instruments for attack also share these properties, but many of the less likely instruments for attack, including rock, do not. If the representations of the various possible instruments are initially activated based on their respective probabilities, activation may spread between representations based on their degree of shared semantic similarity. Fig. 2 shows the relative locations of these possible instruments in a hypothetical semantic space. Notice that the other possible instruments in the jab context are all semantically similar to spear, while in the attack context, some of the possible instruments such as rock and gun are less similar to spear.

Because more of the possible instruments in context (1) have properties in common with spear than those in context (2), our Semantic Similarity Hypothesis would predict that spear would be processed more quickly in context (1). Similarly, because machete has more in common with the other instruments in context (1) than rock does in context (2), we would predict a processing advantage for machete, even though there is no difference in probability.

The main focus of our investigation is whether the semantic similarity between words that could occur in a context has an additional influence on processing beyond the influence of the predictability of the word that does occur. To do this, we first conducted a completion/listing study to establish the distribution and likelihoods of possible words that could appear in a set of contexts. Then we conducted a reading time study to establish the degree of cognitive effort required to processes specific possible words. We then used Latent Semantic Analysis (Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990) to measure the degree of semantic similarity between the possible words that could have occurred in each of the sentences (as determined by the sentence completion/listing study) and the word that does occur in the sentence. Finally, we used a linear mixed-effects regression model to demonstrate that the degree of shared similarity between the actual target word and the other possible words plays a role in predicting processing difficulty above and beyond the effect of the probability of the word that actually occurred.

Section snippets

Establishing the distribution of possible words given a context

In order to investigate the roles of predictability and semantic similarity in processing, we need a set of human performance observations based on target words appearing in contexts where the set of possible alternative words is easily characterizable, and yet has a range of differing probabilities and degrees of similarity with other possible fillers of the target word slot. Instrument phrases, such as with a spear, in the jab and attack examples above make an ideal target for investigation

Measuring the effects of predictability and semantic similarity

The goal of the modeling we conducted was to determine whether processing time was influenced by the semantic similarity between a target instrument word and other possible instrument words that could have occurred in the target sentence frame. Reading time data were submitted to a linear mixed-effects regression model for analysis. Analyses were conducted using the lme4 (version 0.999375-33, Bates & Maechler, 2010) and languageR libraries (version 1.0, Baayen, 2010) for the R statistics

General discussion and conclusions

Our goal in this paper was to determine whether there was any influence on the processing of a particular word from other possible words that could have occurred in the same context. We found that this was indeed the case. Processing of the target word was facilitated when the other possible words were semantically similar to the target word in comparison with when the other possible words were less semantically similar. We also found expected effects of predictability and word length on

Acknowledgements

We wish to acknowledge the help of the members of the Psycholinguistics Lab and the Computational Psycholinguistics Lab at the University at Buffalo. We also wish to acknowledge Stephani Foraker for helpful feedback.

References (51)

  • G.A. Miller et al.

    Length–frequency statistics for written English

    Information and Control

    (1958)
  • A. Pollatsek et al.

    Tests of the E-Z reader model: Exploring the interface between cognition and eye-movement control

    Cognitive Psychology

    (2006)
  • M. Spivey-Knowlton et al.

    Resolving attachment ambiguities with multiple constraints

    Cognition

    (1995)
  • J.C. Trueswell et al.

    Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution

    Journal of Memory & Language

    (1994)
  • J. Ashby et al.

    Eye movements of highly skilled and average readers: Differential effects of frequency and predictability

    The Quarterly Journal of Experimental Psychology A: Human Experimental Psychology

    (2005)
  • R.H. Baayen

    Analyzing linguistic data: A practical introduction to statistics using R

    (2008)
  • Baayen, R. H. (2010). languageR: Data sets and functions with “Analyzing Linguistic Data: A practical introduction to...
  • Bates, D., & Maechler, M. (2010). lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-33....
  • M.F. Boston et al.

    Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus

    Journal of Eye Movement Research

    (2008)
  • M.F. Boston et al.

    Parallel processing and sentence comprehension difficulty

    Language and Cognitive Processes

    (2011)
  • L. Burnard

    Users reference guide for the British National Corpus

    (1995)
  • S. Deerwester et al.

    Indexing by latent semantic analysis

    Journal of the American Society for Information Science

    (1990)
  • K.A. DeLong et al.

    Probabilistic word pre-activation during language comprehension inferred from electrical brain activity

    Nature Neuroscience

    (2005)
  • A. Gelman et al.

    Data analysis using regression and multilevel/hierarchical models

    (2007)
  • Hale, J. (2001). A probabilistic early parser as a psycholinguistic model. In Proceedings of the Second Meeting of the...
  • Cited by (0)

    View full text