Semantic similarity, predictability, and models of sentence processing
Highlights
► We model the effect of semantic similarity, predictability, frequency, and length on reading time. ► Words with greater semantic similarity to their semantic cohort are read faster. ► Semantic similarity, predictability, frequency, and length all affect processing difficulty.
Introduction
It is a (true) cliché of psycholinguistics that the accuracy of human sentence processing is something of a feat, as words must be processed and integrated very quickly, given the continuous nature of the input stream. A popular, partial explanation for this feat is that, when processing sentences, we use all kinds of information to predict what is coming up next and that preactivation of the upcoming material makes integrating it easier.
Because of the widespread belief in the importance of predictability in sentence comprehension, much work has been done to enumerate the factors that comprehenders use to make their predictions about upcoming linguistic material. Factors that have been proposed to influence comprehension include verb subcategorization biases (e.g., Trueswell, Tanenhaus, & Kello, 1993), thematic fit of noun phrases (e.g., McRae et al., 1997, Tanenhaus et al., 1989), the likelihood of different agents carrying out different actions (e.g., Kamide, Altmann, & Haywood, 2003), and various discourse factors (e.g., Binder et al., 2001, Hare et al., 2004).
Comprehenders appear to use contextual information to make predictions about upcoming material. DeLong, Urbach, and Kutas (2005) found an N400 response to indefinite determiners in English (a, an) that did not correspond to the noun that was most likely to occur next given the context. Similarly, Van Berkum, Brown, Zwitserlood, Kooijman, and Hagoort (2005) found an ERP response when Dutch determiners did not match the anticipated following noun in grammatical gender. Both of these results suggest that comprehenders have formed expectations for specific words to occur in advance of the point at which the words actually occur.
The linking assumption between predictability and cognitive effort is that the cognitive representations for expected words (or phonemes, syntactic structures, etc.) are presumed to be more highly activated than those for less expected ones. Consequently, they are presumed to be easier to retrieve from memory, and require less additional activation to incrementally update the set of representations created during the comprehension of the utterance. In a sentence like The poor student ate macaroni and cheese, the word cheese is highly predictable. As a consequence, processing the word cheese results in only minor changes to the overall set of cognitive representations involved in comprehending the sentence. If the word cheese were replaced with a less predictable word, such as in The poor student ate macaroni and caviar, the processing of the sentence at the word caviar would require a larger change to the overall set of activated cognitive representations, and thus more cognitive effort.
The expectations for a particular word also reflect the expectations for structures at levels besides the word level. In a sentence like The horse raced past the barn fell, the reduced relative structure is very unexpected, as is the word fell. Consequently, processing the word fell results in major changes to the set of cognitive representations involved in comprehending the sentence. We will discuss the issue of exactly how expectations at different levels of representation are related to expectations at the word level in the final discussion section of this paper, but informally we assume the inclusion of expectations at all levels when we refer to word predictability throughout the paper.
The relationship between word probability and cognitive effort has been formalized in theories such as the surprisal theory (Hale, 2001, Levy, 2008), which relies on Eq. (1) to make predictions of cognitive effort. This equation indicates that the degree of cognitive effort required to process a word is dependent on the negative log probability of that word, given the preceding context. This measure has been described in several ways that are mathematically equivalent, but which emphasize different aspects of possible cognitive interpretations of the measure. Levy (2008) characterizes the measure in terms of the degree of difference between the probability distributions of the possible interpretations of the message before seeing the word and after seeing the word. Jurafsky (2003) characterizes it in terms of the amount of information conveyed by the word. Hale (2001) characterizes it in terms of the probability mass of the interpretations that are disconfirmed upon hearing a word.
One commonality across discussions of expectations in comprehension is that the degree of cognitive effort needed to process a particular message tends to be cast in terms of how likely a particular word, structure, or message is, relative to another word, structure, or message. Aside from their relative probabilities, little attention is paid to potential relationships between the various possible words or structures. The other words that could have occurred in that position are only relevant in that, if a particular word is very likely, other possible words must necessarily be unlikely. This indirect relationship arises because the probabilities of all possible words must sum to 1. Importantly, it is assumed that the nature of the other words that could have occurred has no other bearing than this indirect relationship on the level of difficulty faced in processing the target word itself.
We challenge this often implicit assumption that the degree of cognitive effort is determined solely by the properties of the material that actually occurs by providing evidence for our Semantic Similarity Hypothesis, which predicts that processing will be facilitated to the degree that the different possible choices that could occur in a given context are semantically similar to each other. One possible cause for the predicted processing facilitation is that activation may spread between the representations of the different possible choices that are being activated during processing (e.g., McRae, Ferretti, & Amyote, 1997). In this view, greater semantic similarity between the possible word choices would result in greater activation of this set of words, and thus greater facilitation in processing. Alternate possible causes of a semantic similarity effect will be addressed in the final discussion section.
To better understand our Semantic Similarity Hypothesis, consider the sets of possible instruments that could occur in the sentential contexts shown in (1) and (2). Based on the hypothetical distributions of possible instruments shown in Fig. 1 for these contexts, probabilistic theories of language comprehension would predict that instruments such as spear and sword would be easier to process than instruments such as machete and rock, due to their greater degrees of anticipatory activation. This prediction is consistent with a long history of experimental results showing that the degree to which material is predictable from the context affects comprehension processes, as reflected in measures such as reading times (e.g., Rayner & Well, 1996), electrophysiological response (e.g., Federmeier, Wlotko, De Ochoa-Dewald, & Kutas, 2007), and the ability to comprehend degraded input (e.g., Obleser & Kotz, 2010). Probability-based accounts such as the surprisal theory (Hale, 2001, Levy, 2008) and the SynSem Integration Model (Padó, Crocker, & Keller, 2009) have had good success at modeling such differences in relative cognitive loads during language comprehension across a wide variety of psycholinguistic phenomena based solely on knowing how likely a target word is, given its context.
- (1)
The aboriginal man jabbed the angry lion with a/an —.
- (2)
The aboriginal man attacked the angry lion with a/an —.
Models which base their predictions only on the probability of target words (e.g., Hale, 2001, Levy, 2008) necessarily also make the following predictions for the contexts shown in (1) and (2), given the distributions of possible instruments shown in Fig. 1. First, because the probability of spear is the same in both contexts, spear should have the same degree of difficulty in either context. Second, because machete has the same probability in context (1) as rock has in context (2), machete and rock should also have the same respective degree of difficulty, once other factors such as length and frequency are taken into account. However, in the examples shown in Fig. 1, there is a difference between the distributions of possible instruments for these two contexts. The set of likely instruments for the jab context are typically all sharp, pointy objects. Several of the possible instruments for attack also share these properties, but many of the less likely instruments for attack, including rock, do not. If the representations of the various possible instruments are initially activated based on their respective probabilities, activation may spread between representations based on their degree of shared semantic similarity. Fig. 2 shows the relative locations of these possible instruments in a hypothetical semantic space. Notice that the other possible instruments in the jab context are all semantically similar to spear, while in the attack context, some of the possible instruments such as rock and gun are less similar to spear.
Because more of the possible instruments in context (1) have properties in common with spear than those in context (2), our Semantic Similarity Hypothesis would predict that spear would be processed more quickly in context (1). Similarly, because machete has more in common with the other instruments in context (1) than rock does in context (2), we would predict a processing advantage for machete, even though there is no difference in probability.
The main focus of our investigation is whether the semantic similarity between words that could occur in a context has an additional influence on processing beyond the influence of the predictability of the word that does occur. To do this, we first conducted a completion/listing study to establish the distribution and likelihoods of possible words that could appear in a set of contexts. Then we conducted a reading time study to establish the degree of cognitive effort required to processes specific possible words. We then used Latent Semantic Analysis (Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990) to measure the degree of semantic similarity between the possible words that could have occurred in each of the sentences (as determined by the sentence completion/listing study) and the word that does occur in the sentence. Finally, we used a linear mixed-effects regression model to demonstrate that the degree of shared similarity between the actual target word and the other possible words plays a role in predicting processing difficulty above and beyond the effect of the probability of the word that actually occurred.
Section snippets
Establishing the distribution of possible words given a context
In order to investigate the roles of predictability and semantic similarity in processing, we need a set of human performance observations based on target words appearing in contexts where the set of possible alternative words is easily characterizable, and yet has a range of differing probabilities and degrees of similarity with other possible fillers of the target word slot. Instrument phrases, such as with a spear, in the jab and attack examples above make an ideal target for investigation
Measuring the effects of predictability and semantic similarity
The goal of the modeling we conducted was to determine whether processing time was influenced by the semantic similarity between a target instrument word and other possible instrument words that could have occurred in the target sentence frame. Reading time data were submitted to a linear mixed-effects regression model for analysis. Analyses were conducted using the lme4 (version 0.999375-33, Bates & Maechler, 2010) and languageR libraries (version 1.0, Baayen, 2010) for the R statistics
General discussion and conclusions
Our goal in this paper was to determine whether there was any influence on the processing of a particular word from other possible words that could have occurred in the same context. We found that this was indeed the case. Processing of the target word was facilitated when the other possible words were semantically similar to the target word in comparison with when the other possible words were less semantically similar. We also found expected effects of predictability and word length on
Acknowledgements
We wish to acknowledge the help of the members of the Psycholinguistics Lab and the Computational Psycholinguistics Lab at the University at Buffalo. We also wish to acknowledge Stephani Foraker for helpful feedback.
References (51)
- et al.
Mixed-effects modeling with crossed random effects for subjects and items
Journal of Memory and Language
(2008) - et al.
The effects of thematic fit and discourse context on syntactic ambiguity resolution
Journal of Memory and Language
(2001) - et al.
Evidence for the immediate use of verb control information in sentence processing
Journal of Memory and Language
(1990) - et al.
Frequency and predictability effects on event-related potentials during reading
Brain Research
(2006) - et al.
Data from eye-tracking corpora as evidence for theories of syntactic processing complexity
Cognition
(2008) - et al.
A rose by any other name: Long-term memory structure and sentence processing
Journal of Memory and Language
(1999) - et al.
Multiple effects of sentential constraint on word processing
Brain Research
(2007) - et al.
The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements
Journal of Memory & Language
(2003) Expectation-based syntactic comprehension
Cognition
(2008)- et al.
Implicit arguments in sentence processing
Journal of Memory and Language
(1995)