Working-memory disruption by task-irrelevant talkers depends on degree of talker familiarity

Kreitewolf, Jens; Wöstmann, Malte; Tune, Sarah; Plöchl, Michael; Obleser, Jonas

doi:10.3758/s13414-019-01727-2

Working-memory disruption by task-irrelevant talkers depends on degree of talker familiarity

Perceptual/Cognitive Constraints on the Structure of Speech Communication: In Honor of Randy Diehl
Published: 16 April 2019

Volume 81, pages 1108–1118, (2019)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Working-memory disruption by task-irrelevant talkers depends on degree of talker familiarity

Download PDF

Jens Kreitewolf¹,
Malte Wöstmann¹,
Sarah Tune¹,
Michael Plöchl¹ &
…
Jonas Obleser¹

1942 Accesses
7 Citations
5 Altmetric
Explore all metrics

Abstract

When one is listening, familiarity with an attended talker’s voice improves speech comprehension. Here, we instead investigated the effect of familiarity with a distracting talker. In an irrelevant-speech task, we assessed listeners’ working memory for the serial order of spoken digits when a task-irrelevant, distracting sentence was produced by either a familiar or an unfamiliar talker (with rare omissions of the task-irrelevant sentence). We tested two groups of listeners using the same experimental procedure. The first group were undergraduate psychology students (N = 66) who had attended an introductory statistics course. Critically, each student had been taught by one of two course instructors, whose voices served as the familiar and unfamiliar task-irrelevant talkers. The second group of listeners were family members and friends (N = 20) who had known either one of the two talkers for more than 10 years. Students, but not family members and friends, made more errors when the task-irrelevant talker was familiar versus unfamiliar. Interestingly, the effect of talker familiarity was not modulated by the presence of task-irrelevant speech: Students experienced stronger working memory disruption by a familiar talker, irrespective of whether they heard a task-irrelevant sentence during memory retention or merely expected it. While previous work has shown that familiarity with an attended talker benefits speech comprehension, our findings indicate that familiarity with an ignored talker disrupts working memory for target speech. The absence of this effect in family members and friends suggests that the degree of familiarity modulates the memory disruption.

Effects of talker continuity and speech rate on auditory working memory

Article 08 February 2019

Sung-Joo Lim, Barbara G. Shinn-Cunningham & Tyler K. Perrachione

Coping with adversity: Individual differences in the perception of noisy and accented speech

Article 08 May 2018

Drew J. McLaughlin, Melissa M. Baese-Berk, … Kristin J. Van Engen

Talker-familiarity benefit in non-native recognition memory and word identification: The role of listening conditions and proficiency

Article 04 March 2019

Polina Drozdova, Roeland van Hout & Odette Scharenborg

In natural situations, listeners are surrounded by a multitude of sounds that compete for attention. To comprehend a talker’s speech in the presence of competing distractors, both voice characteristics, such as the talker’s pitch, timbre, and articulatory style (reviewed by Diehl, Lotto, & Holt, 2004; Mathias & von Kriegstein, 2014), and listening goals, such as selective attention to and working memory of the talker’s speech (reviewed by Fritz, Elhilali, David, & Shamma, 2007; Shinn-Cunningham, 2008), guide the listener to focus on relevant sounds and to ignore irrelevant distractors.

One beneficial factor for speech comprehension under such conditions is familiarity with the talker’s voice. Several studies have shown that listeners are better at comprehending target speech when it is produced by a familiar rather than an unfamiliar talker (Holmes, Domingo, & Johnsrude, 2018; Johnsrude et al., 2013; Kreitewolf, Mathias, & von Kriegstein, 2017; Levi, Winters, & Pisoni, 2011; Newman & Evers, 2007; Nygaard & Pisoni, 1998; Souza, Gehani, Wright, & McCloy, 2013). In the following, we refer to this phenomenon as the familiarity benefit (e.g., Johnsrude et al., 2013; Kreitewolf et al., 2017).

The familiarity benefit likely relies on listeners’ previous experience with the talker’s vocal characteristics that can be exploited to direct selective attention to target sounds in the auditory scene (Bressler, Masud, Bharadwaj, & Shinn-Cunningham, 2014; Kreitewolf, Mathias, Trapeau, Obleser, & Schönwiesner, 2018). A conceivable implication of the familiarity benefit is that listeners do not only benefit from talker familiarity when their goal is to attend to the familiar talker’s speech but also when they want to ignore it. In other words, if the familiarity benefit is based on previous experience with the talker’s voice, then this experience might also help listeners to filter out distracting speech produced by this talker.

Only a few studies have investigated whether talker familiarity helps listeners ignore distracting, task-irrelevant speech. Johnsrude et al. (2013) presented listeners with two concurrent spoken sentences and asked them to report key words from the target sentence. Critically, the authors manipulated talker familiarity in these sentences such that either the attended (target), the unattended (masker), or none of the two was spoken by a highly familiar talker (i.e., the listeners’ spouses). Listeners correctly reported more keywords from the target sentence when either the target or the masker sentence was spoken by a familiar as compared to an unfamiliar talker, suggesting that talker familiarity facilitates both attending to target speech and ignoring distracting speech. In an earlier study, Newman and Evers (2007) asked listeners to attend to and immediately repeat (i.e., to shadow) speech from a target talker while a distracting talker was presented in the background. Listeners differed in their familiarity with one of the two concurrent talkers and whether or not they were told that they would hear a familiar voice (i.e., explicit vs. implicit knowledge). In this study, talker familiarity was ensured by presenting speech produced by the listeners’ university professor. The results showed that listeners with explicit knowledge about talker familiarity made fewer shadowing errors than listeners who only had implicit knowledge or listeners who were not familiar with the talker at all. Yet, this benefit was limited to familiarity with the target talker. Unlike Johnsrude et al., listeners did not benefit from familiarity with the distracting background talker. Therefore, these two studies produced somewhat incongruent results with regard to the question of whether talker familiarity helps listeners to ignore distracting, task-irrelevant speech.

Here, we investigated the effect of talker familiarity on the distraction induced by task-irrelevant speech using a different, yet well-established experimental paradigm: the irrelevant-speech task (e.g., Colle & Welsh, 1976; Salamé & Baddeley, 1982). The irrelevant-speech task requires listeners to keep the serial order of to-be-attended target stimuli in working memory while task-irrelevant, to-be-ignored speech is presented during memory retention. The number of incorrectly recalled targets is thought to increase proportionally to the distraction by the task-irrelevant speech, making the irrelevant-speech task an effective paradigm to study memory disruption by distracting speech. To modulate talker familiarity, we used an adaptation of the irrelevant-speech task in which a task-irrelevant, to-be-ignored sentence was spoken by either a familiar or an unfamiliar talker (Fig. 1).

The major objective of this study was to test whether familiarity with a task-irrelevant talker affects the serial recall of attended target speech. One possibility is that talker familiarity helps listeners filter out irrelevant speech. This should manifest in fewer recall errors when the task-irrelevant talker is familiar versus unfamiliar. Such a finding would be not only in line with Johnsrude et al. (2013), but also consistent with the idea of proactive filtering: When a distractor is known or anticipated, the filter can be applied even before the distractor appears (e.g., Noonan et al., 2016; Ruff & Driver, 2006). Interestingly, proactive filtering is beneficial when a distractor is present, but behaviorally costly when an expected distractor is omitted (Marini, Chelazzi, & Maravita, 2013).

To test the possibility that listeners proactively filter out irrelevant speech produced by the familiar talker, we blocked presentation of the familiar and unfamiliar talkers and omitted the task-irrelevant sentence on rare occasions. If listeners proactively filtered irrelevant speech from a familiar talker, we would observe fewer recall errors on trials in which the task-irrelevant sentence was spoken by a familiar than on trials with an unfamiliar talker (familiarity benefit), and more recall errors on trials in which a familiar rather than an unfamiliar distracting talker was anticipated but the irrelevant sentence was omitted (familiarity deficit).

An alternative possibility is that task-irrelevant speech produced by a familiar talker captures attention and draws it away from items in memory. Previous work on the irrelevant-speech effect has shown that distractors of high familiarity, such as the listener’s own name (Röer, Bell, & Buchner, 2013) and the listener’s native language (Ellermeier, Kattner, Ueda, Doumoto, & Nakajima, 2015), enhance memory disruption. Possibly, familiar distractors constitute salient stimuli that involuntarily draw attention resources away from the serial memory of target items. If this is also the case for familiar voices, we would observe more recall errors when the task-irrelevant sentence is spoken by a familiar versus unfamiliar talker; however, we would expect no effect of talker familiarity on trials in which the task-irrelevant sentence is omitted.

A third possibility is that the effect of talker familiarity is not modulated by the actual presentation of a task-irrelevant sentence. That is, talker familiarity affects working memory irrespective of whether the familiar talker’s distracting speech is heard or merely expected. Such finding would be difficult to explain by (proactive) filtering or attentional capture of the familiar talker’s speech; instead, it would rather speak to more general differences in how familiar and unfamiliar task-irrelevant talkers affect working memory of target speech.

Based on the collective results from previous studies, one might draw the conclusion that the degree of familiarity plays an important role for the distraction by irrelevant speech and that listeners only benefit from high (Johnsrude et al., 2013) but not moderate familiarity (Newman & Evers, 2007) with a distracting talker. Yet, these studies are difficult to compare since they differ markedly in their experimental procedures, including stimuli and task instructions.

Here, we tested the effect of the degree of familiarity by comparing two groups of listeners that performed the exact same irrelevant-speech task. Importantly, these two groups differed in their degree of familiarity with similar magnitudes of familiarity as in previous studies. Specifically, we compared a group of students who heard irrelevant speech produced by one of their course instructors (i.e., moderate familiarity, similar to Newman & Evers, 2007) with a group of the course instructors’ family members and friends (i.e., high familiarity, similar to Johnsrude et al., 2013).

Method

Participants

Two groups of listeners participated in this study. The first group were N = 66 undergraduate psychology students (59 females, seven males; mean age 23.11 years, age range 17–48 years; see Table 1 for details) who had received classroom instructions by one of two talkers. The classroom teaching comprised a total of fourteen 90-min sessions (of which all participants attended at least nine; see Table 1). The second group of listeners were N = 20 family members and close friends of either one of the two talkers (eight females, 12 males; mean age 39 years, age range 30–65 years; see Table 1 for details), who did not receive classroom instructions. All listeners were native German speakers. Students gained course credit for their participation; family members and friends were paid €10. The experimental procedures were approved by the ethics committee of the University of Lübeck.

Table 1 Details of the listener groups

Full size table

Stimuli

The to-be-attended speech stimuli were recordings of the German digits 1 to 9, which we had used in previous studies (Obleser, Wöstmann, Hellbernd, Wilsch, & Maess, 2012; Wöstmann, Lim, & Obleser, 2017; Wöstmann & Obleser, 2016). All digits were spoken by a native German female talker (mean fundamental frequency, f₀, of 192 Hz). None of the listeners was familiar with the talker’s voice. Digits had an average duration of 0.6 s (ranging from 0.5 to 0.7 s) and were concatenated with an onset-to-onset delay of 0.75 s. The resulting digit streams had an average duration of 6.6 s.

For the task-irrelevant speech, we used a German version of the speech-in-noise sentences (Erb, Henry, Eisner, & Obleser, 2012) adopted from Kalikow, Stevens, and Elliott (1977). The same 50 sentences were recorded from two male talkers who were the authors J.K. and M.P. (both native German speakers). The mean f₀, averaged across all sentences, was 93 Hz for talker J.K. and 85 Hz for talker M.P. The sentences produced by J.K. had an average duration of 2.17 s (ranging from 1.83 to 2.58 s); the sentences produced by M.P. had an average duration of 2.19 s (ranging from 1.91 to 2.72 s). All sentences and digit streams were normalized to the same root mean square (RMS) decibel full-scale amplitude using MATLAB (version 8.6, MathWorks, United States). On a given trial, the onset of the task-irrelevant sentence was delayed by 1,409 ms (± 400 ms) so that, on average, the sentences were centered in the middle of a 5-s memory retention phase (Fig. 1B).

Procedure

The listeners performed an online experiment implemented in Labvanced (Scicovery GmbH, Osnabrück, Germany) that used an adaptation of the irrelevant-speech paradigm (e.g., Colle & Welsh, 1976; Jones & Morris, 1992). The online experiment was executed in the browser in full-screen mode. Online experiments allow for the rapid collection of large datasets (e.g., Buhrmester, Kwang, & Gosling, 2011) and have been shown to produce reliable data in several areas of behavioral research, including linguistics (e.g., Saunders, Bex, & Woods, 2013) and psychoacoustics (e.g., McPherson & McDermott, 2018). Here, the online experiment had the additional advantage to prevent direct contact between the listeners and one of the two task-irrelevant talkers immediately before the start of the experiment, which would have otherwise contaminated our manipulation of talker familiarity.

All listeners completed the experiment within 1 h. On each trial, listeners heard the nine spoken digits in random order followed by a task-irrelevant sentence (Fig. 1B), while a fixation cross was presented on the computer screen. In about 17% of trials (i.e., 20 out of 120 trials), silence was presented instead of task-irrelevant speech. A total of 5 s after the offset of the digit stream (i.e., at the end of the memory retention phase), a number pad consisting of the digits 1 to 9 was visually presented. Listeners were asked to select the digits in the order of their presentation. Each visually presented digit disappeared directly after it had been selected. This was done to avoid that the same digit could be selected more than once per trial. After the selection of the ninth digit, the next trial started with a delay of 500 ms. No feedback was given.

Listeners were asked to perform the online experiment in a quiet setting, to use a computer (no tablets, smart phones, etc.), and to listen to the sounds using headphones. Prior to the start of the experiment, listeners were instructed to silently rehearse the digit stream during the memory retention phase, to keep their eyes open and not to speak the digits aloud during a trial. Listeners could adjust the loudness of the sounds to a comfortable level. They were asked not to change the sound level during the experiment.

The experiment comprised four blocks (Fig. 1A, “Test”). Each block consisted of 30 trials: 25 trials with a distracting, task-irrelevant sentence (distractor trials), and five trials with silence in the memory retention phase (no-distractor trials). In each block, the no-distractor trials were pseudo-randomly interspersed with the distractor trials, with the restrictions that the first no-distractor trial within a block could not occur within the first five trials and that two no-distractor trials could not occur in succession. The task-irrelevant familiar and unfamiliar talkers were presented in alternating blocks of trials. The talker familiarity was not made explicit (i.e., listeners were not told that they would hear a familiar talker’s voice). Half of the listeners started with a familiar-talker block; the other half started with an unfamiliar-talker block (Fig. 1A, “Test”). Note that the no-distractor trials were acoustically identical for the task-irrelevant familiar and unfamiliar talkers. However, the blocked presentation of the familiar and unfamiliar talkers allowed us to test whether the infrequent presentation of silence in the memory retention phase would affect performance differently when a familiar versus an unfamiliar talker’s voice was expected.

In the first and second halves of the experiment (each consisting of one familiar- and one unfamiliar-talker block), the same combinations of digit streams and task-irrelevant sentences were presented (but each task-irrelevant sentence was spoken by one talker in the first half and by the other talker in the second half of the experiment). This was done to ensure that differences in performance between the blocks were due to familiarity with the task-irrelevant talker and not to differences in the memorability of the digit streams or the distractibility of the task-irrelevant sentences. To reduce item-specific learning, we ensured that the trial order was always different in familiar- and unfamiliar-talker blocks.

Statistical analyses

To assess listeners’ memory of the serial order of digits, we considered digits recalled at their respective position of presentation as correct and all other responses as incorrect. As a measure of distraction by the task-irrelevant speech, we counted the number of errors per trial (0– 9). All statistical analyses were carried out in R (R Core Team, 2017) using RStudio (version 1.1.383).

To overcome problems related to the unequal numbers of trials with and without a distractor in the memory retention phase, we fitted generalized linear mixed-effect models as implemented in the lme4 package (Bates, Mächler, Bolker, & Walker, 2015) to the number of errors per trial using Poisson regression (log link function; treating the number of errors as count data).

We followed an iterative model-fitting procedure: Starting with a minimal model that only included the random intercepts for subjects, we first added fixed- and then random-effect terms in a stepwise fashion. The fixed-effect terms were added in the order of their conceptual importance (i.e., talker familiarity, distractor, listener group, and interactions between these factors; see below). The random-effect terms included random intercepts for sentences and the subject- and sentence-specific random slopes for all significant main factors and interactions. After each step, we fitted the model using maximum-likelihood estimation and assessed the change in model fit using likelihood-ratio tests. Model terms that significantly improved the model fit were kept in the model, and nonsignificant terms were dropped (unless they were involved in higher-order interactions), resulting in a best-fitting model.

To investigate the potential effects of talker familiarity (familiar vs. unfamiliar task-irrelevant talker), distractor (distractor vs. no-distractor trial), and listener group (students vs. family and friends) on the number of errors, we modeled these predictors as fixed effects using deviation coding. To explore significant interaction terms, we performed post-hoc comparisons using Tukey’s range tests, as implemented in the lsmeans package (Lenth, 2016). We report unstandardized coefficients b in order to provide an estimate of effect size for fixed effects. Note that Poisson regression operates on a log transform of the dependent measure. The coefficients are therefore given in log-scale units. For significant random-effect terms, we report the likelihood-ratio test comparing the more complex model that includes the random-effect term with the simpler model excluding that term.

To enhance the interpretability of nonsignificant effects in particular, we calculated Bayes factors (BFs) using the brms package (Bürkner, 2016). When comparing two statistical models, the BF indicates how many times more likely the observed data are under the more complex model (including a particular model term of interest) than under the simpler model (excluding the model term of interest). In accordance with Jeffreys (1961), a BF of 0.33 or smaller is interpreted as providing evidence in favor of the null hypothesis, whereas a BF of 3 or larger is interpreted as evidence against it.

Results

Figure 2A shows the average proportions of errors as a function of digit position for distractor and no-distractor trials. Several basic observations can be made, on the basis of the data shown in this figure. Descriptively, listeners made fewer errors for digits presented at initial and final positions, which is likely due to primacy and recency effects (e.g., Jones & Macken, 1993; Röer et al., 2013; Salamé & Baddeley, 1982; Schlittmeier, Weisz, & Bertrand, 2011; Wöstmann & Obleser, 2016). Interestingly, differences between distractor and no-distractor trials were most pronounced for digits presented in the second half of the digit stream (i.e., from the fifth to the eighth digit), and digits presented at these positions were generally more difficult to recall. Another observation is that, averaged across digit positions, listeners made more errors in distractor than in no-distractor trials (Fig. 2B).

Distraction by task-irrelevant speech

To test the effect of distractor, among other things, on the number of errors per trial, we fitted linear mixed-effect models. In sum, the best-fitting model included the three main factors, distractor, talker familiarity, and listener group, as well as the interaction between the factors talker familiarity and listener group, as fixed-effect terms (see Table 2 for a summary of fixed-effect terms). The random-effect terms included the random intercepts for subjects and sentences, as well as the subject-specific random slopes for talker familiarity.

Table 2 Fixed-effect terms included in the best-fitting model

Full size table

The inclusion of distractor in the best-fitting model (Z = – 2.81, p = .005, b = – 0.047, BF = 7.6) demonstrated the irrelevant-speech effect: Presentation of a task-irrelevant sentence within the memory retention phase was indeed more distracting than a silence period. The sentence-specific random intercepts were also included in the best-fitting model (χ²₁ = 9.75, p = .002, BF = 4.09), suggesting that the task-irrelevant sentences differed in distractibility.

Moderate but not high talker familiarity disrupts memory of target speech

The main aim of the present study was to investigate the effect of familiarity with a task-irrelevant talker on the serial recall of target digits. Figure 3 shows how the average proportion of errors evolved over digit positions in familiar- and unfamiliar-talker blocks (Fig. 3A) and the numbers of errors per trial averaged across digit positions for task-irrelevant familiar and unfamiliar talkers (Fig. 3B).

The best-fitting model included the factor talker familiarity (Z = – 1.98, p = .048, b = – 0.0472, BF = 1.58) and the interaction between the factors talker familiarity and listener group (Z = 2.01, p = .045, BF = 0.58). Post-hoc comparisons revealed a significant effect of talker familiarity in the student group (Z = – 4.07, p < .001, b = – 0.0947, BF = 13.71), but not in the group of family members and friends (Z = 0.01, p = .994, b = 0.0003, BF = 0.04). This means that students (Fig. 3C) but not family members and friends (Fig. 3D) made more errors when they were familiar with the task-irrelevant talker than when they were not.

Notably, the sample size was much smaller in the group of family and friends (N = 20) than in the student group (N = 66). However, the small BF of 0.04 provides strong evidence for the absence of a familiarity effect for family and friends. It is therefore unlikely that the comparably small sample was responsible for the lack of a familiarity effect in this group of listeners.

The best-fitting model also included the subject-specific random slopes for talker familiarity (χ²₂ = 64.38, p < .001, BF = 17.72), suggesting that the effect of talker familiarity differed across listeners. Interestingly, however, the best-fitting model did not include the interaction between the factors talker familiarity and distractor: Compared to the simpler model, the inclusion of the fixed-effect term for the talker familiarity-by-distractor interaction did not significantly improve the model fit (χ²₁ = 0.61, p = .434, BF < 0.001). Thus, the effect of talker familiarity (higher distraction in the familiar- than in the unfamiliar-talker blocks) was not modulated by the presentation of a task-irrelevant sentence within the memory retention phase. This finding suggests that the mere expectation of a (moderately) familiar talker in the memory retention phase suffices to disrupt listeners’ working memory of target speech.

Familiarity effects are talker-specific

By design, it was possible that any familiarity effect would only be driven by one of our two task-irrelevant talkers. To investigate this possibility, we carried out a control analysis in which we added the factor identity of familiar talker (J.K. vs. M.P.) and, critically, the interaction between identity of familiar talker and talker familiarity as a fixed-effect terms to the best-fitting model. The inclusion of identity of familiar talker did not significantly improve the model fit, relative to the best-fitting model (χ²₁ = 2.2, p = .138, BF = 0.32), but the inclusion of the interaction term between identity of familiar talker and talker familiarity did (χ²₂ = 11.76, p = .003, BF = 5.33). Post-hoc comparisons revealed that listeners who were familiar with talker M.P. showed a detrimental effect of talker familiarity on their serial recall of target digits (Z = – 3.67, p < .001, b = – 0.1057, BF = 13.61), but this was not the case for listeners who were familiar with talker J.K. (Z = 0.44, p = .663, b = 0.013, BF = 0.03).

Although these results suggest that memory disruption by a familiar talker might be talker-specific, they cannot explain our main finding of stronger memory disruption by moderate (but not high) familiarity with the task-irrelevant talker. First, the factor identity of familiar talker did not modulate the talker familiarity-by-listener group interaction: The inclusion of the three-way interaction between identity of familiar talker, talker familiarity, and listener group did not significantly improve the model fit, as compared to the simpler model (χ²₂ = 1.43, p = .49, BF = 0.04). Second, the interaction between talker familiarity and listener group remained a significant predictor for the number of errors per trial even after inclusion of the identity of familiar talker-by-talker familiarity interaction (Z = 2.07, p = .038), with higher memory disruption by the familiar talker in the student group (Z = – 4.20, p < .001, b = – 0.0926, BF = 27.61), but not in the group of family members and friends (Z = – 0.001, p = .999, b = – 4.49e–5, BF = 0.04). Third, and most importantly, the identity of the familiar talker was balanced across listeners; that is, similar numbers of listeners were familiar with either of the two talkers (see Table 1 for details). Our experimental design therefore inherently controlled for any talker-specific effects on the main effect of talker familiarity, as well as on the talker familiarity-by-listener group interaction.

No effect of listener group and block order

In total, students did not differ from family members and friends in their serial recall of digits. As compared to the simpler model, the inclusion of the factor listener group did not significantly improve the model fit (χ²₁ = 0.02, p = .888, BF = 1.11). Note, however, that the BF shows no evidence for either the absence or the presence of an effect of listener group. It is thus possible that the nonsignificant difference between the two listener groups was due to the rather small sample of family members and friends.

To investigate potential effects of the presentation order of familiar and unfamiliar talkers (Fig. 1A, “Test”), we carried out a second control analysis in which we added the factor block order (familiar first vs. unfamiliar first) as a fixed-effect term to the best-fitting model. As compared to the respective simpler model, neither the main effect of block order (χ²₁ = 0.04, p = .843, BF = 0.06) nor the interaction term between the factors block order and talker familiarity (χ²₄ = 5.69, p = .223, BF = 0.05) significantly improved the model fit. Thus, neither the overall performance nor the effect of talker familiarity depended on whether listeners started the experiment with a familiar- or an unfamiliar-talker block.

Discussion

In the present study, we used a variant of the irrelevant-speech task to investigate the effect of familiarity with a distracting, task-irrelevant talker on the serial recall of target speech. The main finding was that listeners made more recall errors in blocks of trials with a familiar compared to an unfamiliar distracting talker. Critically, this effect depends on the degree to which listeners are familiar with this task-irrelevant voice: Only listeners with moderate (i.e., students) but not high (i.e., family and friends) familiarity showed stronger working-memory disruption by talker familiarity. Interestingly, however, the effect of talker familiarity does not depend on the presence of task-irrelevant speech during memory retention: students experienced stronger working-memory disruption irrespective of whether they heard a task-irrelevant sentence produced by the familiar talker (in most of the trials) or merely expected it (in a small subset of trials).

Familiarity with a distracting talker improves comprehension but disrupts working memory of target speech

Two previous studies had investigated the distraction by familiar talkers’ speech. One study (Newman & Evers, 2007) showed no benefit from moderate familiarity with a distracting background talker (i.e., the listeners’ university professor) when listeners had to shadow target speech. Another study (Johnsrude et al., 2013), however, showed that listeners’ comprehension of target speech does benefit from high familiarity with a distracting talker (i.e., the listener’s spouse)—a finding that has recently been extended to familiarity with accented speech (Senior & Babel, 2018). Together, these studies suggest that familiarity with a distracting talker can be beneficial, but that a high degree of familiarity with the talker is necessary for this benefit to occur. The results of the present study, by contrast, suggest that familiarity with a distracting talker is not beneficial but rather harmful, and that moderate instead of high talker familiarity is necessary for this familiarity effect to occur.

Notably, the present study used an irrelevant-speech task to investigate the distraction by familiar and unfamiliar task-irrelevant talkers. To succeed in the irrelevant-speech task, selective attention to items in working memory is needed (for a review on the interaction of attention and working memory, see Awh, Vogel, & Oh, 2006). The previous studies, by contrast, had used concurrent-speech tasks. For example, Johnsrude et al. (2013) presented listeners with two concurrent sentences (from the coordinate response measure corpus; Bolia, Nelson, Ericson, & Simpson, 2000) of the form “Ready [call sign], go to [color] [number] now” (e.g., “Ready Baron, go to green six now”) and asked them to report the color and number word from the target sentence. This task creates much lower working-memory load than the irrelevant-speech task used in the present study in which listeners had to keep the serial order of nine digits in memory while ignoring a task-irrelevant sentence. Our results therefore suggest that, although talker familiarity reduces the distraction from irrelevant concurrent speech, it increases the disruption of working memory for target speech.

Perceptual filtering versus attentional capture of familiar distractors

In the literature, two opposing effects of familiarity with a distracting stimulus have been described. One line of research suggests that advance knowledge about the distractor can enable listeners to form an efficient perceptual filter (e.g., Röer, Bell, & Buchner, 2015). To suppress distraction by familiar stimuli, the filter can be applied even before the distractor appears (Noonan et al., 2016; Ruff & Driver, 2006). However, such proactive suppression of the distractor has been found to produce behavioral costs when the distractor is expected but not presented (Marini et al., 2013). Our results clearly speak against proactive filtering in the case of familiar voices because we observed stronger working-memory disruption by a (moderately) familiar versus unfamiliar distracting talker and no modulation of this effect by whether or not a distracting sentence was presented.

Another line of research suggests that familiarity with a distracting stimulus is not beneficial, but rather costly for working memory (e.g., Ellermeier et al., 2015; Röer et al., 2013). This is because familiar distractors can automatically capture attention resources and draw them away from items in working memory (Cowan, 1998). Although the attentional capture theory predicts stronger working memory disruption by familiar than by unfamiliar talkers, it cannot explain why we only observed a familiarity effect in the student group. If anything, the familiar talker should have been more salient for family members and friends, who should have been therefore more susceptible to attentional capture by the familiar talker’s task-irrelevant speech (but see Gaspelin & Luck, 2018, for a recent review on the suppression of salient stimuli). Furthermore, the attentional capture theory is difficult to reconcile with our finding that the mere expectation of a moderately familiar distractor enhances memory disruption.

Uncertainty about vocal identity causes working-memory disruption

Our results, rather, speak to more general differences in how familiar and unfamiliar talkers are processed (for a recent review, see Maguinness, Roswandowitz, & von Kriegstein, 2018). We argue that the disparity of findings both within and across studies can be explained by a model that takes these differences into account, in particular with regard to how familiarity shapes the representation of vocal identity.

Figure 4 summarizes and illustrates these differences in the representation of familiar and unfamiliar voices and how these differences may relate to working-memory disruption and distraction by concurrent speech. Consistent with recent advances in voice-identity research (Lavan, Burton, Scott, & McGettigan, 2019), we argue that a high degree of talker familiarity is needed in order to arrive at a stable representation of vocal identity. Moderate talker familiarity, however, creates uncertainty about vocal identity that, in turn, causes disruption of working memory and distraction by concurrent speech.

Both the present and previous findings (Johnsrude et al., 2013; Newman & Evers, 2007) have shown that the effect of talker familiarity is not based on a simple dichotomy of familiar versus unfamiliar voices, but that it is rather the degree of familiarity that determines the effect of talker familiarity. We argue that students experienced stronger working-memory disruption by the familiar talker not only because of their limited amount of exposure to the talker’s voice but, critically, because that voice exposure was limited to a very specific context (i.e., classroom teaching). Family members and friends, by contrast, have heard the familiar talker’s voice in a much wider range of contexts. Recent work (Lavan, Burston, & Garrido, 2018; Lavan et al., 2019) suggests that it is this variance in previous voice encounters that enables listeners to form a stable percept of person identity.

Of note, talker familiarity was not made explicit in the present study. That is, listeners were not told that they would hear a familiar voice. It is reasonable to assume that family members and friends nevertheless recognized the familiar talker, whereas the students remained uncertain about the familiar talker’s identity. Resolving uncertainty about a distractor has been shown to be cognitively effortful (e.g., Geyer, Müller, & Krummenacher, 2006; Kerzel & Barras, 2016). In the case of unfamiliar distracting talkers, uncertainty was likely minimal, since listeners had little to no expectation about their vocal identity. The difference in vocal uncertainty can thus explain why students experienced stronger working-memory disruption by a moderately familiar than by an unfamiliar distracting talker. Critically, this explanation also holds for our finding that the mere expectation of a moderately familiar distractor can cause working-memory disruption.

In the case of concurrent speech, listeners are similarly distracted by an unfamiliar and by a moderately familiar talker (Newman & Evers, 2007). Possibly, this is because listeners need more, and more variable, experience with a talker’s voice to arrive at a representation of vocal identity that is sufficiently reliable to alleviate distraction by irrelevant concurrent speech. This explanation is consistent with a recent extension (Maguinness et al., 2018) of the prototype model of voice-identity processing (Lavner, Rosenhouse, & Gath, 2001). While listeners can recognize familiar talkers on the basis of stored reference patterns of their vocal identities, such reference patterns have yet to be established for unfamiliar talkers and may not suffice for robust identity recognition of moderately familiar talkers. Hence, for both unfamiliar and moderately familiar talkers, additional voice exposure is required.

Several studies have shown a link between voice-identity recognition and speech comprehension (Levi et al., 2011; Magnuson, Yamada, & Nusbaum, 1995; Nygaard & Pisoni, 1998; Nygaard, Sommers, & Pisoni, 1994; but see Holmes et al., 2018): Listeners are better at comprehending target speech when they have previously learned to identify the target talker by voice. It is likely that voice-identity information also helps listeners attenuate distraction by irrelevant concurrent speech. The findings by Newman and Evers (2007), however, suggest that it is not sufficient to make the identity of the distracting talker explicit. Rather, it seems that extensive experience with a talker’s voice is needed to overcome immature representation of vocal identity and to benefit from familiarity with a distracting talker.

Conclusions

Here we demonstrated that moderate, but not high, familiarity with a distracting talker disrupts working memory of target speech. We propose a model that can explain both the present and previous findings on the distraction by talker familiarity, by taking into account how voice familiarity shapes the representation of vocal identity.

References

Awh, E., Vogel, E. K., & Oh, S.-H. (2006). Interactions between attention and working memory. Neuroscience, 139, 201–208. https://doi.org/10.1016/j.neuroscience.2005.08.023
Article PubMed Google Scholar
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01
Article Google Scholar
Bolia, R. S., Nelson, W. T., Ericson, M. A., & Simpson, B. D. (2000). A speech corpus for multitalker communications research. Journal of the Acoustical Society of America, 107, 1065–1066.
Article PubMed Google Scholar
Bressler, S., Masud, S., Bharadwaj, H., & Shinn-Cunningham, B. (2014). Bottom-up influences of voice continuity in focusing selective auditory attention. Psychological Research, 78, 349–360.
Article PubMed PubMed Central Google Scholar
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5. https://doi.org/10.1177/1745691610393980
Article Google Scholar
Bürkner, P. C. (2016). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01
Article Google Scholar
Colle, H. A., & Welsh, A. (1976). Acoustic masking in primary memory. Journal of Verbal Learning and Verbal Behavior, 15, 17–31.
Article Google Scholar
Cowan, N. (1998). Attention and memory: An integrated framework. New York, NY: Oxford University Press.
Book Google Scholar
Diehl, R. L., Lotto, A. J., & Holt, L. L. (2004). Speech perception. Annual Review of Psychology, 55, 149–179. https://doi.org/10.1146/annurev.psych.55.090902.142028
Article PubMed Google Scholar
Ellermeier, W., Kattner, F., Ueda, K., Doumoto, K., & Nakajima, Y. (2015). Memory disruption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands. Journal of the Acoustical Society of America, 138, 1561–1569.
Article PubMed Google Scholar
Erb, J., Henry, M. J., Eisner, F., & Obleser, J. (2012). Auditory skills and brain morphology predict individual differences in adaptation to degraded speech. Neuropsychologia, 50, 2154–2164.
Article PubMed Google Scholar
Fritz, J. B., Elhilali, M., David, S. V., & Shamma, S. A. (2007). Auditory attention—Focusing the searchlight on sound. Current Opinion in Neurobiology, 17, 437–455.
Article PubMed Google Scholar
Gaspelin, N., & Luck, S. J. (2018). Inhibition as a potential resolution to the attentional capture debate. Current Opinion in Psychology, 29, 12–18. https://doi.org/10.1016/j.copsyc.2018.10.013
Article PubMed Google Scholar
Geyer, T., Müller, H. J., & Krummenacher, J. (2006). Cross-trial priming in visual search for singleton conjunction targets: Role of repeated target and distractor features. Perception & Psychophysics, 68, 736–749. https://doi.org/10.3758/BF03193697
Article Google Scholar
Holmes, E., Domingo, Y., & Johnsrude, I. S. (2018). Familiar voices are more intelligible, even if they are not recognized as familiar. Psychological Science, 29 1575–1583.
Article PubMed Google Scholar
Jeffreys, H. (1961). The theory of probability (3rd ed.). Oxford, UK: Oxford University Press.
Google Scholar
Johnsrude, I. S., Mackey, A., Hakyemez, H., Alexander, E., Trang, H. P., & Carlyon, R. P. (2013). Swinging at a cocktail party: Voice familiarity aids speech perception in the presence of a competing voice. Psychological Science, 24, 1995–2004.
Article PubMed Google Scholar
Jones, D., & Morris, N. (1992). Irrelevant speech and serial recall: Implications for theories of attention and working memory. Scandinavian Journal of Psychology, 33, 212–229.
Article PubMed Google Scholar
Jones, D. M., & Macken, W. J. (1993). Irrelevant tones produce an irrelevant speech effect: Implications for phonological coding in working memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 369–381. https://doi.org/10.1037/0278-7393.19.2.369
Article Google Scholar
Kalikow, D. N., Stevens, K. N., & Elliott, L. L. (1977). Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. Journal of the Acoustical Society of America, 61, 1337– 1351.
Article PubMed Google Scholar
Kerzel, D., & Barras, C. (2016). Distractor rejection in visual search breaks down with more than a single distractor feature. Journal of Experimental Psychology: Human Perception and Performance, 42, 648–657. https://doi.org/10.1037/xhp0000180
Article PubMed Google Scholar
Kreitewolf, J., Mathias, S. R., Trapeau, R., Obleser, J., & Schönwiesner, M. (2018). Perceptual grouping in the cocktail party: Contributions of voice-feature continuity. Journal of the Acoustical Society of America, 144, 2178–2188.
Article PubMed Google Scholar
Kreitewolf, J., Mathias, S. R., & von Kriegstein, K. (2017). Implicit talker training improves comprehension of auditory speech in noise. Frontiers in Psychology, 8, 1584.
Article PubMed PubMed Central Google Scholar
Lavan, N., Burston, L. F. K., & Garrido, L. (2018). How many voices did you hear? Natural variability disrupts identity perception from unfamiliar voices. British Journal of Psychology, 1–18. https://doi.org/10.1111/bjop.12348
Lavan, N., Burton, A. M., Scott, S. K., & McGettigan, C. (2019). Flexible voices: Identity perception from variable vocal signals. Psychonomic Bulletin & Review, 26, 90–102. https://doi.org/10.3758/s13423-018-1497-7
Article Google Scholar
Lavner, Y., Rosenhouse, J., & Gath, I. (2001). The prototype model in speaker identification by human listeners. International Journal of Speech Technology, 4, 63–74.
Article Google Scholar
Lenth, R. V. (2016). Least-squares means: The R package lsmeans. Journal of Statistical Software, 69, 1–33.
Article Google Scholar
Levi, S. V., Winters, S. J., & Pisoni, D. B. (2011). Effects of cross-language voice training on speech perception: Whose familiar voices are more intelligible? Journal of the Acoustical Society of America, 130, 4053–4062.
Article PubMed Google Scholar
Magnuson, J. S., Yamada, R. A., & Nusbaum, H. C. (1995). The effects of familiarity with a voice on speech perception. In Proceedings of the 1995 Spring Meeting of the Acoustical Society of Japan (pp. 391–392). Tokyo, Japan: Acoustical Society of Japan.
Google Scholar
Maguinness, C., Roswandowitz, C., & von Kriegstein, K. (2018). Understanding the mechanisms of familiar voice-identity recognition in the human brain. Neuropsychologia, 116, 179–193.
Article PubMed Google Scholar
Marini, F., Chelazzi, L., & Maravita, A. (2013). The costly filtering of potential distraction: Evidence for a supramodal mechanism. Journal of Experimental Psychology: General, 142, 906–922. https://doi.org/10.1037/a0029905
Article Google Scholar
Mathias, S. R., & von Kriegstein, K. (2014). How do we recognise who is speaking? Frontiers in Bioscience (Scholar Edition), 6, 92–109.
Article Google Scholar
McPherson, M. J., & McDermott, J. H. (2018). Diversity in pitch perception revealed by task dependence. Nature Human Behaviour, 2, 52–66. https://doi.org/10.1038/s41562-017-0261-8
Article PubMed Google Scholar
Newman, R. S., & Evers, S. (2007). The effect of talker familiarity on stream segregation. Journal of Phonetics, 35, 85–103.
Article Google Scholar
Noonan, M. P., Adamian, N., Pike, A., Printzlau, F., Crittenden, B. M., & Stokes, M. G. (2016). Distinct mechanisms for distractor suppression and target facilitation. Journal of Neuroscience, 36, 1797–1807.
Article PubMed Google Scholar
Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60, 355–376. https://doi.org/10.3758/BF03206860
Article Google Scholar
Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech perception as a talker-contingent process. Psychological Science, 5, 42–46. https://doi.org/10.1111/j.1467-9280.1994.tb00612.x
Article PubMed PubMed Central Google Scholar
Obleser, J., Wöstmann, M., Hellbernd, N., Wilsch, A., & Maess, B. (2012). Adverse listening conditions and memory load drive a common alpha oscillatory network. Journal of Neuroscience, 32, 12376–12383.
Article PubMed Google Scholar
R Core Team. (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Röer, J. P., Bell, R., & Buchner, A. (2013). Self-relevance increases the irrelevant sound effect: Attentional disruption by one’s own name. Journal of Cognitive Psychology, 25, 925–931.
Article Google Scholar
Röer, J. P., Bell, R., & Buchner, A. (2015). Specific foreknowledge reduces auditory distraction by irrelevant speech. Journal of Experimental Psychology: Human Perception and Performance, 41, 692–702. https://doi.org/10.1037/xhp0000028
Article PubMed Google Scholar
Ruff, C. C., & Driver, J. (2006). Attentional preparation for a lateralized visual distractor: Behavioral and fMRI evidence. Journal of Cognitive Neuroscience, 18, 522–538.
Article PubMed Google Scholar
Salamé, P., & Baddeley, A. (1982). Disruption of short-term memory by unattended speech: Implications for the structure of working memory. Journal of Verbal Learning and Verbal Behavior, 21, 150–164. https://doi.org/10.1016/S0022-5371(82)90521-7
Article Google Scholar
Saunders, D. R., Bex, P. J., & Woods, R. L. (2013). Crowdsourcing a normative natural language dataset: A comparison of Amazon Mechanical Turk and in-lab data collection. Journal of Medical Internet Research, 15, e100. https://doi.org/10.2196/jmir.2620
Article PubMed PubMed Central Google Scholar
Schlittmeier, S. J., Weisz, N., & Bertrand, O. (2011). What characterizes changing-state speech in affecting short-term memory? An EEG study on the irrelevant sound effect. Psychophysiology, 48, 1669–1680.
Article PubMed Google Scholar
Senior, B., & Babel, M. (2018). The role of unfamiliar accents in competing speech. Journal of the Acoustical Society of America, 143, 931–942.
Article PubMed Google Scholar
Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in Cognitive Sciences, 12, 182–186.
Article PubMed PubMed Central Google Scholar
Souza, P., Gehani, N., Wright, R., & McCloy, D. (2013). The advantage of knowing the talker. Journal of the American Academy of Audiology, 24, 689–700.
Article PubMed PubMed Central Google Scholar
Wöstmann, M., Lim, S. J., & Obleser, J. (2017). The human neural alpha response to speech is a proxy of attentional control. Cerebral Cortex, 27, 3307–3317.
Article PubMed Google Scholar
Wöstmann, M., & Obleser, J. (2016). Acoustic detail but not predictability of task-irrelevant speech disrupts working memory. Frontiers in Human Neuroscience, 10, 538. https://doi.org/10.3389/fnhum.2016.00538
Article PubMed PubMed Central Google Scholar

Download references

Author note

Research was funded by the University of Lübeck. Photographs appear by courtesy of Leo Waschke. We thank all of the students, family members, and friends who participated in this experiment, as well as two anonymous reviewers for their valuable comments on an earlier version of the manuscript.

Author information

Authors and Affiliations

Department of Psychology, University of Lübeck, Lübeck, Germany
Jens Kreitewolf, Malte Wöstmann, Sarah Tune, Michael Plöchl & Jonas Obleser

Authors

Jens Kreitewolf
View author publications
You can also search for this author in PubMed Google Scholar
Malte Wöstmann
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Tune
View author publications
You can also search for this author in PubMed Google Scholar
Michael Plöchl
View author publications
You can also search for this author in PubMed Google Scholar
Jonas Obleser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jens Kreitewolf or Jonas Obleser.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kreitewolf, J., Wöstmann, M., Tune, S. et al. Working-memory disruption by task-irrelevant talkers depends on degree of talker familiarity. Atten Percept Psychophys 81, 1108–1118 (2019). https://doi.org/10.3758/s13414-019-01727-2

Download citation

Published: 16 April 2019
Issue Date: 31 May 2019
DOI: https://doi.org/10.3758/s13414-019-01727-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Working-memory disruption by task-irrelevant talkers depends on degree of talker familiarity

Abstract

Similar content being viewed by others

Effects of talker continuity and speech rate on auditory working memory

Coping with adversity: Individual differences in the perception of noisy and accented speech

Talker-familiarity benefit in non-native recognition memory and word identification: The role of listening conditions and proficiency