Walker and Hickok (2015; henceforth, WH) have presented results from a simulation of speech production implementing aspects of Hickok’s (2012) hierarchical state feedback control theory. They contrasted this proposal to Foygel and Dell’s (2000) two-step interactive account (TSIA; see also Dell, Lawler, Harris, & Gordon, 2004; Schwartz, Dell, Martin, Gahl, & Sobel, 2006). These are depicted on the right and left sides of Fig. 1 (respectively). Both accounts assume that speech production involves interaction between semantic, lexical, and phonological representations. WH’s proposal also includes a second set of phonological representations, corresponding to auditory information, that interact with both lexical and (motor) phonological representations (leading to the moniker SLAM: the semantic–lexical–auditory–motor model). WH examined the relative abilities of simulations of both models to account for the overall response patterns of a set of individuals with aphasia. Neurological impairment was modeled by reducing the amount of activation flowing between levels of representation; this increased the relative influence of random noise, leading to errors.

Fig. 1
figure 1

Theoretical proposals contrasted in this commentary. Arrows indicate the directions of activation flow between representations connected by the lines. Dotted lines indicate weaker activation flow between representations than that marked by solid lines. Selection points are indicated by a double outline around a level of representation

WH reported two major findings: Their simulation of SLAM exhibited a degree of fit to overall response distributions similar to the fit of a simulation of the TSIA, and the simulation of SLAM exhibited a relatively better fit for individuals that were assigned the clinical label of conduction aphasia than did the TSIA simulations. This commentary reexamines these claims in light of previous work that has established empirical issues with TSIA and methodological issues with the approach of Foygel and Dell (2000). A comparison of SLAM with existing theoretical proposals arising out of this research reveals clear shortcomings of this new proposal.

Empirical challenges to TSIA’s account of sound structure processing

The lexical + postlexical account

An overall performance pattern that is difficult to account for under TSIA is the production of only phonologically related errors (i.e., form-related errors such as cathat, as well as neologisms such as catzat; Caramazza, Papagno, & Ruml, 2000; Schwartz et al., 2006). In TSIA, phonologically related errors (in particular, neologisms or nonword errors) are most likely to arise during phonological processing. However, because cascading activation serves to activate semantically related words at the phonological level, impairments to this level of processing are likely to result in the production of semantic as well as phonological errors (Rapp & Goldrick, 2000). TSIA thus predicts that individuals should never produce a pattern of only phonologically related errors.

A number of studies have documented individuals that violate this prediction (Galluzzi, Bureca, Guariglia, & Romani, 2015; Goldrick & Rapp, 2007; Romani & Galluzzi, 2005; Romani, Galluzzi, Bureca, & Olson, 2011; Romani, Olson, Semenza, & Granà, 2002). Furthermore, the errors of individuals exhibiting this pattern are strongly influenced by the acoustic/articulatory complexity of phonological structures (e.g., exhibiting errors on less-frequent sequences of consonants), but relatively uninfluenced by lexical properties (e.g., word frequency). This contrasts with other individuals that produce phonological errors yet show a complementary pattern: sensitivity to lexical factors (e.g., lower accuracy on low-frequency words) and an insensitivity to the complexity of phonological structures.

These results can be accounted for by a theory that distinguishes multiple levels of sound structure processing in production. As is shown in the middle panel of Fig. 1, this account parallels the TSIA, in that lexical selection is followed by a stage of processing during which relatively abstract specifications of phonological structure are retrieved (lexical phonological processing). A second stage of (postlexical) phonological processing then retrieves and selects more detailed aspects of sound structure (e.g., featural representations; this leads to the moniker lexical + postlexical [LPL] account). Note that this is a distinct stage of production processing, in that it follows the explicit selection of an abstract phonological representation. In general, such selection mechanisms serve to reduce interactions across processing levels, increasing the degree to which distinct subprocesses can exhibit distinct patterns of impairment (Rapp & Goldrick, 2000).

Whereas lexical phonological processing begins with the selection of a lexical representation (and the coactivation of semantically related words), postlexical processing is initiated by the selection of a phonological representation—resulting in the coactivation of multiple phonological structures (e.g., for the target cat, syllables corresponding to words such as hat, as well as nonword syllables such as zat). Disruption to postlexical processing therefore results in the production of phonologically related words as well as nonwords, accounting for the overall performance pattern discussed above. The presence of distinct representational types at each discretely separated stage of processing also accounts for more detailed aspects of their performance. Individuals with deficits arising in lexical phonological processing will be strongly influenced by lexical factors (reflecting the input to lexical processing), but not by phonological complexity (reflecting the abstract structure of lexical phonological representations). Individuals with deficits to a postlexical stage, governed by relationships among fully specified phonological structures, will not be influenced by lexical factors but will show strong effects of phonological complexity.

Finally, because postlexical processing occurs after the retrieval of abstract structures from long-term memory, it is assumed to be engaged by all spoken production tasks. Consistent with this assumption, individuals who produce only phonologically related word and nonword errors in picture naming produce similar patterns in performance of repetition and reading aloud (Goldrick & Rapp, 2007; Romani et al. 2002).

Assessing SLAM relative to LPL

One of WH’s major findings is that SLAM simulations show a better fit than TSIA to the performance of individuals with conduction aphasia. This clinical label is applied to individuals who typically (but not always) produce phonological errors in both repetition and picture naming in the context of intact articulatory and auditory comprehension processes—similar to the postlexical pattern reviewed above. In fact, inspection of individual conduction aphasia cases reveals that this improvement in fit largely reflects SLAM’s relative success in accounting for individuals who produce primarily phonologically related errors.

This was assessed by using WH’s online fitting algorithm (http://cogsci.uci.edu/~alns/webfit.html) to fit SLAM and TSIA simulations (based on 2,321 map points) to the performance of 50 individuals with conduction aphasia from version 2.0 of the Moss Aphasia Psycholinguistic Project Database (Mirman et al., 2010).Footnote 1 As is shown in Table 1, the ten individuals with the greatest improvement in fit show a performance pattern similar to the postlexical pattern identified above. The vast majority of these individuals’ errors are phonologically (formally) related words or nonwords (a response category likely to include phonologically related forms). In fact, across the set of 50 individuals with conduction aphasia, the relative proportions of errors that fall into these two categories are significantly correlated with the amount of SLAM’s improvement in RMSD relative to TSIA [r(48) = .61, p < .0001]. This suggests that SLAM is outperforming TSIA because it better matches deficits that result in the production of predominantly form-related word and nonword errors.

Table 1 Response proportions for each of the ten conduction aphasia cases showing the greatest improvements in fit for the semantic–lexical–auditory–motor theory relative to the two-step interactive account

The LPL account can also account for the overall response distribution of such individuals, by assuming deficits to both lexical processes (resulting in semantically related errors) and postlexical processes (which increases the rate specifically of phonologically related errors). Given that both of these accounts clearly outperform TSIA for this general pattern, which provides a more comprehensive account of the overall set of existing data? To examine this, the fit of SLAM to a prototypical case of only phonological errors in production (BON; Goldrick & Rapp, 2007) was examined. As is shown in Table 2, SLAM has great difficulty fitting this error pattern; it predicts that semantic as well as form-related errors should be produced. Interestingly, SLAM attempts to fit this by approximating the connectivity of LPL. The lexical–motor phonological connections are set to a negligible value (0.0051), whereas all other connections are set to a high value (0.035). However, merely approximating this connectivity pattern is insufficient; fully implementing the LPL account would require also adding in an explicit selection process during the first stage of phonological processing (see Goldrick & Rapp, 2002, for an analysis of the consequences of weakening or eliminating selection within these spreading-activation theories).

Table 2 Observed versus predicted response distributions for BON (Goldrick & Rapp, 2007), an individual producing only phonologically related errors in production

In addition to the challenges in matching the overall error distributions of these cases, SLAM offers no account of the differential effects of phonological complexity versus lexical variables on different deficits, and offers no general account of how multiple stages of phonological processing might be incorporated in production (for additional discussion of the issues in the context of the hierarchical state feedback control theory more generally, see Rapp, Buchwald, & Goldrick, 2014; Roelofs, 2014). Thus, the LPL account provides a clearly superior account of the types of cases on which SLAM outperforms TSIA.

Methodological challenges to simulation studies

The other major result of WH is that SLAM exhibited a degree of fit to overall response distributions similar to that of a simulation of the TSIA. This follows previous studies of TSIA, which have assumed that the degree to which simulations fit the overall response distribution of each participant (e.g., proportions of correct responses, semantic errors, phonologically related errors, etc.) provides a general means of distinguishing between the theories corresponding to each simulation. Although this may be true of some theoretical accounts (e.g., global vs. local disruptions to the production system; Foygel & Dell, 2000), in many cases it fails.

Goldrick (2011) demonstrated this by examining the ability of TSIA simulations to fit simulated data sets. Artificial case series were generated using simulations of (a) Foygel and Dell’s (2000) TSIA, (b) a theory in which speech errors arise prior to the two steps of lexical access assumed in TSIA, and (c) Rapp and Goldrick’s (2000) restricted interaction account, which differs from TSIA in the strength and nature of feedback. When the parameter-fitting procedure of Dell et al. (2004) was then used to fit the TSIA to each of these artificial case series, and the degrees of fit were equivalent for all three artificial case series. Thus, with respect to overall response distributions, TSIA was able to fit data generated by a TSIA simulation just as well as data generated by simulations of distinct theoretical accounts. This suggests that overall response distributions frequently fail to discriminate what type of theory generated a given set of data. In light of these results, the fact that SLAM and TSIA exhibited equivalent fits to overall response distributions is unsurprising; in many cases, this measure will fail to discriminate alternative theories. Focusing on specific aspects of performance, motivated by theoretical contrasts, is a more effective means of distinguishing accounts than measures of overall response distributions (Goldrick, 2011; Rapp & Goldrick, 2000).

Issues outside of sound structure processing for TSIA and SLAM

Schwartz et al. (2006) have noted another overall performance pattern that is difficult for TSIA to account for: modality-specific impairments to speech production that result only in the production of semantic errors (see also Cuetos, Aguado, & Caramazza, 2000, for a discussion). Several studies have documented this pattern of performance (Basso, Taborelli, & Vignolo, 1978; Caramazza & Hillis, 1990; Cuetos et al., 2000; Miceli, Benvegnú, Capasso, & Caramazza, 1997; Nickels, 1992; see also Rapp & Goldrick, 2000). Rapp and Goldrick (2000) presented simulation results showing that this pattern is difficult for TSIA to account for because it incorporates strong feedback from phonological to lexical representations. Such strong feedback is also inconsistent with chronometric and speech error data from unimpaired speakers (see Goldrick, 2006, for a review). Because SLAM adopts similar assumptions regarding feedback, it is likely that it suffers from these same issues. Rapp and Goldrick’s restricted interaction account provides an alternative that successfully addresses these challenges.

Conclusions

WH, following Hickok (2012), motivated the SLAM model by attempting to integrate psycholinguistic and speech motor control approaches to speech production. Although such cross-disciplinary conceptual integration is a laudable goal, it requires a full integration with the rich set of data and theory from psycholinguistic approaches to speech production. SLAM fails to achieve this. To the extent that SLAM outperforms the TSIA, it does so by poorly approximating the LPL account; SLAM is less successful than this existing theory in accounting for the full range of behavioral data. SLAM also fails to address methodological issues from existing work with the TSIA model, and fails to address issues in semantic and lexical processing that are problematic for TSIA. These issues suggest that a true integration of psycholinguistic and speech motor control theories will require a different approach.