Grammatical Encoding for Speech Production

Linda Ruth Wheeldon; Agnieszka Konopka

doi:10.1017/9781009264518

1 Introduction

Psycholinguistic and linguistic theory agree that sentence production is a generative process involving a separate lexicon and grammar (e.g., Reference ChomskyChomsky, 1965; Reference LeveltLevelt, 1989). Speakers of a language can retrieve words from their mental lexicon and order them in accordance with their grammar to generate a theoretically infinite number of sentences. This potential for unbounded creativity is at variance with the evidence, to be reviewed in what follows, that spoken language tends toward repetition. Nevertheless, some degree of separation between lexical and syntactic representations and processes is a cornerstone of all current models of grammatical encoding (e.g., Reference Chang, Dell and BockChang, Dell & Bock, 2006; Reference Dell, Oppenheim and KittredgeDell, Oppenheim & Kittredge, 2008; Reference Levelt, Roelofs and MeyerLevelt, Roelofs & Meyer, 1999). Theoretical approaches to the processes of lexical retrieval and syntactic structure building in fluent sentence production are discussed in Section 1. The theoretical framing will focus on the key dichotomy in the field: whether grammatical encoding is driven by lexical (e.g., Reference Bock, Levelt and GernsbacherBock & Levelt, 1994) or syntactic representations (e.g., Reference Chang, Dell and BockChang et al., 2006; Reference Dell, Oppenheim and KittredgeDell et al., 2008). We will begin with a theoretical overview, which will incorporate a brief discussion of theories of lexical representation and access (e.g., Reference Wheeldon, Konopka, Rueschemeyer and GaskellWheeldon & Konopka, 2018), before turning to how retrieved lexical items are integrated into the unfolding syntax of an utterance.

We then evaluate the evidence for the independence of syntax from lexical representations and the nature of the structural representations generated during grammatical encoding (Section 2). The critical evidence in this area has been largely derived from studies of structural priming. In the early days of this research, the presence of lexically unsupported syntactic priming was taken as evidence of abstract structural processing in sentence production (e.g., Reference BockBock, 1986). Further research demonstrated limited involvement of the lexicon in the generation of syntactic structures. Existing rich evidence from within-language and between-language comparisons largely supports the view of the independence of syntax and the lexicon in adult speakers (Reference Branigan and PickeringBranigan & Pickering, 2017; Reference Chang, Dell and BockChang et al., 2006; Reference Mahowald, James, Futrell and GibsonMahowald, James, Futrell & Gibson, 2016; Reference Pickering and FerreiraPickering & Ferreira, 2008), but with outstanding questions remaining in developmental psycholinguistics (e.g., Reference Messenger, Branigan and McLeanMessenger, Branigan & McLean, 2011; Reference Rowland, Chang, Ambridge, Pine and LievenRowland, Chang, Ambridge, Pine & Lieven, 2012). Priming research has also helped to delimit the nature of the syntactic representations generated during sentence production (e.g., Reference Bernolet, Hartsuiker and PickeringBernolet, Hartsuiker & Pickering, 2007; Reference Branigan, Pickering, McLean and StewartBranigan, Pickering, McLean & Stewart, 2006; Reference FerreiraFerreira, 2003; Reference Fox Tree and MeijerFox Tree & Meijer, 1999; Reference Hardy, Wheeldon and SegaertHardy, Wheeldon & Segaert, 2020; Reference SkalickyZiegler, Snedeker & Wittenburg, 2017).

In the next section we switch focus to the time-course of grammatical encoding (Section 3). Here, the theoretical debate turns on whether online sentence planning occurs in a lexically incremental fashion (Reference Bock, Levelt and GernsbacherBock & Levelt, 1994; Reference GriffinGriffin, 2001; Reference Meyer, Sleiderink and LeveltMeyer, Sleiderink & Levelt, 1998; also see Reference Meyer, Wheeldon, Van der Meulen and KonopkaMeyer, Wheeldon, Van der Meulen & Konopka, 2012) or in a structurally driven, hierarchical fashion (Reference Konopka and MeyerKonopka & Meyer, 2014; Reference Lee, Brown-Schmidt and WatsonLee, Brown-Schmidt & Watson, 2013; Reference Martin, Miller and VuMartin, Miller & Vu, 2004; Reference MommaMomma, 2021; Reference Smith and WheeldonSmith & Wheeldon, 1999; Reference Wheeldon, Fuchs, Weirich, Pape and PerrierWheeldon, 2013; Reference Wheeldon and KonopkaWheeldon, Smith & Apperly 2011). The critical evidence for this debate comes from studies of planning scope in picture description paradigms to determine the degree of planning occurring in advance of articulation onset. These paradigms frequently make use of eye tracking, allowing the time-course of planning from the initial uptake of visual information to the onset of speech to be determined (e.g., Reference KonopkaKonopka, 2019). More recently, cross-linguistic studies have investigated the role of language-specific grammatical constraints on planning (Reference Allum and WheeldonAllum & Wheeldon, 2007, Reference Allum and Wheeldon2009; Reference Hwang and KaiserHwang & Kaiser, 2014a; Reference Momma, Slevc and PhillipsMomma, Slevc & Phillips, 2016; Reference Norcliffe, Konopka, Brown and LevinsonNorcliffe, Konopka, Brown & Levinson, 2015; Reference Sauppe, Norcliffe, Konopka, Van Valin and LevinsonSauppe, Norcliffe, Konopka, van Valin & Levinson, 2013).

The Element will also include relevant data from studies of bilingual sentence planning (e.g., Reference Konopka, Meyer and ForestKonopka, Meyer & Forest, 2018). This research speaks both to the representation of syntactic structure and to the issue of the effects of cognitive load on planning scope. We will review evidence that grammatical planning scope can be modulated by non-linguistic factors and cognitive limitations, including speed requirements (e.g., Reference Ferreira and SwetsFerreira & Swets, 2002), working memory (e.g., Reference Swets, Jacovina and GerrigSwets, Jacovina & Gerrig, 2014), and attention (e.g., Reference Jongman, Meyer and RoelofsJongman, Meyer & Roelofs, 2015; Reference Jongman, Roelofs and MeyerJongman, Roelofs & Meyer, 2015).

In the final section of the Element (Section 4), we will provide an evaluation of the strengths and weaknesses of the methodological approaches that have been used to date in the field. Finally, we will reassess the theoretical landscape, highlighting gaps and defining the resulting avenues for future research.

1.1 Grammatical Encoding in Speech Production

1.1.1 The Component Processes for Speaking

In this section, we review theories of grammatical encoding for speech production, focusing on the proposed relationship between words and syntax. We begin, however, with setting the process of grammatical encoding in context. All cognitive models of speech production are heavily influenced by Levelt’s classic blueprint for the speaker (Reference LeveltLevelt, 1989), which in turn built on the seminal work of Reference GarrettGarrett (1975). The proposal is that utterances are produced in a number of more-or-less successive processes, and there is also agreement on the broad structure of the processes involved (see Figure 1). The starting point is message generation, which involves the construction of a conceptual representation that details the information that the speaker wants to convey. This representation is usually known as the message (Reference LeveltLevelt, 1989). The current view is that messages are non-linear and must at least contain conceptual category information. Messages can be very short (e.g., mapping onto utterances like ‘Hi’ or ‘Look there!’) or much longer, including a thematic structure which assigns concepts to thematic roles such as agent or patient (e.g., mapping onto utterances like ‘The politician was amazed by the volume of fan mail’; see Reference Konopka, Brown-Schmidt, Goldrick, Ferreira and MiozzoKonopka & Brown-Schmidt, 2014, for a review). In addition, messages should contain information that is required to generate a grammatical sentence including time, mood and focus, as well as any language-specific information required by a language for obligatory syntactic or morphological markers (see Reference LeveltLevelt, 1989, chapter 3, for a detailed discussion).

Figure 1 A representation of the key processing stages of spoken sentence production.

The message triggers grammatical encoding processes, which include selecting the appropriate lexical items, assigning grammatical roles and generating a syntactic structure to fix their linear order. The phonological structure of the utterance is constructed in the subsequent phase, where an abstract prosodic representation is generated which forms the input to phonetic and articulatory processes. Grammatical encoding processes therefore form the link between the conceptual structure to be conveyed and the sound structure of the utterance that will convey it. The component processes are lexical retrieval and syntactic structure building.

1.1.2 Lexical Retrieval Processes

Lexical retrieval refers to the activation and retrieval of words from the mental lexicon. During production, activation at the conceptual level triggers a lexical search. Psycholinguistic models largely agree that lexical representations exist independently of semantics, at the lemma and lexeme levels (Reference Kempen and HuijbersKempen & Huijbers, 1983). Lemmas are abstract, modality-general and language-specific lexical entries that are activated by information at the conceptual level. In turn, lemma selection activates lexemes, that is, representations that include word-form information (see Reference Caramazza and MiozzoCaramazza & Miozzo, 1997, vs. Reference Roelofs, Meyer and LeveltRoelofs, Meyer & Levelt, 1998), and then phonological encoding processes. For example, a speaker wishing to convey information about one person (a woman) transferring something (a book) to another person (a man) will generate a message-level representation consisting of conceptual nodes that correspond to the nominal concepts woman, man and book, as well as the action of transferring X to Y, and this information may activate the lemma nodes for the nouns ‘woman’, ‘man’, ‘book’ and the verbs ‘give’ and ‘donate’. Lemmas include item-specific syntactic information, such as grammatical gender for nouns and restrictions on syntactic alternations for verbs (e.g., the verb ‘give’ can be used with both prepositional-object [PO] and double-object [DO] syntax, while the verb ‘donate’ can only be used with PO syntax).

The majority of models describing lexical access focus on retrieval of individual words – most often nouns (e.g., ‘woman’, ‘man’, ‘book’) – or production of short sequences of words in simple or complex noun phrases (NPs) (e.g., ‘the woman’, ‘the woman and the man’). The likelihood of selecting a lemma and the speed of selecting one noun lemma over another vary as a function of (a) word-specific variables (e.g., lexical frequency, age of acquisition, name agreement), (b) properties of the words’ lexical neighbours (e.g., neighbourhood density, recent activation of neighbouring lexical nodes, the degree to which relationships between words are taxonomic or thematic), and (c) the proposed architecture of the production system (e.g., the direction of information flow between the conceptual, lexical and phonological levels). Two classes of models, Levelt and colleagues’ serial model (Reference Levelt, Roelofs and MeyerLevelt et al., 1999; also see Reference RoelofsRoelofs, 1992) and Dell and colleagues’ interactive models of lexical access (Reference DellDell, 1986; Reference Dell, Schwartz, Martin, Saffran and GagnonDell, Schwartz, Martin, Saffran & Gagnon, 1997), have led the theorising in the field. In both models, the concepts or lexical nodes that are most strongly activated are selected for production, but the models differ in the degree to which they allow activation from lower levels to influence selection: serial models assume a feedforward flow of activation from concepts to lemmas and to phonological encoding, while interactive models allow for feedback from lower levels.

Lexical retrieval models also differ in their assumptions about the selection process at the lemma level, specifically the degree to which lemmas do or do not compete for selection (Reference Levelt, Roelofs and MeyerLevelt et al., 1999 vs. Reference Mahon, Costa, Peterson, Vargas and CaramazzaMahon, Costa, Peterson, Vargas & Caramazza, 2007; see Reference Abdel-Rahman and MelingerAbdel-Rahman & Melinger, 2009, for a review). The predictions of these models are often tested with the picture–word interference paradigm, where speakers name individual pictured objects while ignoring superimposed printed words. Retrieval times normally increase in the presence of semantic competitors, such as when trying to name the picture of a cat while seeing the printed word ‘dog’, and decrease in the presence of phonological neighbours, such as when trying to name the picture of a cat while seeing the printed word ‘cap’. Debates concerning the size and direction of these effects often hinge on determining the joint effects of multiple individual processes: conceptual priming (semantically related words prime each other), lexical interference (taxonomically related words compete against each other for selection), lexical facilitation (thematically related word prime each other) and phonological facilitation (phonologically related words prime each other). Production of a sequence of words, either in phrases (e.g., ‘the cat and the dog’) or without a phrasal context (‘cat dog’), naturally multiplies the number of processes to be completed and adds an additional parameter: retrieval of each word (word n) can be influenced by anticipatory activation of word n+1, and likewise, retrieval of word n+1 is influenced by production of word n. As in most picture-word interference paradigms, retrieval of word n is slower when word n+1 is a semantic competitor, but retrieval of word n+1 is also slower when word n is a semantic competitor (an effect known as cumulative semantic interference).

In a recent meta-analysis, Reference Bürki, Elbuy, Madec and VasishthBürki, Elbuy, Madec and Vasishth (2020) concluded that existing research does not adjudicate between models assuming competitive and non-competitive lexical access. Reference Oppenheim and NozariOppenheim and Nozari (2021) also showed that behavioural indexes such as the presence of semantic interference or facilitation cannot be used to conclusively distinguish between competitive and non-competitive lexical access, as competitive and non-competitive selection rules can produce similar behavioural outcomes. A more promising approach is to track context-specific changes in retrieval speed in order to model experience-driven changes in activation levels and connections between the conceptual level and word level (see Reference Dell, Jacobs, Hickok and SmallDell & Jacobs, 2016; Reference Dell, Nozari, Oppenheim, Goldrick, Ferreira and MiozzoDell, Nozari & Oppenheim, 2014; Reference Oppenheim, Dell and SchwartzOppenheim, Dell & Schwartz, 2010, and Reference Oppenheim and NozariOppenheim & Nozari, 2021, for more detail with supporting empirical evidence and simulations). For example, the degree to which both taxonomically and thematically related distractors interfere with production of a target word depends on the way these relationships are represented in the model, rather than depending on selection rules.

The models of lexical retrieval reviewed in the preceding text are concerned with the nature of lexical representations for content words (mostly nouns) and thus do not make explicit claims about processes responsible for integrating sequences of lexical items into longer utterances. In the rest of the Element, we focus primarily on a different long-standing debate in psycholinguistics – namely, the contribution of the lexicon to grammatical encoding (see Reference BockBock, 1982, Reference Bock and Ellis1987, for early reviews). This area of research focuses on production of longer, multi-word utterances with complex syntactic structures and, critically, utterances requiring retrieval of verbs.

1.1.3 The Need for Syntax

Producing grammatically correct multi-word utterances requires that words be produced in a specific order, that is, that they be sequenced according to language-specific word-order rules. This sequencing is referred to as linearisation. Interestingly, while it is clear that linguistic utterances are structured, the nature of the structural representations generated to output grammatically correct word sequences is debated. This puzzle concerns the degree to which the lexicon is involved in the generation of sentence structure.

Broadly speaking, the generation of sentence structure has been described, in different accounts, as a by-product of lexical retrieval processes or as the outcome of processes operating outside of the lexicon (e.g., see Reference Bock and EllisBock, 1987, for a review). Lexicalist (or functional) accounts propose that there is no strict separation between the lexicon and grammar: speakers retrieve lexical items as required by the preverbal message they want to communicate, and it is the lexical retrieval process that initiates the building of a syntactic structure. In other words, the building of a linguistic structure is dependent on lexical activation. By implication then, syntax is largely epiphenomenal. However, the linearisation of a longer, complex message that requires activation of multiple content words poses a problem for this account, as lemma activation can be responsible for the activation of ‘local’ syntactic information but is less likely to be responsible for the building of larger syntactic frames (also see Section 3 for a discussion of planning scope in multi-word utterances). Abstract structural accounts are better able to account for linearisation in longer utterances, as they propose that larger structures (or frames) are built by abstract syntactic procedures independently of the lexical items that will be slotted into them. These procedures are sensitive to word-specific syntactic requirements, but they are not, crucially, triggered by activation of individual lemmas.

The viability of the lexical account, and thus the origins of the debate between lexical and abstract accounts, has historical roots. Language research has been largely skewed in favour of comprehension rather than production, and comprehension studies show strong reliance on the lexicon during parsing. In comprehension, listeners receive a linguistic signal that comes in word by word over time and they must integrate this information to decode the speaker’s message. Naturally, given that listeners process incoming information as soon as it becomes available, the processor may give more weight to new lexical information (which can be quickly integrated with those parts of the utterance that have already been heard) than to structural information (as the structural representation of a spoken utterance is built up or inferred from a string of words rather than from individual words). Listeners do generate predictions about upcoming words, but evidence of prediction based on the semantic or lexical content of a sentence (be it coarse-grained, i.e., involving entire words, or finer-grained, i.e., involving sublexical units) is currently more plentiful than evidence of prediction of structure based on grammatical markers or parts of speech (see Reference Huettig, Rommers and MeyerHuettig, Rommers & Meyer, 2011, for a review). Thus, the demands of comprehension for structural processing may be less stringent than in production and may effectively ‘hide’ potential effects of abstract structural processes. Levels of engagement during comprehension can also vary, such that ‘good enough’ processing (i.e., the build-up of underspecified representations) may be sufficient for successful comprehension in many contexts (Reference Karimi and FerreiraKarimi & Ferreira, 2016). Indeed, finding evidence of the involvement of abstract structural processes in comprehension requires development of more sensitive measurement tools or ensuring greater engagement on the listener’s part (see Reference Tooley and BockTooley & Bock, 2014).

In contrast, the distinction between lexical sources of structure and abstract structural processes is more salient and thus more relevant in production. The processing demands of language production on the speaker are arguably higher than the demands of comprehension on the listener. To produce an utterance, speakers must first decide what they want to say (albeit not necessarily in large, sentence-sized chunks) and must then begin generating the linguistic material they will need to communicate their message from scratch. This involves both structural and lexical processing, so stronger reliance on lexical than structural information may not be as viable in production as it is in comprehension: producing a sequence of words cannot bypass structural processing and rely exclusively on lexically specific syntactic information. An empirical challenge in the field of language production is therefore the need to delineate the boundary between lexically driven and lexically free influences on word order, and to explain when and how these processes interact.

1.1.4 Models of Grammatical Encoding: The Relationship between Words and Syntax

Models of grammatical encoding differ in the relationship they propose between words and structure. There are different claims about which level of representation encodes links between lexical and structural information, with some models encoding explicit links between lexical concepts and thematic roles at the conceptual level (e.g., Reference ChangChang, 2002; Reference Chang, Dell and BockChang et al., 2006), and others in grammatical representations between lemmas and syntactic information, allowing lexical retrieval and structure building to interact during grammatical encoding (e.g., Reference Bock, Levelt and GernsbacherBock & Levelt, 1994; Reference Cleland and PickeringCleland & Pickering, 2003, Reference Cleland and Pickering2006; Reference Ferreira and WheeldonFerreira, 2000; Reference Ferreira, Morgan, Slevc, Rueschemeyer and GaskellFerreira, Morgan & Slevc, 2018; Reference LeveltLevelt, 1989; Reference Levelt, Roelofs and MeyerLevelt et al., 1999; Reference MommaMomma, 2021; Reference Pickering and BraniganPickering & Branigan, 1998). Models also diverge in the degree to which lexical or structural information guide grammatical encoding.

The earliest models of grammatical encoding were lexically driven and accorded a central role to lemma representations, which comprised semantic and syntactic-lexical information (e.g., Reference Bock, Levelt and GernsbacherBock & Levelt, 1994; for reviews, see Reference Bock, Ferreira, Goldrick, Ferreira and MiozzoBock & Ferreira, 2014; Reference Ferreira, Slevc and GaskellFerreira & Slevc, 2007; Reference Ferreira, Morgan, Slevc, Rueschemeyer and GaskellFerreira et al., 2018). Later versions of this approach limited lemmas to encoding aspects of lexical syntax, including grammatical category (e.g., noun, verb, adjective) as well as syntactic features (e.g., tense, number, grammatical gender; e.g., Reference Levelt, Roelofs and MeyerLevelt et al., 1999, see also Reference Roelofs, Ferreira and HagoortRoelofs & Ferreira, 2019). These models also assume a discrete flow of information, with lemma selection occurring during grammatical encoding prior to the activation of phonological form (see Section 1.1.2). Two distinct stages are proposed for structure building. In the initial stage, termed functional encoding, the lemmas which best match the conceptual representation in the message are retrieved and assigned to grammatical functions appropriate for the thematic structure (e.g., agent → subject, patient → object, for a transitive active sentence such as ‘Anne saw Bill’). Following function assignment, an appropriate phrase structure is generated to which the lemmas are attached. The process for generating phrase structure was elaborated in a model proposed by Reference Pickering and BraniganPickering and Branigan (1998), which also incorporated links from lemma representations to nodes specifying the possible phrase structures in which they can occur. These ‘combinatorial nodes’ were initially linked only to verbs and encoded subcategorisation information. Later versions of the model extended the approach to nouns (Reference Cleland and PickeringCleland & Pickering, 2003, Reference Cleland and Pickering2006). Following function assignment, the selection of phrase structures in the model is driven by activation spreading from the lemmas with the most highly activated combinatorial node being selected (constituent assembly). Due to the direct links between lemmas and syntactic structures, this approach provides a clear mechanism through which lexical and syntactic representations can interact to determine the structure of the sentence produced.

Another approach which encodes explicit links between lemmas and syntactic structures employs tree-adjoining grammar (TAG; Reference Ferreira and WheeldonFerreira, 2000; Reference FrankFrank, 2002; Reference MommaMomma, 2021, Reference Momma2022). Reference MommaMomma (2021, Reference Momma2022) proposes a TAG-based grammatical encoding model in which the syntactic structure for an utterance is constructed based on elementary trees. Elementary trees are complex structures headed by clause-taking verbs comprising a hierarchical syntactic structure with open nodes for constituents. For example, the elementary tree for a transitive verb like ‘chase’ would have two determiner phrase nodes for the sentence subject and object. More complex structures are created in the model by combining elementary trees either by a process of substitution or adjoining. In substitution, open nodes in an elementary tree are filled by appropriate tree structures; for example, an open determiner phrase node could be filled by a determiner phrase tree ‘the girl’. The process of adjoining fills nodes with auxiliary trees containing recursive elements like adverbs and adjectives. In this model, lemmas are represented at a sub-tree level and are connected to the appropriate nodes of an elementary tree. The sub-tree level also contains nodes representing functional heads for structural options, such as DO and PO datives, which can be activated by thematic representations. Inhibitory links between sub-trees allow for a competitive lemma selection process. In contrast, elementary trees do not compete for selection. Elementary trees are stored in long-term memory and activated by the conceptual structure, either directly or via the conceptual activation of their sub-tree representations.

Reference MommaMomma (2021) proposed the model to explain the grammatical encoding of long-distance syntactic dependencies, such as the cross-clausal filler-gap dependency in the sentence ‘Who do you think that the girl likes?’. This sentence has a syntactic dependency between the words ‘who’ (the filler) and ‘likes’ (the gap – i.e., the missing object for the verb). According to the model, speakers plan the structural dependency between such elements prior to planning the intervening material. Critically, elementary trees must encode all syntactic dependencies, including long-distance dependencies, within a phrase. Therefore, a minimal elementary tree for the sentence above must represent the cross-clausal filler-gap dependency. This elementary tree is abstract in that it represents critical grammatical information about the syntactic nature of the gap and the clause structure in which it occurs. However, it does not represent the material intervening between the filler and the gap. This is represented in a separate tree, and the process of tree adjoining enables this material to be inserted into the elementary tree at a later point during grammatical encoding. This model therefore encodes explicit links between lexical and syntactic representations, allowing them to interact during grammatical encoding.

In contrast, interactions between lemmas and syntactic structures are not a feature of a series of computational learning models of grammatical encoding (Reference ChangChang, 2002; Reference Chang, Dell and BockChang et al., 2006; Reference Dell and ChangDell & Chang, 2014). The Dual Path approach adopted in these models hinges on a strict separation between lexical retrieval and structure building. Similar to the lexically driven models described in Section 1.1.3, grammatical encoding is initiated following the construction of a conceptual message in which lexical concepts are bound to thematic roles and appropriate lemmas are activated by these lexical concepts. The models diverge, however, in that there is no process of function assignment in the Dual Path approach. Instead, the order of activation of lemmas is determined by the activation level of their lexical concepts, which is in turn determined by the weighting of their associated thematic roles. For example, during the production of an active sentence, the agent role would be most highly activated, while for passive sentences the patient role would have the highest activation level. Importantly, the activation of a lemma would be blind to the thematic role assignment of the associated lexical concept.

Syntactic structure is built by a sequencing system modelled as a simple recurrent network (SRN). This network has access only to the event semantics in the thematic structure and is blind to the lexical concepts. It learns syntactic categories and relationships between words through trial and error by predicting word order during training. The SRN stores the links between thematic structures and word orders via a layer of hidden units. During grammatical encoding, the most appropriate word order to convey the information encoded in a message is determined by the weighted thematic structure and the learned syntactic relationships in the SRN. The Dual Path model therefore explains grammatical encoding in terms of a predictive learning process operating as we comprehend speech, actively predicting the next word we will hear and learning from our mistakes (error-driven learning). It therefore provides an explicit mechanism for the acquisition of syntax (e.g., Reference Fitz and ChangFitz & Chang, 2017). Critically this approach allows no interaction between lexical information and structure building during grammatical encoding.

A competing alternative is Reference Reitter, Keller and MooreReitter, Keller and Moore’s (2011) lexical model of priming. This model based on Adaptive Control of Thought—Rational (ACT-R) accounts for both short-term and long-term structural priming, and views both as a consequence of lexical priming. The model implements Combinatory Categorial Grammar (Reference SteedmanSteedman, 2000), where words are bound to subcategorisation information. It includes a declarative memory element with chunks of lexical information connected to chunks of syntactic information and a procedural memory element with if-then rules. Production occurs by sequential activation and retrieval of individual lexical and syntactic chunks from declarative memory; priming occurs because words are kept in a short-term buffer (the ‘working memory’ of the model) and are more likely to be reactivated, with their associated syntactic details, if they have been used recently. The spread of activation in the model explains lexically supported short-term priming as well as cumulative priming.

Finally, other approaches allow both lexical and structural representations to guide grammatical encoding (or, more specifically, linearisation). Theories of sentence planning are particularly well-suited to addressing questions about the degree of lexical and structural control of production because they make specific predictions about the type of information that starts or triggers production of a sentence. A well-known property of the language system is that it allows production to unfold incrementally: speakers plan utterances in small increments rather than in proposition-like units. So, when producing a novel utterance, what information do speakers tend to encode first? The precise nature of the incremental build-up of an utterance is described by two accounts – Linear Incrementality and Hierarchical Incrementality – which roughly follow from the assumptions of accounts proposing lexical and abstract syntactic control of production (see Figure 2). Linearly incremental planning assumes that utterances can be built up in word-like chunks. For example, the generation of a sentence can begin with the retrieval of a single word (corresponding to the concept that is mentioned first; e.g., either ‘cowboy’ or ‘bull’ in a sentence that will eventually be articulated as ‘The cowboy caught the bull’ or ‘The bull was caught by the cowboy’). Activation of either noun commits the speaker to selection of either active or passive syntax (at least in English), which will result in the ‘projection’ of a syntactic structure to subsequently guide encoding of the remaining content words. The various word-sized planning units are joined together by basic sequencing rules (which are underspecified in this account). The attachment or joining of each new word-like unit to the previous one results in the emergence of a specific syntactic structure. In contrast, Hierarchical Incrementality assumes that control of sentence planning and word sequencing lies in the hands of abstract structural processes responsible for generating a relational structure at the conceptual level and a corresponding syntactic structure at the linguistic level. Importantly, this structural framework is generated without lexical support: it precedes lexical retrieval and controls its timing, rather than being projected from a sequence of lexical retrieval operations (see e.g., Reference Bock, Ferreira, Goldrick, Ferreira and MiozzoBock & Ferreira, 2014, for a review). In other words, speakers activate lexical items in the order in which the sentence syntax calls for them.

(a) hierarchical incrementality and

(b) lexical incrementality in the production of a transitive sentence with three content words in a sentence like ‘The cowboy caught the bull’ in four time steps (see Figure 3 for a paradigm eliciting such sentences).

Figure 2 Schematic illustrating the relationship between message-level and sentence-level planning under the strong versions of

Hierarchical Incrementality is consistent with an elegant solution to the linearisation problem in production proposed by Reference Dell, Oppenheim and KittredgeDell and colleagues (2008; also see Reference Momma, Buffinton, Slevc and PhillipsMomma, Buffinton, Slevc & Phillips, 2020). Dell and colleagues describe a model with a strict division of labour between semantics and syntax. Determining word order is the job of a syntactic ‘traffic cop’, or a mechanism consisting of a series of weights between syntactic sequential states and lexical items that enables speakers to ‘say the right word at the right time’. The traffic cop tracks syntactic sequential states to ensure that only words of specific classes (e.g., determiners vs. nouns vs. verbs) are activated at specific points in time in the order required by the syntax of the developing sentence. Lexical retrieval is thus managed in an efficient manner by sequentially activating and deactivating lexical items that are semantically relevant and syntactically appropriate during production. For example, when producing the sentence ‘The cowboy is catching the bull’, the words ‘cowboy’ and ‘catch’ will not compete for the subject slot (but a suitable alternative word for referring to the agent, like ‘man’ or ‘rancher’, can). The division of labour is instantiated in the weights between syntax and the lexicon: content words have stronger links to semantics than to syntax, while function words have stronger links to syntax than to semantics. Reference Momma, Buffinton, Slevc and PhillipsMomma and colleagues (2020) provided confirmatory experimental evidence that prime verbs and nouns with the same lexical form (e.g., ‘is singing’ vs. ‘her singing’) have different effects on production of semantically related verbs (e.g., ‘whistling’) in target sentences: prime verbs (‘is singing’) delay production of target verbs but prime nouns (‘her singing’) do not. Likewise, prime nouns delay production of target nouns, but prime verbs do not.

The strong versions of Linear and Hierarchical Incrementality described previously provide a useful reframing of the lexicalist versus abstract syntax debate. It is useful not only because it generates explicit testable predictions about control of production (as do other frameworks) but primarily because it does so at a level that provides fine-grained insight into production: adjudicating between these accounts requires tracking the time-course of production from message encoding until articulation, which allows for analysis of the coordination of multiple production processes. Importantly, it also provides a way of reconciling lexicalist and abstract syntax accounts. Determining the degree to which the production system supports one type of incremental planning over another by default – that is, the degree to which lexicalist and abstract syntax accounts provide a better description of the data – can lead to somewhat of a theoretical impasse. Instead, recent work in this area has suggested that there may be no default form of incrementality that is strictly determined by the architecture of the production system (e.g., Reference Ferreira and SwetsFerreira & Swets, 2002); instead, the production system may be flexible in supporting linear planning or hierarchical planning under different conditions (e.g., Reference Konopka, Meyer and ForestKonopka et al., 2018). Outlining the conditions under which planning is more likely to proceed in a linearly or hierarchically incremental fashion may ultimately lead to a more precise characterisation of grammatical encoding as being primarily lexically driven or primarily under the control of abstract syntax, but crucially, at present, this approach provides a new framework for explaining the complex processes through which language processes change and adapt to experience

The following sections of this Element describe empirical investigations into lexical and abstract syntactic control of grammatical encoding, first from the perspective of structural priming paradigms, which allow us to test what factors influence speakers’ binary structure choices (Section 2), and then from the perspective of utterance planning paradigms, which allow us to test what factors influence grammatical encoding with finer-grained continuous temporal measures (Section 3).

2 The Independence of Syntactic and Lexical Representations: Evidence from Structural Priming

As described in Section 1.1.4, a key question in the field concerns the relationship between syntactic processes and other levels of representation during sentence production. Linguistic structure may emerge epiphenomenally from other levels of representation, such as conceptual structure and activation of lexical items (as proposed by lexicalist or functional accounts of syntax) or may be governed by an abstract structure-building syntactic process (as proposed by abstract syntactic accounts). Questions of representation such as these can be addressed by tracking the production of sentences with specific structures under controlled lab conditions as well as in natural speech (Reference BraniganBranigan, 2007; Reference Branigan and PickeringBranigan & Pickering, 2017; Reference Mahowald, James, Futrell and GibsonMahowald, James, Futrell & Gibson, 2016; Reference Pickering and FerreiraPickering & Ferreira, 2008). In the lab, the structural priming paradigm has been a particularly fruitful tool in the psycholinguistic toolbox in addressing this question.

Priming is the well-established finding that cognitive processes become easier to execute or to deploy after recent exposure to a particular stimulus or recent experience deploying similar processes. Thus, in priming studies, speakers are (a) exposed to a linguistic stimulus with specific properties in a prime trial and (b) produce an utterance in a subsequent target trial where they must select one among two (or more) linguistic options. If target sentences repeat a property of the primes, this suggests similarity in the underlying representation of the repeated linguistic property. In structural priming studies, speakers hear or read a sentence with a specific structure in prime trials and then produce a new sentence in target trials. Repetition of structure in the target trial – in the absence of other similarities between the prime and target – is taken to indicate that a sufficiently abstract syntactic frame was generated during processing of the prime to transfer to a new (target) sentence. This logic is not without its critics: any experience-dependent change that occurs in a target trial (such as repetition of a structure) is subject to cognitive constraints and can shift production preferences in a highly malleable language system that influence future productions (e.g., target sentences can act as ‘primes’ for subsequent sentences). Nevertheless, the question of when structural priming occurs and what factors modulate the magnitude of priming can shed light on the nature of the underlying syntactic representations generated at the moment of speaking.

In a seminal paper, Reference BockBock (1986) reported repetition of dative and transitive structures in a spontaneous picture-description task. Hearing and repeating a sentence with a prepositional-object (PO) dative structure in a prime trial (e.g., ‘The wealthy widow gave the Mercedes to the church’) increased the likelihood of speakers producing a prepositional dative in the subsequent target trial (‘The man is reading a story to the boy’); likewise, hearing and repeating a sentence with a double-object (DO) dative structure in a prime trial (e.g., ‘The wealthy widow gave the church the Mercedes’) increased the likelihood of speakers producing a double-object dative structure in the subsequent target trial (‘The man is reading the boy a story’). Analogous effects were obtained with active and passive sentences. Repetition of structure across unrelated sentences provided some of the first empirical evidence in favour of abstract syntax: neither meaning overlap nor lexical overlap were necessary to obtain repetition of syntactic structure, suggesting a syntactic locus for the repetition of structure and thus the involvement of abstract structural processes in the generation of simple sentences. This and further research focused on the question of whether repetition of structure is fundamentally syntactic in nature or whether similarity at other levels of representation (e.g., meaning and sound) changes the likelihood and magnitude of structural priming. Isolating effects due to abstract syntax is not a trivial problem, largely due to the difficulty in meeting the key experimental requirement – that is, unambiguously separating the contribution of syntactic and non-syntactic factors to linguistic structure.

2.1 Independence of Syntax from Meaning

In its earliest days, priming research addressed the question of independence of syntactic representations from meaning. There is an unavoidable parallelism between conceptual structure (or event structure) and syntactic structure: similar ideas are conveyed with similar word orders and underlying structures. The degree to which this parallelism is responsible for the development of a syntactic structure is a fundamental challenge for any account of abstract syntax. Specifically, when speakers reuse a particular structure, they may be doing this because of (a) similarities in conceptual structure across sentences that map onto similar syntactic structures, (b) facilitation of an independent abstract structural process, or (c) both (also see Reference Prat-Sala and BraniganPrat-Sala & Branigan, 2000, for a discussion of the influence of discourse on structure choice). A crucial question then is whether some repetition of meaning is necessary for structural repetition to occur or whether meaning and syntax make independent contributions to the generation of sentence structure.

In support of the abstract syntax account, Reference Bock and LoebellBock and Loebell (1990) showed that repetition of structure across sentences can occur without any overlap in thematic roles (or event roles). Prepositional locatives (e.g., 1a) were found to be as effective as prepositional-object datives (1b) in priming production of target sentences with prepositional-dative syntax (1c, Experiment 1), despite differences in the thematic roles of the individual constituents in locative and dative sentences. Likewise, intransitive by-locative prime sentences (e.g., 2a) were as effective as passive prime sentences (2b) in priming production of target sentences with passive syntax (2c, Experiment 2), again despite differences in thematic roles of the individual constituents in intransitive and transitive sentences. Finally, sentences with superficially similar word orders and metric structures but different constituent structures did not show priming: prime sentences with prepositional-object dative syntax like 3a increased production of new prepositional-object datives (3c), but prime sentences with infinitive verbs like 3b did not (Experiment 3). These results were interpreted as strong evidence that the structures that generalised across sentences were syntactic in nature: what was required for priming to occur was similarity in constituent structure and not similarity in meaning or thematic arguments.

1. a. The wealthy widow drove the Mercedes to the church (prepositional locative)
2. b. The wealthy widow gave the Mercedes to the church (prepositional dative)
3. c. The boy is giving the apple to the teacher (prepositional dative)

1. a. The 747 was landing by the airport’s control tower (by-locative)
2. b. The 747 was alerted by the airport’s control tower (by-passive)
3. c. The man was stung by a bee (by-passive)

1. a. Susan brought a book to Stella (prepositional dative)
2. b. Susan brought a book to study (infinitive)
3. c. The boy is giving the apple to the teacher (prepositional dative)

Reference Messenger, Branigan, McLean and SoraceMessenger, Branigan, McLean and Sorace (2012) tested a similar hypothesis for adults’ and children’s production of actives and passives, and showed that production of agent-patient passives increased after passive primes with three different thematic arguments, determined by the main sentence verb: agent-patient verbs (e.g., 4a), theme-experiencer verbs (e.g., 4b) and experiencer-theme verbs (e.g., 4c). In other words, speakers showed generalisation of a syntactic structure irrespective of the degree of thematic overlap between primes and targets.

1. a. The girl_PATIENT was pushed by the boy_AGENT (agent-patient verb)
2. b. The girl_EXPERIENCER was scared by the boy_THEME (theme-experiencer verb)
3. c. The girl_THEME was seen by the boy_EXPERIENCER (experiencer-theme verb)

A number of other findings are broadly consistent with a meaning-free account of structural repetition. For example, priming is observed across sentences with verb phrases that have similar syntax but different compositional meanings (idiomatic phrasal verbs like ‘to pull off a robbery’ and non-idiomatic phrasal verbs like ‘to pull off a sweatshirt’; Reference Konopka and BockKonopka & Bock, 2009), suggesting that an analysis of word or phrase meanings is not part of the processes responsible for generating syntactic structures (but see Reference Ziegler and SnedekerZiegler et al., 2018). Structural priming also occurs from prime sentences with novel verbs and anomalous verbs (Reference Ivanova, Pickering, Branigan, McLean and CostaIvanova, Pickering, Branigan, McLean & Costa, 2012), as well as from sentences with missing verbs (Reference Ivanova, Branigan, McLean, Costa and PickeringIvanova, Branigan, McLean, Costa & Pickering, 2017) and in artificial languages (Reference Fehér, Wonnacott and SmithFehér, Wonnacott & Smith, 2016), which again suggests that repetition of structure is not sensitive to sentence or verb meaning. Finally, the complexity of individual constituent phrases (e.g., simple NPs vs. complex NPs) or the degree of match in phrasal complexity between primes and targets does not change the magnitude of priming, indicating that what matters for structural repetition is similarity in global rather than local syntactic structure across sentences (Reference Hardy, Wheeldon and SegaertHardy, Wheeldon & Segaert, 2020).

In fact, such repetition effects may not be unique to language. Repetition of abstract structure has also been observed across cognitive domains – from simple arithmetic to relative clause attachment in language. For example, solving arithmetic problems with internal structures analogous to high-attachment and low-attachment sentences increases production of high-attachment and low-attachment sentence fragment continuations, respectively (Reference Wheeldon and KonopkaScheepers et al., 2011; also see Reference SkalickyScheepers & Sturt, 2014). Similar attachment priming effects have also been observed from music sequences and action descriptions to relative clause attachment in language (Reference van de Cavey and HartsuikerVan de Cavey & Hartsuiker, 2016). Finding that sequences with similar hierarchical structures in one domain can prime analogous structures in the linguistic domain argues for the existence of a highly abstract and domain-general structural processor (also see Reference Whittlesea and WrightWhittlesea & Wright, 1997).

At the same time, there is evidence of similarity in event structure across linguistic primes and targets influencing structure choice. Two non-syntactic variables that have received considerable attention are effects of referent animacy as well as thematic role, event structure and semantic similarity on structure choice.

Animacy. The first testcase for abstract syntax theories is the role of animacy in determining word order. Speakers display a strong cross-linguistic bias to assign agents to syntactically prominent positions and/or sentence-initial positions, suggesting a clear influence of conceptual representations and conceptual accessibility on sentence structure. The genesis of this effect may be in a general bias to prioritise detecting agency and causality in linguistic and non-linguistic cognition (Reference Wilson, Zuberbühler and BickelWilson, Zuberbühler & Bickel, 2022). To determine the extent of animacy effects on structure choice, studies have pitted the effects of referent animacy (e.g., agent and patient animacy) and syntactic priming on structure choice against one another. The persistence of structure together with persistence of a mapping of an animate or inanimate referent to a particular grammatical function (concept-to-function mapping) or to a particular linear position (concept-to-linear order mapping) would indicate a strong link between conceptual representations and grammatical processes. Reference BockBock (1986) reported numeric trends suggesting that animacy can indeed restrict structural priming. However, Reference Bock, Loebell and MoreyBock, Loebell and Morey (1992) showed that syntactic priming occurred over and above any effects of character animacy on structure choice (also see Reference Ziegler and SnedekerZiegler & Snedeker, 2018, in this section), suggesting that syntactic structure can be manipulated independently of the bias to assign animate entities to specific linear sentence positions. Evidence of additivity (rather than interactivity) of animacy effects and structural priming effects on structure choice suggest that the two effects may have a different locus.

Finer-grained investigation of animacy effects on production have much to gain from cross-linguistic research, particularly with languages that permit more variation in word order than English (e.g., Mandarin Chinese in Reference Cai, Pickering and BraniganCai, Pickering & Branigan, 2012; Odawa in Reference Christianson and FerreiraChristianson & Ferreira, 2005; Japanese in Reference Tanaka, Branigan, McLean and PickeringTanaka, Branigan, McLean & Pickering, 2011; see Reference BraniganBranigan, 2007 and Reference Norcliffe, Konopka, Mishra, Srinivasan and HuettigNorcliffe & Konopka, 2015 for reviews). At issue is the question of whether highly accessible animate referents are assigned to privileged syntactic roles (e.g., the subject role; concept-to-function mapping) or whether they are simply encoded early and assigned to sentence-initial positions (concept-to-linear order mapping). In English, subjecthood is confounded with linear word order. In languages with fewer constraints on linear word order, the two can be dissociated (if, for example, the grammar allows subjects to not be produced sentence initially). Reference Cai, Pickering and BraniganCai and colleagues (2012) pitted the concept-to-function and concept-to-linear order hypotheses against each other in a priming task with speakers of Mandarin Chinese, which has a more flexible word order for dative sentences (e.g., it allows direct objects of dative sentences like ‘The cowboy gave the sailor the book’ to be topicalised). The results showed persistence in the assignment of concepts to the same grammatical functions (e.g., themes as direct objects) as well as in the assignment of concepts to the same linear positions (e.g., themes before verbs), which supports an account in which mappings from concepts to functions and to linear order can occur in parallel.

Thematic roles. Evidence that is more problematic for the abstract syntax account comes from studies assessing repetition of thematic role order pitted against repetition of structure. Notably, structural processing choice is sensitive to thematic role order. Reference Chang, Bock and GoldbergChang, Bock and Goldberg (2003) showed priming of theme-location/location-theme role orders across prime and target sentences where the surface syntactic structure was held constant: sentences with theme-location order like 5a primed theme-location order in new sentences like 5c more than sentences with location-theme order like 5b (also see Reference Hare, Goldberg, Hahn and StonesHare & Goldberg, 2000 and Reference Ziegler and SnedekerZiegler & Snedeker, 2018). This finding suggests that the order of thematic roles (or thematic role mappings) is a relevant feature during the mapping of conceptual representations onto linguistic structures (also see Reference Chang, Dell and BockChang et al., 2006).

1. a. The maid rubbed polish onto the table (theme-location order)
2. b. The maid rubbed the table with polish (location-theme order)
3. c. The farmer heaped straw onto the wagon (theme-location order)

Reference Ziegler and SnedekerZiegler and Snedeker (2018) tested how similarity in thematic role order influenced priming with a finer-grained manipulation. They compared priming between sentences with the same surface syntactic structures but differences in thematic roles: datives and locatives both have themes and goals, but more specifically, the goals are recipients in dative sentences (6a, 6b) and destinations in locative sentences (6c, 6d). They showed dative-to-dative priming as well as locative-to-locative priming (i.e., sentences with the same thematic roles), as expected, but no locative-to-dative and dative-to-locative priming (i.e., sentences with different thematic roles), unless locatives had animate goals (6e, 6f) so that animacy features in datives and locatives matched. Thus, across a series of experiments, the results suggested a gradient of syntactic priming effects, such that the magnitude of priming increased with increasing overlap in animacy and thematic role order in primes and targets.

1. a. The boy hands the suitcase to his mother (PO dative: theme + goal[recipient])
2. b. The boy hands his mother the suitcase (DO dative: goal[recipient] + theme)
3. c. The boy loads the bag on the cart (theme-first locative: theme + inanimate goal[destination])
4. d. The boy loads the cart with the bag (theme-second locative: inanimate goal[destination] + theme)
5. e. The boy sprayed cologne on the man (theme-first locative: theme + animate goal[destination])
6. f. The boy sprayed the man with cologne (theme-second locative: animate goal[destination] + theme)

Perhaps the most interesting counter-argument to abstract syntactic accounts is Reference Ziegler, Bencini, Goldberg and SnedekerZiegler, Bencini, Goldberg and Snedeker’s (2019) evaluation of Reference BockBock’s (1986) by-locative priming. Ziegler and colleagues used a prime-target paradigm with written primes followed by presentation of target pictures that elicited active and passive descriptions. Their study replicated by-passive and by-locative priming of by-passives (7a and 7b primed 7d) but showed that this effect did not generalise to near-locatives primes (7c did not prime 7d), suggesting a role for lexical repetition in eliciting repetition of structure and calling for a re-examination of the evidence in Reference BockBock (1986) used to argue for abstract syntax.

1. a. The 747 was radioed by the airport control tower (passive)
2. b. The 747 was landing by the airport control tower (by-locative)
3. c. The 747 was landed near the airport control tower (near-locative)
4. d. The boy was hit by the ball (passive)

Semantic and event structure similarity. Finally, exposure to sentences with similar semantic information, expressed via individual content words rather than at the level of event structure, also influences structure selection. For example, Reference Cleland and PickeringCleland and Pickering (2003) showed enhanced noun-phrase priming with semantically related referents: speakers were more likely to produce target descriptions like ‘the sheep that’s red’ after exposure to primes with semantically related nouns (like ‘the goat that’s red’) than unrelated nouns (like ‘the book that’s red’; a semantic boost). Likewise, Reference Konopka and KuchinskyKonopka and Kuchinsky (2015) found enhanced priming of actives and passives when primes and targets had conceptually related verbs (e.g., ‘tripping’ and ‘pushing’) than when they had unrelated verbs (‘paying’ and ‘pushing’; also see Reference Bernolet, Colleman and HartsuikerBernolet, Colleman & Hartsuiker, 2014, for evidence of a sense boost, and Reference Gruberg, Ostrand, Momma and FerreiraGruberg, Ostran, Momma & Ferreira, 2019, for persistence of structure in repeated events). More dramatically, Reference Bunger, Papafragou and TrueswellBunger, Papafragou and Trueswell (2013) found that, when describing motion events (e.g., an alien driving into a cave), participants were more likely to mention path information in target sentences when primes also mentioned path information, both when using the same and different verbs. This priming of the content of preverbal messages suggests that priming of structure may extend to priming of event representations (also see Reference Bernolet, Hartsuiker and PickeringBernolet, Hartsuiker & Pickering, 2009 and Reference Ziegler and SnedekerZiegler et al., 2018).

In sum, recent work in this area shows evidence of both event semantics and abstract syntax influencing speakers’ structure choice. Importantly, this evidence is not incompatible with accounts of abstract syntax as repetition of meaning does not uniquely account for speakers’ structure choices. What is crucial for abstract syntax accounts is that syntactic structures are not purely ‘limnings of meaning’ (Reference Bock and LoebellBock & Loebell, 1990; Reference Chang, Dell and BockChang et al., 2006): with the exception of Reference Ziegler, Bencini, Goldberg and SnedekerZiegler and colleagues (2019), repetition of structure has been shown to occur across sentences in spite of differences in event semantics.

2.2 Independence of Syntax from the Lexicon

A second key question in assessing the independence of syntax from other levels of representation is that of lexical contributions to syntax. Language production requires rapid activation and integration of words and structures to construct grammatically correct utterances with context-appropriate lexical content. Interactions between lexical items and grammar are thus a natural component of this integration process. In structural priming, the question of lexical influences on structure has been one of the most active areas addressing the debate between lexicalist and abstract syntactic theories of structure over the decades. The initial evidence provided by Reference BockBock (1986, Reference Bock1989) was that priming occurs without similarity in meaning across prime and target sentences but also without repetition of either content words or function words. Thirty years later, there is ample evidence of lexical involvement in structure building but also development of models reconciling this evidence with abstract accounts of syntax.

2.2.1 The Lexical Boost in Structural Priming

Repetition of meaning normally entails repetition of lexical items. Thus, as with repetition of event semantics, one might expect to see evidence for the involvement of the lexicon in speakers’ structural choices in the form of a lexical boost in structural priming. Indeed, using a sentence completion task, Reference Pickering and BraniganPickering and Branigan (1998) showed that speakers were more likely to repeat the syntax of a prime PO (or DO sentence) in a new target sentence when primes and targets used the same verb (8a, 8b) than when they did not (8c, 8d).

1. a. The racing driver showed the torn overall … (prime sentence biased towards PO completion)
2. b. The patient showed … (to-be-completed target sentence)
3. c. The racing driver gave the torn overall … (prime sentence biased towards PO completion)
4. d. The patient showed … (to-be-completed target sentence)

Since then, this lexical boost (or lexical enhancement of priming) has been replicated in numerous studies with a range of sentence elicitation paradigms and in corpora (Reference GriesGries, 2005), and with repetition of both verbs and nouns (Reference Cleland and PickeringCleland & Pickering, 2003). Reference Pickering and BraniganPickering and Branigan’s (1998) proposed representation basis of lexical influences on structure choice focuses on activation at the lemma level. On their account, lemma nodes are linked to combinatorial nodes, which specify what structures a verb can be used in (e.g., give can be used in both prepositional-object and double-object dative structures, while donate can only be used with propositional-object syntax). This information is activated during processing of a prime sentence. Residual activation of combinatorial nodes can bias structure selection in a subsequent target trial in favour of the recently activated structure. Importantly, the link between a combinatorial node and the verb lemma also remains temporarily activated, so repetition of the same verb from prime to target increases the likelihood of selecting a recently used structure beyond the level supported by activation of combinatorial nodes alone. The magnitude of this lexical enhancement of priming is noteworthy: the odds of repetition of structure double in sentences with than without lexical overlap (Reference Mahowald, James, Futrell and GibsonMahowald et al., 2016).

It is not clear from Pickering and Branigan’s account (Reference Pickering and Branigan1998) whether the lexical boost is a conflation of two effects: a semantic boost driven by repetition of meaning plus a lexical boost due only to repeated activation of the same lexical information. Reference Santesteban, Pickering and McLeanSantesteban, Pickering and McLean (2010) provided evidence that distinguished between these possibilities by testing whether lexical similarity without semantic overlap can modulate structural priming. Their experiments compared structural priming of NPs (e.g., ‘the red bat’ vs. ‘the bat that’s red’) with non-homophonous and homophonous nouns. Priming was equally strong from primes with nouns that repeated meaning and sound information (e.g., the animal bat in primes and targets) and homophonous nouns that repeated sound information alone (the animal bat and a cricket bat in primes and targets, respectively; but see Reference Cleland and PickeringCleland & Pickering, 2003). This homophone boost suggests that overlap at the word-form level is sufficient to observe enhanced priming (also see Reference Konopka and BockKonopka & Bock, 2009, for evidence of a lexical boost in sentences with idiomatic verbs that repeat word forms but not meaning).

A number of findings regarding the involvement of the lexicon in structure building are important for keeping these effects in perspective. The first observation is that structural priming is boosted by – but not determined by – lexical repetition: the lexical boost is an enhancement of structural repetition rather than a precondition for structural repetition to occur. Observing lexically unsupported priming is indeed treated as the golden standard for classifying repetition of structure as ‘syntactic’ in nature (e.g., Reference Fehér, Wonnacott and SmithFehér et al., 2016). This definition places key constraints on lexical accounts of syntax and requires clarification of how potential links between lexical items and syntax modulate structure choice.

For example, not all lexical repetition results in a lexical boost. Repetition is mediated by the syntactic role of the repeated words: a lexical boost has been observed with repetition of open-class words that are syntactic heads (Reference Carminati, van Gompel and WakefordCarminati, van Gompel & Wakeford, 2019; Reference Ivanova, Branigan, McLean, Costa and PickeringIvanova et al., 2017; Reference van Gompel, Wakeford and Kantolavan Gompel, Wakeford & Kantola, 2022) but not syntactic non-heads. Repetition of closed-class words such as to and for in dative sentences (Reference BockBock, 1989), or by in locatives and passives (Reference Bock and LoebellBock & Loebell, 1990) does not create a lexical boost (but see Reference Ziegler and SnedekerZiegler et al., 2018). Further, lexical similarity alone is also not sufficient for repetition. For example, Reference FerreiraFerreira (2003) used a sentence-recall production task to demonstrate that the use of the optional complementiser ‘that’ in sentences such as 9d could only be primed by sentences that included a complementiser ‘that’ (i.e., the same lexical item playing a similar syntactic role, 9a) but not by a determiner ‘that’ (i.e., the same lexical item playing a different syntactic role, 9b) or a noun-complement ‘that’ (i.e., the same lexical item in a different syntactic context, 9c). Importantly, both the inclusion and exclusion of the complementiser ‘that’ could be primed – a finding consistent with a structural rather than lexical locus for priming effect.

1. a. The company ensured that the farm was covered for two million dollars
2. b. The company insured that farm for two million dollars
3. c. The theory that penguins built the igloo was completely false
4. d. The mechanic mentioned (that) the antique car could use a tune-up

Reference MommaMomma (2022) built on these findings by demonstrating that the priming of complementiser ‘that’ can be lexically boosted by the repetition of verbs biased for its use (e.g., Reference Bernolet and HartsuikerBernolet & Hartsuiker, 2010). This boost was observed when both prime and target either did or did not feature cross-clausal filler-gap dependencies (10a and 10b, respectively). However, when the prime sentence (but not the target) contained a cross-clausal filler-gap dependency (e.g., 10a), the lexical boost disappeared.

1. a. Who did the manager imply (that) he would promote?
2. b. The manager implied (that) he would promote the employee

This result is problematic due to the explicit memory account of the lexical boost, which would predict that a boost should be observed in all conditions. Reference MommaMomma (2022) proposed that lexically independent structural priming is the result of an enhanced link between a concept and a node of an elementary tree, whereas lexical boost effects are due to the residual activation of elementary trees (similar to the priming mechanisms proposed by Reference Pickering and BraniganPickering & Branigan, 1998). Critically, according to the TAG model, elementary trees must contain all syntactic dependencies as well as complementiser features. Therefore, the sentences in 10 will be represented by different elementary trees which are headed by the same verb.

Second, repetition of structure across sentences with and without overlap in content words has a different time-course. Lexical and abstract syntactic accounts make different predictions about the duration of structural priming effects, following directly from differences in their mechanistic explanations of priming. Activation-based accounts, such as Reference Pickering and BraniganPickering and Branigan’s (1998) lexical account, predict short-lived priming: by definition, activation dissipates quickly, so any activation of links between verb nodes and combinatorial nodes that produces a lexical boost in the short term may fail to produce a lexical boost in the long term. Accounts that emphasise a syntactic locus of structural repetition predict persistence of lexically unsupported priming.

Consistent with abstract accounts, structural priming has been observed across different lags within the same experiment (Reference Bernolet, Collina and HartsuikerBernolet, Collina & Hartsuiker, 2016; Reference Bock and GriffinBock & Griffin, 2000; Reference Bock, Dell, Chang and OnishiBock, Dell, Chang & Onishi, 2007; Reference Hartsuiker, Bernolet, Schoonbaert, Speybroeck and VanderelstHartsuiker et al., 2008; Reference Kaschak and BorreggineKaschak & Borreggine, 2008; Reference Kaschak, Loney and BorreggineKaschak, Loney & Borreggine, 2006), across different sessions of the same experiment (Reference Kaschak, Kutta and SchatschneiderKaschak, Kutta & Schatschneider, 2011), and in natural speech (Reference GriesGries, 2005) in sentences without lexical overlap (see Figure 3 for an example of Lag 0 and Lag 2 priming). Priming effects are also obtained regardless of the cover task given to participants (i.e., regardless of whether participants’ attention is directed to the sentence form or not; Reference Bock, Loebell and MoreyBock et al., 1992) and remarkably even in participants with compromised episodic memory (anterograde amnesics; Reference Ferreira, Bock, Wilson and CohenFerreira, Bock, Wilson & Cohen, 2008; Reference Heyselaar, Segaert, Walvoort, Kessels and HagoortHeyselaar, Segaert, Walvoort, Kessels & Hagoort, 2017). In contrast, the lexical boost in syntactic priming declines when primes and targets are separated by as few as two intervening sentences (Reference Hartsuiker, Bernolet, Schoonbaert, Speybroeck and VanderelstHartsuiker, Bernolet, Schoonbaert, Speybroeck & Vanderelst, 2008; Reference Kaschak and BorreggineKaschak & Borreggine, 2008; Reference Konopka and BockKonopka & Bock, 2005). For example, Reference Hartsuiker, Bernolet, Schoonbaert, Speybroeck and VanderelstHartsuiker and colleagues (2008) compared the magnitude of dative PO/DO priming at lags 0, 2, and 6, and showed a lexical boost only at lag 0. The sharp decline in lexically supported priming suggests that lexical contributions to structure repetition are short-lived.

(a) lag 0 with adjacent prime and target trials (white cells), and in

(b) lag 2 with the prime and target separated by two intervening filler trials (grey cells). Recorded prime sentences with active or passive syntax: ‘The man is lifting the bench’ / ‘The bench is being lifted by the man’. Target sentences eliciting active or passive sentences: ‘The cowboy is catching the bull’ / ‘The bull is being caught by the cowboy’.

Figure 3 Schematic illustrating a prime-target structural priming paradigm in

A complementary approach showed that the magnitude of priming effects can vary with the extent of repeated exposure to alternative structures but not to repeated exposure with specific verbs. Reference Kaschak, Loney and BorreggineKaschak and colleagues (2006) tested whether participants’ sensitivity to a PO-DO priming manipulation was modulated by recent and repeated experience with the primed structures (i.e., a form of cumulative priming). In Experiment 1, after an exposure phase where the ratio of PO and DO sentences was manipulated (50:50 in the same block vs. 50:50 but in separate blocks vs. 100:0), priming was only observed in participants who had been exposed to both structures at the beginning of the study. Participants who had been exposed to only PO or only DO sentences (i.e., a 100:0 ratio of PO:DO structures or DO:PO structures) did not respond to the priming manipulation. In Experiment 2, sensitivity to the priming manipulation also varied in a graded fashion, in line with the strength of the structural bias introduced in the exposure phase (50:50 vs. 75:25 vs. 100:0). This effect was not replicated by Reference KaschakKaschak (2007), who proposed that the exposure phase shifts base rates for individual structures (with stronger shifts for the dispreferred PO structure) but not the magnitude of priming effects. Subsequently, Reference Kaschak and BorreggineKaschak and Borreggine (2008) tested whether these biases are affected by verb repetition in the exposure phase and again showed structural repetition only in participants who had been exposed to both structural alternatives prior to the priming task. Crucially, these effects were not modulated by verb repetition or by the presentation of the verb in one or both structures. In other words, what mattered for structure choice was the frequency of use of individual structures rather than the frequency of use of individual structures with specific verbs.

Thus, testing the longevity of abstract syntactic priming and lexically supported priming shows a critical dissociation. On balance, the short-lived nature of the lexical boost – arguably the strongest evidence for involvement of the lexicon in structure building – and the persistence of abstract priming suggests that the two effects have a different source. These results favour a multi-factorial account of priming, one where the binding of words to structures responsible for the lexical boost is dynamic and short-lived, while the persistence of structure choice long after the lexical boost has decayed arises from longer-term learning of structure-building procedures in an abstract syntactic system. Reference Chang, Dell and BockChang and colleagues (2006) account for this finding by proposing an implicit learning mechanism that predicts both short-term and longer-term repetition of structure and that is resistant to episodic forgetting, and speculate that a separate mechanism, relying on explicit memory, may explain lexical influences on structure choice (see Section 1.1.4; Reference Chang, Janciauskas and FitzChang, Janciauskas & Fitz, 2012).

The proposal of explicit memory retrieval explaining the lexical boost is important for clarifying the coordination of lexical and syntactic processes during grammatical encoding. If repetition of structure is driven by memory for the prime sentences, this makes the priming paradigm less suitable for addressing questions about the production architecture, such as the nature of the links between individual lexical items and structural information. For example, Reference Scheepers, Raffray and MyachykovScheepers, Raffray and Myachykov (2017) showed that repetition of any content word from PO/DO-primes to PO/DO targets can produce a lexical boost (agents, verbs, recipients, themes). In fact, the magnitude of priming increased with increasing overlap in content words, producing a cumulative lexical boost effect. Scheepers and colleagues proposed that repetition of lexical items serves as a powerful retrieval cue: quite simply, the more lexical repetition, the stronger the memory cues and the higher the likelihood of structural repetition. Reference Bernolet, Collina and HartsuikerBernolet and colleagues (2016) also suggested that explicit memory may contribute to structural repetition even when primes and target do not share lexical items, based on the finding that both structural priming and explicit memory for the prime sentences declined over lags (from Lag 0 to Lag 6) in their experiments. There was, however, evidence of cumulative lexically unsupported priming: speakers’ production of target structures increased as a function of the number of these structures produced within an experimental session. In a more direct test of the memory hypothesis, Reference Zhang, Bernolet and HartsuikerZhang, Bernolet and Hartsuiker (2020) showed that adding a cognitive load to the production task reduced structural priming effects in adjacent prime and target trials, both in sentences with and without lexical overlap (but see Reference Yan, Martin and SlevcYan, Martin & Slevc, 2018, for a different view). Further insight into the coordination of lexical and structural processes is provided by studies assessing verb biases and cumulative priming effects (Section 2.2.2).

2.2.2 Verb Bias and Structural Priming

Stronger evidence for the involvement of the lexicon in structure building comes from studies of verb bias. Among verbs that can appear in alternative syntactic structures (e.g., PO and DO datives), some demonstrate a strong bias for one structure over the other, while others show weaker biases or are considered to be equi-biased. These biases illustrate one of the main premises of lexicalist accounts of syntax, that is, the claim that lexical activation is a key driver of structure-building procedures, or that, put differently, syntax is ‘projected’ from the lexicon. For such biases to arise, the production system must keep track of individual verb-structure pairings over a speaker’s lifetime (a form of cumulative priming) and store verb-specific frequency information that reflects the statistics of the input. This information is then activated when a known verb is used on a new occasion and can bias selection of the most frequent structural alternative for that verb.

In support of this lexical view, Reference Melinger and DobelMelinger and Dobel (2005) showed that speakers’ structure choices can be influenced by the presentation of a single verb. In prime trials in their study, participants read non-alternating dative verbs (i.e., verbs that were strongly biased towards either PO or DO dative structures), and then saw pictures meant to elicit dative descriptions in target trials. Target descriptions were consistent with the bias of prime verbs: speakers produced more PO descriptions after PO-biased verbs and more DO descriptions after DO-biased verbs. Thus, activation of a verb with strong structural biases out of a sentence context was sufficient to influence structure choice in line with these biases.

However, the mere existence of verb biases is seemingly in direct contrast with the observation that the lexical boost in priming declines over time (discussed in the previous section). How do results such as Reference Melinger and DobelMelinger and Dobel’s (2005) as well as Reference Kaschak and BorreggineKaschak and Borreggine’s (2008) square with the observation of a short-lived lexical boost in priming? Conversely, if verb-structure pairings are dynamic, as assumed by implicit accounts of priming, how do verb biases come about in the first place? The answer arguably lies in the difference between short-term priming effects and long-term (or cumulative) priming.

Studies assessing structural priming with biased verbs in the short term provide an explanation that is more compatible with implicit learning accounts than with lexicalist accounts of syntax. Reference Bernolet and HartsuikerBernolet and Hartsuiker (2010) tested how existing verb biases modulate the magnitude of dative PO/DO structural priming in a full-sentence prime-target paradigm and showed an important difference between baseline structural preferences and sensitivity to priming of different verbs. Production of target sentences in a baseline condition showed strong effects of verb bias: participants generated more sentences with PO syntax than DO syntax, consistent with the biases of the verbs used in these sentences. At the same time, structure choice was also modulated by the priming manipulation: more PO/DO sentences were produced after PO/DO-primes. Importantly, participants’ productions showed an inverse priming effect: priming from DO-primes (i.e., primes with dispreferred DO syntax) was stronger than priming from PO-primes (i.e., primes with preferred PO syntax; see Reference Segaert, Weber, Cladder-Micus and HagoortSegaert, Weber, Cladder-Micus & Hagoort, 2014, for a similar effect with active and passive structures). This effect was further modulated by the individual ‘bias scores’ of both prime and target verbs: DO-primes had the strongest effect on target sentences when they featured PO-biased verbs, and target sentences with PO-biased verbs showed the strongest effects of DO-primes. In other words, the verb-structure combinations that are encountered less frequently produced the strongest priming.

The results are consistent with a key prediction of the implicit learning account of structural priming (Reference Bock and GriffinBock & Griffin, 2000; Reference Chang, Dell and BockChang et al., 2006; Reference Jaeger and SniderJaeger & Snider, 2013), namely the fact that encountering a surprising structure in a prime trial results in stronger error-based learning and thus increases the likelihood of producing a dispreferred structure in a target trial. This account also explains why dispreferred structures continue to exist in speakers’ linguistic repertoire: these structures receive a large boost with each use, which effectively ensures that they do not prime themselves out of existence (Reference Ferreira and BockFerreira & Bock, 2006).

In contrast, studies assessing structural priming with biased verbs in the long term show that verb biases do persist. Reference Coyle and KaschakCoyle and Kaschak (2008) proposed that repeated exposure to verbs in the same structures (i.e., using a manipulation meant to simulate specific verb biases) can produce longer-lasting, lexically enhanced structural priming, and that these effects can be observed in production tasks with a minor modification. Namely, Coyle and Kaschak hypothesised that any longer-lasting effects induced in the exposure phase of their experiment may not be visible in the priming phase because prime trials exert a strong and immediate influence on structure choice on target trials, but should be observable in the same target trials when not preceded by primes. Indeed, examining structure choice in target trials after an exposure phase meant to induce specific verb biases showed the persistence of these biases. Thus, the rapid decay of the lexical boost in priming studies need not imply that lexical contributions to structure repetition are short-lived: verb biases may come about due to repeated exposure to specific verb-structure bindings on a time scale that exceeds that of most priming studies in the lab.

2.2.3 Structural Priming in Bilinguals

As a field, psycholinguistics initially favoured research on language processing in monolingual populations. Studies on bilingual language processing, however, are now plentiful. Much like cross-linguistic research, bilingualism provides new opportunities for establishing the nature of processing similarities across languages or constraints on processing (also see Reference Blasi, Henrich, Adamou, Kemmerer and MajidBlasi, Henrich, Adamou, Kemmerer & Majid, 2022, for a discussion of the need to broaden the field’s scope of research to languages other than English). In particular, by testing whether syntactic structures can generalise from one language to another, bilingual studies contribute valuable evidence to the debate about the balance of lexical and structural influences in grammatical encoding (Reference Hartsuiker and PickeringHartsuiker & Pickering, 2008).

Early research in this area showed lexically unsupported repetition of structure in bilinguals that closely mirrored findings from monolingual speakers (Reference Loebell and BockLoebell & Bock, 2003): priming occurred between English and German dative sentences (both languages allow PO and DO syntax) but not between English and German transitive sentences (English and German passive syntax differs). Later studies showed reliable between-language priming even across languages with different word orders (e.g., Reference Bernolet, Hartsuiker and PickeringBernolet et al., 2009; Reference Hwang and ShinHwang & Shin, 2019; also see Reference Khoe, Tsoukala, Kootstra and FrankKhoe, Tsoukala, Kootstra & Frank, 2021, for a model). These results are strongly supportive of abstract syntactic accounts in demonstrating that syntactic structure can persist in the absence of any lexical similarity between prime and target sentences, as long as there was syntactic similarity in the prime and target sentences (Reference Bernolet, Hartsuiker and PickeringBernolet et al., 2007). Based on similar evidence with English and Spanish transitive sentences, Reference Hartsuiker, Pickering and VeltkampHartsuiker, Pickering and Veltkamp (2004) proposed a shared-syntax model that extended Reference Pickering and BraniganPickering and Branigan’s (1998) lexical account of priming by adding combinatorial nodes linked to lemmas in two languages rather than only one language. However, bilinguals and language learners do show sensitivity to structure frequencies (i.e., stronger priming for less frequent structures; Reference Hwang and ShinHwang & Shin, 2019; Reference Kaan and ChunKaan & Chun, 2018; Reference Kootstra and DoedensKootstra & Doedens, 2016), which is more consistent with implicit learning accounts.

As in research on monolingual production, further studies considered the extent of lexical involvement in structural processing and showed a high degree of similarity in within-language and between-language priming. Most strikingly, Reference Salamoura and WilliamsSalamoura and Williams (2006) found evidence of single-verb priming comparable to Reference Melinger and DobelMelinger and Dobel (2005) in bilingual speakers: presenting PO-only and DO-only verb primes in one language (L1, Dutch) increased production of PO and DO target sentences, respectively, in a second language (L2, English). Reference Schoonbaert, Hartsuiker and PickeringSchoonbaert, Hartsuiker and Pickering (2007) showed that priming PO and DO syntax from L1 to L2 was enhanced when primes and targets shared the same verb (irrespective of the verb’s cognate status), although the cross-linguistic translation-equivalent boost was smaller than the within-language lexical boost. An analogous translation-equivalent boost was not observed from L2 to L1, which can be explained by differences in the ease with which L1 and L2 verbs activate each other (L1 targets may not reactivate L2 primes as strongly as L2 targets reactivate L1 primes). Cross-linguistic priming of NP structure (e.g., the girl’s apple vs. the apple of the girl; Reference Bernolet, Hartsuiker and PickeringBernolet, Hartsuiker & Pickering, 2012) did show a cognate effect, suggesting a possible role for feedback from phonology to the lemma level in line with interactive models of lexical access (but see Reference Cai, Pickering and BraniganCai et al., 2012, for a different argument). However, the longevity of the translation-equivalent boost – that is, the key parameter supporting interpretations of structural repetition in terms of implicit learning of abstract structures rather than in terms of lexicalist accounts in monolingual speakers – remains to be determined (see Reference van Gompel and Araivan Gompel & Arai, 2018, for a review).

These results are broadly compatible with accounts assuming shared syntactic representations across languages, as well as some degree of dependence on lexical information. Importantly, the fact that proficiency levels vary across speakers as well as within speakers over time offers a unique means of determining how the balance between lexically unsupported and lexically supported syntactic processing might shift with linguistic experience (Reference Hartsuiker and BernoletHartsuiker & Bernolet, 2017; see Reference JacksonJackson, 2018, for a review). For example, the magnitude of cross-linguistic priming without lexical repetition was found to increase with speakers’ proficiency in the target (L2) language but the magnitude of priming with lexical support was larger in speakers with lower L2 proficiency (Reference Bernolet, Hartsuiker and PickeringBernolet, Hartsuiker & Pickering, 2013; also see Reference Jackson and RufJackson & Ruf, 2017). This suggests greater overlap in L1 and L2 structural processes in more proficient speakers but more reliance on the lexicon (i.e., less abstraction of syntax) in less proficient speakers. Such results are consistent with early descriptions of developmental changes in children (see e.g., Reference TomaselloTomasello, 2000, for arguments about lexical dependence; Reference FisherFisher, 2002, for arguments about abstract syntax; Reference Rowland, Chang, Ambridge, Pine and LievenRowland et al., 2012 and Reference Peter, Chang, Pine, Blything and RowlandPeter, Chang, Pine, Blything & Rowland, 2015, for an updated account).

2.2.4 Structural Priming in Dialogue

Going beyond lexicalist and abstract syntactic accounts, psycholinguistic theories have also suggested that repetition of structure, together with repetition of a number of linguistic properties, may serve a communicative function in dialogue (Reference Ferreira and BockFerreira & Bock, 2006; Reference Pickering and GarrodPickering & Garrod, 2004). Interlocutors normally establish common ground and align representations over the course of a conversation. Alignment at the semantic and lexical level in conversational settings is well-established: for example, speakers repeat lexical items previously used in the discourse (Reference Levelt and KelterLevelt & Kelter, 1982) and begin to refer to known referents with reduced lexical descriptions (‘the dancer’ instead of a longer description for a tangram resembling a cartoon person in motion; Reference Clark and Wilkes-GibbsClark & Wilkes-Gibbs, 1986). They also reliably produce target sentences that repeat structures recently used by conversational partners in prime sentences (datives and NPs in collaborative card-matching tasks; Reference Branigan, Pickering and ClelandBranigan, Pickering & Cleland, 2000; Reference Branigan, Pickering, McLean and ClelandBranigan, Pickering, McLean & Cleland, 2007; Reference Cleland and PickeringCleland & Pickering, 2003), both within and across languages (Reference Hartsuiker and PickeringHartsuiker & Pickering, 2008). Such findings have a number of implications.

First, the occurrence of structural repetition in dialogue shows that priming can occur from comprehension to production as readily as from production to production. Early priming studies that used non-interactive paradigms involved a production task in both prime and target trials: participants heard a prime sentence which they had to repeat out loud and then generated a new sentence in target trials (e.g., Reference BockBock, 1986). Repetition of a prime sentence implies that, in principle, any changes observed in target trials could be attributed to the engagement of production processes in prime trials, rather than showing generalisation directly from comprehension to production. More recent research has ruled out this explanation: in comprehension-to-production priming studies, participants are exposed to sentences with a given structure in a prime trial without immediately repeating them out loud, and production is then monitored in a subsequent target trial. Reference Bock, Dell, Chang and OnishiBock and colleagues (2007) reported structural priming effects of similar magnitude – and similar persistence across lags – to tasks requiring production of both primes and targets, suggesting that generalisation of structure is not compromised by changes in modality (repetition of prime sentences, however, may serve a more supportive function in second language production; see Reference Jackson and RufJackson & Ruf, 2018).

By analogy to prime-target paradigms used in the lab, utterances produced by one conversational partner in a dialogue serve as ‘comprehension primes’ and utterances produced by the other conversational partner are ‘production targets’. But does structural repetition in dialogue occur automatically or is dialogue ‘special’? Communicative success is often described as a joint effort, prompting questions about modulation of priming by the social nature of the production setting: one might expect repetition effects to be larger in dialogue than in single-speaker, non-interactive settings (or rather, given that language use in interactive settings is the norm rather than the exception, one might expect repetition effects in non-interactive settings to underestimate the magnitude of repetition effects in everyday language use). This is indeed often the case. For example, Reference Branigan, Pickering and ClelandBranigan and colleagues (2000, Reference Branigan, Pickering, McLean and Cleland2007) and Reference Cleland and PickeringCleland and Pickering (2003) reported priming effects that were larger than in most non-interactive studies. Reference Schoot, Hagoort and SegaertSchoot, Hagoort and Segaert (2019) confirmed these differences in a between-participant comparison of priming in an interlocutor-present and interlocutor-absent condition, suggesting that repetition of structure may not simply occur automatically but rather that it may be additionally influenced by speakers’ communicative goals. However, Reference Ivanova, Horton, Swets, Kleinman and FerreiraIvanova, Horton, Swets, Kleinman and Ferreira (2020) obtained priming effects of similar magnitude in interlocutor-present and interlocutor-absent conditions, both with between-participant and within-participant manipulations, and argued that potential differences between interactive and non-interactive priming effects in earlier studies may be due to differences in participants’ attention and engagement in the two types of settings. Consistent with this hypothesis is the observation of stronger priming in corpus data from a goal-driven task than in spontaneous conversation (Reference Reitter and MooreReitter & Moore, 2014).

If dialogue is special, one might also expect the magnitude of repetition effects to vary between two-party and multi-party settings together with participants’ roles in the conversational exchanges. Speakers are indeed more likely to reuse recently heard structures when they are addressed directly by an interlocutor than when are were not addressed directly (i.e., when the prime sentence is addressed to another speaker in the conversational setting; Reference Branigan, Pickering, McLean and ClelandBranigan et al., 2007), consistent with the hypothesis that repetition of structure may be mediated by heightened attention and task engagement. The magnitude of structural repetition can also vary with the identity and social evaluations of one’s conversational partner (e.g., humans vs. computer-like avatars in Reference Heyselaar, Hagoort and SegaertHeyselaar, Hagoort & Segaert, 2017; likeable vs. less likeable confederates in Reference Balcetis and DaleBalcetis & Dale, 2005; teachers evaluated more vs. less positively in Reference Hwang and ChunHwang & Chun, 2018) as well as evaluations of the likelihood of communicative success (e.g., humans vs. computers in Reference Branigan, Pickering, Pearson and McLeanBranigan, Pickering, Pearson & McLean, 2010), although ‘social’ effects can also be observed in non-interactive paradigms (Reference Weatherholtz, Campbell-Kibler and JaegerWeatherholtz, Campbell-Kibler & Jaeger, 2014). Interestingly, the magnitude of repetition does not seem to vary with the degree to which conversational partners align with the participants’ productions (Reference Schoot, Hagoort and SegaertSchoot et al., 2019).

The larger question of whether structural alignment systematically supports communicative success, however, is still open. While communicative pressures can shape language structure (e.g., Reference Christensen, Fusaroli and TylénChristensen et al., 2016; Reference Fehér, Wonnacott and SmithFehér et al., 2016), the evidence linking repetition of structure to communicative benefits is mixed. Structural alignment can reduce processing times in production and comprehension (e.g., Reference Ferreira and BockFerreira & Bock, 2006; Reference Pickering and GarrodPickering & Garrod, 2004), but structural repetition effects need not be partner specific (e.g., Reference Ferreira, Kleinman, Kraljic and SiuFerreira, Kleinman, Kraljic & Siu, 2012) and are not directly correlated with task success (e.g., Reference Branigan, Pickering, McLean and ClelandBranigan et al., 2007; Reference Ivanova, Horton, Swets, Kleinman and FerreiraIvanova et al., 2020). The scarcity of supporting evidence may be due to a number of factors. It is possible that repetition of structure alone is less clearly linked to communicative success than repetition of lexical items (i.e., lexical alignment, which is an explicit and strategic choice made by the speaker; e.g., Reference Suffill, Kutasi, Pickering and BraniganSuffill, Kutasi, Pickering & Branigan, 2021), or that alignment at multiple levels is needed to boost communicative success. It may also be the case that laboratory tasks elicit relatively unchallenging conversational exchanges and thus fail to uncover possible benefits of alignment. Interestingly, Reference Reitter and MooreReitter and Moore (2014) found evidence supporting the alignment hypothesis in rich and unsupervised task-driven dialogues between conversational partners completing the Map Task, that is, a task where one participant gives instructions for drawing a route on a map to another participant and thus where alignment of situation models is critical for task success. Participants who showed more long-term (but not short-term) structural alignment in this task had more similar paths. Drawing causal inferences about the role of abstract syntax from these data is complicated, but the relationship between long-term linguistic adaptation and task performance suggests that alignment needs to be tracked over longer time intervals than is typically done in the lab. Designing suitable tasks to detect such effects is a methodological challenge that we return to in Section 4.1.

2.3 Conclusions

In sum, the evidence on lexical or abstract syntactic control of grammatical encoding from structural priming paradigms is mixed. On the one hand, there is support for the psychological reality of abstract syntax in studies showing syntactic influences on structure choice; on the other hand, there is also evidence of non-syntactic influences on structure choice in similar sentences. The magnitude of lexically supported structural repetition effects is often larger than that of lexically unsupported repetition but has a shorter lifespan. On balance, while the generation of syntactic structures may not be fully lexically independent, there are clear limits to lexical effects. These limits are crucial for determining key architectural properties of production models, both for monolingual and bilingual speakers, such as the separation of the content and sequencing systems in the Dual Path model (Reference Chang, Dell and BockChang et al., 2006), and key processing parameters, such as determining the weights assigned to thematic roles in these models. The next section considers the question of lexical and syntactic influences on grammatical encoding by tracking the time-course of sentence planning.

3 The Time-Course of Grammatical Encoding: Planning Scope

The fluent production of spoken sentences requires speakers to plan ahead, both in terms of grammatical structures and lexical content. As reviewed above, theories differ in how they model the interdependence between these processes. What is undisputed is that utterances are not usually fully planned prior to articulation. According to the incrementality proposal (e.g., Reference Kempen and HoenkampKempen & Hoenkamp, 1987; Reference LeveltLevelt, 1989, Reference Levelt1992; Reference Levelt, Roelofs and MeyerLevelt et al., 1999), utterances are generated in a piecemeal fashion, allowing speakers to output early parts of an utterance while planning upcoming parts. Incrementality means that each sequential processing stage can be initiated based on only a piece of information from the preceding stage, thereby allowing for parallel processing of different parts of an utterance at different levels of representation. For example, incremental processing would allow a speaker to articulate the initial portion of their utterance while grammatically and conceptually encoding upcoming parts.

A fully specified incremental model of speech production should state what determines the scope of advanced planning, that is, how much of an utterance is generated at a particular level of representation before processing at the next level can begin. However, the degree to which grammatical planning is completed prior to utterance onset remains a matter for debate. Allowing small increments to planning prior to speech onset, for example lexical increments, would of course facilitate the speed of output and reduce memory costs. However, incremental systems must also have processes that determine the order in which different utterance parts should be encoded to reduce linearisation errors. Such ordering processes should be influenced by the grammatical systems of the target language, as languages differ in terms of how, and how flexibly, syntactic units can be ordered (e.g., Reference Allum and WheeldonAllum & Wheeldon, 2007, Reference Allum and Wheeldon2009; Reference Hwang and KaiserHwang & Kaiser, 2014a; Reference Momma, Slevc and PhillipsMomma et al., 2016; Reference Norcliffe, Konopka, Brown and LevinsonNorcliffe et al., 2015; Reference Myachykov, Scheepers, Garrod, Thompson and FedorovaMyachykov, Scheepers, Garrod, Thompson & Fedorova, 2013; Reference Sauppe, Norcliffe, Konopka, Van Valin and LevinsonSauppe, Norcliffe & Konopka, van Valin & Levinson, 2013). For example, Reference Myachykov, Scheepers, Garrod, Thompson and FedorovaMyachykov and colleagues (2013) observed a broader planning scope in the more syntactically flexible language Russian than in English.

Planning scope will also be affected by different backward dependencies in languages, such as obligatory morphological markers that link lexical representations within phrases. For example, determiners and adjectives in NPs are marked for noun gender in Norwegian (e.g., ‘et rødt hus’ indefinite neuter, ‘en rød bil’ indefinite masculine); therefore, the correct form of the determiner and adjective is dependent on the gender of the upcoming noun. Moreover, as discussed in Section 1.1.4, not all grammatical dependencies occur locally within a phrase but can cross clause boundaries (e.g., Reference MommaMomma, 2021. Reference Momma2022; Reference MommaSarvasy, Morgan, Yu, Ferreira & Momma, 2022). There are also questions about the representation and planning of backward dependencies due to collocational factors (e.g., strong coffee/powerful computer) and idiomatic phrases (e.g., ‘kick the bucket’), which must be represented and processed as complete units at some level (e.g., Reference Smith and WheeldonSmith, 2000). However, idioms also differ in the degree to which they can be adapted syntactically (e.g., Reference FellbaumFellbaum, 2019) and, as was discussed above (Section 2.1), there is strong evidence from structural priming studies that grammatical encoding processes function in a similar way during the production of idiomatic and non-idiomatic utterances (Reference Konopka and BockKonopka & Bock, 2009).

Current models of grammatical encoding make very different predictions about planning units. Some lexically driven models (e.g., Reference Bock, Levelt and GernsbacherBock & Levelt, 1994; Reference Garrett and ButterworthGarrett, 1980a, Reference Garrett and Fromkinb; Reference LeveltLevelt, 1989) require verb subcategorisation information to assign content words to grammatical functions and to initiate structure generation. These models therefore propose a clausal scope for grammatical encoding. Other approaches also give verbs a central role in planning but with some restrictions (e.g., Reference Momma, Slevc and PhillipsMomma et al., 2016; Reference Momma, Slevc and PhillipsMomma, Slevc, & Phillips, 2018, discussed in Section 3.1) and allow abstract syntactic structures to interact with lexical representations to guide planning (e.g., Reference MommaMomma, 2021). In contrast, the Dual Path model of Chang and colleagues (described in Section 1.1.4) proposes that the integration of lexical content into syntactic structures proceeds on a word-by-word basis guided by the mapping of thematic to syntactic structures. Of course, the planning scope for lexical and syntactic processes need not coincide and may vary according to a number of factors, both linguistic and non-linguistic, requiring more flexible and adaptive models of grammatical encoding (e.g., Reference Dell, Jacobs, Hickok and SmallDell & Jacobs, 2016). In this section, we review the evidence for the scope of advanced planning in grammatical encoding and of the linguistic and cognitive factors that can determine it.

3.1 Evidence for Grammatical Planning Scope: Effects of Linguistic Structure

The first evidence for planning units came from studies of pausing and speech errors (e.g., Reference Butterworth and ButterworthButterworth, 1980; Reference Bock and CuttingBock & Cutting, 1992; Reference Garrett and ButterworthGarrett, 1980a; Reference Goldman EislerGoldman Eisler, 1968). However, in order to investigate the time-course of grammatical planning, a variety of online methodologies have been employed. Many studies have focused on measures of lexical processing during production of utterances consisting of NPs (e.g., ‘The hat and the tree …’). Early eye-tracking studies of object naming showed evidence for a radically incremental lexical planning scope. The objects to be described were fixated one-by-one in the order of mention, and fixation durations were affected by the conceptual, lexical and phonological properties of the picture name and not by properties of the picture named next (Reference GriffinGriffin, 2001, Reference Levelt and MeyerLevelt & Meyer, 2000; Reference Meyer, Sleiderink and LeveltMeyer, Sleiderink & Levelt 1998; Reference Meyer, Wheeldon, Van der Meulen and KonopkaMeyer, Wheeldon, van de Meulen & Konopka, 2012). The timing of the first fixation to the next picture was slightly in advance of the articulation of the preceding picture name. The data were consistent with the first picture being processed to the level of phonological encoding prior to gaze shifting to the next picture. However, Reference Meyer, Wheeldon, Van der Meulen and KonopkaMeyer and colleagues (2012) also demonstrated that, with increased practice, the time between the shift of gaze from an object and the articulation of its name becomes shorter, and there is also evidence of the peripheral processing of upcoming pictures to be named (e.g., Reference Morgan and MeyerMorgan & Meyer, 2005; Reference Schotter, Ferreira and RaynerSchotter, Ferreira & Rayner, 2013), suggesting a greater degree of advanced planning. Moreover, in these studies, the same structure was used on all trials (e.g., the hat and the tree), minimising the effect of conceptual and grammatical structure on planning. With the production of more variable syntactic structures, object-by-object fixations are usually preceded by an initial scan of the visual scene, suggesting more extensive processing of the visual display (e.g., Reference Griffin and BockGriffin & Bock, 2000). Exactly what information is retrieved, and which representations are constructed based on this initial scan, remains a matter of debate, to which we return later in this section.

A number of experimental sentence production studies have provided evidence suggestive of grammatical constraints on planning scope. Reference Levelt, Maassen, Klein and LeveltLevelt and Maassen (1981) asked participants to describe displays of moving shapes and found that latencies to initiate description with coordinate NPs such as ‘The circle and the square move up’, were longer than for descriptions involving coordinate sentences such as ‘The circle moves up and the square moves up’. The coordinate sentence structures are longer and more complex than the coordinate NPs but were initiated more quickly – a finding consistent with incremental planning of the first clause (‘the circle moves up’) or phrase (‘the circle’) during the production of the coordinate sentences. Reference Smith and WheeldonSmith and Wheeldon (1999) used an extended version of this methodology in order to determine whether the clause or the phrase defined the planning scope during fluent sentence production. Clearly not all speech is fluent, but the aim of these studies was to investigate the optimal time-course for incremental planning, that is, how the system operates when everything is going well. Their studies included a large number of simple line drawings of objects (ninety-two in total) and used filler trials to vary the sentence structures produced over the course of the experiment. Experimental trials involved a horizonal row of three pictured objects that moved up or down. The task was to describe the display from left to right as quickly and fluently as possible. As shown in Figure 4, correct descriptions were single clauses of equal complexity, comprising both coordinate and simple NPs but differing in which phrase type was produced first.

Figure 4 Example stimuli from Reference Smith and WheeldonSmith and Wheeldon (1999, Experiment 1). Trials began with a warning frame (A) for 500 ms followed by a blank screen for 500 ms. A horizontal array of pictures then appeared, some of which immediately began to move (2.5 cm in 600 ms). Participants were instructed to describe the array from left to right. To-be-produced sentences thus comprised either a coordinate NP followed by a simple NP (B) or the reverse (C). Pictures from Cycowicz et al (1997)

Fluent and correct utterances beginning with coordinate NPs were initiated significantly more slowly than utterances beginning with simple NPs (a difference of 77 ms). This finding is inconsistent with both lexical and clausal planning scopes, instead suggesting that speakers planned the first phrase prior to speech onset. This finding was replicated by Reference Martin, Miller and VuMartin, Miller and Vu (2004; see also Reference Martin and FreedmanMartin & Freedman, 2001), who also tested two aphasic patients (ML, EA) to show that the processing of complex NPs caused a marked processing disadvantage for the patient with a short-term memory disadvantage in semantic retention (ML) but not for the patient with a deficit in phonological retention (EA). Reference Martin, Miller and VuMartin and colleagues (2004) argued that the effect was therefore occurring during planning of lexical semantics (see also Reference MartinMartin, 2021; Reference Martin and SchnurMartin & Schnur, 2019). More recently, the Reference Smith and WheeldonSmith and Wheeldon (1999) methodology has been used to demonstrate a phrasal planning scope in both the dominant and non-dominant languages of bilingual speakers (Reference Li, Ferreira and GollanLi, Ferreira & Gollan, 2022). Moreover, when speakers were required to switch languages to name the second picture in the displays, switch costs (in terms of speech-duration measures) were observed later in the simple–complex than the complex–simple sentences. The first noun and determiner showed longer production latencies in the initial complex NPs in the language-switch trials, whereas similar switch costs occurred only on the second noun or just prior to it in the simple NP sentences. These findings are consistent with phrasal planning occurring prior to language-switch planning in bilingual sentence production.

Importantly, the effect of phrase size observed using the Reference Smith and WheeldonSmith and Wheeldon (1999) methodology cannot be attributed to effects of visual complexity, that is, the grouping of pictures visually rather than syntactically, as it is not observed when speakers are asked to name the pictures left to right rather than to produce a sentence (Reference Martin, Crowther, Knight, Tamborello and YangMartin, Crowther, Knight, Tamborello & Yang, 2010; see also Reference Wheeldon and MeyerWheeldon & Meyer, 2005). Of course, the phrase complexity effect might be driven by the predictability of the verb. If, for example, speakers prefer to retrieve at least two content words prior to speech onset, then a simple NP will be easier to plan because the second content word is always the same verb (‘moves’). This possibility was ruled out by Reference Martin, Crowther, Knight, Tamborello and YangMartin and colleagues (2010), who replicated the effect with varying verbs. The effect is also observed in a verb-final language such as Japanese. Reference Allum and WheeldonAllum and Wheeldon (2007) used coloured picture displays to elicit sentences such as 11a and 11b below. This experiment also ruled out an explanation of the phrase complexity effect in terms of phonological planning. The generation of the phonological structure of utterances is also an incremental process, and there is evidence for the phonological word as a planning unit (Reference Wheeldon and LahiriWheeldon & Lahiri, 1997, Reference Wheeldon and Lahiri2002; Reference Wynne, Wheeldon and LahiriWynne, Wheeldon & Lahiri, 2018). A phonological word comprises a lexical word plus any following unstressed syllables. The coordinate phrases in the English sentences begin with a larger phonological word (e.g., ‘the cup and the’) than the simple sentences (e.g., ‘the cup’). However, this is not true of the Japanese sentences in 11, which are perfectly matched for initial phonological word structure.

1. a. [INU to BOUSHI wa] FOOKU no ue ni arimasu[Dog and hat TOP] fork above areThe dog and the hat are above the fork
2. b. [INU wa] BOUSHI to FOOKU no ue ni arimasu[Dog TOP] hat and fork above isThe dog is above the hat and the fork

The relative contributions of conceptual and syntactic structure to phrasal scope effects are more difficult to untangle. The relationship between the thematic structure to be expressed and the unfolding syntactic structure that is constructed to express it is not simple. Nevertheless, these representations will, of course, share many structural features. In the simple-sentence structures tested in the studies described above, the sentence-initial phrases represent key units at both conceptual (agent or theme) and grammatical (subject phrase, and head of the subject phrase) levels. Reference Allum and WheeldonAllum and Wheeldon (2007) tested the production of utterances with complex subject phrases which included a modifying prepositional phrase such as ‘The cup above the hat is blue’. These sentences were initiated approximately 100 ms faster than coordinate NP sentences (e.g., ‘the cup and the hat are blue’) in which both simple NPs act as hierarchically equal heads. This pattern is inconsistent with the scope of planning being the entire subject phrase and suggests instead that speakers planned only the head of the subject phrase prior to speech onset. The generation of similar sentences in Japanese was tested in order to determine whether the initial unit was determined at a grammatical (head of initial phrase) or a conceptual level (theme) of representation (Experiment 2). Unlike English, Japanese is a head-final language, in which a modifying prepositional phrase occurs before the head of a subject phrase (see Figure 5). In a head-final language, therefore, the first grammatical phrase is not necessarily the theme of the sentence. This allows us to determine whether conceptual salience or grammatical convention governs processing scope. The stimuli for this experiment varied the size of the sentence-initial prepositional phrase while keeping the size of the subject phrase as a whole constant. Latencies increased by approximately 50 ms with each lexical addition to the prepositional phrase, a finding consistent with Japanese speakers planning the sentence-initial phrase prior to speech onset rather than the whole subject phrase. Critically, this phrase does not encode a major or the most salient thematic unit. The sentence-initial phrase is determined by Japanese syntax and is therefore consistent with a grammatical encoding locus for the phrasal scope effect.

Figure 5 Example stimuli from Reference Allum and WheeldonAllum and Wheeldon (2007, Experiment 3). The picture stimuli elicited Japanese sentences with an initial modifying prepositional phrase comprising one, two or three nouns. As the side of the prepositional phrase increased the size of the head of the subject phrase decreased, maintaining a fixed size for the subject phrase across the three conditions. Naming latencies and percentage error rates for the three conditions are shown, as well as the latency increase associated with the increasing initial phrase size. pictures from Cycowicz et al (1997)

Grammatical encoding has two main component processes: lemma retrieval and structure building, but how do these processes contribute to the scope effect? Does phrase structure determine the scope of lexical access prior to articulation? This is an important question as it has consequences for modelling the mapping between conceptual and grammatical units. As reviewed in Section 1.1.4, theories of grammatical encoding differ in the degree to which they are led by lexical or structural representations. In lexically driven models (Reference Bock, Levelt and GernsbacherBock and Levelt, 1994; Reference LeveltLevelt, 1989, Reference Levelt1992; Reference Pickering and BraniganPickering and Branigan, 1998), the order of lexical activation is determined by the conceptual weighting of lexical concepts. A highly activated lexical concept will activate its associated lemma and this in turn will influence function assignment and the syntactic structure to be generated. For example, if the patient concept ‘Bill’ in our example above was the most highly activated, the lemma for ‘Bill’ might be assigned the grammatical subject function, resulting in a passive sentence (‘Bill was seen by Anne’). Indeed, the salience of lexical concepts can affect structural choice (e.g., Reference BockBock, 1986; Reference Gleitman, January, Nappa and TrueswellGleitman, January, Nappa & Trueswell, 2007), although most significantly when event encoding is difficult (Reference Kuchinsky and BockKuchinsky & Bock, 2010). Moreover, the effect of salience has also been shown to be subject to language-specific grammatical constraints (e.g., Reference Hwang and KaiserHwang & Kaiser, 2015; Reference Myachykov, Garrod, Scheepers, Mishra and SrinivasanMyachykov, Garrod & Scheepers, 2010; Reference Myachykov, Garrod and ScheepersMyachykov, Garrod & Scheepers, 2018; Reference Myachykov, Thompson, Scheepers and GarrodMyachykov, Thompson, Scheepers & Garrod, 2011; Reference Myachykov and TomlinMyachykov & Tomlin, 2008; but see Reference Schlenter, Esaulova, Dolscheid and PenkeSchlenter, Esaulova, Dolscheid & Penke, 2022, for a null effect of case marking). Many studies have now shown that the order of lexical activation is driven by the requirements of a given structure (e.g., active or passive sentences; Reference Griffin and BockGriffin & Bock, 2000) or of the structural requirements of a given particular language (e.g., Reference Hwang and KaiserHwang & Kaiser, 2014b; Reference Momma, Slevc and PhillipsMomma et al, 2016; Reference Norcliffe, Konopka, Brown and LevinsonNorcliffe et al., 2015; Reference Sauppe, Norcliffe, Konopka, Van Valin and LevinsonSauppe et al., 2013).

If the scope of grammatical encoding is the sentence-initial phrase, the question then arises of how a non-linear thematic structure controls the order of lemma activation for initial phrases with no conceptual weighting, such as the modifying phrase ‘above the crab’ in Japanese (Figure 5). If the sentence-initial phrase also determines the minimal scope of lexical retrieval, then, arguably, it must be possible for syntactic and thematic processes to interact to determine the order of lexical access. In the Dual Path model (Reference ChangChang, 2002; Reference Chang, Dell and BockChang et al., 2006), syntactic structure is generated based on the thematic structure and learned syntactic rules independently of the retrieval of lexical content. The TAG model (Reference MommaMomma, 2021) also has mechanisms by which some thematic and syntactic structures can interact directly and independently of lexical content. Reference Allum and WheeldonAllum and Wheeldon (2009; see also Wheeldon, 2012) investigated this issue using a picture-preview paradigm in which one of the pictures to be named on a picture-description trial was previewed for one second prior to the onset of the picture to be described. Picture preview occurred on a third of all trials and participants knew only that the previewed picture would always occur in the upcoming display but not where it would occur in the display or in the sentence description. Both native English and Japanese speakers produced picture descriptions such as 12a–d below and the previewed pictures were either the first or the second objects to be mentioned.

1. a. [The fork] above the dog is blue
2. b. [Inu no ue no] fooku wa ao desu
3. c. [Inu to fooku wa] ao desu
4. d. [The dog and the fork] are blue

The prediction was that picture preview should facilitate sentence production only if the picture name was required for utterance planning prior to speech onset. If the scope of lexical access is determined by phrase structure, then during the production of the prepositional subjected phrases in 12a and 12b, a preview benefit should be observed only for the first noun to be produced in speakers of both English and Japanese – even though the initial noun plays different grammatical roles in each language. This was the effect observed. In contrast, both the first and second nouns in the coordinate NP sentences such as 12c and 12d showed significant preview benefits, although they were significantly larger for the first than for the second nouns. The pattern of picture-preview benefits therefore matches the phrasal scope finding reviewed above and suggests that a phrasal scope determines the scope of lexical access prior to speech onset.

The structural nature of the picture-preview effects was confirmed in an experiment which tested two different forms of coordination in Japanese with different conceptual and syntactic characteristics (Reference Allum and WheeldonAllum & Wheeldon, 2009). The to-wa coordination used in the previous experiments binds the coordinates as a set and is usually contrastive. So, for example, the sentences in 12d above suggests that the dog and the fork are blue in contrast to the colour of other objects. An alternative form of coordination is mo-mo (e.g., ‘[Inu mo fooku mo] ao desu’), which binds items more loosely conceptually and syntactically. Mo-mo functions as a listing form of coordination and is not contrastive, that is, the dog and the fork are blue as well as other objects. The different forms of coordination also have consequences for the scope of application of adjectives. In a sentence such as ‘the blue cup and plate are broken’, with to-wa coordination both objects are blue, but with mo-mo coordination only the cup is blue. These structural differences would predict differences in planning scope. Using the same visual displays as for the to-wa experiments described above, the mo-mo coordination showed no preview benefit for the second picture to be named.

Defining the syntactic unit determining the phrasal scope effect is, however, not straightforward. The coordinate NPs need not correspond to a salient thematic unit or to a major grammatical phrase such as the subject phrase or even its head. They also do not correspond to a minimal grammatical phrase as coordinate NPs, which are constructed from two simple phrases, are planned as a unit. Reference Allum and WheeldonAllum and Wheeldon (2007; see also Reference Zhao, Alario and YangZhao, Alario & Yang, 2014) suggested that some reference to thematic structure is required in order to define the phrasal unit, which also needs to function as a minimal thematic unit in the message.

The picture-preview experiments provide evidence that the phrasal scope determines the lower limit of lexical planning prior to articulation. As argued above, this finding suggests that higher-level conceptual-syntactic representations are constructed prior to (and thus guide) lexical access. This is consistent with the Dual Path model, which allows direct mapping from thematic to grammatical structure in order to ensure the correct order of word retrieval model (Reference Chang, Dell and BockChang, et al., 2006; also Reference Konopka and BockKonopka & Bock, 2009). But what is the nature of the guiding syntactic representation? Although processing of the initial phrase is prioritised prior to speech onset, there is also clear evidence of more global processing beyond the initial phrase. Reference Griffin and BockGriffin and Bock (2000; see also Reference KonopkaKonopka, 2012, Reference Konopka2019; Reference SkalickyVan de Velde, Meyer & Konopka, 2014) argued that, when describing event pictures spontaneously, initial gazes are not predictive of what speakers will say (e.g., they are not predictive of structure choice) but rather reflect a gist apprehension phase that involves encoding a conceptual structure of the pictured event prior to the onset of linguistic encoding. Reference KonopkaKonopka (2019) investigated the time-course of the planning of relational information in simple sentences (e.g., ‘The tiger is scratching the photographer’), that is, planning of the main action of the event at the message level (i.e., scratching) as well as the verb (i.e., ‘scratching’) to express this action at the sentence level. In two eye-tracking experiments, speakers described pictured events in response to questions that were either neutral (e.g., 13a) or focused the speaker on the agent (13b) or the patient (13c) in the event. Agents and patient characters were either more or less informative about the action being performed.

1. a. Neutral: What is happening?
2. b. Agent-focused: What is the tiger doing?
3. c. Patient-focused: What is happening to the photographer?

Neutral questions elicited 77 per cemt active sentences, whereas the focused questions (13b and 13c) elicited almost exclusively active and passive sentences, respectively. Eye movements were analysed in two time windows based on previous studies, which have shown that fixations in the first 400 ms of a trial are related to message planning and that fixations thereafter are related to linguistic planning (Reference Gleitman, January, Nappa and TrueswellGleitman et al., 2007; Reference Griffin and BockGriffin & Bock, 2000; Reference Konopka and MeyerKonopka & Meyer, 2014; Reference Konopka, Meyer and ForestKonopka et al., 2018). In all conditions, eye movements during the 0–400 ms window moved between agents and patients, but with a preference for the character that was more informative for the purpose of encoding the event action, consistent with early relational processing. Effects of the question manipulation were observed in the second window. Following the neutral questions, speakers looked at the two characters in their order of mention, but when answering agent-focused questions, speakers looked at both characters until speech onset (1200 ms) and thereafter they mostly fixated on the patient. This pattern is consistent with conceptual encoding of relational information (necessary for verb retrieval) prior to speech onset, an interpretation supported by a preference to focus on action-informative characters in this time window. In contrast, no such encoding window was observed for passive sentences after patient-focused questions. Instead, speakers fixated on patients followed by agents after 400 ms. Nevertheless, the gaze patterns clearly demonstrate the early extraction of relational information required for hierarchical processing both during message-level planning and the mapping of the message to the required sentence output.

Models of grammatical encoding therefore need to account for the planning of both global and local representations of utterances. As reviewed in Section 1.1.4, the classic lexically driven model of grammatical encoding requires verbs to be retrieved prior to phrase structure building as verbs are required for function assignment (e.g., Reference Bock, Levelt and GernsbacherBock & Levelt, 1994; Reference LeveltLevelt, 1989; Reference Levelt, Roelofs and MeyerLevelt et al., 1999; Reference Pickering and BraniganPickering & Branigan, 1998). This claim was tested by Reference Schriefers, Teruel and MeinshausenSchriefers, Teruel and Meinshausen (1998), who used an extended version of the picture–word interference task (Reference MeyerMeyer, 1996; Reference Schriefers, Meyer and LeveltSchriefers, Meyer & Levelt, 1990) in which native German speakers produced descriptions of pictured actions like ‘The man empties the bucket’. The position of the verb in the picture description was manipulated using lead-in phrases such as those in 14 below.

1. a. SVO Der Mann leert den Eimer The man empties the bucket
2. b. (Auf dem nächsten Bild sieht man wie) (On the next picture one sees how)
  SOV - der Mann den Eimer leert - the man the bucket empties)
3. c. (Und auf dem nechsten Bild) (And on the next picture)
  VSO - leert der Mann den Eimer - empties the man the bucket)

Picture onset was accompanied by a semantically related distractor verb (e.g., ‘empty’) which causes interference and slows naming (Reference Schriefers, Meyer and LeveltSchriefers et al., 1990). Compared to an unrelated distractor verb (e.g., ‘writes’), the semantically related distractor caused interference only in the production of VSO sentences (e.g., 15c), suggesting that only verbs in sentence-initial position are retrieved prior to speech onset. However, more recent research suggests that verb retrieval may occur prior to speech onset when their relationship to object arguments, rather than subject arguments, is critical. For example, Reference Momma, Slevc and PhillipsMomma and colleagues (2016) used the same methodology to test Japanese speakers’ production of object-initial and subject-initial sentences and observed semantic interference effects only for object-initial sentences. Other studies have shown evidence that verbs are planned before subject nouns produced in English passive sentences but not active sentences (Reference Momma, Slevc and PhillipsMomma, Slevc & Phillips, 2015). Reference Momma, Slevc and PhillipsMomma and colleagues (2018) showed a similar effect in the processing of two different classes of transitive verbs: unaccusative verbs which can only take patients or themes as their argument (e.g., 15a), and unergative verbs, which can only take agents as their argument (15b). Participants learned to produced sentence like those in 15a and 15b as picture descriptions.

1. a. The doctor is floating (unaccusative)
2. b. The doctor is sleeping (unergative)

Written distractor verbs were shown prior to picture onset and were either semantically related or unrelated to the target verb to be produced. A semantic interference effect on sentence onset latencies was observed for the unaccusative verbs but not for the unergative verbs. Spoken duration measures showed the reverse pattern of priming effects, with longer durations for the subject noun+auxilliary in the unergative sentences but not in the unaccusative sentences. This pattern of results is consistent with the unaccusative verbs being planned prior to sentence onset but the unergative verbs being planned during the articulation of the subject noun. Based on these data and on proposals from theoretical linguistics (see Reference Momma and Ferreira.Momma & Ferreira, 2021, for a discussion), Momma and colleagues make a distinction between a verb’s external (subject) arguments and internal (object) arguments. They propose that verbs only need to be retrieved prior to planning their internal arguments.

There is also evidence that hierarchical planning can cross clause boundaries. Reference Smith and WheeldonSmith and Wheeldon (1999) compared the production of one-clause and two-clause sentences such as in 16 below.

1. a. The cup moves up (one clause)
2. b. The cup and the hat move up (one clause)
3. c. The cup moves up and the hat and the chair move down (two clauses)
4. d. The cup and the hat move up and the chair moves down (two clauses)

The latency benefit for sentences with initial simple phrases was replicated confirming that processing of the initial phrase is prioritised prior to speech onset. However, the two-clause sentences (16c and 16d) took significantly longer to initiate (by 142 ms) than the single-clause sentences (16a and 16b), and the effect of initial phrase size was smaller in the two-clause sentences (78 ms) than the single-clause sentences (195 ms). This pattern of results is consistent with a degree of structural processing of elements in the second clause occurring prior to speech onset, which is also affected by their complexity.

Reference MommaMomma (2021) investigated the planning of long-distance dependencies such as the cross-clause filler-gap dependency in ‘Who does the artist think is chasing the ballerina?’. The methodology combined picture descriptions to elicit the target sentences in 17 with a priming manipulation for the use of ‘that’. The prime sentences either did or did not contain ‘that’ (e.g., The flight attendant thinks (that) the captain will announce something). They were presented as part of a sentence memorisation phase in which participants saw two sentences sequentially and were subsequently cued to produce one of them. Participants then produced target sentences as descriptions of pictured scenes. The aim was to test for effects of ‘that’ priming on the production of sentences where ‘that’ cannot legally occur in a long-distance dependency, such as in 17a versus 17b. The effect of the prime was to slow onset latencies for sentences such as 17a but not 17b, suggesting that the grammatical structure of the dependency is planned prior to speech onset.

1. a. Who does the artist think (*that) is chasing the ballerina?
2. b. Who does the artist think (that) the chef is chasing?

The planning of long-distance dependencies and of unaccusative verbs prior to speech onset is in conflict with the evidence for an incremental phrasal planning scope reviewed previously in this section. However, according to the tree adjoining grammar (TAG) model of grammatical encoding, long-distance dependencies can be planned without planning the intervening material, which can be tree-adjoined later. This model therefore provides an explicit mechanism for global structural planning within an incremental grammatical encoding system. Evidence for such a model was provided by Reference Momma and Ferreira.Momma and Ferreira (2021), who investigated the time-course of the planning of sentences with unaccusative verbs such as ‘The octopus below the spoon is boiling’. They employed the extended picture-word interference paradigm to test for interference from semantic distractors related to the verb (e.g., melt) or to the noun in the modifying prepositional phrase (e.g., knife). Onset latencies to the subject noun (e.g., octopus) were significantly slowed by verb distractors but not by noun distractors, suggesting that the verb, but not the modifying prepositional phrase, was planned prior to subject onset. Unergative verb sentences showed a less consistent pattern of results, with evidence of verb retrieval prior to subject onset in some experiments (in contrast to Reference Momma, Slevc and PhillipsMomma et al., 2018) and variation across participants suggestive of individual differences in planning scope, a topic discussed in the following section.

In summary, the evidence reviewed in this section shows clear effects of syntactic structure on both the scope and the order of lexical retrieval operations. These effects have been observed for long-distance structural dependencies as well as for local phrase structures. They are consistent with structurally driven rather than lexically driven grammatical encoding processes, and with the generation of global hierarchical syntactic structures prior to the sequential construction of constituent phrases. In the next section, we turn to the evidence that non-linguistic factors might also affect the scope of grammatical encoding.

3.2 Evidence for Flexibility in Planning Scope: Effects of Non-linguistic Factors

Despite its apparent ease, speaking is a cognitively costly activity which can negatively affect, and be affected by, concurrent tasks (e.g., Reference Jongman, Roelofs and MeyerJongman, Roelofs & Meyer, 2015; Reference Roelofs and PiaiRoelofs & Piai, 2011). This raises the possibility that advanced planning could also be constrained by non-linguistic factors due to the nature of the context in which sentences must be produced or be governed by cognitive limitations due to individual differences in attention or working memory. It is also possible that planning scope is to some extent under a speaker’s control.

3.2.1 Cognitive Load in Linguistic Processing

Cognitive load can vary within the language planning system due to differences in the ease with which utterance increments can be planned or retrieved at different levels of representation. A number of studies suggest that varying cognitive load can affect the scope of grammatical planning processes. Message-level representations that are easier to construct result in increased planning scopes (e.g., Reference Konopka and MeyerKonopka & Meyer, 2014; Reference Kuchinsky and BockKuchinsky & Bock 2010; Reference Skalickyvan de Velde et al., 2014). Reference Konopka and MeyerKonopka and Meyer (2014) used a spontaneous production task in which speakers described pictures of transitive events eliciting active and passive sentences. Analyses of eye movements during production of active sentences (e.g., ‘The dog is chasing the postman’) showed that speakers allocated more attention to both characters shortly after picture onset when the gist of the event was easy to encode and to express linguistically (suggesting planning of a larger message, consistent with Hierarchical Incrementality) but quickly directed their attention to the character they would mention first when the gist of the event was more difficult to encode and to express (suggesting planning of a small, one-character increment at the outset of the planning process, consistent with Linear Incrementality). Further, facilitating generation of an active sentence via structural priming also resulted in a shift towards planning of a larger message shortly after picture onset. Such differences across items and shifts in planning strategies due to structural priming suggest that speakers can change planning strategies flexibly and dynamically. Specifically, speakers appear to prioritise processes that can be completed quickly at the outset of planning, so planning may proceed in larger increments or in small increments for different sentences.

Effects of cognitive load on lexical retrieval have also been observed. Word retrieval is a notoriously costly process, prone to retrieval failures (e.g., tip-of-the-tongue states; Reference Brown and McNeillBrown & McNeil, 1966; Reference Meyer and BockMeyer & Bock, 1992) which increase with the learning of addition languages (e.g., Reference Gollan and AcenasGollan & Acenas, 2004) and in older age (Reference Segaert, Lucas and BurleySegaert et al., 2018). As mentioned above, in picture-naming tasks, there is evidence of parallel activation of upcoming picture names (e.g., Reference Schotter, Ferreira and RaynerSchotter, et al., 2013), but also evidence that the pre-activation of upcoming pictures is influenced by the ease of name retrieval (e.g., Reference KonopkaKonopka, 2012; Reference Malpass and MeyerMalpass & Meyer, 2010; Reference Morgan and MeyerMorgan & Meyer, 2005; Reference Wheeldon and KonopkaWheeldon et al., 2011). It has further been shown that syntactic processing load can affect lexical planning scope. Lexical processing scope is smaller when the sentence structures to be produced vary from trial to trial (e.g., Reference Wheeldon and KonopkaWagner et al., 2010). Conversely, lexical planning scope increases when the syntactic processing load is reduced using structural priming (Reference KonopkaKonopka, 2012, Reference Konopka and KuchinskyKonopka & Kuchinsky, 2015, Reference Konopka and MeyerKonopka & Meyer, 2014; see Reference Wheeldon, Konopka, Rueschemeyer and GaskellWheeldon & Konopka, 2018, for a review). For example, Reference KonopkaKonopka (2012) used a structural priming methodology to facilitate the production of sentences beginning with complex NPs that included semantically related or unrelated nouns (e.g., ‘The axe and the cup are above the book’ vs. ‘The axe and the saw are above the book’). Lexical planning scope was measured by comparing sentence onsets for the two types of NPs in structurally primed and unprimed conditions, that is, after producing prime sentences beginning with complex NPs or simple NPs (see Reference Smith and WheeldonSmith & Wheeldon, 2001; Reference SkalickyWheeldon & Smith, 2003). The results showed earlier retrieval of the second noun when the complex NP structure was primed, as evidenced by the presence of semantic interference delaying sentence onsets, but not when the structure was unprimed. However, increases in lexical planning scope were not observed beyond the initial phrase.

A related question is the extent to which lexical availability can affect syntactic planning scope. Retrieving and buffering words in memory is cognitively demanding, so can speakers extend their syntactic processing scope to encompass available lexical material? Reference Wheeldon, Ohlson, Ashby and GatorWheeldon, Ohlson, Ashby and Gator (2013) tested this by manipulating both lexical availability and initial phrase structure at the same time. Speakers saw a previewed picture followed by an array of four moving pictures, which again elicited sentences beginning with a coordinate or a simple NP. In the critical sentences, the previewed picture was always the second picture to be named. The position of the previewed picture was either unpredictable (i.e., filler trials were used to vary the position) or predictable (always occurring in second position in experimental and filler sentences). Unpredictable preview replicated the benefit for pictures falling within the initial phrase but not beyond it, as observed by Reference Allum and WheeldonAllum and Wheeldon (2009). When preview was predictable, a significant benefit was observed for pictures beyond the first phrase, as well as a significant effect of initial phrase length. This pattern of results is consistent with speakers extending their planning to include some processing of the second picture in a display when it is previewed but is not consistent with the picture’s name being retrieved and incorporated into the grammatical structure prior to speech onset. Interestingly, the effect of picture preview beyond the first phrase was shown to be inhibitory rather than facilitatory in older adults (Reference Hardy, Wheeldon and SegaertHardy et al., 2020), suggesting age-related differences in the ability to hold on to upcoming lexical information while planning sentence-initial phrases. Finally, Reference Wheeldon, Ohlson, Ashby and GatorWheeldon and colleagues (2013) observed no preview benefit for the predictable preview of pictures occurring in the final position of three of a three-noun coordinate phrase, although the effect of initial phrase length (between a two-noun and three-noun coordinate) was significant. Along with similar findings reviewed above (e.g., Reference KonopkaKonopka, 2012), this demonstrates that initial phrase structure does not necessarily determine the upper limit of lexical access, confirming that phrasal and lexical processing scope do not necessarily coincide (see Reference Roeser, Torrance and BaguleyRoeser, Torrance & Baguley, 2019, for similar findings in the planning of written phrases).

Language experience and proficiency also affect the cognitive demands of speaking, and studies of bilingual language planning have shown that planning scope can differ in a bilingual’s dominant and non-dominant language (e.g., Reference Gilbert, Cousineau-Perusse and TitoneGilbert, Cousineau-Perusse, & Titone, 2020; Reference Konopka, Meyer and ForestKonopka et al., 2018). For example, Reference Konopka, Meyer and ForestKonopka and colleagues (2018) compared planning of active SVO sentences (e.g., ‘The dog is chasing the postman’) by Dutch speakers with high proficiency in English. Analyses of eye movements before speech onset showed that the speakers began fixating and thus linguistically encoding the sentence-initial character (the agent: ‘The dog …’) earlier when generating a description in Dutch than in English. Early encoding of the sentence-initial noun implies that speakers allocated fewer resources to the advance planning of the information to be produced next (‘… is chasing the postman’), that is, that they engaged in more linearly incremental planning rather than in hierarchical planning. Thus, speakers were more likely to adopt an opportunistic (or risky, on-the-fly) planning strategy when using their native language, possibly because they would be able to plan subsequent conceptual and linguistic increments (verbs and the patient names) quickly or would be able to correct any errors if they ran into problems from their chosen starting point (‘The dog …’). By comparison, production is more effortful in a second language, so a highly incremental, risky planning strategy is not optimal: when preparing English sentences, speakers allocated more resources to encoding information about the whole event before beginning linguistic encoding of the sentence-initial character, consistent with a hierarchically incremental planning strategy. Reference Konopka, Meyer and ForestKonopka and colleagues (2018) verified that this effect was indeed due to speakers’ preference to encode relational (verb-related) information early in the planning process rather than due only to a delay in retrieving the sentence-initial referent name.

3.2.2 Individual Differences in Cognitive Abilities

The evidence reviewed in the previous section demonstrates that cognitive load can influence grammatical planning scope within the language system. These findings raise the question of whether individual differences in cognitive abilities can have similar effects. There is some evidence for effects on language planning of individual differences in cognitive abilities related to attention and memory. Arguably, cognitive limitations might necessitate the adoption of smaller, less demanding planning increments (e.g., Reference Christiansen and ChaterChristiansen & Chater, 2016). For example, there is evidence of a relationship between working memory (WM) capacity and planning scope (e.g., Reference Martin, Miller and VuMartin et al., 2004; Reference Martin, Slevc, Goldrick, Ferreria and MiozzoMartin & Slevc, 2014). A number of studies have demonstrated that higher performance in WM tasks is related to planning of larger increments (e.g., Reference Petrone, Fuchs and KrivovokapićPetrone, Fuchs & Krivovokapić, 2011; Reference SkalickySwets, Desmet, Hambrick, & Ferreira, 2007; Reference Wheeldon and KonopkaSwets, Fuchs, Krivovokapić & Petrone, 2021; Reference Swets, Jacovina and GerrigSwets et al., 2014). For example, Reference Petrone, Fuchs and KrivovokapićPetrone and colleagues (2011) demonstrated a relationship between WM capacity and phrase-initial fundamental frequency (F0, the acoustic measure related to pitch). Phrase-initial F0 was higher for speakers with high WM than for those with low WM. There is a decline in F0 across an utterance, and longer phrases are initiated with a higher F0 than shorter phrases, suggesting that phrase-initial F0 is a measure of planning scope (see also Reference Fuchs, Petrone, Krivokapić and HooleFuchs, Petrone, Krivokapić & Hoole, 2013). In a different approach, Reference Swets, Jacovina and GerrigSwets and colleagues (2014) related individual differences in WM to speakers’ performance in an interactive speech production task in which participants directed a listener to move pictured objects in grids on a screen. Speakers with better WM performance showed evidence of a broader planning scope in this task. They were more likely to look at the third object in a scene (e.g., a wheel) prior to initiating a sentence such as ‘The cat moves below the train and the wheel moves above the train’. They were also more likely to produce disambiguating modifications to the first NP such as ‘The four-legged cat moves below the train … .’. Onset latencies to contrasting sentence structures did not vary for low- and high-WM speakers; however, these data suggest that high-WM speakers were able to plan further ahead than low-WM speakers within the same time frame. In a follow-up study, Reference Wheeldon and KonopkaSwets and colleagues (2021) investigated whether differences in language requirements interact with individual differences in cognitive factors, such as WM and processing speed. Using a similar methodology to Reference Swets, Jacovina and GerrigSwets and colleagues (2014), they tested speakers of English, French and German. English and German are Germanic languages that allow modifiers in NPs to occur before or after the noun (e.g., ‘the four-legged cat’, ‘the cat with four legs’). In contrast, modifiers in French are almost always post-nominal (e.g., ‘le chat à quatre pattes’). Previous research has shown faster latencies for the planning of post-nominal modification (Reference Brown-Schmidt and KonopkaBrown-Schmidt & Konopka, 2008; Reference Myachykov, Scheepers, Garrod, Thompson and FedorovaMyachykov et al., 2013), suggesting more incremental planning of such phrases. Reference Wheeldon and KonopkaSwets and colleagues (2021) also found evidence for more incremental planning in French speakers than in English and German speakers. French speakers also showed a relationship between speech latency and individual differences in processing speed, which was not observed in the Germanic language speakers. However, the data patterns were not robust, and the relationship between WM and planning scope observed by Reference Swets, Jacovina and GerrigSwets and colleagues (2014) for English speakers did not replicate. Nevertheless, the study is the first to address the possibility that cognitive capabilities may differ in the extent to which they predict planning scope in speakers of different languages.

Finally, attention is a multifaceted ability (Reference Miyake, Friedman and EmersonMiyake et al., 2000) and individual differences in some components of attention predict picture-naming performance (Reference Piai and RoelofsPiai & Roelofs, 2013; Reference Wheeldon and KonopkaShao, Roelofs & Meyer, 2012). In phrase production, the ability to sustain attention has been shown to affect production latencies (e.g., Reference Jongman, Meyer and RoelofsJongman, Meyer & Roelofs 2015; Reference Jongman, Roelofs and MeyerJongman, Roelofs & Meyer, 2015). For example, Reference Jongman, Meyer and RoelofsJongman, Meyer and Roelofs (2015) measured individual differences in sustained attention using a continuous processing task (CPT) involving the monitoring of a series of digits for the target digit 0. Single digits were presented for 100 ms and participants made a button-press response to the target digit. Participants with lower CPT performance also showed an increase in their number of slow responses in the production of conjoined NPs in L1 Dutch (e.g., ‘de wortel en de emmer’, i.e., the carrot and the bucket), consistent with the effect of lapses of attention. A similar correlation was shown when single-picture naming was followed by a non-linguistic arrow-categorisation task. The findings suggest that speakers need to maintain attention when coordinating the production of NPs, either with another NP or a non-linguistic task. While these results do not speak directly to effects of sustained attention on processing scope, they highlight the need to account for individual differences in cognition in language planning research.

3.2.3 Cognitive Load in Dialogue

Finally, most sentences are spoken in conversational contexts which demand much more than the planning of one’s own utterances. An interlocutor must comprehend the speech of other speakers, keep track of what they have said, and constantly update their representation of the unfolding discourse. These processes are demanding of both attention and memory (e.g., Reference Barthel and SauppeBarthel & Sauppe, 2019; Reference Fairs, Bögels and MeyerFairs, Bögels & Meyer, 2018; Reference Fargier and LaganaroFargier & Laganaro, 2016). Moreover, timing in conversational turn-taking is very tight, with a new speaker often taking the floor about 200 ms from the offset of the previous speaker’s utterance (e.g., Reference Levinson and TorreiraLevinson & Torreira, 2015). This raises the possibility that speakers may reduce their processing scope in order to speed up utterance onset under conversational time pressure. However, the literature on turn-taking focuses on the degree to which utterance planning occurs in parallel with listening to the current speaker, rather than on changes in planning scope (e.g., Reference Barthel, Sauppe, Levinson and MeyerBarthel, Sauppe, Levinson & Meyer, 2016; Reference Lindsay, Gambi and RabagliatiLindsay, Gambi & Rabagliati, 2019; Reference Sjerps and MeyerSjerps & Meyer, 2015) and there is evidence that the speed of turn-taking might be overestimated (e.g., Reference Corps, Knudsen and MeyerCorps, Knudsen & Meyer, 2022). Some studies have shown that pause length in turn-taking is sensitive to measures of the length of the utterance to be produced (e.g., Reference Roberts, Torreira and LevinsonRoberts, Torreira & Levinson, 2015; Reference Torreira, Bögels and LevinsonTorreira, Bögels & Levinson, 2015), but the effect of conversational constraints on the scope of grammatical encoding remains poorly understood. Nevertheless, it remains true that, in order to keep the floor in a conversation, it is important to avoid long planning pauses between one’s own utterances, and there is some experimental evidence that planning scope can be reduced under conditions of increased time pressure (e.g., Reference Ferreira and SwetsFerreira & Swets, 2002).

3.3 Conclusions

The studies reviewed above show that a complex pattern of linguistic and non-linguistic factors can influence the time-course of grammatical encoding. In general, the data show more consistent effects of structure planning and more variable effects of lexical planning. Many factors affecting the ease of structural planning influence the scope of lexical retrieval. Conversely, lexical availability does not affect the scope of grammatical encoding even when speakers know the linear order of the available words in the utterance to be produced. Moreover, it is clear that structural and lexical planning scopes do not necessarily coincide. These findings are consistent with a model in which conceptual and syntactic structure can interact to determine the order of lexical activation but where the extent of lexical activation can vary due to factors affecting the ease of planning and to individual differences in cognitive processes.

4 Summing up

4.1 Methodological Review

Research in any area is only as good as the experimental paradigms and data-collection methods allow. A recurring dilemma in language production research is choosing paradigms and stimuli that elicit the desired linguistic output as spontaneously as possible while maintaining a high degree of experimental control. Because language processes unfold very quickly and competing theories make explicit predictions about what type of information is encoded when, good timing resolution is necessary for drawing inferences about the temporal coordination of lexical and structural processes. In addition, many research questions require detailed analysis of participants’ speech output (e.g., to relate word onsets to eye movements). In the absence of automatic language analysis tools, this is an extremely time-consuming and labour-intensive process. This issue is also magnified by a second recurring challenge concerning the sheer volume of data needed for appropriately powered comparisons. As in other fields, increasing power usually means recruiting larger samples of participants rather than increasing the size of item pools. Moreover, the most appropriate means for estimating power remains a matter of debate. One notable strength of the field is that it is characterised by a habit of replication which provides important information about effect reliability. However, the field is still prone to reporting biases due to the problems associated with publishing null effects. The growing use of Open Science Framework practices is of critical importance here.

4.1.1 Paradigms

Picture-naming paradigms that elicit simple sentences consisting of NPs (e.g., ‘The A and the B’) meet the criteria outlined above as these sentences are relatively easy to elicit without training. Studies employing such paradigms have provided much of the initial evidence about vertical information flow in the production system (i.e., top-down information flow as well as feedback from lower levels to higher levels in the lexicon) and horizontal information flow (i.e., estimates of the amount of information that can be planned in parallel at the message level and sentence level). However, the production of more complex sentences is needed to address theoretical issues such as planning of long-distance dependencies. To elicit complex structures, picture-naming studies present participants with hand-drawn pictures or photographs showing transitive or dative events (Reference Griffin and BockGriffin & Bock, 2000; Reference Konopka, Meyer and ForestKonopka et al., 2018; Reference SkalickySegaert, Wheeldon & Hagoort, 2016). Often, picture-naming studies include training blocks where participants are taught to use specific words prior to starting the main production task, or participants are shown printed words to be used in their picture descriptions on a trial-by-trial basis (e.g., Hartsuiker and colleagues; Reference Ziegler and SnedekerZiegler & Snedeker, 2018). Studies eliciting sentences that communicate messages that are hard to depict employ sentence memory tasks: sentences are presented to participants either in full or word by word, and the task for participants is to then repeat these sentences back from memory after a short interval (e.g., Reference FerreiraFerreira, 2003; Reference Konopka and BockKonopka & Bock, 2009; Reference MommaMomma, 2022). Despite evidence that speakers will reproduce these sentences from a gist representation (rather than repeating sentences verbatim from information stored in working memory), the prior processing of the full structure may still have consequences for planning scope that differ from more spontaneous sentence production methodologies.

These paradigms are far from exhaustive in terms of capturing the complexity of language produced outside of the lab. Nevertheless, they have provided data that, so far, can be only partially accounted for by existing computational models. A promising approach for future research is to adapt such paradigms for use in dialogue settings to be able to test effects of interactivity and conversational history.

4.1.2 Dependent Measures

Studies using priming paradigms, such as the structural priming paradigm, have typically relied on two dependent measures: binary outcomes in each trial (such as repetition of the primed structure or production of the unprimed structure), which are aggregated to compute the magnitude of the priming effect across conditions, and onset latencies in the sentences with primed and unprimed structures, which are aggregated to compute facilitation effects across conditions. The former provides a measure of changes in the degree to which the production system is biased to select one structure over another (see Section 2: Reference BockBock, 1986; Reference Chang, Dell and BockChang et al., 2006; Reference Pickering and BraniganPickering & Branigan, 1998), and is typically used to make claims about learning of a structural alteration. The latter provides a continuous measure of how quickly a particular structural procedure can be implemented (Reference KonopkaKonopka, 2012; Reference Smith and WheeldonSmith & Wheeldon, 1999), regardless of whether speakers select the primed or unprimed structure, and provide key insight into the planning process (Section 3).

Comparisons of sentence latencies across priming conditions are also particularly useful for assessing the effects of structural primes on sentences with preferred structures, that is, sentences that are hard to prime. For example, speakers overwhelmingly prefer to use active syntax than passive syntax. Given that selection of active syntax is often at ceiling, structural primes rarely increase selection of active syntax further (frequent structural alternatives also show less priming; Reference Jaeger and SniderJaeger & Snider, 2013). However, priming in active sentences is observable with a different measure: sentences that repeat active syntax have shorter onset latencies (Reference Segaert, Weber, Cladder-Micus and HagoortSegaert et al., 2014). This suggests that repetition of structure can have facilitatory effects on production in the absence of changes in structure choice. At the same time, sentence onsets are highly sensitive to a large number of variables that influence the speed with which speakers produce linguistic output under any conditions (e.g., lexical constraints such as the ease of retrieving individual content words) as well as changes specific to structural repetition itself, such as structure-driven changes in planning scope (Reference KonopkaKonopka, 2012; Reference Konopka and MeyerKonopka & Meyer, 2014).

A particularly informative approach in sentence planning is the use of eye tracking to track spontaneous production of sentences that can be elicited with visual stimuli. Sentence onsets vary across studies but speakers rarely begin speaking before 1000 ms. Tracking participants’ eye movements provides a rich implicit record of what information participants began encoding at different points in time and how easy this information was to encode before speech onset. This is especially relevant for making inferences about incremental planning (see Reference Norcliffe, Konopka, Mishra, Srinivasan and HuettigNorcliffe & Konopka, 2015). The richness of the eye-movement record in spontaneous sentence production also presents a challenge: eye movements can capture effects of low-level non-linguistic variables (such as perceptual salience) as well as higher-level linguistic variables (such as message-level encoding difficulty). The time windows in which these variables are likely to influence production in theoretically interesting ways, as well as the way that eye movements reflect an influence of these variables, are still debated (see e.g., Reference Griffin and DavisonGriffin & Davison, 2011; Reference KonopkaKonopka, 2019).

4.2 Summary and Future Directions

In the opening sections of this Element, we set the processes of grammatic encoding for speech production in context, and we outlined its component processes: lexical retrieval and structure building. The end point of successful grammatical encoding is a representation of appropriate linear order of the words to be phonologically and phonetically encoded for articulation. The theoretical issue that we focused on throughout this Element is the relationship between lexical and structural representations and processes during grammatical encoding. Current models make different claims about lexically driven and structurally driven influences on the linearisation process – that is, if, how and when they interact. We characterised the problem in terms of two theoretical extremes which approximate the claims of lexically driven and structurally driven accounts of grammatical planning: Linear Incrementality and Hierarchical Incrementality. In Linear Incrementality, lexical access is driven by the activation of lexical concepts, and sequencing occurs based on syntactic information represented in lemmas. In Hierarchical Incrementality, syntactic structure is constructed based on thematic relationships in the conceptual structure without reference to lexical content. This structure then drives the access and sequencing of lemmas.

In Section 2, we addressed the issue of the representation of lexical and syntactic structure by examining the evidence from structural repetition priming studies. The data we (and many others) have reviewed provides evidence for the representation of abstract syntactic structures that is separate from both conceptual and lexical representations. These data speak against a radically incremental model of grammatical encoding driven purely by lexical syntax. Nevertheless, both conceptual representations (animacy, thematic roles) and lexical representations (verb subcategorisation information and biases) show independent effects on priming and can influence the generation of syntactic structure.

Regarding the effects of conceptual representation, it is clear that these representations must drive grammatical encoding processes, as the aim of grammatical encoding is to convey this information in language. However, current models of grammatical encoding make limited claims about the nature of conceptual representations involved. The production of grammatically acceptable sentences requires more information at the conceptual level than lexical concepts and their associated thematic roles (see Reference LeveltLevelt, 1989). For example, the message should also encode information about the mood of the utterance to be produced in order to generate statements, questions or commands. Time, timing and place information should be represented for the appropriate grammatical encoding of tense, aspect and deixis. In addition, the grammatical encoder needs information about the utterance perspective, for example, the topic of the utterance, and what information is new or to be focused (essential in dialogue). Moreover, the nature of this information may differ based on language-specific grammatical requirements. Languages differ in what, and how, information must be encoded in order to produce a grammatical sentence (e.g., Reference Slobin, Wanner and GleitmanSlobin, 1982). For example, number in English is marked for singular and plural, whereas Arabic also marks the category dual. Such differences will have consequences not only for grammatical encoding processes cross-linguistically but also raise interesting questions about the nature of messages in bilinguals (and multilinguals) and their effect on first and second-language planning.

Current theories of grammatical encoding differ in how they model the links between conceptual representations and syntactic structure. As discussed in Section 3.1, the current evidence favours models that include direct links between conceptual and syntactic structures. However, more work is needed to build a detailed picture of this relationship. The lack of complexity at the message level is mirrored at the syntactic level in current theories. This is a result of the limited range and complexity of the sentence structures that have been tested. The vast majority of structural priming studies have focused on a few syntactic alternations in a relatively limited number of languages. This has, however, begun to change in recent years, with the advent of models incorporating sophisticated and detailed syntactic representations at both global and local levels of structure (e.g., Reference MommaMomma, 2021, Reference Momma2022) based on studies testing encoding of more complex structures and syntactic dependencies. In addition, the clear requirement for comparative cross-linguistic data is driving the investigation of more numerous, and more diverse, languages (see Reference Blasi, Henrich, Adamou, Kemmerer and MajidBlasi et al., 2022, for a discussion).

The structural priming data in Section 2 also provide strong evidence for lexical contributions to syntactic structure generation. Some lexical effects can be short-lived (the lexical boost) compared to others (verb biases), and disagreement remains as to how best to model them. No current model can account for all aspects of the short-term and cumulative effects observed. Moreover, many models restrict lexical and structural interaction to verbs (Reference MommaMomma, 2021, Reference Momma2022), despite evidence for the interactive effects with other lexical heads (e.g., nouns; Reference Cleland and PickeringCleland & Pickering, 2003).

What is also clear from research in this field is that there is no steady state in terms of lexical and syntactic representations. An immense challenge for priming research is to link the effects of recent experience to learning effects more generally, both in terms of lifelong learning of a native language (e.g., Reference Heyselaar, Wheeldon and SegaertHeyselaar, Wheeldon & Segaert, 2021) and the learning of second or third languages.

Lexical and syntactic contributions to the time-course of grammatical encoding were discussed in Section 3. Here the focus was on the degree to which lexical or syntactic representation drive the incremental planning of sentences. The evidence has obvious parallels with the structural priming results reviewed in Section 2. The role of syntactic structure in planning scope is evident in the data, once again arguing against Lexical Incrementality. Structural effects in planning have been shown for both local and distant (cross-clausal) syntactic dependencies. The data are thus more consistent with Hierarchical Incrementality involving the generation of abstract global syntactic structures prior to the sequential construction of local constituent phrases, although the precise coordination of these levels of planning remains to be determined.

The relationship between words and structure, in terms of processing scope, is also unclear. There is clearly an important role for verb retrieval in the generation of syntactic structure, albeit with limitations that are beginning to be clarified. The evidence also suggests that the scope of lexical retrieval is under the influence of, but is not fully determined by, syntactic structure. A complex range of interacting factors influence how many words we activate, retrieve and bind into a structure prior to speech onset. Some of these factors are non-linguistic, relating to cognitive load: the time-course of language production can be affected by processing demands within the grammatical encoding system but also by individual differences in working memory, attention and processing speed. It is also likely that different cognitive abilities have distinct effects for different components of the grammatical encoding process, and that these effects will also interact with different population characteristics such as language profiles (monolinguals and multilinguals) and age (younger and older adults).

Finally, the vast majority of experimental research on sentence production has (for very understandable reasons) elicited single sentences from speakers in non-interactive situations. However, the intention behind most of the language we produce is to convey information to others. The factors that have been shown to affect the time-course of grammatical encoding in monologue must also have consequences for dialogue. An important focus for future research is to determine how these consequences play out in interactive speaking situations, with all the additional linguistic complexity and cognitive demands that this entails.

Paul Warren
Victoria University of Wellington
Paul Warren is Professor of Linguistics at Victoria University of Wellington, where his teaching and research is in psycholinguistics, phonetics, and laboratory phonology. His publications include Introducing Psycholinguistics (2012) and Uptalk (2016), both published by CUP. He is a founding member of the Association for Laboratory Phonology, and a member of the Australasian Speech Science Technology Association and the International Phonetic Association. Paul is a member of the editorial boards for Laboratory Phonology and the Journal of the International Phonetic Association, and for twenty years (2000–2019) served on the editorial board of Language and Speech.

Advisory Board

Mailce Borges Mota, University of Santa Catarina, Brazil
Yuki Hirose, University of Tokyo, Japan
Kathy Rastle, Royal Holloway, London, UK
Anna Piasecki, University of the West of England, UK
Shari Speer, The Ohio State University, USA
Andrea Weber, University of Tübingen, Germany

About the Series

This Elements series presents theoretical and empirical studies in the interdisciplinary field of psycholinguistics. Topics include issues in the mental representation and processing of language in production and comprehension, and the relationship of psycholinguistics to other fields of research. Each Element is a high quality and up-to-date scholarly work in a compact, accessible format.

Element contents

Grammatical Encoding for Speech Production

Summary

Keywords

1 Introduction

1.1 Grammatical Encoding in Speech Production

1.1.1 The Component Processes for Speaking

1.1.2 Lexical Retrieval Processes

1.1.3 The Need for Syntax

1.1.4 Models of Grammatical Encoding: The Relationship between Words and Syntax

2 The Independence of Syntactic and Lexical Representations: Evidence from Structural Priming

2.1 Independence of Syntax from Meaning

2.2 Independence of Syntax from the Lexicon

2.2.1 The Lexical Boost in Structural Priming

2.2.2 Verb Bias and Structural Priming

2.2.3 Structural Priming in Bilinguals

2.2.4 Structural Priming in Dialogue

2.3 Conclusions

3 The Time-Course of Grammatical Encoding: Planning Scope

3.1 Evidence for Grammatical Planning Scope: Effects of Linguistic Structure

3.2 Evidence for Flexibility in Planning Scope: Effects of Non-linguistic Factors

3.2.1 Cognitive Load in Linguistic Processing

3.2.2 Individual Differences in Cognitive Abilities

3.2.3 Cognitive Load in Dialogue

3.3 Conclusions

4 Summing up

4.1 Methodological Review

4.1.1 Paradigms

4.1.2 Dependent Measures

4.2 Summary and Future Directions

Advisory Board

About the Series

References

Element contents

Grammatical Encoding for Speech Production

Summary

Keywords

1 Introduction

1.1 Grammatical Encoding in Speech Production

1.1.1 The Component Processes for Speaking

1.1.2 Lexical Retrieval Processes

1.1.3 The Need for Syntax

1.1.4 Models of Grammatical Encoding: The Relationship between Words and Syntax

2 The Independence of Syntactic and Lexical Representations: Evidence from Structural Priming

2.1 Independence of Syntax from Meaning

2.2 Independence of Syntax from the Lexicon

2.2.1 The Lexical Boost in Structural Priming

2.2.2 Verb Bias and Structural Priming

2.2.3 Structural Priming in Bilinguals

2.2.4 Structural Priming in Dialogue

2.3 Conclusions

3 The Time-Course of Grammatical Encoding: Planning Scope

3.1 Evidence for Grammatical Planning Scope: Effects of Linguistic Structure

3.2 Evidence for Flexibility in Planning Scope: Effects of Non-linguistic Factors

3.2.1 Cognitive Load in Linguistic Processing

3.2.2 Individual Differences in Cognitive Abilities

3.2.3 Cognitive Load in Dialogue

3.3 Conclusions

4 Summing up

4.1 Methodological Review

4.1.1 Paradigms

4.1.2 Dependent Measures

4.2 Summary and Future Directions

References

Save element to Kindle

Save element to Dropbox

Save element to Google Drive