Wide-Coverage Probabilistic Sentence Processing

Crocker, Matthew W.; Brants, Thorsten

doi:10.1023/A:1026560822390

Wide-Coverage Probabilistic Sentence Processing

Published: November 2000

Volume 29, pages 647–669, (2000)
Cite this article

Journal of Psycholinguistic Research Aims and scope Submit manuscript

Matthew W. Crocker¹ &
Thorsten Brants²

309 Accesses
53 Citations
Explore all metrics

Abstract

This paper describes a fully implemented, broad-coverage model of human syntactic processing. The model uses probabilistic parsing techniques, which combine phrase structure, lexical category, and limited subcategory probabilities with an incremental, left-to-right “pruning” mechanism based on cascaded Markov models. The parameters of the system are established through a uniform training algorithm, which determines maximum-likelihood estimates from a parsed corpus. The probabilistic parsing mechanism enables the system to achieve good accuracy on typical, “garden-variety” language (i.e., when tested on corpora). Furthermore, the incremental probabilistic ranking of the preferred analyses during parsing also naturally explains observed human behavior for a range of garden-path structures. We do not make strong psychological claims about the specific probabilistic mechanism discussed here, which is limited by a number of practical considerations. Rather, we argue incremental probabilistic parsing models are, in general, extremely well suited to explaining this dual nature—generally good and occasionally pathological—of human linguistic performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

Altmann, G. T. M., & Steedman, M. (1988). Interaction with context during human sentence processing. Cognition, 18, 129–144.
Google Scholar
Anderson, J. R. (1991). Is human cognition adaptive? Behavioural and Brain Sciences, 14,471–517.
Google Scholar
Brants, T. (1999a). Cascaded Markov Models, Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL-99), Bergen, Norway.
Brants, T. (1999b). Tagging and parsing with Cascaded Markov Models—Automation of corpus annotation. Vol. 6 of Saarbrücken Dissertations in Computational Linguistics and Language Technology, DFKI and Saarland University, Saarbrücken Germany.
Google Scholar
Brants, T. (2000). TnT—A statistical part-of-speech tagger, Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, WA.
Brants, T., & Crocker, M. W. (2000). Probabilistic parsing and psychological plausibility, Proceeding of the International Conference on Computational Linguistics (COLING 2000), Saarbrücken, Germany.
Chater, N., Crocker, M. W., & Pickering, M. (1998). The rational analysis of inquiry: The case for parsing. In Chater & Oaksford (Eds), Rational Analysis of Cognition, (pp. 441–468). Oxford: Oxford University Press.
Google Scholar
Collins, M. (1996). A new statistical parser based on bigram lexical dependencies, Proceedings of the Annual Conference of the Association for Computational Linguistics, Santa Cruz, California.
Corley, S., & Crocker, M. W. (2000).The modular statistical hypothesis: Exploring lexical category ambiguity. In M. W. Crocker, M. Pickering & C. Clifton (Eds.), Architectures and mechanisms for language processing (pp 135–160.) Cambridge: Cambridge University Press.
Google Scholar
Crocker, M. W., & Corley, S. Modular architectures and statistical mechanisms: The case from lexical category disambiguation. In P. Merlo & S. Stevenson (Eds.), The lexical basis of sentence processing, New York, Benjamins, in press.
Duffy, S. A., Morris, R. K., & Rayner, K. (1988). Lexical ambiguity and fixation times in reading. Journal of Memory and Language, 27, 429–446.
Google Scholar
Ferreira, F., & Clifton Jr., C. (1986). The Independence of Syntactic Processing. Journal of Memory and Language, 25, 348–368.
Google Scholar
Frazier, L., & Rayner, K. (1987). Resolution of syntactic category ambiguities: Eye movements in parsing lexically ambiguous sentences. Journal of Memory and Language, 26, 505–526.
Google Scholar
Garnsey, S., Pearlmutter, N., Myers, E., & Lotocky, M. (1997). The contribution of verb bias nd plausibility to the comprehension of temporarily ambiguous sentences. Journal of emory and Language, 37, 58–93.
Google Scholar
Juliano, C., & Tanenhaus, M. K. (1993). Contingent frequency effects in syntactic ambiguity resolution. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, (pp. 593–598). Lawrence Erlbaum Associates.
Jurafsky, D. A (1996). Probabilistic model of lexical and syntactic access and disambiguation, Cognitive Science, 20, 137–194.
Google Scholar
Lapata, M., Keller, F., & Schulte im Walde, S. Verb frame frequency as a predictor of verb bias, submitted.
MacDonald, M. C. (1993). The interaction of lexical and syntactic ambiguity. Journal of Memory and Language, 32, 692–715.
Google Scholar
MacDonald, M. C. (1994). Probabilistic constraints and syntactic ambiguity resolution. Language and Cognitive Processes, 9, 157–201.
Google Scholar
MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). The lexical nature of syntactic ambiguity resolution. Psychological Review, 10, 676–703.
Google Scholar
Marcus, M., Santorini, B., and Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19, 313–330.
Google Scholar
McRae, K., Spivey-Knowlton, M., & Tanenhaus, M. (1998). Modelling the influence of thematic fit (and other constaints) in on-line sentence comprehension. Journal of Memory and Language, 38, 283–312.
Google Scholar
Merlo, P., & Stevenson, S. (2000). Lexical syntax and parsing architecture. In M. W. Crocker, M. Pickering, & C. Clifton (Eds.) Architectures and mechanisms for language processing, (pp. 161–188). Cambridge: Cambridge University Press.
Google Scholar
Pickering, M., Traxler, M., & Crocker, M. W. (2000). Ambiguity resolution in sentence processing: vidence against frequency-based accounts. Journal of Memory and Language, 43, 447–475.
Google Scholar
Rabiner, R. (1989). A tutorial on Hidden Markov Models and selected applications in??? recognition. Proceedings of the IEEE, 77, 257–285.
Google Scholar
Ratnaparkhi, A. (1997). A linear observed time statistical parser based on maximum entropy. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Providence, Rhode Island.
Samuelsson, C. (1997). Extending n-gram tagging to word graphs. Proceedings of the 2 ^nd International Conference on Recent Advances in Natural Language Processing, Tzigov Chark, Bulgaria.
Seidenberg, M. S. (1997). Language acquisition and use: Learning and applying probabilistic constraints. Science, 275, 213–215.
Google Scholar
Spivey-Knowlton, M. (1996). Integration of visual and linguistic information: Human data and model simulations. Unpublished doctoral disseration, University of Rochester, Rochester, N.Y.
Google Scholar
Tanenhaus, M. K., Spivey-Knowlton, M. J., & Hanna, J. E. (2000). Modelling discourse context effects: A multiple constraints approach. In M. W. Crocker, M. Pickering, & C. Clifton (Eds.) Architectures and mechanisms for language processing (pp. 90–118). Cambridge: Cambridge University Press.
Google Scholar
Trueswell, J. (1996). The role of lexical frequency in syntactic ambiguity resolution. Journal of Memory and Language, 35, 566–585.
Google Scholar
Trueswell, J., Tanenhaus, M., & Kello, C. (1993). Verb specific constraints in sentence processing: Separating effects of lexical preferences from garden-paths. Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 528–553.
Google Scholar
Viterbi, A. (1967). Error bounds for convolution codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory, 13, 260–269.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Linguistics, Universität des Saarlandes, Saarbücken, Germany
Matthew W. Crocker
Department of Computational Linguistics, Universität des Saarlandes, Saarbücken, Germany
Thorsten Brants

Authors

Matthew W. Crocker
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Brants
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crocker, M.W., Brants, T. Wide-Coverage Probabilistic Sentence Processing. J Psycholinguist Res 29, 647–669 (2000). https://doi.org/10.1023/A:1026560822390

Download citation

Issue Date: November 2000
DOI: https://doi.org/10.1023/A:1026560822390

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Wide-Coverage Probabilistic Sentence Processing

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

Word prevalence norms for 62,000 English lemmas

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Wide-Coverage Probabilistic Sentence Processing

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

Word prevalence norms for 62,000 English lemmas

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation