Skip to main content
Log in

A diagnostic tool for German syntax

  • Published:
Machine Translation

Abstract

In this paper we describe an ongoing effort to construct a catalogue of syntactic data exemplifying the major syntactic patterns of German. The purpose of the corpus is to support the diagnosis of errors in the syntactic components of natural language processing (NLP) systems. Secondary aims are the evaluation of NLP syntax components and support of theoretical and empirical work on German syntax.

The data consist of artificially and systematically constructed expressions, including also negative (ungrammatical) examples. The data are organized into a relational database and annotated with some basic information about the phenomena illustrated and the internal structure of the sample sentences. The organization of the data supports selected systematic testing of specific areas of syntax, but also serves the purpose of a linguistic database.

The paper first gives some general motivation for the necessity of syntactic precision in some areas of NLP and discusses the potential contribution of a syntactic database to the field of component evaluation. The second part of the paper describes the set up and control methods applied in the construction of the sentence suite and annotations to the examples. We illustrate the approach with examples from verbal government and sentential coordination. This section also contains a description of the abstract data model, the design of the database and the query language used to access the data. The final sections compare our work to existing approaches and sketch some future extensions.

We invite other research groups to participate in our effort, so that the diagnostics tool can eventually become public domain. Several groups have already accepted this invitation, and progress is being made.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger.The awk programming language. Addison Wesley: Wokingham, 1988

    Google Scholar 

  • István Batori and Martin Volk: Das Verhältnis von natürlichsprachlichen Korpora zu systematischen Sammlungen konstruierter Texte. Workshop presentation,Repräsentatives Korpus der deutschen Gegenwartssprache, 15–16.Oct.1992. to appear in a report of theInstitut für Kommunikationsforschung und Phonetik, Bonn.

  • E. Black et al.: A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars.Proceedings of the February 1991 Speech and Natural Language Workshop. Morgan Kaufmann: San Mateo, California, 1991

    Google Scholar 

  • Peter Colliander:Das Korrelat und die obligatorische Extraposition, Kopenhagener Beiträge zur Germanistischen Linguistik. Sonderband 2. Kopenhagen, 1983.

  • P. Chen: The entity-relationship model. Toward a unified view of data.ACM Transactions on Database Systems. No. 1, 1976.

  • Abdel Kader Diagne:DiToA Diagnostic Tool for German Syntax. Data Base and User's Manual. DFKI Technical Document D-92-05, DFKI, Saarbrücken, 1992.

    Google Scholar 

  • Daniel Flickinger, John Nerbonne, Ivan Sag, and Thomas Wasow: Towards evaluation of natural language processing systems. Technical report, Hewlett-Packard Laboratories, 1987.

  • Ulrich Engel: Die deutschen Satzbaupläne. InWirkendes Wort 20, pages 361–392, 1970.

  • Bernhard Engelen:Untersuchungen zu Satzbauplan und Wortfeld in der geschriebenen deutschen Sprache der Gegenwart. Reihe I 3.3 Verblisten. München, 1975.

  • Lutz Götze:Valenzstrukturen deutscher Verben und Adjektive. München, 1979.

  • Giovanni Guida and Giancarlo Mauri: Evaluation of natural language processing systems: Issues and approaches.Proceedings of the IEEE, 74(7):1026–1035, 1986.

    Google Scholar 

  • Gerhard Helbig:Wörterbuch zur Valenz und Distribution deutscher Verben. Leipzig, 5th ed., 1980.

  • Judith Klein and Ludwig Dickmann:Daten—Dokumentation: Verbrektion und Koordination. DFKI Technical Document D-92-04, DFKI, Saarbrücken 1992.

    Google Scholar 

  • Brigitte Krenn: Funktionsverbgefüge: Eine Datenbeschreibung. unpub. Documentation, Institut für angewandte Informationsforschung, Saarbrücken.

  • Mark Liberman: Text on Tap: the ACL/DCI. InProceedings of the October 1989 Speech and Natural Language Workshop. Morgan Kaufmann: San Mateo, California, 1989, pp.173–188.

    Google Scholar 

  • Judith Klein, Ludwig Dickmann, Abdel Kader Diagne, John Nerbonne, and Klaus Netter: DiTo: Ein Diagnostikwerkzeug für die syntaktische Analyse. InTagungsband KONVENZ 92. Springer: Berlin, 1992, pp.380–385.

    Google Scholar 

  • Dave S. Pallett et al.: DARPA Research Management Benchmark Test Results, June 1990. InProceedings of the June 1990 Speech and Natural Language Workshop. Morgan Kaufmann: San Mateo, California, 1990, pp.298–305.

    Google Scholar 

  • Martha Palmer and Tim Finin: Workshop on the Evaluation of Natural Language Processing Systems. InComputational Linguistics 16(3), 1990, pp.175–181.

    Google Scholar 

  • Walter Read, Alex Quilici, John Reeves, Michael Dyer, and Eva Baker: Evaluating natural language systems: A sourcebook approach. InCOLING '88, pages 530–534, 1988.

  • Ivan Sag: Linguistic Theory and Natural Language Processing. In Ewan Klein and Frank Veltman, eds.Natural Language and Speech. Symposium Proceedings. Springer-Verlag: Berlin, 1991.

    Google Scholar 

  • Daniel D. K. Sleator and Davy Temperley:Parsing English with a Link Grammar. Carnegie Mellon School of Computer Science Technical Report CMU-CS-91-196, October 1991.

  • Martin Volk and Hanno Ridder: GTU—eine Grammatik Testumgebung mit Testsatzarchiv to appear in:LDV-Forum 1.1992.

  • Martin Volk: Kurzbeschreibung der Testsatzsammlung zu den Relativsätzen unpub. Documentation, Universität Koblenz.

  • Monika Weisgerber:Valenz und Kongruenzbeziehungen. Frankfurt a. M., 1983.

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work was supported by a research grant, ITW 9002 0, from the German Bundesministerium für Forschung und Technologie to the DFKI project DISCO and by IBM Germany through the project LILOG-SB conducted at the University of Saarbrücken. These projects are led by Hans Uszkoreit. We are grateful to Andrew White, Trinity College, Dublin, and Francesco Infantino and Claudio Piorenza, both of Universiteta di Bari, for support programming in the context of student projects.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nerbonne, J., Netter, K., Diagne, A.K. et al. A diagnostic tool for German syntax. Mach Translat 8, 85–107 (1993). https://doi.org/10.1007/BF00981246

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00981246

Keywords

Navigation