Data-oriented parsing with discontinuous constituents and function tags

Authors

  • Andreas van Cranenburgh 1. Huygens ING, Royal Dutch Academy of Science 2. Institute for Logic, Language and Computation, University of Amsterdam
  • Remko Scha Institute for Logic, Language and Computation, University of Amsterdam
  • Rens Bod Institute for Logic, Language and Computation, University of Amsterdam

Keywords:

discontinuous constituents, statistical parsing, tree-substitution grammar

Abstract

Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. 

We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing.

The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch.

DOI:

https://doi.org/10.15398/jlm.v4i1.100

Full article

Published

2016-04-13

How to Cite

van Cranenburgh, A., Scha, R., & Bod, R. (2016). Data-oriented parsing with discontinuous constituents and function tags. Journal of Language Modelling, 4(1), 57–111. https://doi.org/10.15398/jlm.v4i1.100