Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion

Kando, Shunsuke; Noji, Hiroshi; Miyao, Yusuke

Computer Science > Computation and Language

arXiv:2204.08644 (cs)

[Submitted on 19 Apr 2022]

Title:Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion

Authors:Shunsuke Kando, Hiroshi Noji, Yusuke Miyao

View PDF

Abstract:Incorporating stronger syntactic biases into neural language models (LMs) is a long-standing goal, but research in this area often focuses on modeling English text, where constituent treebanks are readily available. Extending constituent tree-based LMs to the multilingual setting, where dependency treebanks are more common, is possible via dependency-to-constituency conversion methods. However, this raises the question of which tree formats are best for learning the model, and for which languages. We investigate this question by training recurrent neural network grammars (RNNGs) using various conversion methods, and evaluating them empirically in a multilingual setting. We examine the effect on LM performance across nine conversion methods and five languages through seven types of syntactic tests. On average, the performance of our best model represents a 19 \% increase in accuracy over the worst choice across all languages. Our best model shows the advantage over sequential/overparameterized LMs, suggesting the positive effect of syntax injection in a multilingual setting. Our experiments highlight the importance of choosing the right tree formalism, and provide insights into making an informed decision.

Comments:	To appear in the 6th ACL Workshop on Structured Prediction for NLP (SPNLP)
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2204.08644 [cs.CL]
	(or arXiv:2204.08644v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.08644

Submission history

From: Shunsuke Kando [view email]
[v1] Tue, 19 Apr 2022 03:56:28 UTC (649 KB)

Computer Science > Computation and Language

Title:Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators