Learning Domain-Specific Grammars from a Small Number of Examples

Lange, Herbert; Ljunglöf, Peter

doi:10.1007/978-3-030-63787-3_4

Learning Domain-Specific Grammars from a Small Number of Examples

Herbert Lange³ &
Peter Ljunglöf³

Chapter
First Online: 26 March 2021

873 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 939))

Abstract

In this chapter we investigate the problem of grammar learning from a perspective that diverges from previous approaches. These prevailing approaches to learning grammars usually attempt to infer a grammar directly from example corpora without any additional information. This either requires a large training set or suffers from bad accuracy. We instead view learning grammars as a problem of grammar restriction or subgrammar extraction. We start from a large-scale grammar (called a resource grammar) and a small number of example sentences, and find a subgrammar that still covers all the examples. To accomplish this, we formulate the problem as a constraint satisfaction problem, and use a constraint solver to find the optimal grammar. We created experiments with English, Finnish, German, Swedish, and Spanish, which show that 10–20 examples are often sufficient to learn an interesting grammar for a specific application. We also present two extensions to this basic method: we include negative examples and allow rules to be merged. The resulting grammars can more precisely cover specific linguistic phenomena. Our method, together with the extensions, can be used to provide a grammar learning system for specific applications. This system is easy-to-use, human-centric, and can be used by non-syntacticians. Based on this grammar learning method, we can build applications for computer-assisted language learning and interlingual communication, which rely heavily on the knowledge of language and domain experts who often lack the competence to develop required grammars themselves.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The size of a tree is the number of nodes in the tree.
2.
http://www.cplex.com/.
3.
https://www.gnu.org/software/glpk/.
4.
Thus \(F_\mathbf {G}\) is a subset of the union of all flattened syntax trees, or \(F_\mathbf {G} \subseteq t_{11_\text {flat}} \cup t_{12_\text {flat}} \cup \dots \cup t_{nm_\text {flat}}\).
5.
The same abstract grammar is used to describe multiple languages in parallel.
6.
https://github.com/MUSTE-Project/subgrammar-extraction.
7.
The examples are in English but the problems we approach are language independent.
8.
Usually the alphabet is denoted with the letter \(\Sigma \) to avoid naming conflicts with signatures, we use the letter V instead.
9.
For example, I saw the building with the telescope has again two syntactic analyses, and each one has a plausible semantic interpretation.
10.
For some people that would even be as likely as pineapple as a topping.

References

Bod, R.: A computational model of language performance: data oriented parsing. In: COLING’92, 14th International Conference on Computational Linguistics. Nantes, France (1992). https://www.aclweb.org/anthology/papers/C/C92/C92-3126/
Bod, R., Scha, R.: Data-oriented language processing. In: Young, S., Bloothooft, G. (eds.) Corpus-based Methods in Language and Speech Processing, Text, Speech, and Language Technology 2, chap. 5, pp. 137–174. ELSNET: Kluwer, Dordrecht (1997). https://doi.org/10.1007/978-94-017-1183-8
Bresnan, J.: Lexical-Functional Syntax. Blackwell Textbooks in Linguistics. Blackwell, Malden, Mass (2001). https://doi.org/10.1002/9781119105664
Claessen, K.: SAT+ (2018). https://github.com/koengit/satplus. Accessed 25 June 2020
Clark, A.: Unsupervised induction of stochastic context free grammars using distributional clustering. In: CoNLL, the ACL 2001 Workshop on Computational Natural Language Learning (2001). https://www.aclweb.org/anthology/W01-0713
Clark, A., Lappin, S.: Unsupervised learning and grammar induction. In: Clark, A., Fox, C., Lappin, S. (eds.) The Handbook of Computational Linguistics and Natural Language Processing, chap. 8, pp. 197–220. Wiley-Blackwell, Oxford (2010). https://doi.org/10.1002/9781444324044.ch8
Clark, A., Yoshinaka, R.: Distributional learning of parallel multiple context-free grammars. Mach. Learn. 96(1–2), 5–31 (2014). https://doi.org/10.1007/s10994-013-5403-2
DELPH-IN: Deep linguistic processing with HPSG (DELPH-IN) (2020). http://moin.delph-in.net/GrammarCatalogue. Accessed 25 June 2020
D’Ulizia, A., Ferri, F., Grifoni, P.: A survey of grammatical inference methods for natural language learning. Aritif. Intell. Rev. 36, 1–27 (2011). https://doi.org/10.1007/s10462-010-9199-1
Eén, N., Sörensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella, A. (eds.) Theory and Applications of Satisfiability Testing, pp. 502–518. Springer, Berlin (2003). https://doi.org/10.1007/978-3-540-24605-3_37
Fuchs, N.E., Schwitter, R.: Specifying logic programs in controlled natural language. In: CLNLP’95, Workshop on Computational Logic for Natural Language Processing. University of Edinburgh, Edinburgh (1995)
Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co, New York, USA (1979). https://doi.org/10.5555/574848
Henschel, R.: Application-driven automatic subgrammar extraction. In: Computational Environments for Grammar Development and Linguistic Engineering (1997). https://www.aclweb.org/anthology/W97-1507
Imada, K., Nakamura, K.: Learning context free grammars by using SAT solvers. In: ICMLA 2009, International Conference on Machine Learning and Applications, pp. 267–272 (2009). https://doi.org/10.1109/ICMLA.2009.28
Joshi, A.K., Schabes, Y.: Tree-adjoining grammars. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 3, pp. 69–123. Springer, Berlin, Heidelberg (1997). https://doi.org/10.1007/978-3-642-59126-6_2
Kaplan, R.M., Bresnan, J.: Lexical-functional grammar: a formal system for grammatical representations. In: Bresnan, J. (ed.) The Mental Representation of Grammatical Relations, pp. 173–281. MIT Press, Cambridge, MA (1982)
Google Scholar
Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W., Bohlinger, J. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum, New York, USA (1972). https://doi.org/10.1007/978-1-4684-2001-2_9
Kešelj, V., Cercone, N.: A formal approach to subgrammar extraction for NLP. Math. Comput. Modell. 45(3), 394–403 (2007). https://doi.org/10.1016/j.mcm.2006.06.001
Lange, H.: Computer-assisted language learning with grammars. a case study on Latin learning. Licentiate thesis, Department of Computer Science and Engineering, University of Gothenburg, Gothenburg, Sweden (2018). https://gup.ub.gu.se/publication/269655
Lange, H., Ljunglöf, P.: MULLE: A grammar-based Latin language learning tool to supplement the classroom setting. In: NLPTEA 2018, 5th Workshop on Natural Language Processing Techniques for Educational Applications, pp. 108–112. Melbourne, Australia (2018). http://aclweb.org/anthology/W18-3715
Lange, H., Ljunglöf, P.: Putting control into language learning. In: CNL 2018, 6th International Workshop on Controlled Natural Languages, Frontiers in Artificial Intelligence and Applications, vol. 304, pp. 61–70. IOS Press, Maynooth. Ireland (2018). https://doi.org/10.3233/978-1-61499-904-1-61
Lange, H., Ljunglöf, P.: Learning domain-specific grammars from a small number of examples. In: ICAART 2020, 12th International Conference on Agents and Artificial Intelligence, vol. 1, pp. 422–430. INSTICC, SciTePress, Valletta, Malta (2020). https://doi.org/10.5220/0009371304220430
Lari, K., Young, S.: The estimation of stochastic context-free grammars using the inside-outside algorithm. Comput. Speech Lang. 4(1), 35–56 (1990). https://doi.org/10.1016/0885-2308(90)90022-X
Ljunglöf, P.: Expressivity and Complexity of the Grammatical Framework. Ph.D. thesis, University of Gothenburg, Gothenburg, Sweden (2004). https://gup.ub.gu.se/publication/10794
Loukanova, R.: An approach to functional formal models of constraint-based lexicalized grammar. Fundam. Inform. 152(4), 341–372 (2017). https://doi.org/10.3233/FI-2017-1524
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)
MATH Google Scholar
Pereira, F., Schabes, Y.: Inside-outside reestimation from partially bracketed corpora. In: ACL 1992, 30th Annual Meeting of the Association for Computational Linguistics, pp. 128–135. Newark, Delaware, USA (1992). https://www.aclweb.org/anthology/P92-1017
Pollard, C.J.: Head-driven Phrase Structure Grammar. Studies in Contemporary linguistics. University of Chicago Press, Chicago (1994)
Google Scholar
Ranta, A.: GF: A multilingual grammar formalism. Lang. Linguist. Compass 3(5), 1242–1265 (2009). https://doi.org/10.1111/j.1749-818X.2009.00155.x
Ranta, A.: The GF resource grammar library. Linguist. Issues Lang. Technol. 2(2), 1–63 (2009). https://journals.linguisticsociety.org/elanguage/lilt/article/view/214.html
Ranta, A.: Grammatical Framework: Programming with Multilingual Grammars. CSLI Publications (2011). https://www.grammaticalframework.org/gf-book/
Ranta, A.: Implementing Programming Languages. An Introduction to Compilers and Interpreters. College Publications (2012). http://www.grammaticalframework.org/ipl-book/
Ranta, A., Angelov, K., Höglind, R., Axelsson, C., Sandsjö, L.: A mobile language interpreter app for prehospital/emergency care. In: Medicinteknikdagarna. Västerås, Sweden (2017). http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-13366
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall, Upper Saddle River (2009). http://aima.cs.berkeley.edu/
Sag, I.A., Wasow, T., Bender, E.M.: Syntactic theory: A formal introduction, 2nd edn. No. 152 in CSLI Lecture Notes. Center for the Study of Language and Information, Stanford, CA (2003)
Google Scholar
Wirsing, M.: Algebraic specification. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. B, chap. 13, pp. 675–788. Elsevier, MIT Press, Cambridge (1990)
Google Scholar
XMG: eXtensible MetaGrammar (2017). http://xmg.phil.hhu.de/. Accessed 25 June 2020

Download references

Acknowledgements

We want to thank Koen Claessen for inspiration and help with the CSP formulation, Krasimir Angelov and Thierry Coquand for pointing us in the direction of many-sorted algebras as a means of formalizing abstract grammars, and three anonymous reviewers for many constructive comments. This chapter is an extended version of [22] presented at the Special Session NLPinAI 2020 at the 12th International Conference on Agents and Artificial Intelligence (ICAART 2020). The work reported in this chapter was supported by the Swedish Research Council, project 2014-04788 (MUSTE: Multimodal semantic text editing).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden
Herbert Lange & Peter Ljunglöf

Authors

Herbert Lange
View author publications
You can also search for this author in PubMed Google Scholar
Peter Ljunglöf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Herbert Lange .

Editor information

Editors and Affiliations

Department of Algebra and Logic, Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria
Roussanka Loukanova

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lange, H., Ljunglöf, P. (2021). Learning Domain-Specific Grammars from a Small Number of Examples. In: Loukanova, R. (eds) Natural Language Processing in Artificial Intelligence—NLPinAI 2020. Studies in Computational Intelligence, vol 939. Springer, Cham. https://doi.org/10.1007/978-3-030-63787-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-63787-3_4
Published: 26 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63786-6
Online ISBN: 978-3-030-63787-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics