Abstract
Interlinear glossed text (IGT) is a notation used in various fields of linguistics to provide readers with a way to understand the linguistic phenomena. We describe the representation of IGT data in RDF, the conversion from two popular tools, and their automated linking with resources from the Linguistic Linked Open Data (LLOD) cloud. We argue that such an LLOD edition of IGT data facilitates their reusability, their infrastructural support and their integration with external data sources.
Our converters are available under an open source license, two data sets will be published along with the final version of this paper. To our best knowledge, this is the first attempt to publish IGT data sets as Linguistic Linked Open Data we are aware of.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
In fact, the FLEx importer also omits such data.
- 7.
The dotted lines are yet to be inferred from toolbox:has_morph.
- 8.
As defined in the lemon/ontolex vocabulary, this implicitly casts flex:morphs as ontolex:Form (expected object of ontolex:lexicalForm). To satisfy the ontolex:Form definition, we may add flex:gls rdfs:subPropertyOf ontolex:representation. Note that the rendering of a gloss as a lexical entry mirrors the way glosses are treated in FLEx and Toolbox: During annotation, a dictionary comprising all glossed forms is created. In many cases, this dictionary (and the accompanying grammar) represents the main outcome of IGT annotation.
- 9.
- 10.
- 11.
References
Abromeit, F., Chiarcos, C., Fäth, C., Ionov, M.: Linking the Tower of Babel: modelling a massive set of etymological dictionaries as RDF. In: McCrae, J., Chiarcos, C., Montiel Ponsoda, E., Declerck, T., Osenova, P., Hellmann, S. (eds.) Proceedings of the 5th Workshop on Linked Data in Linguistics (LDL-2016): Managing, Building and Using Linked Language Resources, Portoroz, Slovenia, 11–19 May 2016
Sanderson, R., Ciccarese, P., Van de Sompel, H.: Open annotation data model. Technical report, W3C Community Draft, 08 February 2013
Sanderson, R., Ciccarese, P., Young, B.: Web annotation data model. Technical report, W3C Recommendation, 23 February 2017
Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Proceedings of 12th International Semantic Web Conference, Sydney, Australia, 21–25 October 2013. http://persistence.uni-leipzig.org/nlp.2rdf/
Comrie, B., Haspelmath, M., Bickel, B.: The Leipzig glossing rules: conventions for interlinear morpheme-by-morpheme glosses (2008). https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf
Lewis, W.D.: ODIN: a model for adapting and enriching legacy infrastructure. In: Second International Conference on e-Science and Grid Technologies (e-Science 2006), 4–6 December 2006, p. 137. IEEE Computer Society, Amsterdam (2006)
Sérasset, G.: DBnary: wiktionary as a lemon-based multilingual lexical resource in RDF. Semantic Web J. 648 (2014). http://kaiko.getalp.org/about-dbnary/
Chiarcos, C., Sukhareva, M.: OLiA - Ontologies of Linguistic Annotation. Semantic Web J. 518, 379–386 (2015)
Dipper, S., Götze, M., Skopeteas, S.: Information structure in cross-linguistic corpora: annotation guidelines for phonology, morphology, syntax, semantics, and information structure. In: Interdisciplinary Studies on Information Structure (ISIS), Working papers of the SFB 632 7 (2007)
Poornima, S., Good, J.: Modeling and encoding traditional wordlists for machine applications. In: Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground, Uppsala, Sweden. Association for Computational Linguistics, 1–9 July 2010
Nakhimovsky, A., Good, J., Myers, T.: Interoperability of language documentation tools and materials for local communities. In: Digital Humanities (DH 2012), Hamburg, July 2012. http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/interoperability-of-language-documentation-tools-and-materials-for-local-communities.1.html
Schalley, A.C.: Tyto - a collaborative research tool for linked linguistic data. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 139–149. Springer, Heidelberg (2012)
Forkel, R.: The cross-linguistic linked data project. In: 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing, Reykjavik, Iceland, pp. 60–66, May 2014
Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 98–113. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41338-4_7
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chiarcos, C., Ionov, M., Rind-Pawlowski, M., Fäth, C., Schreur, J.W., Nevskaya, I. (2017). LLODifying Linguistic Glosses. In: Gracia, J., Bond, F., McCrae, J., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds) Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science(), vol 10318. Springer, Cham. https://doi.org/10.1007/978-3-319-59888-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-59888-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59887-1
Online ISBN: 978-3-319-59888-8
eBook Packages: Computer ScienceComputer Science (R0)