Skip to main content

LLODifying Linguistic Glosses

  • Conference paper
  • First Online:
Language, Data, and Knowledge (LDK 2017)

Abstract

Interlinear glossed text (IGT) is a notation used in various fields of linguistics to provide readers with a way to understand the linguistic phenomena. We describe the representation of IGT data in RDF, the conversion from two popular tools, and their automated linking with resources from the Linguistic Linked Open Data (LLOD) cloud. We argue that such an LLOD edition of IGT data facilitates their reusability, their infrastructural support and their integration with external data sources.

Our converters are available under an open source license, two data sets will be published along with the final version of this paper. To our best knowledge, this is the first attempt to publish IGT data sets as Linguistic Linked Open Data we are aware of.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://fieldworks.sil.org/flex.

  2. 2.

    http://www-01.sil.org/computing/toolbox.

  3. 3.

    http://annotation.exmaralda.org/index.php?title=Advanced_Glossing.

  4. 4.

    http://www-01.sil.org/computing/catalog/show_software.asp?id=79.

  5. 5.

    http://fieldworks.sil.org/flex.

  6. 6.

    In fact, the FLEx importer also omits such data.

  7. 7.

    The dotted lines are yet to be inferred from toolbox:has_morph.

  8. 8.

    As defined in the lemon/ontolex vocabulary, this implicitly casts flex:morphs as ontolex:Form (expected object of ontolex:lexicalForm). To satisfy the ontolex:Form definition, we may add flex:gls rdfs:subPropertyOf ontolex:representation. Note that the rendering of a gloss as a lexical entry mirrors the way glosses are treated in FLEx and Toolbox: During annotation, a dictionary comprising all glossed forms is created. In many cases, this dictionary (and the accompanying grammar) represents the main outcome of IGT annotation.

  9. 9.

    http://dbserver.acoli.cs.uni-frankfurt.de:5000/search/?query=&originLang= &targetLang=trk.

  10. 10.

    https://github.com/xigt/xigt/wiki.

  11. 11.

    https://github.com/glottobank/cldf/issues/10.

References

  1. Abromeit, F., Chiarcos, C., Fäth, C., Ionov, M.: Linking the Tower of Babel: modelling a massive set of etymological dictionaries as RDF. In: McCrae, J., Chiarcos, C., Montiel Ponsoda, E., Declerck, T., Osenova, P., Hellmann, S. (eds.) Proceedings of the 5th Workshop on Linked Data in Linguistics (LDL-2016): Managing, Building and Using Linked Language Resources, Portoroz, Slovenia, 11–19 May 2016

    Google Scholar 

  2. Sanderson, R., Ciccarese, P., Van de Sompel, H.: Open annotation data model. Technical report, W3C Community Draft, 08 February 2013

    Google Scholar 

  3. Sanderson, R., Ciccarese, P., Young, B.: Web annotation data model. Technical report, W3C Recommendation, 23 February 2017

    Google Scholar 

  4. Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Proceedings of 12th International Semantic Web Conference, Sydney, Australia, 21–25 October 2013. http://persistence.uni-leipzig.org/nlp.2rdf/

  5. Comrie, B., Haspelmath, M., Bickel, B.: The Leipzig glossing rules: conventions for interlinear morpheme-by-morpheme glosses (2008). https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf

  6. Lewis, W.D.: ODIN: a model for adapting and enriching legacy infrastructure. In: Second International Conference on e-Science and Grid Technologies (e-Science 2006), 4–6 December 2006, p. 137. IEEE Computer Society, Amsterdam (2006)

    Google Scholar 

  7. Sérasset, G.: DBnary: wiktionary as a lemon-based multilingual lexical resource in RDF. Semantic Web J. 648 (2014). http://kaiko.getalp.org/about-dbnary/

  8. Chiarcos, C., Sukhareva, M.: OLiA - Ontologies of Linguistic Annotation. Semantic Web J. 518, 379–386 (2015)

    Article  Google Scholar 

  9. Dipper, S., Götze, M., Skopeteas, S.: Information structure in cross-linguistic corpora: annotation guidelines for phonology, morphology, syntax, semantics, and information structure. In: Interdisciplinary Studies on Information Structure (ISIS), Working papers of the SFB 632 7 (2007)

    Google Scholar 

  10. Poornima, S., Good, J.: Modeling and encoding traditional wordlists for machine applications. In: Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground, Uppsala, Sweden. Association for Computational Linguistics, 1–9 July 2010

    Google Scholar 

  11. Nakhimovsky, A., Good, J., Myers, T.: Interoperability of language documentation tools and materials for local communities. In: Digital Humanities (DH 2012), Hamburg, July 2012. http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/interoperability-of-language-documentation-tools-and-materials-for-local-communities.1.html

  12. Schalley, A.C.: Tyto - a collaborative research tool for linked linguistic data. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 139–149. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Forkel, R.: The cross-linguistic linked data project. In: 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing, Reykjavik, Iceland, pp. 60–66, May 2014

    Google Scholar 

  14. Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 98–113. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41338-4_7

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Chiarcos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Chiarcos, C., Ionov, M., Rind-Pawlowski, M., Fäth, C., Schreur, J.W., Nevskaya, I. (2017). LLODifying Linguistic Glosses. In: Gracia, J., Bond, F., McCrae, J., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds) Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science(), vol 10318. Springer, Cham. https://doi.org/10.1007/978-3-319-59888-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59888-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59887-1

  • Online ISBN: 978-3-319-59888-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics