Abstract
In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre 632 “Information Structure”. These include deeply annotated data collections of 25 sub-Saharan languages that are described together with their annotation scheme, as well as the corpus tool ANNIS, which provides unified access to a broad variety of annotations created with a range of different tools. With the application of ANNIS to several African data collections, we illustrate its suitability for the purpose of language documentation, distributed access, and the creation of data archives.
Similar content being viewed by others
Notes
Tense/ Aspect/ Modality, cf. the discussion of auxiliary focus in Hyman and Watters (1984).
We use the open source database management system PostgreSQL (http://www.postgresql.org).
In the Hausar Baka corpus, nominal chunks are currently not annotated, so \( {\mathsf{CHUNK=}}``{\mathsf{NC}}\text{''}\) substitutes for a variety of templates matching nominal chunks.
References
Brants, T., & Plaehn, O. (2000). Interactive corpus annotation. In Proceedings of the second international conference on language resources and evaluation (LREC-2000) (pp. 453–459). Athens, Greece.
Busemann, A., & Busemann, K. (2008). Toolbox self-training. tech. rep., Summer Institute of Linguistics (SIL). http://www.sil.org/ (Version 1.5.4 Oct 2008).
Chafe, W. L. (1976). Givenness, contrastiveness, definiteness, subjects, topics and point of view. In C. N. Li (Ed.) Subject and topic (pp. 27–55). Academic Press, New York.
Chiarcos, C., Dipper, S., Götze, M., Leser, U., Lüdeling, A., Ritz, J., & Stede, M. (2008). A flexible framework for integrating annotations from different tools and tag sets. Traitement Automatique des Langues, 49(2), 271–293.
Crysmann, B. (2009). Autosegmental representations in an HPSG of Hausa. In Proceedings of the ACL-IJCNLP workshop on grammar engineering across frameworks (GEAF 2009) (pp. 28–36). Singapore.
Dipper, S. (2005). XML-based Stand-off representation and exploitation of multi-level linguistic annotation. In R. Eckstein & R. Tolksdorf (Eds.), Proceedings of Berliner XML tage (pp. 39–50).
Dipper, S., & Götze, M. (2005). Accessing heterogeneous linguistic data—generic XML-based representation and flexible visualization. In Proceedings of the 2nd language and technology conference 2005 (pp. 23–30). Poznan, Poland.
Dipper, S., Götze, M., & Skopeteas, S. (Eds.) (2007). Information structure in cross-linguistic corpora: Annotation guidelines for phonology, morphology, syntax, semantics, and information structure. Interdisciplinary Studies on Information Structure 7. Potsdam: Universitätsverlag Potsdam.
Fiedler, I. (2009). Contrastive topic marking in Gbe. In Current issues in unity and diversity of languages. Collection of papers selected from the CIL 18 (pp. 295–308). Seoul: The Linguistic Society of Korea.
Fiedler, I., Hartmann, K., Reineke, B., Schwarz, A., & Zimmermann, M. (2010). Subject Focus in West African Languages. In M. Zimmermann & C. Féry (Eds.), Information structure theoretical, typological, and experimental perspectives (pp. 234–257). Oxford: Oxford University Press.
Green, M., & Jaggar, P. (2003). Ex-situ and in-situ focus in Hausa: syntax, semantics and discourse. In J. Lecarme (Ed.), Research in Afroasiatic grammar 2 (current issues in linguistic theory) (pp. 187–213). Amsterdam: John Benjamins.
Hartmann, K., & Zimmermann, M. (2007a). Focus strategies in Chadic: The case of tangale revisited. Studia Linguistica, 61(2), 95–129.
Hartmann, K., & Zimmermann, M. (2007b). In place—Out of place? Focus in Hausa. In K. Schwabe & S. Winkler (Eds.), On information structure, meaning and form: Generalizing across languages (pp. 365–403). Benjamins: Amsterdam.
Hartmann, K., & Zimmermann, M. (2009). Morphological focus marking in Gùrùntùm (West Chadic). Lingua, 119(9), 1340–1365.
Hellwig, B., Van Uytvanck, D., & Hulsbosch, M. (2008). ELAN Linguistic annotator. Tech. rep., Max Planck Institute. http://www.lat-mpi.eu/tools/elan/ (June 13, 2011).
Hyman, L., & Watters, J. (1984). Auxiliary focus. Studies in African Linguistics, 15, 233–273.
Krifka, M. (2008). Basic notions of information structure. Acta Linguistica Hungarica, 55, 243–76.
Müller, C., & Strube, M. (2006). Multi-level annotation of linguistic data with MMAX2. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus technology and language pedagogy: New resources, new tools, new methods (pp. 197–214). Frankfurt: Peter Lang.
Newman, P. (2000). The Hausa language. An encyclopedic reference grammar. Interdisciplinary studies on information structure 4. New Haven: Yale University Press.
O’Donnell, M. (2000). RSTTool 2.4—A markup tool for rhetorical structure theory. In Proceedings of the international natural language generation conference (INLG’2000) (pp. 253–256). Mitzpe Ramon, Israel.
Orasan, C. (2003). PALinkA: a highly customisable tool for discourse annotation. In Proceedings of the 4th SIGdial workshop on discourse and dialogue (pp. 39–43). Sapporo, Japan.
Randell, R., Bature, A., & Schuh, R. (1998). Hausar Baka. http://www.humnet.ucla.edu/humnet/aflang/hausarbaka/ (June 13, 2011).
Schmidt, T. (2004). Transcribing and annotating spoken language with EXMARaLDA. In Proceedings of the LREC-workshop on XML based richly annotated corpora, Lisbon 2004 (pp. 69–74). Paris: ELRA.
Schwarz, A. (2010). Verb-and-predication focus markers in Gur. In I. Fiedler & A. Schwarz (Eds.) The expression of information structure. A documentation of its diversity across Africa. (Typological Studies in Language 91) (pp. 287–314). Amsterdam Philadelphia: John Benjamins.
Schwarz, A., & Fiedler, I. (2007). Narrative focus strategies in Gur and Kwa. In E. Aboh, K. Hartmann, & M. Zimmermann (Eds.), Focus strategies in African languages. The interaction of focus and grammar in Niger-Congo and Afro-Asiatic(pp. 267–286). Berlin: Mouton de Gruyter.
Skopeteas, S., Fiedler, I., Hellmuth, S., Schwarz, A., Stoel, R., Fanselow, G., Féry, C., & Krifka, M. (2006). Questionnaire on information structure (QUIS). Interdisciplinary studies on information structure 4. Potsdam: Universitätsverlag Potsdam.
Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd edn). San Francisco: Morgan Kaufman.
Zeldes, A., Ritz, J., Lüdeling, A., & Chiarcos, C. (2009). A search tool for multi-layer annotated corpora. In Proceedings of corpus linguistics 2009. Liverpool, UK.
Zimmermann, M. (2008). Contrastive focus and emphasis. Acta Linguistica Hungarica, 55, 347–360.
Zipser, F., & Romary, L. (2010). A model oriented approach to the mapping of annotation formats using standards. In Proceedings of the workshop on language resource and language technology standards, LREC 2010 (pp. 7–18). Malta.
Author information
Authors and Affiliations
Corresponding author
Additional information
The Collaborative Research Centre 632 “Information Structure: the linguistic means for structuring utterances, sentences and texts” is funded by the German Research Foundation. The project associations are as follows: A5 (Focus from a cross-linguistic perspective, Mira Grubic, Malte Zimmermann), B1 (Gur and Kwa languages, Ines Fiedler, Katharina Hartmann, Anne Schwarz), B2 (Chadic languages, Katharina Hartmann), D1 (Linguistic database, Christian Chiarcos, Julia Ritz, Amir Zeldes).
Rights and permissions
About this article
Cite this article
Chiarcos, C., Fiedler, I., Grubic, M. et al. Information structure in African languages: corpora and tools. Lang Resources & Evaluation 45, 361–374 (2011). https://doi.org/10.1007/s10579-011-9153-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-011-9153-0