Informatics and Applications

2014, Volume 8, Issue 2, pp 98-110

INFORMATION TECHNOLOGIES FOR CORPUS STUDIES:
UNDERPINNINGS FOR CROSS-LINGUISTIC DATABASE CREATION

  • N. V. Buntman
  • Anna A. Zaliznyak
  • I. M. Zatsman
  • M. G. Kruzhkov
  • E. Yu. Loshchilova
  • D. V. Sitchinava

Abstract

Information technology for creation of cross-linguistic databases ofRussian texts with French translations (also known as parallel texts) is considered. The underlying principles of the developed database provide a unique combination of three types of bilingual search: lexical, grammatical, and lexico-grammatical. A distinctive feature of the considered technology is simultaneous creation of Russian-French parallel subcorpus within the National Russian Corpus and of the cross-linguistic database of Russian verbal lexico-grammatical forms and their French functional equivalents. The subcorpus and the database have different levels of alignment: the former is aligned at the level of sentences, and the later at the level of constructions. The academic relevance of the developed database is due to its support of bilingual contrastive grammar development, as well as to its role in creation of Russian grammar based on the modern empirical base and information technologies of corpus linguistics. The main practical application of the database consists in improvement of quality of machine translation.

[+] References (23)

[+] About this article