ABSTRACT
The algorithm takes as only input a list of words, preferably but not necessarily in phonemic transcription, in any two putatively related languages, and sorts it into decreasing order of probable cognation. The processing of a 250-item bilingual list takes about five seconds of CPU time on a DEC KL1091, and requires 56 pages of core memory. The algorithm is given no information whatsoever about the phonemic transcription used, and even though cognate identification is carried out on the basis of a context-free one-for-one matching of individual characters, its cognation decisions are bettered by a trained linguist using more information only in cases of wordlists sharing less than 40% cognates and involving complex, multiple sound correspondences.
- Abramowitz, Milton and Irene A. Stegun. Handbook of Mathematical Functions. National Bureau of Standards, 1970. Google ScholarDigital Library
- Suhotin, B. V. Eksperimental'noe vydelenie klassov bukv s pomoshchju elektronnoj vychislitel'noj mashiny. Problemy strukturnoj lingvistiki. Moscow 1962.Google Scholar
- An algorithm for identifying cognates between related languages
Recommendations
Methods for extracting and classifying pairs of cognates and false friends
The identification of cognates has attracted the attention of researchers working in the area of Natural Language Processing, but the identification of false friends is still an under-researched area. This paper proposes novel methods for the automatic ...
Tagging Portuguese with a Spanish tagger using cognates
CrossLangInduction '06: Proceedings of the International Workshop on Cross-Language Knowledge InductionWe describe a knowledge and resource light system for an automatic morphological analysis and tagging of Brazilian Portuguese. We avoid the use of labor intensive resources; particularly, large annotated corpora and lexicons. Instead, we use (i) an ...
Identifying cognates by phonetic and semantic similarity
NAACL '01: Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologiesI present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than "orthographic" measures, such as the Longest Common Subsequence Ratio (...
Comments