Stochastic Models for Automatic Diacritics Generation of Arabic Names

Al-anzi, Fawaz S.

doi:10.1007/s10579-004-2323-6

Stochastic Models for Automatic Diacritics Generation of Arabic Names

Notes and Discusssion
Published: November 2004

Volume 38, pages 469–481, (2004)
Cite this article

Computers and the Humanities Aims and scope Submit manuscript

Fawaz S. Al-anzi¹

60 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, two new models for generating diacritics for Arabic names are proposed. The first proposed model is called N-gram model. It is a stochastic model that is based on generating a corpus database of N-grams extracted from a large database of names with their corresponding probability according to an N-gram position in a text (Bhal et al., 1983). i.e., the probability that an N-gram has happened in a position x, where x can be the first, second,... or ith position in the text. Replacing the N-grams with their patterns extends the first model to the second proposed stochastic model. It is called the Envelope model. These two proposed models are unique in being the first attempt to solve the problem in Arabic text diacritics generation using linguistic constraints stochastic approaches that are neither grammatical nor pure lexical based (Merialdo, 1991; Ney and Essen, 1991; Schukat-Talamazzini et al., 1992; Witschel and Niedermair, 1992). This methodology helps in reducing size and complexity of software implementation of the proposed models and makes it easier to update and port across different platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization

Article 27 October 2015

The Generative Power of Arabic Morphology and Implications: A Case for Pattern Orientation in Arabic Corpus Annotation and a Proposed Pattern Ontology

Code Switch Point Detection in Arabic

References

D. Abdoh (1979) Pupils Weakness in Written Arabic Texts. Symposium of Arabic Language Problem at University Levels Kuwait University Kuwait 5–10
Google Scholar
Al-Anzi, Fawaz S., Algorithmically Producing Standard Romanization of Arabic Names Using Hits from Non-standards, accepted for publication in the International Journal of Oriental Language Processing.
N. Ali (1988) Arabic Language and Computing Arabization Kuwait
Google Scholar
Ali N. (1992) Parsing and Automatic Diacritization of Written Arabic: A Breakthru. Proceedings of 13th National Computer Conference, Riyadh.
M. Al-Nahas (1981) An Entrance to Arabic Morphology Al-Falah Library Kuwait
Google Scholar
Bahrain, Kuwait, Qatar, and United Arab Emirates Official Standard Names, United States Board on Geographic Names, Defense Mapping Agency Topographic Center, Washington, DC, March 1976.
L.R. Bahl F. Jelinek R.L. Mercer (1983) ArticleTitleA Maximum Likelihood Approach to Continuous Speech Recognition IEEE Trans. On Pattern Analysis and Machine Intelligence 5 IssueID2 179–190
Google Scholar
Merialdo B. (1991) Tagging Text with a Probabilistic Model. Proceedings of International Conference on Acoustics, Speech, and Signal Processing,
Ney H., Essen U. (1991) On Smoothing Technique for Bigram-Based Natural Language Modeling. Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Toronto, pp. 825–828.
Report on the Current status of United Nation Romanization System for Geographical Names, Compiled by UNGEGN Working Group on Romanization Systems, Version 2.2, January 2003.
Roochnik P. (1993) Computer-Based Solutions to Certain Linguistic Problems Arising from the Romanization of Arabic Names, Ph.D. Dissertation, Georgetown University.
Sadany T., Hashish M. (1988) Semi-Automatic Vowelization of Arabic Verbs. 12th Computer Conference, Saudi Arabia.
B. Saliba A. Al-Danan (1989) An Approach to Automatic Vowelization of Arabic Texts Second Conference on Arabic Computational Linguistics Kuwait
Google Scholar
Schukat-Talamazzini E.G., Niemann H., Eckert W., Kuhn T., Rieck S. (1992) Acoustic Modeling of Subword Units in the ISADORA Speech Recognizer. Proceedings of International Conference on Acoustics, Speech, and Signal Processing, San Francisco, pp. 577–580.
Second United Nations Conference on the Standardization of Geographical Names. London, 10–31 May 1972. Vol. II. Technical papers, p. 170
P. Witschel G. Niedermair (1992) Experiments in Dialogue Context Dependent Language Modeling G. Gorz (Eds) KONVENS 92 Springer Berlin 395–399
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Kuwait University, P.O. Box 5969, Safat, 13060, Kuwait
Fawaz S. Al-anzi

Authors

Fawaz S. Al-anzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fawaz S. Al-anzi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-anzi, F.S. Stochastic Models for Automatic Diacritics Generation of Arabic Names. Comput Hum 38, 469–481 (2004). https://doi.org/10.1007/s10579-004-2323-6

Download citation

Issue Date: November 2004
DOI: https://doi.org/10.1007/s10579-004-2323-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic Models for Automatic Diacritics Generation of Arabic Names

Abstract

Access this article

Similar content being viewed by others

Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization

The Generative Power of Arabic Morphology and Implications: A Case for Pattern Orientation in Arabic Corpus Annotation and a Proposed Pattern Ontology

Code Switch Point Detection in Arabic

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stochastic Models for Automatic Diacritics Generation of Arabic Names

Abstract

Access this article

Similar content being viewed by others

Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization

The Generative Power of Arabic Morphology and Implications: A Case for Pattern Orientation in Arabic Corpus Annotation and a Proposed Pattern Ontology

Code Switch Point Detection in Arabic

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation