Skip to main content
Log in

Stochastic Models for Automatic Diacritics Generation of Arabic Names

  • Notes and Discusssion
  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

In this paper, two new models for generating diacritics for Arabic names are proposed. The first proposed model is called N-gram model. It is a stochastic model that is based on generating a corpus database of N-grams extracted from a large database of names with their corresponding probability according to an N-gram position in a text (Bhal et al., 1983). i.e., the probability that an N-gram has happened in a position x, where x can be the first, second,... or ith position in the text. Replacing the N-grams with their patterns extends the first model to the second proposed stochastic model. It is called the Envelope model. These two proposed models are unique in being the first attempt to solve the problem in Arabic text diacritics generation using linguistic constraints stochastic approaches that are neither grammatical nor pure lexical based (Merialdo, 1991; Ney and Essen, 1991; Schukat-Talamazzini et al., 1992; Witschel and Niedermair, 1992). This methodology helps in reducing size and complexity of software implementation of the proposed models and makes it easier to update and port across different platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • D. Abdoh (1979) Pupils Weakness in Written Arabic Texts. Symposium of Arabic Language Problem at University Levels Kuwait University Kuwait 5–10

    Google Scholar 

  • Al-Anzi, Fawaz S., Algorithmically Producing Standard Romanization of Arabic Names Using Hits from Non-standards, accepted for publication in the International Journal of Oriental Language Processing.

  • N. Ali (1988) Arabic Language and Computing Arabization Kuwait

    Google Scholar 

  • Ali N. (1992) Parsing and Automatic Diacritization of Written Arabic: A Breakthru. Proceedings of 13th National Computer Conference, Riyadh.

  • M. Al-Nahas (1981) An Entrance to Arabic Morphology Al-Falah Library Kuwait

    Google Scholar 

  • Bahrain, Kuwait, Qatar, and United Arab Emirates Official Standard Names, United States Board on Geographic Names, Defense Mapping Agency Topographic Center, Washington, DC, March 1976.

  • L.R. Bahl F. Jelinek R.L. Mercer (1983) ArticleTitleA Maximum Likelihood Approach to Continuous Speech Recognition IEEE Trans. On Pattern Analysis and Machine Intelligence 5 IssueID2 179–190

    Google Scholar 

  • Merialdo B. (1991) Tagging Text with a Probabilistic Model. Proceedings of International Conference on Acoustics, Speech, and Signal Processing,

  • Ney H., Essen U. (1991) On Smoothing Technique for Bigram-Based Natural Language Modeling. Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Toronto, pp. 825–828.

  • Report on the Current status of United Nation Romanization System for Geographical Names, Compiled by UNGEGN Working Group on Romanization Systems, Version 2.2, January 2003.

  • Roochnik P. (1993) Computer-Based Solutions to Certain Linguistic Problems Arising from the Romanization of Arabic Names, Ph.D. Dissertation, Georgetown University.

  • Sadany T., Hashish M. (1988) Semi-Automatic Vowelization of Arabic Verbs. 12th Computer Conference, Saudi Arabia.

  • B. Saliba A. Al-Danan (1989) An Approach to Automatic Vowelization of Arabic Texts Second Conference on Arabic Computational Linguistics Kuwait

    Google Scholar 

  • Schukat-Talamazzini E.G., Niemann H., Eckert W., Kuhn T., Rieck S. (1992) Acoustic Modeling of Subword Units in the ISADORA Speech Recognizer. Proceedings of International Conference on Acoustics, Speech, and Signal Processing, San Francisco, pp. 577–580.

  • Second United Nations Conference on the Standardization of Geographical Names. London, 10–31 May 1972. Vol. II. Technical papers, p. 170

  • P. Witschel G. Niedermair (1992) Experiments in Dialogue Context Dependent Language Modeling G. Gorz (Eds) KONVENS 92 Springer Berlin 395–399

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fawaz S. Al-anzi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-anzi, F.S. Stochastic Models for Automatic Diacritics Generation of Arabic Names. Comput Hum 38, 469–481 (2004). https://doi.org/10.1007/s10579-004-2323-6

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-004-2323-6

Keywords

Navigation