skip to main content
article

A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM

Published:01 March 2005Publication History
Skip Abstract Section

Abstract

The feasibility of converting text into speech using an inexpensive computer with minimal memory is of great interest. Speech synthesizers have been developed for many popular languages (e.g., English, Chinese, Spanish, French, etc.), but designing a speech synthesizer for a language is largely dependant on the language structure. In this article, we develop a Persian synthesizer that includes an innovative text analyzer module. In the synthesizer, the text is segmented into words and after preprocessing, a neural network is passed over each word. In addition to preprocessing, a new model (SEHMM) is used as a postprocessor to compensate for errors generated by the neural network. The performance of the proposed model is verified and the intelligibility of the synthetic speech is assessed via listening tests.

References

  1. Ainsworth, W. A. 1973. A system for converting English text into speech. IEEE Trans. Audio and Electroacoustics 21 (1973), 288--290.Google ScholarGoogle Scholar
  2. Bagshaw, P. C. 1998. Phonemic transcription by analogy in text-to-speech synthesis: Novel word pronunciation and lexicon compression. Computational Linguistics 12 (1998), 119--142.Google ScholarGoogle Scholar
  3. El-Imam, Y. A. 1989. An unrestricted vocabulary Arabic speech synthesis system. IEEE Trans. Acoustic, Speech and Signal Processing 37 (1989), 1829--1845.Google ScholarGoogle Scholar
  4. Embrechts, M. J. and Arciniegas, F. 2000. Neural networks for text-to-speech phoneme recognition. In Proceedings of the IEEE Systems, Man and Cybernetics Conference. IEEE Society, 2000. 3582--3587.Google ScholarGoogle Scholar
  5. Lee, L.-S., Tseng, C.-Y., and Hsieh, C.-J. 1993. Improved tone concatenation rules in a formant-based Chinese text-to-speech system. IEEE Trans. Speech and Audio Processing 1 (1993), 287--294.Google ScholarGoogle Scholar
  6. Moulines, E. and Charpentier, F. 1990. Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9 (1990), 453--467. Google ScholarGoogle Scholar
  7. Rabiner, L. R. 1977. On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoustic, Speech and Signal Processing 25 (1977), 24--33.Google ScholarGoogle Scholar
  8. Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77 (1989), 257--286.Google ScholarGoogle Scholar
  9. Rabiner, L. R., Cheng, M. J., Rosenberg, A. E., and McGonegal, C. A. 1976. A comparative performance study of several pitch detection algorithms. IEEE Trans, Acoustic, Speech, and Signal Processing 24 (1976), 399--418.Google ScholarGoogle Scholar
  10. Sejnowski, T. J. and Rosenberg, C. R. 1987. NETtalk: Parallel networks that learn to pronounce English text. Complex Systems 1 (1987), 145--168.Google ScholarGoogle Scholar
  11. Selim, H. and Anbar, T. 1986. A phonetic transcription system of Arabic text. IBM Cairo Scientific Center Tech. Rep. 25.Google ScholarGoogle Scholar
  12. Sproat, R., Hu, J., and Chen, H. 1998. Emu: An e-mail preprocessor for text-to-speech. In Proceedings of the IEEE Workshop on Multimedia Signal Processing, 1998. 239--244.Google ScholarGoogle Scholar
  13. Wu, C.-H. and Chen, J.-H. 1997. Speech activated telephony e-mail reader (SATER) based on speaker verification and text-to-speech conversion. IEEE Trans. Consumer Electronics 43 (1997), 707--716. Google ScholarGoogle Scholar

Index Terms

  1. A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian Language Information Processing
        ACM Transactions on Asian Language Information Processing  Volume 4, Issue 1
        March 2005
        52 pages
        ISSN:1530-0226
        EISSN:1558-3430
        DOI:10.1145/1066078
        Issue’s Table of Contents

        Copyright © 2005 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 March 2005
        Published in talip Volume 4, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader