article

A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM

Authors:
F. Hendessi

Isfahan University of Technology, Isfahan, Iran

Isfahan University of Technology, Isfahan, Iran
View Profile

,
A. Ghayoori

Isfahan University of Technology, Isfahan, Iran

Isfahan University of Technology, Isfahan, Iran
View Profile

,
T. A. Gulliver

University of Victoria, Victoria, B.C., Canada

University of Victoria, Victoria, B.C., Canada
View Profile

ACM Transactions on Asian Language Information Processing Volume 4 Issue 1pp 38–52https://doi.org/10.1145/1066078.1066081

Published:01 March 2005Publication History

ACM Transactions on Asian Language Information Processing

Abstract

The feasibility of converting text into speech using an inexpensive computer with minimal memory is of great interest. Speech synthesizers have been developed for many popular languages (e.g., English, Chinese, Spanish, French, etc.), but designing a speech synthesizer for a language is largely dependant on the language structure. In this article, we develop a Persian synthesizer that includes an innovative text analyzer module. In the synthesizer, the text is segmented into words and after preprocessing, a neural network is passed over each word. In addition to preprocessing, a new model (SEHMM) is used as a postprocessor to compensate for errors generated by the neural network. The performance of the proposed model is verified and the intelligibility of the synthetic speech is assessed via listening tests.

References

Ainsworth, W. A. 1973. A system for converting English text into speech. IEEE Trans. Audio and Electroacoustics 21 (1973), 288--290.Google Scholar
Bagshaw, P. C. 1998. Phonemic transcription by analogy in text-to-speech synthesis: Novel word pronunciation and lexicon compression. Computational Linguistics 12 (1998), 119--142.Google Scholar
El-Imam, Y. A. 1989. An unrestricted vocabulary Arabic speech synthesis system. IEEE Trans. Acoustic, Speech and Signal Processing 37 (1989), 1829--1845.Google Scholar
Embrechts, M. J. and Arciniegas, F. 2000. Neural networks for text-to-speech phoneme recognition. In Proceedings of the IEEE Systems, Man and Cybernetics Conference. IEEE Society, 2000. 3582--3587.Google Scholar
Lee, L.-S., Tseng, C.-Y., and Hsieh, C.-J. 1993. Improved tone concatenation rules in a formant-based Chinese text-to-speech system. IEEE Trans. Speech and Audio Processing 1 (1993), 287--294.Google Scholar
Moulines, E. and Charpentier, F. 1990. Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9 (1990), 453--467. Google Scholar
Rabiner, L. R. 1977. On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoustic, Speech and Signal Processing 25 (1977), 24--33.Google Scholar
Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77 (1989), 257--286.Google Scholar
Rabiner, L. R., Cheng, M. J., Rosenberg, A. E., and McGonegal, C. A. 1976. A comparative performance study of several pitch detection algorithms. IEEE Trans, Acoustic, Speech, and Signal Processing 24 (1976), 399--418.Google Scholar
Sejnowski, T. J. and Rosenberg, C. R. 1987. NETtalk: Parallel networks that learn to pronounce English text. Complex Systems 1 (1987), 145--168.Google Scholar
Selim, H. and Anbar, T. 1986. A phonetic transcription system of Arabic text. IBM Cairo Scientific Center Tech. Rep. 25.Google Scholar
Sproat, R., Hu, J., and Chen, H. 1998. Emu: An e-mail preprocessor for text-to-speech. In Proceedings of the IEEE Workshop on Multimedia Signal Processing, 1998. 239--244.Google Scholar
Wu, C.-H. and Chen, J.-H. 1997. Speech activated telephony e-mail reader (SATER) based on speaker verification and text-to-speech conversion. IEEE Trans. Consumer Electronics 43 (1997), 707--716. Google Scholar

Index Terms

A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

A small-footprint context-independent HMM-based synthesizer for Tamil

A text-to-speech synthesis system produces intelligible and natural speech corresponding to any given text. Two main attributes of a synthesizer are the quality of speech produced and the footprint size. In the current work, HMM-based speech ...
Read More
Speaker independent Urdu speech recognition using HMM
NLDB'10: Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems

Automatic Speech Recognition (ASR) is one of the advanced fields of Natural Language Processing (NLP). Recent past has witnessed valuable research activities in ASR in English, European and East Asian languages. But unfortunately South Asian Languages ...
Read More
Voice comparison between smokers and non-smokers using HMM speech recognition system

Automatic speech recognition is a technology that allows a computer to transcribe in real time spoken words into readable text. In this work an HMM automatic speech recognition system was created to detect smoker speaker. This research project is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 4, Issue 1
March 2005
52 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/1066078
Issue’s Table of Contents

Copyright © 2005 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2005
Published in talip Volume 4, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hidden Markov model
TD-PSOLA
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 904
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A small-footprint context-independent HMM-based synthesizer for Tamil

Speaker independent Urdu speech recognition using HMM

Voice comparison between smokers and non-smokers using HMM speech recognition system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A small-footprint context-independent HMM-based synthesizer for Tamil

Speaker independent Urdu speech recognition using HMM

Voice comparison between smokers and non-smokers using HMM speech recognition system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media