Abstract
The feasibility of converting text into speech using an inexpensive computer with minimal memory is of great interest. Speech synthesizers have been developed for many popular languages (e.g., English, Chinese, Spanish, French, etc.), but designing a speech synthesizer for a language is largely dependant on the language structure. In this article, we develop a Persian synthesizer that includes an innovative text analyzer module. In the synthesizer, the text is segmented into words and after preprocessing, a neural network is passed over each word. In addition to preprocessing, a new model (SEHMM) is used as a postprocessor to compensate for errors generated by the neural network. The performance of the proposed model is verified and the intelligibility of the synthetic speech is assessed via listening tests.
- Ainsworth, W. A. 1973. A system for converting English text into speech. IEEE Trans. Audio and Electroacoustics 21 (1973), 288--290.Google Scholar
- Bagshaw, P. C. 1998. Phonemic transcription by analogy in text-to-speech synthesis: Novel word pronunciation and lexicon compression. Computational Linguistics 12 (1998), 119--142.Google Scholar
- El-Imam, Y. A. 1989. An unrestricted vocabulary Arabic speech synthesis system. IEEE Trans. Acoustic, Speech and Signal Processing 37 (1989), 1829--1845.Google Scholar
- Embrechts, M. J. and Arciniegas, F. 2000. Neural networks for text-to-speech phoneme recognition. In Proceedings of the IEEE Systems, Man and Cybernetics Conference. IEEE Society, 2000. 3582--3587.Google Scholar
- Lee, L.-S., Tseng, C.-Y., and Hsieh, C.-J. 1993. Improved tone concatenation rules in a formant-based Chinese text-to-speech system. IEEE Trans. Speech and Audio Processing 1 (1993), 287--294.Google Scholar
- Moulines, E. and Charpentier, F. 1990. Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9 (1990), 453--467. Google Scholar
- Rabiner, L. R. 1977. On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoustic, Speech and Signal Processing 25 (1977), 24--33.Google Scholar
- Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77 (1989), 257--286.Google Scholar
- Rabiner, L. R., Cheng, M. J., Rosenberg, A. E., and McGonegal, C. A. 1976. A comparative performance study of several pitch detection algorithms. IEEE Trans, Acoustic, Speech, and Signal Processing 24 (1976), 399--418.Google Scholar
- Sejnowski, T. J. and Rosenberg, C. R. 1987. NETtalk: Parallel networks that learn to pronounce English text. Complex Systems 1 (1987), 145--168.Google Scholar
- Selim, H. and Anbar, T. 1986. A phonetic transcription system of Arabic text. IBM Cairo Scientific Center Tech. Rep. 25.Google Scholar
- Sproat, R., Hu, J., and Chen, H. 1998. Emu: An e-mail preprocessor for text-to-speech. In Proceedings of the IEEE Workshop on Multimedia Signal Processing, 1998. 239--244.Google Scholar
- Wu, C.-H. and Chen, J.-H. 1997. Speech activated telephony e-mail reader (SATER) based on speaker verification and text-to-speech conversion. IEEE Trans. Consumer Electronics 43 (1997), 707--716. Google Scholar
Index Terms
- A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM
Recommendations
A small-footprint context-independent HMM-based synthesizer for Tamil
A text-to-speech synthesis system produces intelligible and natural speech corresponding to any given text. Two main attributes of a synthesizer are the quality of speech produced and the footprint size. In the current work, HMM-based speech ...
Speaker independent Urdu speech recognition using HMM
NLDB'10: Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systemsAutomatic Speech Recognition (ASR) is one of the advanced fields of Natural Language Processing (NLP). Recent past has witnessed valuable research activities in ASR in English, European and East Asian languages. But unfortunately South Asian Languages ...
Voice comparison between smokers and non-smokers using HMM speech recognition system
Automatic speech recognition is a technology that allows a computer to transcribe in real time spoken words into readable text. In this work an HMM automatic speech recognition system was created to detect smoker speaker. This research project is ...
Comments