Speech synthesis systems: disadvantages and limitations

  • Authors

    • Karolina Kuligowska
    • PaweÅ‚ Kisielewicz
    • Aleksandra WÅ‚odarz
    2018-05-16
    https://doi.org/10.14419/ijet.v7i2.28.12933
  • Speech synthesis system, Speech synthesis limitations, Text-to-speech, TTS.
  • The present speech synthesis systems can be successfully used for a wide range of diverse purposes. However, there are serious and important limitations in using various synthesizers. Many of these problems can be identified and resolved. The aim of this paper is to present the current state of development of speech synthesis systems and to examine their drawbacks and limitations. The paper dis-cusses the current classification, construction and functioning of speech synthesis systems, which gives an insight into synthesizers implemented so far. The analysis of disadvantages and limitations of speech synthesis systems focuses on identification of weak points of these systems, namely: the impact of emotions and prosody, spontaneous speech in terms of naturalness and intelligibility, preprocessing and text analysis, problem of ambiguity, natural sounding, adaptation to the situation, variety of systems, sparsely spoken languages, speech synthesis for older people, and some other minor limitations. Solving these problems stimulates further development of speech synthesis domain.

     

  • References

    1. [1] Andersson S., Georgila K., Traum D., Aylett M., Clark R.A.J., “Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selectionâ€, 5th International Conference on Speech Prosody (Speech Prosody 2010), Chicago 2010, 1 – 2.

      [2] Aylett M.P., Yamagishi J., “Combining Statistical Parameteric Speech Synthesis and Unit-Selection for Automatic Voice Cloningâ€, Proceedings of LangTech, Rome 2008, 3.

      [3] Aylett M. P., Potard B., Pidcock Ch.J., “Expressive speech synthesis: synthesising ambiguityâ€, 8th ISCA Workshop on Speech Synthesis (SSW-8), ISCA, Barcelona 2013, 217.

      [4] Balyan A., Agrawal S.S., Dev A., “Speech Synthesis: A Reviewâ€, International Journal of Engineering Research and Technology IJERT, Vol. 2, Issue 6, 2013, 57 – 75.

      [5] Bellegarda J.R., “Toward Naturally Expressive Speech Synthesis: Data-Driven Emotion Detection Using Latent Affective Analysisâ€, 7th ISCA Workshop on Speech Synthesis (SSW-7), ISCA, Kyoto 2010, 200.

      [6] Chandra E., Akila A., “An Overview of Speech Recognition and Speech Synthesis Algorithmsâ€, International Journal of Computer Technology and Applications, Vol.3, Issue 4, 2012, 1427.

      [7] Chauhan A., Chauhan V., Singh G., Choudhary C., Arya P., “Design and Development of a Text-To-Speech Synthesizer Systemâ€, International Journal of Electronics and Communication Technology, Vol. 2, Issue 3, 2011, 42 – 44.

      [8] Demenko G., Wagner A. (eds.), Speech and Language Technology, vol. 14/15, Polskie Towarzystwo Fonetyczne, Poznań 2012, 32.

      [9] Gruhn R.E., Minker W., Nakamura S., Statistical Pronunciation Modeling for Non-Native Speech Processing, Signals and Communication Technology, Springer-Verlag Berlin Heidelberg, 2011, 15 – 17.

      [10] Hankins T.L., Silverman R.J., Instruments and the imagination, Princeton University Press, 1995, 186.

      [11] Hinterleitner F., Norrenbrock Ch. R., Moller S., “Is Intelligibility Still the Main Problem? A Review of Perceptual Quality Dimensions of Synthetic Speechâ€, 8th ISCA Workshop on Speech Synthesis (SSW-8), Barcelona 2013, 147.

      [12] Indumathi A., Chandra E., “Survey on Speech Synthesisâ€, Signal Processing: An International Journal - SPIJ, Vol. 6, Issue 5, 2012, 140 – 145.

      [13] Kacprzak S., „Inteligentne metody rozpoznawania dźwiÄ™kuâ€, master’s thesis, WydziaÅ‚ Fizyki Technicznej i Matematyki Stosowanej Politechniki Åódzkiej, Åódź 2010, 13.

      [14] Kuczmarski T., “Overview of HMM-based Speech Synthesis Methodsâ€, [in:] Demenko G., Wagner A. (eds.), Speech and Language Technology, vol. 14/15, Polskie Towarzystwo Fonetyczne, PoznaÅ„ 2012, 32 – 35.

      [15] Nabożny A., „Przygotowanie korpusu do projektu korpusowego syntezatora mowyâ€, engineering thesis, WydziaÅ‚ Inżynierii Mechanicznej i Robotyki Akademii Górniczo-Hutniczej, Kraków 2014, 14 – 18.

      [16] Obin N., Lanchantin P., Avanzi M., Lacheret-Dujour A., Rodet X., “Toward improved HMM-based speech synthesis using high-level syntactical featuresâ€, Speech Prosody 2010 Conference Proceedings, Chicago 2010, 2000.

      [17] Ohala J.J., “Christian Gottlieb Kratzenstein: pioneer in speech synthesisâ€, 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong 2011, 156 – 159.

      [18] Padda S., Bhalla N., Kaur R., “A Step towards Making an Effective Text to speech Conversion Systemâ€, International Journal of Engineering Research and Applications, vol. 2, issue 2, 2012, 1242 – 1244.

      [19] Parssinen K., “Multilingual Text-to-Speech System for Mobile Devices: Development and Applicationsâ€, doctoral dissertation, Tampere University of Technology, Tampere 2007, 18 – 42.

      [20] Rabiner L.R., Schafer R.W, “Introduction to Digital Speech Processingâ€, Foundations and Trends in Signal Processing, vol. 1, Issue 1–2, Now Publishers, 2007, 12 – 158.

      [21] Raj A.A., Sarkar R., Pammi S.C., Yuvaraj S., Bansal M., Prahallad K., Black A.W., “Text Processing for Text-to-Speech Systems in Indian Languagesâ€, 6th ISCA Workshop on Speech Synthesis (SSW-6), ISCA, Bonn 2007, 188.

      [22] Raptis S., Chalamandaris A., Tsiakoulis P., Karabetsos S., “The ILSP Text-to-Speech System for the Blizzard Challenge 2012â€, Proceedings of Blizzard Challenge 2012, Portland, Oregon, USA 2012, 1 – 6.

      [23] Rehm G., Uszkoreit H. (eds.), The Polish Language in the Digital Age, Springer 2012, 25.

      [24] Saheer L., Potard B., “Understanding Factors in Emotion Perceptionâ€, 8th ISCA Workshop on Speech Synthesis (SSW-8), Barcelona 2013, 59.

      [25] San-Segundo R., Montero J.M., Giurgiu M., Muresan I., King S., “Multilingual Number Transcription for Text-to-Speech Conversionâ€, 8th ISCA Workshop on Speech Synthesis (SSW-8), ISCA, Barcelona 2013, 65.

      [26] Schroeder M., “Expressive Speech Synthesis: Past, Present, and Possible Futuresâ€, [in:] Tao J., Tan T. (ed.), Affective Information Processing, Springer, London 2009, 111 – 116.

      [27] Schroeder M.R., “A Brief History of Synthetic Speechâ€, Speech Communication, vol. 13, issue 1-2, Elsevier, 1993, 231 – 237.

      [28] Schroeter J., “Text to-Speech (TTS) Synthesisâ€, [in:] Dorf R. C. (ed.), Circuits, Signals, and Speech and Image Processing, CRC Press, 2006, 163.

      [29] Sitaram S., Anumanchipalli G. K., Chiu J., Parlikar A., Black A. W., “Text to Speech in New Languages without a Standardized Orthography, Multilingual Number Transcription for Text-to-Speech Conversionâ€, 8th ISCA Workshop on Speech Synthesis (SSW-8), Barcelona 2013, 95.

      [30] Szklanny K., “System korpusowej syntezy mowy dla jÄ™zyka polskiegoâ€, [in:] XI International PhD Workshop OWD 2009, Conference Archives PTETiS, Vol. 26, WisÅ‚a 2009, 235 – 240.

      [31] Tang H., Zhou X., Odisio M., Hasegawa-Johnson M., Huang T. S., “Two-Stage Prosody Prediction for Emotional Text-to-Speech Synthesisâ€, 9th Annual Conference of the International Speech Communication Association (Interspeech 2008), Brisbane 2008, 1 – 2.

      [32] Tatham M., Morton K., Developments in Speech Synthesis, Wiley, 2005, 143 – 144.

      [33] Taylor P., Text-to-Speech Synthesis, Cambridge University Press, 2009, 1 – 50.

      [34] Thakur B.K., Chettri B., Shah K.B., “Current Trends, Frameworks and Techniques Used in Speech Synthesis - A Surveyâ€, International Journal of Soft Computing and Engineering IJSCE, Volume 2, Issue 2, 2012, 444 – 445.

      [35] Violante L., Rodriguez Zivic P., Gravano A., “Improving speech synthesis quality by reducing pitch peaks in the source recordingsâ€, Proceedings of NAACL-HLT 2013, Association for Computational Linguistics, Atlanta, Georgia 2013, 502.

      [36] Wagner A., “A comprehensive model of intonation for application in speech synthesisâ€, doctoral dissertation, WydziaÅ‚ Neofilologii Uniwersytetu im. Adama Mickiewicza w Poznaniu, PoznaÅ„ 2008, 129 – 137.

      [37] Wang D., King S., “Letter-to-sound Pronunciation Prediction Using Conditional Random Fieldsâ€, IEEE Signal Processing Letters, vol. 18, No. 2, 2011, 39.

      [38] Wolters M., Campbell P., DePlacido C., Liddell A., Owens D., “Making Speech Synthesis More Accessible to Older Peopleâ€, 6th ISCA Workshop on Speech Synthesis (SSW-6), ISCA, Bonn 2007, 1 – 2.

      [39] Yang C.Y., Chen C.P., “A Hidden Markov Model-Based Approach for Emotional Speech Synthesisâ€, 7th ISCA Workshop on Speech Synthesis (SSW-7), ISCA, Kyoto 2010, 126 – 129.

      [40] Yang Wang W., Georgila K., “Automatic Detection of Unnatural Word-Level Segments in Unit-Selection Speech Synthesisâ€, [in:] Nahamoo D., Picheny M., (eds.), 2011 IEEE Workshop on Automatic Speech Recognition Understanding, ASRU 2011, 289.

      [41] Zen H., Senior A., Schuster M., “Statistical parametric speech synthesis using deep neural networksâ€, Proceedings of 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver 2013, 7962.

  • Downloads

  • How to Cite

    Kuligowska, K., Kisielewicz, P., & WÅ‚odarz, A. (2018). Speech synthesis systems: disadvantages and limitations. International Journal of Engineering & Technology, 7(2.28), 234-239. https://doi.org/10.14419/ijet.v7i2.28.12933