Speech synthesis systems: disadvantages and limitations

Karolina Kuligowska; PaweÅ‚ Kisielewicz; Aleksandra WÅ‚odarz

doi:10.14419/ijet.v7i2.28.12933

Article Summary Keywords Abstract References Full Article How to cite

Authors
- Karolina Kuligowska
- PaweÅ‚ Kisielewicz
- Aleksandra WÅ‚odarz
2018-05-16

https://doi.org/10.14419/ijet.v7i2.28.12933
Speech synthesis system, Speech synthesis limitations, Text-to-speech, TTS.
The present speech synthesis systems can be successfully used for a wide range of diverse purposes. However, there are serious and important limitations in using various synthesizers. Many of these problems can be identified and resolved. The aim of this paper is to present the current state of development of speech synthesis systems and to examine their drawbacks and limitations. The paper dis-cusses the current classification, construction and functioning of speech synthesis systems, which gives an insight into synthesizers implemented so far. The analysis of disadvantages and limitations of speech synthesis systems focuses on identification of weak points of these systems, namely: the impact of emotions and prosody, spontaneous speech in terms of naturalness and intelligibility, preprocessing and text analysis, problem of ambiguity, natural sounding, adaptation to the situation, variety of systems, sparsely spoken languages, speech synthesis for older people, and some other minor limitations. Solving these problems stimulates further development of speech synthesis domain.
Â
References
1. [1] Andersson S., Georgila K., Traum D., Aylett M., Clark R.A.J., â€œPrediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selectionâ€, 5th International Conference on Speech Prosody (Speech Prosody 2010), Chicago 2010, 1 â€“ 2.
  [2] Aylett M.P., Yamagishi J., â€œCombining Statistical Parameteric Speech Synthesis and Unit-Selection for Automatic Voice Cloningâ€, Proceedings of LangTech, Rome 2008, 3.
  [3] Aylett M. P., Potard B., Pidcock Ch.J., â€œExpressive speech synthesis: synthesising ambiguityâ€, 8th ISCA Workshop on Speech Synthesis (SSW-8), ISCA, Barcelona 2013, 217.
  [4] Balyan A., Agrawal S.S., Dev A., â€œSpeech Synthesis: A Reviewâ€, International Journal of Engineering Research and Technology IJERT, Vol. 2, Issue 6, 2013, 57 â€“ 75.
  [5] Bellegarda J.R., â€œToward Naturally Expressive Speech Synthesis: Data-Driven Emotion Detection Using Latent Affective Analysisâ€, 7th ISCA Workshop on Speech Synthesis (SSW-7), ISCA, Kyoto 2010, 200.
  [6] Chandra E., Akila A., â€œAn Overview of Speech Recognition and Speech Synthesis Algorithmsâ€, International Journal of Computer Technology and Applications, Vol.3, Issue 4, 2012, 1427.
  [7] Chauhan A., Chauhan V., Singh G., Choudhary C., Arya P., â€œDesign and Development of a Text-To-Speech Synthesizer Systemâ€, International Journal of Electronics and Communication Technology, Vol. 2, Issue 3, 2011, 42 â€“ 44.
  [8] Demenko G., Wagner A. (eds.), Speech and Language Technology, vol. 14/15, Polskie Towarzystwo Fonetyczne, PoznaÅ„ 2012, 32.
  [9] Gruhn R.E., Minker W., Nakamura S., Statistical Pronunciation Modeling for Non-Native Speech Processing, Signals and Communication Technology, Springer-Verlag Berlin Heidelberg, 2011, 15 â€“ 17.
  [10] Hankins T.L., Silverman R.J., Instruments and the imagination, Princeton University Press, 1995, 186.
  [11] Hinterleitner F., Norrenbrock Ch. R., Moller S., â€œIs Intelligibility Still the Main Problem? A Review of Perceptual Quality Dimensions of Synthetic Speechâ€, 8th ISCA Workshop on Speech Synthesis (SSW-8), Barcelona 2013, 147.
  [12] Indumathi A., Chandra E., â€œSurvey on Speech Synthesisâ€, Signal Processing: An International Journal - SPIJ, Vol. 6, Issue 5, 2012, 140 â€“ 145.
  [13] Kacprzak S., â€žInteligentne metody rozpoznawania dÅºwiÄ™kuâ€, masterâ€™s thesis, WydziaÅ‚ Fizyki Technicznej i Matematyki Stosowanej Politechniki ÅÃ³dzkiej, ÅÃ³dÅº 2010, 13.
  [14] Kuczmarski T., â€œOverview of HMM-based Speech Synthesis Methodsâ€, [in:] Demenko G., Wagner A. (eds.), Speech and Language Technology, vol. 14/15, Polskie Towarzystwo Fonetyczne, PoznaÅ„ 2012, 32 â€“ 35.
  [15] NaboÅ¼ny A., â€žPrzygotowanie korpusu do projektu korpusowego syntezatora mowyâ€, engineering thesis, WydziaÅ‚ InÅ¼ynierii Mechanicznej i Robotyki Akademii GÃ³rniczo-Hutniczej, KrakÃ³w 2014, 14 â€“ 18.
  [16] Obin N., Lanchantin P., Avanzi M., Lacheret-Dujour A., Rodet X., â€œToward improved HMM-based speech synthesis using high-level syntactical featuresâ€, Speech Prosody 2010 Conference Proceedings, Chicago 2010, 2000.
  [17] Ohala J.J., â€œChristian Gottlieb Kratzenstein: pioneer in speech synthesisâ€, 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong 2011, 156 â€“ 159.
  [18] Padda S., Bhalla N., Kaur R., â€œA Step towards Making an Effective Text to speech Conversion Systemâ€, International Journal of Engineering Research and Applications, vol. 2, issue 2, 2012, 1242 â€“ 1244.
  [19] Parssinen K., â€œMultilingual Text-to-Speech System for Mobile Devices: Development and Applicationsâ€, doctoral dissertation, Tampere University of Technology, Tampere 2007, 18 â€“ 42.
  [20] Rabiner L.R., Schafer R.W, â€œIntroduction to Digital Speech Processingâ€, Foundations and Trends in Signal Processing, vol. 1, Issue 1â€“2, Now Publishers, 2007, 12 â€“ 158.
  [21] Raj A.A., Sarkar R., Pammi S.C., Yuvaraj S., Bansal M., Prahallad K., Black A.W., â€œText Processing for Text-to-Speech Systems in Indian Languagesâ€, 6th ISCA Workshop on Speech Synthesis (SSW-6), ISCA, Bonn 2007, 188.
  [22] Raptis S., Chalamandaris A., Tsiakoulis P., Karabetsos S., â€œThe ILSP Text-to-Speech System for the Blizzard Challenge 2012â€, Proceedings of Blizzard Challenge 2012, Portland, Oregon, USA 2012, 1 â€“ 6.
  [23] Rehm G., Uszkoreit H. (eds.), The Polish Language in the Digital Age, Springer 2012, 25.
  [24] Saheer L., Potard B., â€œUnderstanding Factors in Emotion Perceptionâ€, 8th ISCA Workshop on Speech Synthesis (SSW-8), Barcelona 2013, 59.
  [25] San-Segundo R., Montero J.M., Giurgiu M., Muresan I., King S., â€œMultilingual Number Transcription for Text-to-Speech Conversionâ€, 8th ISCA Workshop on Speech Synthesis (SSW-8), ISCA, Barcelona 2013, 65.
  [26] Schroeder M., â€œExpressive Speech Synthesis: Past, Present, and Possible Futuresâ€, [in:] Tao J., Tan T. (ed.), Affective Information Processing, Springer, London 2009, 111 â€“ 116.
  [27] Schroeder M.R., â€œA Brief History of Synthetic Speechâ€, Speech Communication, vol. 13, issue 1-2, Elsevier, 1993, 231 â€“ 237.
  [28] Schroeter J., â€œText to-Speech (TTS) Synthesisâ€, [in:] Dorf R. C. (ed.), Circuits, Signals, and Speech and Image Processing, CRC Press, 2006, 163.
  [29] Sitaram S., Anumanchipalli G. K., Chiu J., Parlikar A., Black A. W., â€œText to Speech in New Languages without a Standardized Orthography, Multilingual Number Transcription for Text-to-Speech Conversionâ€, 8th ISCA Workshop on Speech Synthesis (SSW-8), Barcelona 2013, 95.
  [30] Szklanny K., â€œSystem korpusowej syntezy mowy dla jÄ™zyka polskiegoâ€, [in:] XI International PhD Workshop OWD 2009, Conference Archives PTETiS, Vol. 26, WisÅ‚a 2009, 235 â€“ 240.
  [31] Tang H., Zhou X., Odisio M., Hasegawa-Johnson M., Huang T. S., â€œTwo-Stage Prosody Prediction for Emotional Text-to-Speech Synthesisâ€, 9th Annual Conference of the International Speech Communication Association (Interspeech 2008), Brisbane 2008, 1 â€“ 2.
  [32] Tatham M., Morton K., Developments in Speech Synthesis, Wiley, 2005, 143 â€“ 144.
  [33] Taylor P., Text-to-Speech Synthesis, Cambridge University Press, 2009, 1 â€“ 50.
  [34] Thakur B.K., Chettri B., Shah K.B., â€œCurrent Trends, Frameworks and Techniques Used in Speech Synthesis - A Surveyâ€, International Journal of Soft Computing and Engineering IJSCE, Volume 2, Issue 2, 2012, 444 â€“ 445.
  [35] Violante L., Rodriguez Zivic P., Gravano A., â€œImproving speech synthesis quality by reducing pitch peaks in the source recordingsâ€, Proceedings of NAACL-HLT 2013, Association for Computational Linguistics, Atlanta, Georgia 2013, 502.
  [36] Wagner A., â€œA comprehensive model of intonation for application in speech synthesisâ€, doctoral dissertation, WydziaÅ‚ Neofilologii Uniwersytetu im. Adama Mickiewicza w Poznaniu, PoznaÅ„ 2008, 129 â€“ 137.
  [37] Wang D., King S., â€œLetter-to-sound Pronunciation Prediction Using Conditional Random Fieldsâ€, IEEE Signal Processing Letters, vol. 18, No. 2, 2011, 39.
  [38] Wolters M., Campbell P., DePlacido C., Liddell A., Owens D., â€œMaking Speech Synthesis More Accessible to Older Peopleâ€, 6th ISCA Workshop on Speech Synthesis (SSW-6), ISCA, Bonn 2007, 1 â€“ 2.
  [39] Yang C.Y., Chen C.P., â€œA Hidden Markov Model-Based Approach for Emotional Speech Synthesisâ€, 7th ISCA Workshop on Speech Synthesis (SSW-7), ISCA, Kyoto 2010, 126 â€“ 129.
  [40] Yang Wang W., Georgila K., â€œAutomatic Detection of Unnatural Word-Level Segments in Unit-Selection Speech Synthesisâ€, [in:] Nahamoo D., Picheny M., (eds.), 2011 IEEE Workshop on Automatic Speech Recognition Understanding, ASRU 2011, 289.
  [41] Zen H., Senior A., Schuster M., â€œStatistical parametric speech synthesis using deep neural networksâ€, Proceedings of 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver 2013, 7962.
Downloads
How to Cite
Kuligowska, K., Kisielewicz, P., & WÅ‚odarz, A. (2018). Speech synthesis systems: disadvantages and limitations. International Journal of Engineering & Technology, 7(2.28), 234-239. https://doi.org/10.14419/ijet.v7i2.28.12933
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX

Speech synthesis systems: disadvantages and limitations

Authors

References

Downloads

How to Cite

Published