Abstract
The objective of this work is to develop a rule-based emotion conversion method for a better emotional perception. In this work, performance of emotion conversion using the linear modification model is improved by using vowel-based non-uniform prosody modification. In the present approach, attempts were made to integrate features like position and identity for addressing the non-uniformity in prosody generated due to the emotional state of the speaker. We mainly concentrate on the parameters such as strength, duration and pitch contour of vowels at different parts of the sentence. The influence of emotions on the above parameters is exploited to convert the speech from neutral emotion to the target emotion. Non-uniform prosody modification factors for emotion conversion are based on the position of vowels in the word, and the position of the word in the sentence. This study is carried out by using Indian Institute of Technology-Simulated Emotion speech corpus. Evaluation of the proposed algorithm is carried out by a subjective listening test. From the listening tests, it is observed that the performance of the proposed approach is better than the existing approaches.
Similar content being viewed by others
References
M. Bulut, S.S. Narayanan, A.K. Syrdal, Expressive speech synthesis using a concatenative synthesizer, in Proceedings of International Conferences Spoken Language Processing, vol. 2, pp. 1265–1268 (2002)
J.P. Cabral, L.C. Oliveira, Emovoice: a system to generate emotions in speech, in INTERSPEECH (2006)
E. Eide, A. Aaron, R. Bakis, W. Hamza, M. Picheny, J. Pitrelli, A corpus-based approach to expressive speech synthesis, in Fifth ISCA Workshop on Speech Synthesis (2004)
D. Govind, S.R.M. Prasanna, Expressive speech synthesis using prosodic modification and dynamic time warping, in NCC 2009 (Guwahati, India), pp. 290–293 (2009)
D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in INTERSPEECH, pp. 2969–2972 (2011)
S.G. Koolagudi, S. Maity, A.K. Vuppala, S. Chakrabarti, K.S. Rao, IITKGP-SESC: speech database for emotion analysis, in Contemporary Computing (Springer, 2009), pp. 485–492
I.R. Murray, J.L. Arnott, Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Commun. 16(4), 369–390 (1995)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Speech Audio Lang. Process. 16(8), 1602–1613 (2008)
A. Paeschke, W.F. Sendlmeier, Prosodic characteristics of emotional speech: measurements of fundamental frequency movements, in ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion, pp. 75–80 (2000)
J.F. Pitrelli, R. Bakis, E.M. Eide, R. Fernandez, W. Hamza, M.A. Picheny, The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Speech Audio Lang. Process. 14(4), 1099–1108 (2006)
S.R.M. Prasanna, D. Govind, Analysis of excitation source information in emotional speech, in INTERSPEECH, pp. 781–784 (2010)
S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Speech Audio Lang. Process. 17, 556–565 (2009)
K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51(12), 1263–1269 (2009)
K.S. Rao, A.K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun. 55(6), 745–756 (2013)
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Speech Audio Lang. Process. 14(3), 972–980 (2006)
J. Tao, Y. Kang, A. Li, Prosody conversion from neutral speech to emotional speech. IEEE Trans. Speech Audio Lang. Process. 14(4), 1145–1154 (2006)
A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEU Int. J. Electron. Commun. 66(8), 697–700 (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vydana, H.K., Kadiri, S.R. & Vuppala, A.K. Vowel-Based Non-uniform Prosody Modification for Emotion Conversion. Circuits Syst Signal Process 35, 1643–1663 (2016). https://doi.org/10.1007/s00034-015-0134-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-015-0134-1