Rhythm Speech Lyrics Input for MIDI-Based Singing Voice Synthesis

Lee, Hong-Ru; Huang, Chih-Fang; Hsu, Chih-Hao; Wang, Wen-Nan

doi:10.1007/978-3-642-10467-1_40

Rhythm Speech Lyrics Input for MIDI-Based Singing Voice Synthesis

Hong-Ru Lee²²,
Chih-Fang Huang²³,
Chih-Hao Hsu²⁴ &
…
Wen-Nan Wang²⁴

Conference paper

1351 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5879))

Abstract

This paper presents useful techniques and considerations in implementing underlying mandarin singing voice synthesis system using the RSLI unit. The system can receive the continuous speech of the lyrics of a song, and can synthesize the intended song based on the MIDI-based music database. This system is designed based on 3 units.. The first one is the input unit which allows the user specifies a musical score and phonetically-spelled lyrics to system. The second one is the modified unit and it is employed to implement the pitch-shifting function using the PSOLA method. The third one is the mixed unit which has some undesirable artificial-sounding buzzy-effects, including echo and vibrato effects. Moreover, the energy, duration, and spectrum modifications are also implemented in the mixed unit. The synthesized singing voice sounds reasonably good. From the subjective listening test, the MOS (mean opinion score) of 3.3 and 3.2 are obtained for the synthesized singing voices and the similarity of singer’s voice, respectively.

This research conducts in according to Services Oriented Machine Open Platform Software Development Project executed by Institute for information industry subsidized by Economic departments of Taiwan.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bennett, G., Rodet, X.: Synthesis of the singing voice. In: Mathews, M.V., Pierce, J.R. (eds.) Current Directions in Computer Music Research, pp. 19–44. MIT Press, Cambridge (1989)
Google Scholar
Jang, J.-S.R., Jang, Y.-S.: Micro-controller Implementation of Melody Recognition. In: The 13th ACM Multimedia Conference (Demo paper), Berkeley, CA, USA (November 2003)
Google Scholar
Lin, C.-Y., Jang, J.-S.R., Hsu, M.-Y.: An Automatic Singing Voice Rectifier. In: The 13th ACM Multimedia Conference (Poster paper, acceptance rate 17%), Berkeley, CA, USA (November 2003)
Google Scholar
Jang, J.-S.R., Lee, H.-R., Yeh, C.-H.: Query by Tapping: A New Paradigm for Content-based Music Retrieval from Acoustic Input. In: The Second IEEE Pacific-Rim Conference on Multimedia, Beijing, China (October 2001)
Google Scholar
Lin, C.-Y., Roger Jang, J.-S.: Automatic Phonetic Segmentation by Score Predictive Model for the Corpora of Mandarin Singing Voices. IEEE Transactions on Speech, Audio, and Language Processing (2007)
Google Scholar
Chou, F.C., Tseng, C.Y., Lee, L.S.: A Set of Corpus-based Text-to-speech Synthesis Technologies for Mandarin Chinese. IEEE Transactions on Speech and Audio Processing 10(7), 481–494 (2002)
Article Google Scholar
Lin, C.-Y., Jao, P.-C., Roger Jang, J.-S.: Effective Initial/Final Duration Prediction Method for Corpus-based Singing Voice Synthesis of Mandarin Chinese. In: Europen Conference on Speech Communication and Technology, Eurospeech (2007)
Google Scholar
Jang, J.S., Gao, M.Y.: A Query-by-singing System Based on Dynamic Programming. In: Proc. International Workshop on Intelligent Systems Resolutions, pp. 85–89 (2000)
Google Scholar
Jang, J.S., Lee, H.R.: Hierarchical Filtering Method for Content-based MusicRetrieval via Acoustic Input. In: Proc. ACM Multimedia, pp. 401–410 (2001)
Google Scholar
Lin, C.-Y., Roger Jang, J.-S.: Automatic Phonetic Segmentation by Using a SPM-based Approach for a Mandarin Singing Voice Corpus. In: Proc. ICSLP, pp. 1489–1492 (2006)
Google Scholar
Macon, M.W., Clements, M.A.: Speech Concatenation and Synthesis Using an Overlap-add Sinusoidal Model. In: Proc. of International Conference on Acoustics, Speech, and Signal Processing, pp. 361–364 (1996)
Google Scholar
Proakis, J.R.J.G., Hansen, J.H.L.: Discrete-time processing of speech signals. Macmillan Pub. Co., New York (1993)
Google Scholar
Jang, J.-S.R., Jang, Y.-S.: Micro-controller Implementation of Melody Recognition. In: The 13th ACM Multimedia Conference (Demo paper), Berkeley, CA, USA (November 2003)
Google Scholar
Lin, C.-Y., Roger Jang, J.-S., Hwang, S.-H.: An On-the-fly Mandarin Singing Voice Synthesis System. In: The Third IEEE Pacific-Rim Conference on Multimedia, Hsinchu, Taiwan, D
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical Engineering, National Chiao-Tung University,
Hong-Ru Lee
Department of Information Communication, Yuan Ze University,
Chih-Fang Huang
Innovative DigiTech-Enabled Applications & Services Institute, 8F., No.133, Sec. 4, Minsheng E. RD., Taipei City, 105, Taiwan
Chih-Hao Hsu & Wen-Nan Wang

Authors

Hong-Ru Lee
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Fang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Hao Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Nan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Naresuan University, 65000, Phisanulok, Thailand
Paisarn Muneesawang
Microsoft Research Asia, 100109, Beijing, China
Feng Wu
Tokyo Institute of Technology, 226-8503, Yokohama, Japan
Itsuo Kumazawa
Mahanakorn University of Technology, 10530, Bankok, Thailand
Athikom Roeksabutr
Institute of Information Science, Academia Sinica, Taipei, Taiwan
Mark Liao
Chinese University of Hong Kong, Shatin, N.T., Hong Kong,
Xiaoou Tang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, HR., Huang, CF., Hsu, CH., Wang, WN. (2009). Rhythm Speech Lyrics Input for MIDI-Based Singing Voice Synthesis. In: Muneesawang, P., Wu, F., Kumazawa, I., Roeksabutr, A., Liao, M., Tang, X. (eds) Advances in Multimedia Information Processing - PCM 2009. PCM 2009. Lecture Notes in Computer Science, vol 5879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10467-1_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-10467-1_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10466-4
Online ISBN: 978-3-642-10467-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics