Skip to main content

A Study on Tailor-Made Speech Synthesis Based on Deep Neural Networks

  • Conference paper
  • First Online:
Advances in Intelligent Information Hiding and Multimedia Signal Processing

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 63))

Abstract

We propose “tailor-made speech synthesis,” the speech synthesis technique which enables users to control the synthetic speech naturally and intuitively. As a first step to realizing tailor-made speech synthesis, we introduce F0 context into speaker model training of speech synthesis based on deep neural networks (DNNs). F0 context represents relative log F0 at the mora or the accent-phrase level of training data. It allows users to control the F0 of synthetic speech steplessly on the contrary to conventional F0 context in HMM-based technique. Experiments showed that F0 context was effective to control the F0 because the F0 of synthetic voice followed the value of F0 context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apple Inc.: iOS - Siri - Apple, http://www.apple.com/ios/siri/

  2. Google Inc.: Google Now, https://www.google.com/search/about/learn-more/now/

  3. Kawahara, H., Masuda-Katsuse, I., de Cheveigné, A.: Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Communication 27(3–4), 187–207 (1999)

    Google Scholar 

  4. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  5. Maeno, Y., Nose, T., Kobayashi, T., Koriyama, T., Ijima, Y., Nakajima, H., Mizuno, H., Yoshioka, O.: Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis. Speech Communication 57, 144–154 (2014)

    Google Scholar 

  6. Nishigaki, Y., Takamichi, S., Toda, T., Neubig, G., Sakti, S., Nakamura, S.: Prosody-controllable HMM-based speech synthesis using speech input. In: Proc. MLSLP (2015)

    Google Scholar 

  7. Nose, T., Yamagishi, J., Masuko, T., Kobayashi, T.: A style control technique for HMM-based expressive speech synthesis. IEICE Trans. Inf. & Syst. E90-D(9), 1406–1413 (2007)

    Google Scholar 

  8. Watts, O., Wu, Z., King, S.: Sentence-level control vectors for deep neural network speech synthesis. In: Proc. Interspeech. pp. 2217–2221 (2015)

    Google Scholar 

  9. Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: Proc. ICASSP. pp. 7962–7966 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuhei Yamada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Yamada, S., Nose, T., Ito, A. (2017). A Study on Tailor-Made Speech Synthesis Based on Deep Neural Networks. In: Pan, JS., Tsai, PW., Huang, HC. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. Smart Innovation, Systems and Technologies, vol 63. Springer, Cham. https://doi.org/10.1007/978-3-319-50209-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50209-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50208-3

  • Online ISBN: 978-3-319-50209-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics