GlottDNN — A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis

Airaksinen, Manu; Bollepalli, Bajibabu; Juvela, Lauri; Wu, Zhizheng; King, Simon; Alku, Paavo

doi:10.21437/Interspeech.2016-342

GlottDNN — A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis

Manu Airaksinen, Bajibabu Bollepalli, Lauri Juvela, Zhizheng Wu, Simon King, Paavo Alku

GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles of its predecessor, GlottHMM, but the new vocoder introduces three main improvements: GlottDNN (1) takes advantage of a new, more accurate glottal inverse filtering method, (2) uses a new method of deep neural network (DNN) -based glottal excitation generation, and (3) proposes a new approach of band-wise processing of full-band speech.

The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder. The results of the subjective listening test indicate that GlottDNN improves the TTS quality over the compared methods.

doi: 10.21437/Interspeech.2016-342

Cite as: Airaksinen, M., Bollepalli, B., Juvela, L., Wu, Z., King, S., Alku, P. (2016) GlottDNN — A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis. Proc. Interspeech 2016, 2473-2477, doi: 10.21437/Interspeech.2016-342

@inproceedings{airaksinen16b_interspeech,
  author={Manu Airaksinen and Bajibabu Bollepalli and Lauri Juvela and Zhizheng Wu and Simon King and Paavo Alku},
  title={{GlottDNN — A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={2473--2477},
  doi={10.21437/Interspeech.2016-342}
}