Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks

Gu, Yu; Ling, Zhen-Hua; Dai, Li-Rong

doi:10.21437/Interspeech.2016-678

Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks

Yu Gu, Zhen-Hua Ling, Li-Rong Dai

This paper presents a novel method for speech bandwidth extension (BWE) using deep structured neural networks. In order to utilize linguistic information during the prediction of high-frequency spectral components, the bottleneck (BN) features derived from a deep neural network (DNN)-based state classifier for narrowband speech are employed as auxiliary input. Furthermore, recurrent neural networks (RNNs) incorporating long short-term memory (LSTM) cells are adopted to model the complex mapping relationship between the feature sequences describing low-frequency and high-frequency spectra. Experimental results show that the BWE method proposed in this paper can achieve better performance than the conventional method based on Gaussian mixture models (GMMs) and the state-of-the-art approach based on DNNs in both objective and subjective tests.

doi: 10.21437/Interspeech.2016-678

Cite as: Gu, Y., Ling, Z.-H., Dai, L.-R. (2016) Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks. Proc. Interspeech 2016, 297-301, doi: 10.21437/Interspeech.2016-678

@inproceedings{gu16b_interspeech,
  author={Yu Gu and Zhen-Hua Ling and Li-Rong Dai},
  title={{Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={297--301},
  doi={10.21437/Interspeech.2016-678}
}