ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Perceptual Improvement of Deep Neural Network (DNN) Speech Coder Using Parametric and Non-parametric Density Models

Joon Byun, Seungmin Shin, Jongmo Sung, Seungkwon Beack, Youngcheol Park

This paper proposes a method to improve the perceptual quality of an end-to-end neural speech coder using density models for bottleneck samples. Two parametric and non-parametric approaches are explored for modeling the bottleneck sample density. The first approach utilizes a sub-network to generate mean-scale hyperpriors for bottleneck samples, while the second approach models the bottleneck samples using a separate sub-network without any side information. The whole network, including the sub-network, is trained using PAM-based perceptual losses in different timescales to shape quantization noise below the masking threshold. The proposed method achieves a frame-dependent entropy model that enhances arithmetic coding efficiency while emphasizing perceptually relevant audio cues. Experimental results show that the proposed density model combined with PAM-based losses improves perceptual quality compared to conventional speech coders in both objective and subjective tests.


doi: 10.21437/Interspeech.2023-2305

Cite as: Byun, J., Shin, S., Sung, J., Beack, S., Park, Y. (2023) Perceptual Improvement of Deep Neural Network (DNN) Speech Coder Using Parametric and Non-parametric Density Models. Proc. INTERSPEECH 2023, 859-863, doi: 10.21437/Interspeech.2023-2305

@inproceedings{byun23_interspeech,
  author={Joon Byun and Seungmin Shin and Jongmo Sung and Seungkwon Beack and Youngcheol Park},
  title={{Perceptual Improvement of Deep Neural Network (DNN) Speech Coder Using Parametric and Non-parametric Density Models}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={859--863},
  doi={10.21437/Interspeech.2023-2305}
}