Copyright © 2005 The Institute of Electronics, Information and Communication Engineers
Special Section on Multi-channel Acoustic Signal Processing -- Papers -- Speech Enhancement |
Harmonicity Based Dereverberation for Improving Automatic Speech Recognition Performance and Speech Intelligibility
1 The authors are with the NTT Communication Science Laboratories, NTT Corporation, Kyoto-fu, 619-0237 Japan. E-mail: kinoshita{at}cslab.kecl.ntt.co.jp, nak{at}cslab.kecl.ntt.co.jp, miyo{at}cslab.kecl.ntt.co.jp
A speech signal captured by a distant microphone is generally smeared by reverberation, which severely degrades both the speech intelligibility and Automatic Speech Recognition (ASR) performance. Previously, we proposed a single-microphone dereverberation method, named "Harmonicity based dEReverBeration (HERB)." HERB estimates the inverse filter for an unknown room transfer function by utilizing an essential feature of speech, namely harmonic structure. In previous studies, improvements in speech intelligibility was shown solely with spectrograms, and improvements in ASR performance were simply confirmed by matched condition acoustic model. In this paper, we undertook a further investigation of HERB's potential as regards to the above two factors. First, we examined speech intelligibility by means of objective indices. As a result, we found that HERB is capable of improving the speech intelligibility to approximately that of clean speech. Second, since HERB alone could not improve the ASR performance sufficiently, we further analyzed the HERB mechanism with a view to achieving further improvements. Taking the analysis results into account, we proposed an appropriate ASR configuration and conducted experiments. Experimental results confirmed that, if HERB is used with an ASR adaptation scheme such as MLLR and a multicondition acoustic model, it is very effective for improving ASR performance even in unknown severely reverberant environments.
Key Words: dereverberation, speech harmonicity, automatic speech recognition, speech intelligibility
Manuscript received October 29, 2004. Manuscript revised January 20, 2005. Final manuscript received March 11, 2005.