Sparse Representation with Optimized Learned Dictionary for Robust Voice Activity Detection

You, Datao; Han, Jiqing; Zheng, Guibin; Zheng, Tieran; Li, Jie

doi:10.1007/s00034-014-9748-y

Sparse Representation with Optimized Learned Dictionary for Robust Voice Activity Detection

Published: 18 February 2014

Volume 33, pages 2267–2291, (2014)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Datao You^1,2,
Jiqing Han¹,
Guibin Zheng¹,
Tieran Zheng¹ &
…
Jie Li²

338 Accesses
8 Citations
Explore all metrics

Abstract

Traditionally, most of voice activity detection (VAD) methods are based on speech features such as spectrum, temporal energy, and periodicity. The robustness of these features plays a critical role on the performance of VAD. However, since these features are always directly generated from observed signal, the robustness of these features would be significantly degraded in non-stationary noise environments, especially at low level signal-to-noise ratio (SNR) condition. This paper proposes a kind of robust feature for VAD based on sparse representation with an optimized learned dictionary. To do so, a speech dictionary and a noise dictionary are first learned from speech corpus and noise corpus, respectively. Then an optimization algorithm is designed to reduce the mutual coherence between the two learned dictionaries. After that the proposed feature is generated from the optimized dictionary-based sparse representation, and a VAD method is derived from the proposed feature. The proposed method is evaluated over seven types of noise and four types of SNR level, experimental results show that the optimized dictionary is important for enhancing the robustness of the proposed method, and the proposed method performs well under non-stationary noise, especially at low level SNR condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A voice activity detection algorithm in spectro-temporal domain using sparse representation

Article 01 August 2018

Mohadese Eshaghi, Farbod Razzazi & Alireza Behrad

Robust Voice Activity Detection Using the Combination of Short-Term and Long-Term Spectral Patterns

Performance Analysis of Voice Activity Detector in Presence of Non-stationary Noise

References

M. Aharon, M. Elad, A.M. Bruckstein, The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representations. IEEE Trans. Signal Process. 54, 4311–4322 (2006)
Article Google Scholar
F. Beritelli, S. Casale, G. Rugeri, Performance evaluation and comparison of G.729/AMR/Fuzzy voice activity detectors. IEEE Signal Process. Lett. 9(3), 85–88 (2002)
Article Google Scholar
D. Bertsekas, Nonlinear Programming (Athena Scientific, Belmont, 1999)
MATH Google Scholar
C. Breithaupt, T. Gerkmann, and R. Martin, A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing, in IEEE International Conference Acoustics, Speech, and Signal Processing, 2008, pp. 4897–4900
E.J. Candes, J. Romberg, T. Tao, Robust uncertianty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Article MATH MathSciNet Google Scholar
J.H. Chang, N.S. Kim, Voice activity detection based on complex Laplacianmodel. IEE Electron. Lett. 39(7), 632–634 (2003)
Article Google Scholar
S.H. Chen, H.T. Wu, Y.K. Chang, Teager energy based feature parameters for speech recognition in car noise. Pattern Recogn. Lett. 28, 1327–1332 (2007)
Article Google Scholar
S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1999)
Article MATH MathSciNet Google Scholar
Y.D. Cho, A. Kondoz, Analysis and improvement of a statistical model-based voice activity detector. IEEE Signal Process. Lett. 8(10), 276–278 (2001)
Google Scholar
Y.D. Cho, K.A. Naimi, A. Kondoz, Improved voice activity detection based on a smoothed statistical likelihood ratio, in IEEE International Conference Acoustics, Speech, and Signal Processing, vol. 2, 2001, pp. 7–11
D.S. Christian, D. Tomas, M.B. Joachim, Speech enhancement with sparse coding learned dictionaries, in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 20, 2010, pp. 4758–4761
A. Craciun, M. Gabrea, Correlation coefficient-based voice activity detector algorithm, in Proc. Can. Conf. Elect. Comput. Eng., vol 3, 2004, pp. 1789–1792
A. Davis, S. Nordholm, R. Togneri, Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold. IEEE Trans. Audio Speech Lang. Process. 14(2), 412–424 (2006)
Article Google Scholar
D. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet Google Scholar
B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article MATH MathSciNet Google Scholar
D. Enqing, Z. Heming, L. Yongli, Lowbit and variable rate speech coding using local cosine transform. Proc. TENCON 1, 423–426 (2002)
Google Scholar
A. Fazel, S. Chakrabartty, An overview of statistical pattern recognition techniques for speaker verification. IEEE Circuits Syst. Mag. 11(2), 62–81 (2011)
Article Google Scholar
A. Fazel, S. Chakrabartty, Sparse auditory reproducing kernel (SPARK) features for noise-robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 4 (2012)
Article Google Scholar
D.K. Freeman, C.B. Southcott, I. Boyd, and G. Cosier, A voice activity detector for pan-European digital cellular mobile telephone service, in IEEE International Conference on Acoustics, Speech, and Signal Processing, Glasgow, U.K., vol. 1, 1989, pp. 369–372
M. Fujimoto, K. Ishizuka, T. Nakatani, A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2008, pp. 4441–4444
R. Fulchiero, A. Spanias, Speech enhancement using the bispectrum, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993, pp. 488–491
P.K. Ghosh, A. Tsiartas, S. Narayanan, Robust voice activity detection using long-term signal variability. IEEE Trans. Audio Speech Lang. Process. 19(3), 600–613 (2011)
Article Google Scholar
J.A. Haigh, and J.S. Mason, Robust voice activity detection using cepstral feature, in IEEE TELCON, 1993, pp. 321–324
Y. Hu, P. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49, 588–601 (2007)
Article Google Scholar
Y. Hu, P. Loizou, A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Trans. Speech Audio Process. 11, 334–341 (2003)
Article Google Scholar
K. Itoh, M. Mizushima, Environmental noise reduction based on speech/non-speech identification for hearing aids, in Proc. Int. Conf. Acoust., Speech. Signal Process., vol. 1, 1997, pp. 419–422
F.G. Jort, H. Antti, V. Tuomas, and S. Yang, Toward a practical implementation of exemplar-based noise robust ASR, in 19th European Signal Processing Conference, 2011, pp. 1490–1494
M. Julien, P. Jean, and S. Guillermo, Online dictionary learning for sparse coding, in Proc. 26th ICML, 2009
B. Kotnik, Z. Kacic, and B. Horvat, A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm, in Proc. 7th Eurospeech, Aalborg, Denmark, 2001, pp. 197–200
T. Kristjansson, S. Deligne, P. Olsen, Voicing features for robust speech detec-tion, in Proc. Interspeech, 2005, pp. 369–372
K. Li, M.N.S. Swamy, M.O. Ahmad, An improved voice activity detection using higher order statistics. IEEE Trans. Speech Audio Process. 13(5), 956–974 (2005)
Article Google Scholar
J. Ma, Y. Hu, P. Loizou, Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. J. Acoust. Soc. Am. 125(5), 3387–3405 (2009)
Article Google Scholar
S. Mallat, A Wavelet Tour of Signal Processing, the Sparse Way (Academic Press, Burlington, 2009)
MATH Google Scholar
E. Nemer, R. Goubran, S. Mahmoud, Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans. Speech Audio Process. 9(3), 217–231 (2001)
Article Google Scholar
R. Padmanabhan, P.S.H. Krishnan, and H.A. Murthy, A pattern recognition approach to VAD using modified group delay, in Proc. 14th National Conf. Commun., IIT Bombay, 2008, pp. 432–437
R. Prasad, H. Saruwatari, K. Shikano, Noise estimation using negentropy based voice-activity detector, in Proc. 47th Midwest Symp. Circuits Syst., vol. 2, 2004, pp. 149–152
J. Ramirez, J.C. Segura, C. Benitez, An effective subband OSF-based vad with noise reduction for robust speech recognition. IEEE Trans. Speech Audio Process. 13(6), 1119–1129 (2005)
Article Google Scholar
J. Ramirez, J.C. Segura, C. Benitez, A. Torre, A. Rubio, Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 42, 271–287 (2004)
Article Google Scholar
J. Ramirez, J.C. Segura, C. Benitez, L. Garcia, and A. Rubio, Statistical voice detection using a multiple observation likelihood ratio test, in IEEE Signal Processing Letters, vol. 12, no. 10, 2005.
A. Sangwan, M.C. Chiranth, H.S. Jamadagni, R. Sah, R.V. Prasad, and V. Gaurav, VAD techniques for real-time speech transmission on the Internet, in Proc. IEEE Int. Conf. High-Speech Netw. Multimedia Commun., 2002, pp. 365–368
J.W. Shin, H.J. Kwon, S.H. Jin and N.S. Kim, Voice activity detection based on conditional MAP criterion, in IEEE Signal Processing Letters, vol. 15, 2008
E.C. Smith, M.S. Lewicki, Efficient auditory coding. Nature 439, 978–982 (2006)
Article Google Scholar
J. Sohn, N.S. Kim, W.A. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)
Article Google Scholar
S.A. Soleimani and S.M. Ahadi, Voice activity detection based on combination of multiple features using linear/kernel discriminant analyses, in Proc. 3rd Int. Conf. Inf. Commun. Technol.: from theory to applicat., 2008, pp. 1–5
R. Tibshirani, Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Series B 67, 267–288 (1996)
MathSciNet Google Scholar
A. Varga, H.J.M. Steenken, Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Article Google Scholar
R. Vipperla, J.T. Geiger, S. Bozonnet, D. Wang, Nicholas Evans, Bjorn Schuller, Gerhard Rigoll, Speech overlap detection and attribution using convolutive non-negative sparse coding, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2012, pp. 4181–4184
D. Vlaj, B. Kotnik, B. Horvat, Z. Kacic, A computationally efficient mel-flter bank VAD algorithm for distributed speech recognition systems. EURASIP J. Appl. Signal Process. 4, 487–497 (2005)
Article Google Scholar
B. Yegnanarayana, C. Alessandro, V. Darsinos, An iterative algorithm for decompo-sition of speech signals into periodic and aperiodic components. IEEE Trans. Speech Audio Process. 6(1), 1–11 (1998)
Article Google Scholar
D.T. You, J.Q. Han, G.B. Zheng, and T.R. Zheng, Sparse power spectrum based robust voice activity detector, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2012, pp. 289–292

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No. 91120303, No. 91220301, and No. 61170243, and the Ph. D Programs Foundation of Ministry of Education of China (No. 20112302110042).

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, 92 West Dazhi Street, Nan Gang District, Harbin , 150001, China
Datao You, Jiqing Han, Guibin Zheng & Tieran Zheng
Henan University, Minglun Street, Kaifeng , 475001, China
Datao You & Jie Li

Authors

Datao You
View author publications
You can also search for this author in PubMed Google Scholar
Jiqing Han
View author publications
You can also search for this author in PubMed Google Scholar
Guibin Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Tieran Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jie Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Datao You.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, D., Han, J., Zheng, G. et al. Sparse Representation with Optimized Learned Dictionary for Robust Voice Activity Detection. Circuits Syst Signal Process 33, 2267–2291 (2014). https://doi.org/10.1007/s00034-014-9748-y

Download citation

Received: 18 September 2012
Revised: 16 January 2014
Published: 18 February 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s00034-014-9748-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Representation with Optimized Learned Dictionary for Robust Voice Activity Detection

Abstract

Access this article

Similar content being viewed by others

A voice activity detection algorithm in spectro-temporal domain using sparse representation

Robust Voice Activity Detection Using the Combination of Short-Term and Long-Term Spectral Patterns

Performance Analysis of Voice Activity Detector in Presence of Non-stationary Noise

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse Representation with Optimized Learned Dictionary for Robust Voice Activity Detection

Abstract

Access this article

Similar content being viewed by others

A voice activity detection algorithm in spectro-temporal domain using sparse representation

Robust Voice Activity Detection Using the Combination of Short-Term and Long-Term Spectral Patterns

Performance Analysis of Voice Activity Detector in Presence of Non-stationary Noise

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation