Skip to main content
Log in

Sparse Representation with Optimized Learned Dictionary for Robust Voice Activity Detection

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Traditionally, most of voice activity detection (VAD) methods are based on speech features such as spectrum, temporal energy, and periodicity. The robustness of these features plays a critical role on the performance of VAD. However, since these features are always directly generated from observed signal, the robustness of these features would be significantly degraded in non-stationary noise environments, especially at low level signal-to-noise ratio (SNR) condition. This paper proposes a kind of robust feature for VAD based on sparse representation with an optimized learned dictionary. To do so, a speech dictionary and a noise dictionary are first learned from speech corpus and noise corpus, respectively. Then an optimization algorithm is designed to reduce the mutual coherence between the two learned dictionaries. After that the proposed feature is generated from the optimized dictionary-based sparse representation, and a VAD method is derived from the proposed feature. The proposed method is evaluated over seven types of noise and four types of SNR level, experimental results show that the optimized dictionary is important for enhancing the robustness of the proposed method, and the proposed method performs well under non-stationary noise, especially at low level SNR condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. M. Aharon, M. Elad, A.M. Bruckstein, The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representations. IEEE Trans. Signal Process. 54, 4311–4322 (2006)

    Article  Google Scholar 

  2. F. Beritelli, S. Casale, G. Rugeri, Performance evaluation and comparison of G.729/AMR/Fuzzy voice activity detectors. IEEE Signal Process. Lett. 9(3), 85–88 (2002)

    Article  Google Scholar 

  3. D. Bertsekas, Nonlinear Programming (Athena Scientific, Belmont, 1999)

    MATH  Google Scholar 

  4. C. Breithaupt, T. Gerkmann, and R. Martin, A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing, in IEEE International Conference Acoustics, Speech, and Signal Processing, 2008, pp. 4897–4900

  5. E.J. Candes, J. Romberg, T. Tao, Robust uncertianty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  6. J.H. Chang, N.S. Kim, Voice activity detection based on complex Laplacianmodel. IEE Electron. Lett. 39(7), 632–634 (2003)

    Article  Google Scholar 

  7. S.H. Chen, H.T. Wu, Y.K. Chang, Teager energy based feature parameters for speech recognition in car noise. Pattern Recogn. Lett. 28, 1327–1332 (2007)

    Article  Google Scholar 

  8. S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  9. Y.D. Cho, A. Kondoz, Analysis and improvement of a statistical model-based voice activity detector. IEEE Signal Process. Lett. 8(10), 276–278 (2001)

    Google Scholar 

  10. Y.D. Cho, K.A. Naimi, A. Kondoz, Improved voice activity detection based on a smoothed statistical likelihood ratio, in IEEE International Conference Acoustics, Speech, and Signal Processing, vol. 2, 2001, pp. 7–11

  11. D.S. Christian, D. Tomas, M.B. Joachim, Speech enhancement with sparse coding learned dictionaries, in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 20, 2010, pp. 4758–4761

  12. A. Craciun, M. Gabrea, Correlation coefficient-based voice activity detector algorithm, in Proc. Can. Conf. Elect. Comput. Eng., vol 3, 2004, pp. 1789–1792

  13. A. Davis, S. Nordholm, R. Togneri, Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold. IEEE Trans. Audio Speech Lang. Process. 14(2), 412–424 (2006)

    Article  Google Scholar 

  14. D. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  Google Scholar 

  15. B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  16. D. Enqing, Z. Heming, L. Yongli, Lowbit and variable rate speech coding using local cosine transform. Proc. TENCON 1, 423–426 (2002)

    Google Scholar 

  17. A. Fazel, S. Chakrabartty, An overview of statistical pattern recognition techniques for speaker verification. IEEE Circuits Syst. Mag. 11(2), 62–81 (2011)

    Article  Google Scholar 

  18. A. Fazel, S. Chakrabartty, Sparse auditory reproducing kernel (SPARK) features for noise-robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 4 (2012)

    Article  Google Scholar 

  19. D.K. Freeman, C.B. Southcott, I. Boyd, and G. Cosier, A voice activity detector for pan-European digital cellular mobile telephone service, in IEEE International Conference on Acoustics, Speech, and Signal Processing, Glasgow, U.K., vol. 1, 1989, pp. 369–372

  20. M. Fujimoto, K. Ishizuka, T. Nakatani, A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2008, pp. 4441–4444

  21. R. Fulchiero, A. Spanias, Speech enhancement using the bispectrum, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993, pp. 488–491

  22. P.K. Ghosh, A. Tsiartas, S. Narayanan, Robust voice activity detection using long-term signal variability. IEEE Trans. Audio Speech Lang. Process. 19(3), 600–613 (2011)

    Article  Google Scholar 

  23. J.A. Haigh, and J.S. Mason, Robust voice activity detection using cepstral feature, in IEEE TELCON, 1993, pp. 321–324

  24. Y. Hu, P. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49, 588–601 (2007)

    Article  Google Scholar 

  25. Y. Hu, P. Loizou, A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Trans. Speech Audio Process. 11, 334–341 (2003)

    Article  Google Scholar 

  26. K. Itoh, M. Mizushima, Environmental noise reduction based on speech/non-speech identification for hearing aids, in Proc. Int. Conf. Acoust., Speech. Signal Process., vol. 1, 1997, pp. 419–422

  27. F.G. Jort, H. Antti, V. Tuomas, and S. Yang, Toward a practical implementation of exemplar-based noise robust ASR, in 19th European Signal Processing Conference, 2011, pp. 1490–1494

  28. M. Julien, P. Jean, and S. Guillermo, Online dictionary learning for sparse coding, in Proc. 26th ICML, 2009

  29. B. Kotnik, Z. Kacic, and B. Horvat, A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm, in Proc. 7th Eurospeech, Aalborg, Denmark, 2001, pp. 197–200

  30. T. Kristjansson, S. Deligne, P. Olsen, Voicing features for robust speech detec-tion, in Proc. Interspeech, 2005, pp. 369–372

  31. K. Li, M.N.S. Swamy, M.O. Ahmad, An improved voice activity detection using higher order statistics. IEEE Trans. Speech Audio Process. 13(5), 956–974 (2005)

    Article  Google Scholar 

  32. J. Ma, Y. Hu, P. Loizou, Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. J. Acoust. Soc. Am. 125(5), 3387–3405 (2009)

    Article  Google Scholar 

  33. S. Mallat, A Wavelet Tour of Signal Processing, the Sparse Way (Academic Press, Burlington, 2009)

    MATH  Google Scholar 

  34. E. Nemer, R. Goubran, S. Mahmoud, Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans. Speech Audio Process. 9(3), 217–231 (2001)

    Article  Google Scholar 

  35. R. Padmanabhan, P.S.H. Krishnan, and H.A. Murthy, A pattern recognition approach to VAD using modified group delay, in Proc. 14th National Conf. Commun., IIT Bombay, 2008, pp. 432–437

  36. R. Prasad, H. Saruwatari, K. Shikano, Noise estimation using negentropy based voice-activity detector, in Proc. 47th Midwest Symp. Circuits Syst., vol. 2, 2004, pp. 149–152

  37. J. Ramirez, J.C. Segura, C. Benitez, An effective subband OSF-based vad with noise reduction for robust speech recognition. IEEE Trans. Speech Audio Process. 13(6), 1119–1129 (2005)

    Article  Google Scholar 

  38. J. Ramirez, J.C. Segura, C. Benitez, A. Torre, A. Rubio, Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 42, 271–287 (2004)

    Article  Google Scholar 

  39. J. Ramirez, J.C. Segura, C. Benitez, L. Garcia, and A. Rubio, Statistical voice detection using a multiple observation likelihood ratio test, in IEEE Signal Processing Letters, vol. 12, no. 10, 2005.

  40. A. Sangwan, M.C. Chiranth, H.S. Jamadagni, R. Sah, R.V. Prasad, and V. Gaurav, VAD techniques for real-time speech transmission on the Internet, in Proc. IEEE Int. Conf. High-Speech Netw. Multimedia Commun., 2002, pp. 365–368

  41. J.W. Shin, H.J. Kwon, S.H. Jin and N.S. Kim, Voice activity detection based on conditional MAP criterion, in IEEE Signal Processing Letters, vol. 15, 2008

  42. E.C. Smith, M.S. Lewicki, Efficient auditory coding. Nature 439, 978–982 (2006)

    Article  Google Scholar 

  43. J. Sohn, N.S. Kim, W.A. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)

    Article  Google Scholar 

  44. S.A. Soleimani and S.M. Ahadi, Voice activity detection based on combination of multiple features using linear/kernel discriminant analyses, in Proc. 3rd Int. Conf. Inf. Commun. Technol.: from theory to applicat., 2008, pp. 1–5

  45. R. Tibshirani, Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Series B 67, 267–288 (1996)

    MathSciNet  Google Scholar 

  46. A. Varga, H.J.M. Steenken, Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Article  Google Scholar 

  47. R. Vipperla, J.T. Geiger, S. Bozonnet, D. Wang, Nicholas Evans, Bjorn Schuller, Gerhard Rigoll, Speech overlap detection and attribution using convolutive non-negative sparse coding, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2012, pp. 4181–4184

  48. D. Vlaj, B. Kotnik, B. Horvat, Z. Kacic, A computationally efficient mel-flter bank VAD algorithm for distributed speech recognition systems. EURASIP J. Appl. Signal Process. 4, 487–497 (2005)

    Article  Google Scholar 

  49. B. Yegnanarayana, C. Alessandro, V. Darsinos, An iterative algorithm for decompo-sition of speech signals into periodic and aperiodic components. IEEE Trans. Speech Audio Process. 6(1), 1–11 (1998)

    Article  Google Scholar 

  50. D.T. You, J.Q. Han, G.B. Zheng, and T.R. Zheng, Sparse power spectrum based robust voice activity detector, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2012, pp. 289–292

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No. 91120303, No. 91220301, and No. 61170243, and the Ph. D Programs Foundation of Ministry of Education of China (No. 20112302110042).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Datao You.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, D., Han, J., Zheng, G. et al. Sparse Representation with Optimized Learned Dictionary for Robust Voice Activity Detection. Circuits Syst Signal Process 33, 2267–2291 (2014). https://doi.org/10.1007/s00034-014-9748-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-014-9748-y

Keywords

Navigation