Abstract
The Bag-of-Features principle proved successful in many pattern recognition tasks ranging from document analysis and image classification to gesture recognition and even forensic applications. Lately these methods emerged in the field of acoustic event detection and showed very promising results. The detection and classification of acoustic events is an important task for many practical applications like video understanding, surveillance or speech enhancement. In this paper a novel approach for online acoustic event detection is presented that builds on top of the Bag-of-Features principle. Features are calculated for all frames in a given window. Applying the concept of feature augmentation additional temporal information is encoded in each feature vector. These feature vectors are then softly quantized so that a Bag-of-Feature representation is computed. These representations are evaluated by a classifier in a sliding window approach. The experiments on a challenging indoor dataset of acoustic events will show that the proposed method yields state-of-the-art results compared to other online event detection methods. Furthermore, it will be shown that the temporal feature augmentation significantly improves the recognition rates.
Similar content being viewed by others
Notes
- 1.
A video of the proposed method applied in our lab can be found at: https://vimeo.com/134489154 .
References
Aucouturier, J.J., Defreville, B., Pachet, F.: The Bag-of-Frames Approach to Audio Pattern Recognition: A Sufficient Model for Urban Soundscapes but Not for Polyphonic Music. J. Acoust. Soc. Am. 122(2), 881–891 (2007)
Carletti, V., Foggia, P., Percannella, G., Saggese, A., Strisciuglio, N., Vento, M.: Audio Surveillance using a Bag of Aural Words Classifier. In: 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 81–86. IEEE (2013)
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Proceeding British Machine Vision Conference (BMVC) (2011)
Fink, G.A.: Markov Models for Pattern Recognition. From Theory to Applications. Advances in Computer Vision and Pattern Recognition, 2nd edn. Springer, London (2014)
Foggia, P., Saggese, A., Strisciuglio, N., Vento, M.: Cascade classifiers trained on Gammatonegrams for reliably detecting Audio Events. In: 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 50–55. IEEE (2014)
Giannoulis, D., Benetos, E., Stowell, D., Rossignol, M., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events: an IEEE AASP challenge. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1–4. IEEE (2013)
Good, P.: Permutation Tests - A Practical Guide to Resampling Methods for Testing Hypotheses. Springer Series in Statistics, 2nd edn. Springer, New York (2000)
Grzeszick, R., Rothacker, L., Fink, G.A.: Bag-of-Features Representations using Spatial Visual Vocabularies for Object Classification. In: Proceeding International Conference on Image Processing (ICIP) (2013)
Jiang, Y.G., Bhattacharya, S., Chang, S.F., Shah, M.: High-level event recognition in unconstrained videos. Int. J. Multimedia Inf. Retrieval 2(2), 73–101 (2013)
Klinck, H., Stelzer, K., Jafarmadar, K., Mellinger, D.K.: AAS Endurance: An Autonomous Acoustic Sailboat for Marine Mammal Research. In: International Robotic Sailing Conference (2009)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceeding IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 2169–2178 (2006)
Nogueira, W., Roma, G., Herrera, P.: Automatic Event Classification using Front End Single Channel Noise Reduction, MFCC Features and a Support Vector Machine Classifier. Technical report, IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events (2013). http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge/abstracts/OL/NR2.pdf
Pancoast, S., Akbacak, M.: Bag-of-audio-words approach for multimedia event classification. In: Interspeech, pp. 2105–2108 (2012)
Phan, H., Maasz, M., Mazur, R., Mertins, A.: Random regression forests for acoustic event detection and classification. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 20–31 (2014). http://ieeexplore.ieee.org/articleDetails.jsp?arnumber=6949625
Phan, H., Mertins, A.: Exploiting superframe cooccurence for acoustic event recognition. In: European Signal Processing Conference (2014)
Plinge, A., Grzeszick, R., Fink, G.A.: A bag-of-features approach to acoustic event detection. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2014)
Sánchez, J., Perronnin, F., De Campos, T.: Modeling the spatial layout of images beyond spatial pyramids. Pattern Recogn. Lett. 33(16), 2216–2223 (2012)
Schröder, J., Cauchi, B., Schädler, M.R., Moritz, N., Adiloglu, K., Anemüller, J., Doclo, S., Kollmeier, B., Goetze, S.: Acoustic event detection using signal enhancement and spectro-temporal feature extraction. Technical report, IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events (2013). http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge/abstracts/OL/SCS.pdf
Shao, Y., Srinivasan, S., Wang, D.: Incorporating auditory feature uncertainties in robust speaker identification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 277–280 (2007)
Shivappa, S.T., Trivedi, M.M., Rao, B.D.: Audiovisual information fusion in human computer interfaces and intelligent environments: a survey. Proc. IEEE 98(10), 1692–1715 (2010)
Steele, D., Krijnders, J.D., Guastavino, C.: The Sensor City Initiative: Cognitive Sensors for Soundscape Transformations. GIS Ostrava (2013)
Tang, H., Chu, S.M., Hasegawa-Johnson, M., Huang, T.S.: Partially supervised speaker clustering. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 959–971 (2012)
Temko, A., Malkin, R.G., Zieger, C., Macho, D., Nadeu, C., Omologo, M.: CLEAR evaluation of acoustic event detection and classification systems. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 311–322. Springer, Heidelberg (2007)
Vuegen, L., Broeck, B.V.D., Karsmakers, P., Gemmeke, J.F., Vanrumste, B., Hamme, H.V.: An MFCC-GMM approach for event detection and classification. Technical report, IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events (2013). http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge/abstracts/OL/VVK.pdf
Wang, D., Brown, G.J. (eds.): Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. IEEE Press (2006)
Young, S.H., Scanlon, M.V.: Robotic vehicle uses acoustic array for detection and localization in Urban environments. in: SPIE Proceeding Mobile Robot Perception, vol. 4364, pp. 264–273 (2001)
Zeppelzauer, M., Stöger, A.S., Breiteneder, C.: Acoustic detection of elephant presence in noisy environments. In: Proceedings of the 2nd ACM international workshop on Multimedia analysis for ecological data, pp. 3–8. ACM (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Grzeszick, R., Plinge, A., Fink, G.A. (2015). Temporal Acoustic Words for Online Acoustic Event Detection. In: Gall, J., Gehler, P., Leibe, B. (eds) Pattern Recognition. DAGM 2015. Lecture Notes in Computer Science(), vol 9358. Springer, Cham. https://doi.org/10.1007/978-3-319-24947-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-24947-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24946-9
Online ISBN: 978-3-319-24947-6
eBook Packages: Computer ScienceComputer Science (R0)