Abstract
The computation required for sensing and processing perceptual information can impose significant burdens on personal computer systems. We explore several policies for selective perception in SEER, a multimodal system for recognizing office activity that relies on a cascade of Hidden Markov Models (HMMs) named Layered Hidden Markov Model (LHMMs). We use LHMMs to diagnose states of a user’s activity based on real-time streams of evidence from video, audio and computer (keyboard and mouse) interactions. We review our efforts to employ expected-value-of-information (EVI) to limit sensing and analysis in a context-sensitive manner. We discuss an implementation of a greedy EVI analysis and compare the results of using this analysis with a heuristic sensing policy that makes observations at different frequencies. Both policies are then compared to a random perception policy, where sensors are selected at random. Finally, we discuss the sensitivity of ideal perceptual actions to preferences encoded in utility models about information value and the cost of sensing.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
@ACM,2003. This is a minor revision of the work published in the 5th International Conference on Multimodal Interfaces, 2003, pp. 36–43. http://doi.acm.org/10.1145/958432.958442.
@ELSEVIER,2004. Parts of the paper have been reprinted from “Layered Representations for Learning and Inferring Office Activity from Multiple Sensory Channels”, published in the Computer Vision and Image Understanding Journal, Volume 96, N. 2, 2004, pp. 163–180, Oliver et al.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Oliver, N., Horvitz, E., Garg, A.: Layered representations for human activity recognition. In: Proc. of Int. Conf. on Multimodal Interfaces, pp. 3–8 (2002)
Rabiner, L., Huang, B.: Fundamentals of Speech Recognition (1993)
Starner, T., Pentland, A.: Real-time american sign language recognition from video using hidden markov models. In: Proceed. of SCV 1995, pp. 265–270 (1995)
Wilson, A., Bobick, A.: Recognition and interpretation of parametric gesture. In: Proc. of International Conference on Computer Vision, ICCV 1998, pp. 329–336 (1998)
Brand, M., Kettnaker, V.: Discovery and segmentation of activities in video. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8) (2000)
Galata, A., Johnson, N., Hogg, D.: Learning variable length markov models of behaviour. International Journal on Computer Vision, IJCV, 398–413 (2001)
Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: Proc. of CVPR 1997, pp. 994–999 (1996)
Hongeng, S., Bremond, F., Nevatia, R.: Representation and optimal recognition of human activities. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2000 (2000)
Ivanov, Y., Bobick, A.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. on Pattern Analysis and Machine Intelligence, TPAMI 22(8), 852–872 (2000)
Madabhushi, A., Aggarwal, J.: A bayesian approach to human activity recognition. In: Proc. of the 2nd International Workshop on Visual Surveillance, pp. 25–30 (1999)
Hoey, J.: Hierarchical unsupervised learning of facial expression categories. In: Proc. ICCV Workshop on Detection and Recognition of Events in Video, Vancouver, Canada (2001)
Fernyhough, J., Cohn, A., Hogg, D.: Building qualitative event models automatically from visual input. In: ICCV 1998, pp. 350–355 (1998)
Buxton, H., Gong, S.: Advanced Visual Surveillance using Bayesian Networks. In: International Conference on Computer Vision, Cambridge, Massachusetts, pp. 111–123 (1995)
Intille, S.S., Bobick, A.F.: A framework for recognizing multi-agent action from visual evidence. In: AAAI/IAAI 1999, pp. 518–525 (1999)
Forbes, J., Huang, T., Kanazawa, K., Russell, S.: The batmobile: Towards a bayesian automated taxi. In: Proc. Fourteenth International Joint Conference on Artificial Intelligence, IJCAI 1995 (1995)
Bolles, R.: Verification vision for programmable assembly. In: Proc. IJCAI 1977, pp. 569–575 (1977)
Garvey, J.: Perceptual strategies for purposive vision. Technical Report 117, SRI International (1976)
Feldman, J., Sproull, R.: Decision theory and artificial intelligence ii: The hungry monkey. Cognitive Science 1, 158–192 (1977)
Wu, H., Cameron, A.: A bayesian decision theoretic approach for adaptive goal-directed sensing. ICCV 90, 563–567 (1990)
Rimey, R.D.: Control of selective perception using bayes nets and decision theory. Technical Report TR468 (1993)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceed. of the IEEE 77, 257–286 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oliver, N., Horvitz, E. (2005). S-SEER: Selective Perception in a Multimodal Office Activity Recognition System . In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-30568-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24509-4
Online ISBN: 978-3-540-30568-2
eBook Packages: Computer ScienceComputer Science (R0)