Abstract
Detection, tracking, and understanding of moving objects of interest in dynamic scenes have been active research areas in computer vision over the past decades. Intelligent visual surveillance (IVS) refers to an automated visual monitoring process that involves analysis and interpretation of object behaviors, as well as object detection and tracking, to understand the visual events of the scene. Main tasks of IVS include scene interpretation and wide area surveillance control. Scene interpretation aims at detecting and tracking moving objects in an image sequence and understanding their behaviors. In wide area surveillance control task, multiple cameras or agents are controlled in a cooperative manner to monitor tagged objects in motion. This paper reviews recent advances and future research directions of these tasks. This article consists of two parts: The first part surveys image enhancement, moving object detection and tracking, and motion behavior understanding. The second part reviews wide-area surveillance techniques based on the fusion of multiple visual sensors, camera calibration and cooperative camera systems.
Similar content being viewed by others
References
W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual surveillance of object motion and behaviors,” IEEE Trans. on Systems, Man, and Cybernetics — Part C, vol. 34, no. 3, pp. 334–352, August 2004.
R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto, O. Hasegawa, P. Burt, and L. Wixson, “A system for video surveillance and monitoring,” Carnegie Mellon University Technical Report, CMU-RI-TR-00-12, 2000.
N. T. Siebel and S. Maybank, “The advisor visual surveillance system.” Proc. of the ECCV Workshop on Applications of Computer Vision, pp. 103–111, 2004.
C. F. Shu, A. Hampapur, M. Lu, L. Brown, J. Connell, A. Senior, and Y. Tian, “IBM smart surveillance system (S3): an open and extensible framework for event based surveillance,” Proc. of IEEE Conf. Advanced Video and Signal Based Surveillance, pp. 318–323, 2005.
C. Regazzoni, V. Ramesh, and G. L. Foresti, “Special issue on video communications, processing, and understanding for third generation surveillance systems,” Proc. of the IEEE, vol. 89, no. 10, pp. 1355–1367, 2001.
V. Kober, “Robust and efficient algorithm of image enhancement,” IEEE Trans. on Consumer Electronics, vol. 52, no. 2, pp. 655–659, 2006.
M. J. Seow and V. K. Asrai, “Homomorphic processing system and ratio rule for color image enhancement,” Proc. of the IEEE International Conf. Neural Network, vol. 4, pp. 2507–2511, 2004.
T. C. Aysal and K. E. Barner, “Quadratic weighted median filters for edge enhancement of noisy images,” IEEE Trans. on Image Processing, vol. 15, no. 11, pp. 3294–3310, November 2006.
S. Thurnhofer and S. K. Mitra, “A general framework for quadratic Voltera filters for edge enhancement,” IEEE Trans. on Image Processing, vol. 5, no. 6, pp. 950–963, June 1996.
K. E. Barner and T. C. Aysal, “Polynomial weighted median filtering,” Proc. of IEEE International Conf. Acoustics Speech and Signal Processing, vol. 4, no. 4, pp. 153–156, 2006.
J. Duan and G. Qiu, “Novel histogram processing for color image enhancement,” Proc. of the 3rd International Conf. Image and Graphics, pp. 55–58, 2004.
Y. T. Kim, “Contrast enhancement using brightness preserving bi-histogram equalization,” IEEE Trans. on Consumer Electronics, vol. 43, no. 1, pp. 1–8, February 1997.
Y. Wang, Q. Chen, and B. M. Zhang, “Image Enhancement based on equal area dualistic subimage histogram equalization method,” IEEE Trans. on Consumer Electronincs, vol. 45, no. 1, pp. 68–75, February 1999.
S. D. Chen and A. R. Ramli, “Minimum mean brightness error bi-histogram equalization in contrast enhancement,” IEEE Trans. on Consumer Electronics, vol. 49, no. 4, pp. 1310–1319, November 2003.
C. Wang and Z. Ye, “Brightness preserving histogram equalization with maximum entropy: a variational perspective,” IEEE Trans. on Consumer. Electronics, vol. 51, no. 4, pp. 1326–1334, November 2005.
D. Y. Tsai, Y. B. Lee, M. Sekiya, S. Sakaguchi, and I. Yamada, “A method of medical image enhancement using wavelet analysis,” Proc. of International Conf. Signal Processing, vol. 1, pp. 723–726, 2002.
S. S. Agaian, K. Panetta, and A. M. Grigoryan, “Transform based image enhancement algorithms,” IEEE Trans. on Image Processing, vol. 10, no. 3, pp. 367–382, 2001.
S. Aghagolzadeh and O. K. Ersoy, “Transform image enhancement,” Optical Engineering, vol. 31, no. 3, pp. 614–626, 1992.
F. T. Arslan and A. M. Grigoryan, “Image enhancement by the tensor transform,” Proc. of IEEE International Symp. on Biomedical Imaging: Macro to Nano, vol. 1, pp. 816–819, April 2004.
F. T. Arslan and A. M. Grigoryan, “Fast splitting alpha-rooting method of image enhancement: tensor representation” IEEE Trans. on Image Processing, vol. 15, no. 11, pp. 3375–3384, November 2006.
C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: real-time tracking of the human body,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 780–785, July 1997.
C. Stauffer and E. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 747–757, August 2000.
I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: real-time surveillance of people and their activities,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 809–830, August 2000.
N. M. Oliver, B. Rosario, and A. P. Pentland, “A Bayesian computer vision system for modeling human interactions,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 831–843, August 2000.
T. Horprasert, D. Harwood, and L. S. Davies, “A robust background subtraction and shadow detection,” Proc. of Asian Conf. Computer Vision, pp. 8–11, 2000.
A. J. Lipton, H. Fujiyoshi, and R. S. Patil, “Moving target classification and tracking from real-time video,” Proc. of the IEEE Workshop Applications of Computer Vision, pp. 8–14, 1998.
J. Barron, D. Fleet, and S. Beauchemin, “Performance of optical flow techniques,” International Journal of Computer Vision, vol. 12, no. 1, pp. 42–77, February 1994.
A. Bevilacqua and P. Azzari, “High-quality real time motion detection using PTZ cameras,” Proc. of IEEE Conf. Advanced Video and Signal Based Surveillance, pp. 23–23, 2006.
R. Cucchiara, A. Prati, and R. Vezzani, “Advanced video surveillance with pan tilt zoom cameras,” Proc. of the 6th IEEE International Workshop on Visual Surveillance, 2006.
C. Micheloni and G. L. Foresti, “A robust feature tracker for active surveillance of outdoor scenes,” Electronic Letters on Computer Vision and Image Analysis, pp. 21–34, 2003.
D. Murray and A. Basu, “Motion tracking with an active camera,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16, no. 5, pp. 449–459, May 1994.
B. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” Proc. of the Imaging understanding workshop, pp. 121–130, 1981.
J. Shi and C. Tomasi, “Good features to track,” Proc. of IEEE Conf. Computer Vision and Pattern Recognition, pp. 593–600, 1994.
A. Yilmaz, O. Javed, and M. Shah, “Object tracking: a survey,” ACM Computing Surveys, vol. 38, no. 4, 2006.
K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615–1630, October 2005.
V. Salari and I. Sethi, “Feature point correspondence in the presence of occlusion,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 12, no. 1, pp. 87–91, January 1990.
T. Broida and R. Chellappa, “Estimation of object motion parameters from noisy images,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 8, no. 1, pp. 90–99, January 1986.
H. Tanizaki, “Non-gaussian state-space modeling of nonstationary time series,” Journal of the American Statistical Association, vol. 82, pp. 1032–1063, December 1987.
G. Hager, M. Dewan, and C. Stewart, “Multiple kernel tracking with SSD,” Proc. of IEEE Conf. Computer Vision and Pattern Recognition, pp. 790–797, 2004.
D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 564–575, May 2003.
R. Collins, “Mean-shift blob tracking through scale space,” Proc. of IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 234–240, June 2003.
T. Lindeberg, “Scale-space theory: A basic tool for analyzing structures at different scales,” Journal of Applied Statistics, vol. 21, no. 2, pp. 224–270, 1994.
S. Birchfield and S. Rangarajan, “Spatiograms versus histograms for region based tracking,” Proc. of IEEE Conf. Computer Vision and Pattern Recognition, pp. 1158–1163, 2005.
M. Black and A. Jepson, “Eigentracking: robust matching and tracking of articulated objects using a view-based representation,” International Journal of Computer Vision, vol. 26, no. 1, pp. 63–84, 1998.
J. Lim, D. Ross, R. Lin, and M. H. Yang, “Incremental learning for visual tracking,” Advances in Neural Information Processing Systems, pp. 793–800, 2005.
S. Avidan “Support vector tracking,” Proc. of IEEE Conf. Computer Vision and Pattern Recognition, pp. 184–191, 2001.
S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 2nd Edition, pp. 318–350, 1998.
N. Paragios and R. Deriche “Geodesic active regions and level set methods for supervised texture segmentation,” International Journal of Computer Vision, vol. 46, no. 3, pp. 223–247, 2002.
N. Peter Freund, “Robust tracking of position and velocity with Kalman snakes,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 21, no. 6, pp. 564–569, June 1999.
M. Isard and A. Blake, “Condensation — conditional density propagation for visual tracking,” International Journal of Computer Vision, vol. 29, no. 1, pp. 5–28, 1998.
A. Yilmaz, X. Li, and M. Shah, “Contour based object tracking with occlusion handling in video acquired using mobile cameras,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1531–1536, November 2004.
J. Sethian, Level Set Methods: Evolving Interfaces in Geometry, Fluid Mechanics Computer Vision and Material Sciences, Cambridge University. Press, 1999.
J. K. Aggarwal and P. Sangho, “Human motion: modeling and recognition of actions and interactions,” Proc. of the 2nd International Symposium on 3D Data Processing, Visualization and Transmission, pp. 640–647, September 2004.
K. Sato and J. K. Aggarwal, “Temporal spatiovelocity transform and its applications to tracking and interaction,” Computer Vision and Image Understanding, vol. 96, no. 2, pp. 100–128, November 2004.
I. Haritaoglu, D. Harwood, and L. Davis, “Hydra: Multiple people detection and tracking using silhouettes,” Proc. of the 2nd IEEE Workshop on Visual Surveillance, pp. 6–13, 1999.
M. Pantic, A. Pentland, A. Nijholt, and T. Huang, “Human computing and machine understanding of human behavior: a survey,” Proc. of the International Conf. Multimodal interfaces, pp. 239–248, 2006.
H. Fujiyoshi and A. Lipton, “Real-time human motion analysis by image skeletonization,” Proc. of the Workshop on Applications of Computer Vision, pp. 15–21, 1998.
S. Ju, M. Black, and Y. Yaccob, “Cardboard people: a parameterized model of articulated image motion,” Proc. of the IEEE International Conf. Automatic Face and gesture Recognition, pp. 38–44, 1996.
M. Leung and Y. Yang, “First sight: a human body outline labeling system,” IEEE Trans. on Pattern Analysis Machine Intelligence, vol. 17, no. 4, pp. 359–377, April 1995.
J. K. Aggarwal, Q. Cai, W. Liao, and B. Sabata, “Nonrigid motion analysis: articulated and elastic motion,” Computer Vision and Image Understanding, vol. 70, no. 2, pp. 142–156, 1997.
J. Park, S. Park, and J. K. Aggarwal, “Model-based human motion capture from monocular video sequences,” Lecture Notes in Computer Science: Computer and Information Sciences, vol. 2869, pp. 405–412, 2003.
D. Gavrila, Vision-Based 3-D Tracking of Humans in Action, Ph.D. thesis, Department of Computer Science, University of Maryland, 1996.
J. O’Rourke and N. Badler, “Model-based image analysis of human motion using constraint propagation,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 2, no. 6, pp. 522–536, November 1980.
C. Myers, L. Rabinier, and A. Rosenberg, “Performance tradeoffs in dynamic time warping algorithms for isolated word recognition,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 28, no. 6, pp. 623–635, December 1980.
L. R Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
K. Murphy, Dynamic Bayesian Networks: Representation, Inference and Learning, Ph.D. Thesis, University of California at Berkeley, 2002.
A. Bobick and A. Wilson, “A state-based technique for the summarization and recognition of gesture,” Proc. of the International Conf. Computer Vision, pp. 382–388, June 1995.
J. Yang, Y. Xu, and C. S. Chen, “Human action learning via hidden markov model,” IEEE Trans. on System, Man and Cybernetics, vol. 27, no. 1, pp. 34–44, January 1997.
T. Starner, J. Weaver, and A. Pentland, “Real-time American sign language recognition using desk and wearable computer-based video,” IEEE Trans. on Pattern Anal. Mach. Intell, vol. 20, no. 12, pp. 1371–1375, December 1998.
N. M. Oliver, B. Rosario, and A. P. Pentland, “A bayesian computer vision system for modeling human interactions,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 831–843, August 2000.
S. Park and J. K. Aggarwal, “Recognition of two-person interactions using a hierarchical Bayesian network,” Proc. of ACM SIGMM International Workshop on Video Surveillance, pp. 65–76, 2003.
A. F. Bobick and J. Davis, “Real-time recognition of activity using temporal templates,” Proc. of the IEEE CS Workshop on Applications of Computer Vision, pp. 39–42, 1996.
A. F. Bobick and J. Davis, “Real-time recognition of activity using temporal templates,” Proc. of the IEEE Computer Society Workshop on Computer vision, pp. 39–42, 1996.
A. D. Wilson, A. F. Bobick, and J. Cassell, “Temporal classification of natural gesture and application to video coding,” Proc. of the IEEE Conf. Computer Vision and Pattern Recognition, pp. 948–954, 1997.
F. Bremond and G. Medioni, “Scenario recognition in airborne video imagery,” Proc. of the International Workshop Interpretation of Visual Motion, pp. 57–64, 1998.
S. Intille and A. Bobick, “Representation and visual recognition of complex, multi-agent actions using belief networks,” MIT Technical Report, no. 454, 1998.
R. Mann, A. Jepson, and J. Siskind, “Computational perception of scene dynamics,” Computer Vision and Image Understanding, vol. 65, no. 2, pp. 113–128, February 1997.
M. Brand and I. Essa, “Causal analysis for visual gesture understanding,” MIT Technical Report, 1995.
Y. Ivanov and A. Bobick, “Recognition of visual activities and interactions by stochastic parsing,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 852–872, August 2000.
D. Ayers and M. Shah, “Recognizing human action in a static room,” Proc. of the IEEE Computer Society Workshop on Interpretation of Visual Motion, pp. 42–46, 1998.
R. T. Collins and Y. Tsin, “Calibration of an outdoor active camera system,” Proc. of the Conf. Computer Vision and Pattern Recognition, pp. 528–534, June 1999.
J. Davis and X. Chen, “Calibrating pan-tilt cameras in wide-area surveillance networks,” Proc. of IEEE International Conf. Computer Vision, vol. 1, pp. 144–149, 2003.
E. Hemayed, “A survey of camera self-calibration,” Proc. of IEEE Conf. Advanced Video and Signal Based Surveillance, pp. 351–357, 2003.
I. Pavlidis, V. Morellas, P. Tsiamyrtzis, and S. Harp, “Urban surveillance systems: from the laboratory to the commercial world,” Proc. of the IEEE, vol. 89, no. 10, pp. 1478–1495, 2001.
I. Paulidis and V. Morellas, “Two examples of indoor and outdoor surveillance systems,” in P. Remagnino, G. A. Jones, N. Paragios, and C. S. Regazzoni Eds., Video-based Surveillance Systems, Kluwer Academic Publishers, Boston, pp. 39–51, 2002.
C. Micheloni, G. L. Foresti, and L. Snidaro, “A cooperative multicamera system for video-surveillance of parking lots,” Proc. of IEE Symp. on Intelligent Distributed Surveillance Systems, pp. 21–24, 2003.
J. Krumm, S. Harris, B. Meyers, B. Brumit, M. Hale, and S. Shafer, “Multi-camera multi-person tracking for easy living,” Proc. of the 3rd IEEE International Workshop on Visual Surveillance, pp. 8–11, 2000.
N. T. Nguyen, S. Venkatesh, G. West, and H. H. Bui, “Multiple camera coordination in a surveillance system,” ACTA Automatica Sinica, vol. 29, no. 3, pp. 408–421, 2003.
D. MaKris, T. Ellis, and J. Black, “Bridging the gaps between cameras,” Proc. of International Conf. Multimedia and Expo, June 2004.
R. T. Collins, A. J. Lipton, H. Fujiyoshi, and T. Kanade, “Algorithms for cooperative multisensor surveillance,” Proc. of the IEEE, vol. 89, no. 10, pp. 1456–1477, 2001.
M. Xu, L. Lowey, and J. Orwell, “Architecture and algorithms for tracking football players with multiple cameras,” Proc. of the IEE Workshop on Intelligent Distributed Surveillance Systems, pp. 51–56, 2004.
A. Mittal and L. S. Davis, “M2 tracker: a multiview approach to segmenting and tracking people in a cluttered scene,” International Journal of Computer Vision, vol. 51, no. 3, pp. 189–203, 2003.
J. Kang, I. Cohen, and G. Medioni, “Continuous tracking within and across camera streams,” Proc. of IEEE Computer Society Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 267–272, 2003.
O. Javed, Z. Rasheed, K. Shafique, and M. Shah, “Tracking across multiple cameras with disjoint views,” Proc. of IEEE International Conf. Computer Vision, vol. 1, pp.952–957, 2003.
R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, Wiley-Interscience, 2nd edition, pp. 164–173, 2000.
W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual surveillance of object motion and behaviors,” IEEE Trans. on Systems, Man and Cybernetics — Part C, vol. 34, no. 3, August 2004.
S. Hong, H. Lee, K.-A. Toh, and E. Kim, “Gait recognition using multi-bipolarized contour vector,” International Journal of Control, Automation, and Systems, vol. 7, no. 5, pp. 799–808, 2009.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Guest Co-Editor Mongi Abidi.
In Su Kim received his B.S in the School of Electrical Engineering and Computer Science from Hanyang University, Seoul, Korea in 2003. He is currently a candidate for Ph.D degree in the School of Electrical Engineering and Computer Science at Seoul National University. His research interests include computer vision, pattern recognition, motion detection and object classification for embedded visual surveillance system.
Hong Seok Choi receives his B.S. and M.S. degrees in the School of Electrical Engineering from Seoul National University, Seoul, Korea, in 2000 and 2002, respectively. He is working toward the Ph.D. degree in the School of Electrical Engineering and Computer Science at Seoul National University. His research interests are in the area of computer vision, pattern recognition, and embedded surveillance system.
Kwang Moo Yi received his B.S. Degree in the department of Electrical Engineering and Computer Science from Seoul National University, Seoul, Korea, in 2007. Currently he is a Ph.D. candidate student in the department of Electrical Engineering and Computer Science from Seoul National University, Seoul, Korea. His research interests include computer vision, visual tracking, motion segmentation, motion detection, unsupervised Bayesian learning, and so on.
Jin Young Choi received his B.S., M.S., and Ph.D. degrees in Electrical Engineering and Computer Science from Seoul National University in 1982, 1984, 1993, respectively. He was a researcher at the Electronics and Telecommunications Research Institute (ETRI) from 1984 to 1994. He was a visiting professor at the University of California, Riverside from 1998 to 1999. He is currently a professor at the School of Electrical Engineering and Computer science at Seoul National University, Korea. He is a director of the Perception and Intelligence Research Center. His research interests include visual surveillance, intelligent systems, and adaptive control.
Seong G. Kong received his B.S. and M.S. degrees in Electrical Engineering from Seoul National University, Seoul, Korea, in 1982 and 1987, respectively, and his Ph.D. degree in Electrical Engineering from the University of Southern California, Los Angeles, in 1991. From 1992 to 2000, he was an Associate Professor of Electrical Engineering at Soongsil University, Seoul, Korea. He was Chair of the Department from 1998 to 2000. During 2000–2001, he was with School of Electrical and Computer Engineering at Purdue University, West Lafayette, IN, as a Visiting Scholar. From 2002 to 2007, he was an Associate Professor at the Department of Electrical and Computer Engineering, University of Tennessee, Knoxville. Currently, he is an Associate Professor at the Department of Electrical and Computer Engineering, Temple University, Philadelphia, PA. He published more than 70 refereed journal articles, conference papers, and book chapters in the areas of image processing, pattern recognition, and intelligent systems.Dr. Kong was the Editor-in-Chief of Journal of Fuzzy Logic and Intelligent Systems from 1996 to 1999. He is an Associate Editor of the IEEE TRANSACTIONS ON NEURAL NETWORKS. He received the Award for Academic Excellence from Korea Fuzzy Logic and Intelligent Systems Society in 2000 and the Most Cited Paper Award from the Journal of Computer Vision and Image Understanding in 2007 and 2008, two years in a row. He is a Technical Committee member of the IEEE Computational Intelligence Society.
Rights and permissions
About this article
Cite this article
Kim, I.S., Choi, H.S., Yi, K.M. et al. Intelligent visual surveillance — A survey. Int. J. Control Autom. Syst. 8, 926–939 (2010). https://doi.org/10.1007/s12555-010-0501-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-010-0501-4