HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion

Sigal, Leonid; Balan, Alexandru O.; Black, Michael J.

doi:10.1007/s11263-009-0273-6

HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion

Published: 05 August 2009

Volume 87, pages 4–27, (2010)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Leonid Sigal¹,
Alexandru O. Balan² &
Michael J. Black²

4116 Accesses
751 Citations
9 Altmetric
Explore all metrics

Abstract

While research on articulated human motion and pose estimation has progressed rapidly in the last few years, there has been no systematic quantitative evaluation of competing methods to establish the current state of the art. We present data obtained using a hardware system that is able to capture synchronized video and ground-truth 3D motion. The resulting HumanEva datasets contain multiple subjects performing a set of predefined actions with a number of repetitions. On the order of 40,000 frames of synchronized motion capture and multi-view video (resulting in over one quarter million image frames in total) were collected at 60 Hz with an additional 37,000 time instants of pure motion capture data. A standard set of error measures is defined for evaluating both 2D and 3D pose estimation and tracking algorithms. We also describe a baseline algorithm for 3D articulated tracking that uses a relatively standard Bayesian framework with optimization in the form of Sequential Importance Resampling and Annealed Particle Filtering. In the context of this baseline algorithm we explore a variety of likelihood functions, prior models of human motion and the effects of algorithm parameters. Our experiments suggest that image observation models and motion priors play important roles in performance, and that in a multi-view laboratory environment, where initialization is available, Bayesian filtering tends to perform well. The datasets and the software are made available to the research community. This infrastructure will support the development of new articulated motion and pose estimation algorithms, will provide a baseline for the evaluation and comparison of new methods, and will help establish the current state of the art in human pose estimation and tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pictorial Human Spaces: A Computational Study on the Human Perception of 3D Articulated Poses

Article 01 April 2016

BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking

Real-Time Multi-person Motion Capture from Multi-view Video and IMUs

Article Open access 17 December 2019

References

Agarwal, A., & Triggs, B. (2004a). Learning to track 3D human motion from silhouettes. In International conference on machine learning (ICML) (pp. 9–16).
Agarwal, A., & Triggs, B. (2004b). 3D human pose from silhouettes by relevance vector regression. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 882–888).
Arulampalam, S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188.
Article Google Scholar
Baker, S., Scharstien, D., Lewis, J. P., Roth, S., Black, M. J., & Szeliski, R. (2007). A database and evaluation methodology for optical flow. In IEEE international conference on computer vision (ICCV) (pp. 1–8).
Balan, A., Sigal, L., Black, M. J., Davis, J., & Haussecker, H. (2007). Detailed human shape and pose from images. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Balan, A., & Black, M. J. (2006). An adaptive appearance model approach for model-based articulated object tracking. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 758–765).
Balan, A., Sigal, L., & Black, M. (2005). A quantitative evaluation of video-based 3D person tracking. In IEEE workshop on visual surveillance and performance evaluation of tracking and surveillance (VS-PETS) (pp. 349–356).
Bissacco, A., Yang, M.-H., & Soatto, S. (2007). Fast human pose estimation using appearance and motion via multi-dimensional boosting, regression. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Bo, L., Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2008). Fast algorithms for large scale conditional 3D prediction. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Bouguet, J.-Y. Camera calibration toolbox for Matlab. http://www.vision.caltech.edu/bouguetj/calib_doc/, accessed on 7/24/2009.
Bregler, C., & Malik, J. (1998). Tracking people with twists and exponential maps. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8–15).
Brubaker, M., Fleet, D. J., & Hertzmann, A. (2007). Physics-based person tracking using simplified lower-body dynamics. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Camomilla, V., Cereatti, A., Vannozzi, G., & Cappozzo, A. (2006). An optimized protocol for hip joint centre determination using the functional method. Journal of Biomechanics, 39(6), 1096–1106.
Article Google Scholar
CMU Motion Capture Database, http://mocap.cs.cmu.edu/, accessed on 7/24/2009.
Corazza, S., Mündermann, L., & Andriacchi, T. (2007). A framework for the functional identification of joint centers using markerless motion capture, validation for the hip joint. Journal of Biomechanics, 40(15), 3510–3515.
Google Scholar
Deutscher, J., & Reid, I. (2005). Articulated body motion capture by stochastic search. International Journal of Computer Vision, 61(2), 185–205.
Article Google Scholar
Doucet, A., Godsil, S. J., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208.
Article Google Scholar
Dimitrijevic, M., Lepetit, V., & Fua, P. (2006). Human body pose detection using bayesian spatio-temporal, templates. Computer Vision and Image Understanding, 104(2), 127–139.
Article Google Scholar
Fathi, A., & Mori, G. (2007). Human pose estimation using motion, exemplars. In IEEE international conference on computer vision (ICCV) (pp. 1–8).
Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Article Google Scholar
Gall, J., Rosenhahn, B., Brox, T., Kersting, U., & Seidel, H.-P. (2006). Learning for multi-view 3D tracking in the context of particle filters. In LNCS : Vol. 4292. International symposium on visual computing (ISVC) (pp. 59–69). Berlin: Springer.
Google Scholar
Gavrila, D. (1999). The visual analysis of human movement: a survey. Computer Vision and Image Understanding, 73(1), 82–98.
Article MATH Google Scholar
Gavrila, D., & Davis, L. (1996). 3-D model-based tracking of humans in action: a multi-view approach. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 73–80).
Grauman, K., Shakhnarovich, G., & Darrell, T. (2003). Inferring 3D structure with a statistical image-based shape model. In IEEE international conference on computer vision (ICCV) (pp. 641–648).
Gross, R., & Shi, J. (2001). The CMU motion of body (MoBo) database. Technical Report CMU-RI-TR-01-18. Robotics Institute, Carnegie Mellon University.
Hogg, D. C. (1983). Model-based vision: a program to see a walking person. Image and Vision Computing, 1, 5–20.
Article Google Scholar
Hough, P. V. C. (1962). Method and means for recognizing complex patterns. U.S. Patent 3,069,654.
Hua, G., Yang, M.-H., & Wu, Y. (2005). Learning to estimate human pose with data driven belief propagation. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 747–754).
Isard, M., & Blake, A. (1998). Condensation–conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1), 5–28.
Article Google Scholar
Jepson, A., Fleet, D., & El-Maraghi, T. (2003). Robust online appearance models for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10), 1296–1311.
Article Google Scholar
Ju, S., Black, M., & Yacoob, Y. (1996). Cardboard people: a parametrized model of articulated motion. In International conference on automatic face and gesture recognition (pp. 38–44).
Kakadiaris, I. A., & Metaxas, D. (1996). Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 81–87).
Knossow, D., Ronfard, R., & Horaud, R. (2008). Human motion tracking with a kinematic parameterization of extremal contours. International Journal of Computer Vision, 79(3), 247–269.
Article Google Scholar
Lan, X., & Huttenlocher, D. (2005). Beyond trees: common factor models for 2D human pose recovery. In IEEE international conference on computer vision (ICCV), vol. 1 (pp. 470–477).
Lan, X., & Huttenlocher, D. (2004). A unified spatio-temporal articulated model for tracking. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 722–729).
Lee, C.-S., & Elgammal, A. (2007). Modeling view and posture manifold for tracking. In IEEE international conference on computer vision (ICCV) (pp. 1–8).
Lee, M., & Nevatia, R. (2006). Human pose tracking using multi-level structured models. In European conference on computer vision (ECCV), vol. 3 (pp. 368–381).
Li, R., Tian, T.-P., & Sclaroff, S. (2007). Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series. In IEEE international conference on computer vision (ICCV) (pp. 1–8).
Li, R., Yang, M.-H., Sclaroff, S., & Tian, T.-P. (2006). Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In European conference on computer vision (ECCV).
Lu, Z., Perpinan, M. C., & Sminchisescu, C. (2007). People tracking with the laplacian eigenmaps latent variable model. In Advances in neural information processing systems (NIPS), vol. 2 (pp. 137–150).
MacCormick, J., & Isard, M. (2000). Partitioned sampling, articulated objects, and interface-quality hand tracking. In European conference on computer vision (ECCV), vol. 2 (pp. 3–19).
Moeslund, T., & Granum, E. (2001). A survey of computer vision-based human motion capture. Computer Vision and Image Understanding, 18, 231–268.
Article Google Scholar
Mori, G. (2005). Guiding model search using segmentation. In IEEE international conference on computer vision (ICCV) (pp. 1417–1423).
Mori, G., Ren, X., Efros, A., & Malik, J. (2004). Recovering human body configurations: combining segmentation and recognition. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 326–333).
Muendermann, L., Corazza, S., & Andriacchi, T. (2007). Accurately measuring human movement using articulated ICP with soft-joint constraints and a repository of articulated models. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Navaratnam, R., Fitzgibbon, A., & Cipolla, R. (2007). The joint manifold model for semi-supervised multi-valued regression. In IEEE international conference on computer vision (ICCV) (pp. 1–8).
Ning, H., Xu, W., Gong, Y., & Huang, T. (2008). Discriminative learning of visual words for 3D human pose estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Ormoneit, D., Sidenbladh, H., Black, M. J., & Hastie, T. (2001). Learning and tracking cyclic human motion. In Advances in neural information processing systems (NIPS), vol. 13 (pp. 894–900).
Ormoneit, D., Sidenbladh, H., Black, M. J., & Hastie, T. (2000). Stochastic modeling and tracking of human motion, Learning 2000, Snowbird, UT.
O’Rourke, J., & Badler, N. I. (1980). Model-based image analysis of human motion using constraint propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(6), 522–192.
Google Scholar
Pavolvic, V., Rehg, J., Cham, T.-J., & Murphy, K. (1999). A dynamic Bayesian network approach to figure tracking using learned dynamic models. In IEEE international conference on computer vision (ICCV) (pp. 94–101).
Phillips, P. J., Blackburn, D., Bone, M., Grother, P., Micheals, R., & Tabassi, E. (2002). Face recognition vendor test. http://www.frvt.org/.
Phillips, P. J., Moon, H., Rizvi, S. A., & Rauss, P. J. (2000). The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10), 1090–1104.
Article Google Scholar
Poon, E., & Fleet, D. (2002). Hybrid Monte Carlo filtering: edge-based people tracking. It IEEE workshop on motion and video computing (pp. 151–158).
Ramanan, D., Forsyth, D., & Zisserman, A. (2005). Strike a pose: tracking people by finding stylized poses (CVPR). In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 271–278).
Ramanan, D., & Forsyth, D. (2003). Finding and tracking people from the bottom up. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 467–474).
Ren, X., Berg, A., & Malik, J. (2005). Recovering human body configurations using pairwise constraints between parts. In IEEE international conference on computer vision (ICCV), vol. 1 (pp. 824–831).
Roberts, T., McKenna, S., & Ricketts, I. (2004). Human pose estimation using learnt probabilistic region similarities and partial configurations. In European conference on computer vision (ICCV), vol. 4 (pp. 291–303).
Rogez, G., Rihan, J., Ramalingam, S., Oritte, C., & Torr, P. H. S. (2008). Randomized trees for human pose estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Ronfard, R., Schmid, C., & Triggs, B. (2002). Larning to parse pictures of people. In European conference on computer vision (ECCV), vol. 4 (pp. 700–714).
Rosales, R., & Sclaroff, S. (2000). Inferring body pose without tracking body parts. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 721–727).
Rosenhahn, B., Brox, T., Kersting, U., Smith, D., Gurney, J., & Klette, R. (2006). A system for marker-less human motion estimation. Kuenstliche Intelligenz, 1, 45–51.
Google Scholar
Roth, S., Sigal, L., & Black, M. J. (2004). Gibbs likelihoods for Bayesian tracking. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 886–893).
Sarkar, S., Phillips, P. J., Liu, Z., Robledo, I., Grother, P., & Bowyer, K. W. (2005). The human ID gait challenge problem: data sets, performance, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2), 162–177.
Article Google Scholar
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1/2/3), 7–42.
Article MATH Google Scholar
Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In IEEE international conference on computer vision (ICCV), vol. 2 (pp. 750–759).
Sidenbladh, H., & Black, M. J. (2003). Learning the statistics of people in images and video. International Journal of Computer Vision, 54(1–3), 183–209.
MATH Google Scholar
Sidenbladh, H., Black, M. J., & Sigal, L. (2002). Implicit probabilistic models of human motion for synthesis and tracking. In European conference on computer vision (ECCV), vol. 1 (pp. 784–800).
Sidenbladh, H., De la Torre, F., & Black, M. J. (2000). A framework for modeling the appearance of 3D articulated figures. In International conference on automatic face and gesture recognition (FG) (pp. 368–375).
Sidenbladh, H., Black, M., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision (ECCV), vol. 2 (pp. 702–718).
Sigal, L., Bhatia, S., Roth, S., Black, M., & Isard, M. (2004). Tracking loose-limbed people. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 421–428).
Sigal, L., & Black, M. (2006). Measure locally, reason globally: occlusion-sensitive articulated pose estimation. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 2041–2048).
Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (2005). Discriminative density propagation for 3D human motion estimation. in IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 390–397).
Sminchisescu, C., & Jepson, A. (2004). Generative modeling for continuous non-linearly embedded visual inference. In International conference on machine learning (ICML) (pp. 759–766).
Sminchisescu, C., & Triggs, B. (2003a). Kinematic jump processes for monocular 3D human tracking. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 69–76).
Sminchisescu, C., & Triggs, B. (2003b). Estimating articulated human motion with covariance scaled sampling. International Journal of Robotics Research, 22(6), 371–391.
Article Google Scholar
Sminchisescu, C., & Telea, A. (2002). Human pose estimation from silhouettes a consistent approach using distance level sets. In International conference on computer graphics, visualization and computer vision (WSCG).
Sminchisescu, C. (2002). Consistency and coupling in human model likelihoods. In International conference on automatic face and gesture recognition (FG) (pp. 27–32).
Srinivasan, P., & Shi, J. (2007). Bottom-up recognition and parsing of the human body. In IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Taylor, C. J. (2000). Reconstruction of articulated objects from point correspondences in a single image. Computer Vision and Image Understanding, 80(3), 349–363.
Article MATH Google Scholar
Urtasun, R., & Darrell, T. (2008). Local probabilistic regression for activity-independent human pose inference. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Urtasun, R., Fleet, D. J., & Fua, P. (2006). 3D people tracking with gaussian process dynamical models. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 238–245).
Urtasun, R., Fleet, D. J., Hertzmann, A., & Fua, P. (2005). Priors for people tracking from small training sets. In IEEE international conference on computer vision (ICCV), vol. 1 (pp. 403–410).
Vlasic, D., Baran, I., Matusik, W., & Popović, J. (2008). Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics, 27(3), 1–9.
Article Google Scholar
Vondrak, M., Sigal, L., & Jenkins, O. C. (2008). Physical simulation for probabilistic motion tracking. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Wang, P., & Rehg, J. M. (2006). A modular approach to the analysis and evaluation of particle filters for figure tracking. In IEEE conference on computer vision and pattern recognition (CVPR), vol. 1 (pp. 790–797).
Wachter, S., & Nagel, H. H. (1999). Tracking persons in monocular image sequences. Computer Vision and Image Understanding, 74(3), 174–192.
Article Google Scholar
Xu, X., & Li, B. (2007). Learning motion correlation for tracking articulated human body with a Rao-Blackwellised particle filter. In IEEE international conference on computer vision (ICCV) (pp. 1–8).
Zhang, J., Luo, J., Collins, R., & Liu, Y. (2006). Body localization in still images using hierarchical models and hybrid search. In IEEE international conference on computer vision and pattern recognition (CVPR), vol. 2 (pp. 1536–1543).

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Toronto, 6 King’s College Rd, Toronto, ON, M5S 3H5, Canada
Leonid Sigal
Dept. of Computer Science, Brown University, 115 Waterman St, Box 1910, Providence, RI, 02912, USA
Alexandru O. Balan & Michael J. Black

Authors

Leonid Sigal
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru O. Balan
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Black
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonid Sigal.

Additional information

This project was supported in part by gifts from Honda Research Institute and Intel Corporation. Funding for portions of this work was also provided by NSF grants IIS-0534858 and IIS-0535075. We would like to thank Ming-Hsuan Yang, Rui Li, Payman Yadollahpour and Stefan Roth for help in data collection and post-processing. We also would like to thank Stan Sclaroff for making the color video capture equipment available for this effort.

The first two authors contributed equally to this work.

The work of L. Sigal was conducted at Brown University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sigal, L., Balan, A.O. & Black, M.J. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion. Int J Comput Vis 87, 4–27 (2010). https://doi.org/10.1007/s11263-009-0273-6

Download citation

Received: 05 May 2008
Accepted: 10 July 2009
Published: 05 August 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11263-009-0273-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion

Abstract

Access this article

Similar content being viewed by others

Pictorial Human Spaces: A Computational Study on the Human Perception of 3D Articulated Poses

BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking

Real-Time Multi-person Motion Capture from Multi-view Video and IMUs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion

Abstract

Access this article

Similar content being viewed by others

Pictorial Human Spaces: A Computational Study on the Human Perception of 3D Articulated Poses

BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking

Real-Time Multi-person Motion Capture from Multi-view Video and IMUs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation