Abstract
We present a method to synthesize plausible video sequences of humans according to user-defined body motions and viewpoints. We first capture a small database of multi-view video sequences of an actor performing various basic motions. This database needs to be captured only once and serves as the input to our synthesis algorithm. We then apply a marker-less model-based performance capture approach to the entire database to obtain pose and geometry of the actor in each database frame. To create novel video sequences of the actor from the database, a user animates a 3D human skeleton with novel motion and viewpoints. Our technique then synthesizes a realistic video sequence of the actor performing the specified motion based only on the initial database. The first key component of our approach is a new efficient retrieval strategy to find appropriate spatio-temporally coherent database frames from which to synthesize target video frames. The second key component is a warping-based texture synthesis approach that uses the retrieved most-similar database frames to synthesize spatio-temporally coherent target video frames. For instance, this enables us to easily create video sequences of actors performing dangerous stunts without them being placed in harm's way. We show through a variety of result videos and a user study that we can synthesize realistic videos of people, even if the target motions and camera views are different from the database content.
Supplemental Material
Available for Download
- Ballan, L., and Cortelazzo, G. M. 2008. Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes. In 3DPVT.Google Scholar
- Ballan, L., Brostow, G. J., Puwein, J., and Pollefeys, M. 2010. Unstructured video-based rendering: Interactive exploration of casually captured videos. ACM TOG (Proc. SIGGRAPH), 1--11. Google Scholar
- Baran, I., and Popovic, J. 2007. Automatic rigging and animation of 3d characters. ACM TOG (SIGGRAPH) 26, 3, 72. Google ScholarDigital Library
- Bradley, D., Popa, T., Sheffer, A., Heidrich, W., and Boubekeur, T. 2008. Markerless garment capture. ACM TOG (Proc. SIGGRAPH) 27, 3, 99. Google ScholarDigital Library
- Buehler, C., Bosse, M., McMillan, L., Gortler, S., and Cohen, M. 2001. Unstructured lumigraph rendering. In SIGGRAPH, 425--432. Google Scholar
- Cagniart, C., Boyer, E., and Ilic, S. 2010. Free-form mesh tracking: a patch-based approach. In Proc. IEEE CVPR, 1--8.Google Scholar
- Carranza, J., Theobalt, C., Magnor, M., and Seidel, H.-P. 2003. Free-viewpoint video of human actors. In ACM TOG (Proc. SIGGRAPH). Google Scholar
- Celly, B., and Zordan, V. 2004. Animated people textures. In Proc. of CASA, 331--338.Google Scholar
- Cheung, G. 2003. Visual Hull Construction, Alignment and Refinement for Human Kinematic Modeling, Motion Capture and Rendering. PhD thesis, Carnegie Mellon University. Google Scholar
- Cobzas, D., Yerex, K., and Jagersand, M. 2002. Dynamic textures for image-based rendering of fine-scale 3d structure and animation of non-rigid motion. In In Eurographics, 1067--7055.Google Scholar
- de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.-P., and Thrun, S. 2008. Performance capture from sparse multi-view video. ACM TOG (SIGGRAPH) 27, 1--10. Google ScholarDigital Library
- Debevec, P. E., Taylor, C. J., and Malik, J. 1996. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In SIGGRAPH, 11--20. Google Scholar
- Einarsson, P., Chabert, C.-F., Jones, A., Ma, W.-C., Lamond, B., im Hawkins, Bolas, M., Sylwan, S., and Debevec, P. 2006. Relighting human locomotion with flowed reflectance fields. In Proc. EGSR, 183--194. Google Scholar
- Flagg, M., Nakazawa, A., Zhang, Q., Kang, S. B., Ryu, Y. K., Essa, I., and Rehg, J. M. 2009. Human video textures. In Proc. of I3D, 199--206. Google Scholar
- Gall, J., Stoll, C., Aguiar, E., Theobalt, C., Rosenhahn, B., and Seidel, H.-P. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proc. IEEE CVPR, 1746--1753.Google Scholar
- Gleicher, M. 1998. Retargetting motion to new characters. In SIGGRAPH '98, 33--42. Google Scholar
- Hornung, A., and Kobbelt, L. 2009. Interactive pixel-accurate free viewpoint rendering from images with silhouette aware sampling. Comput. Graph. Forum 28, 8, 2090--2103.Google ScholarCross Ref
- Hornung, A., Dekkers, E., and Kobbelt, L. 2007. Character animation from 2d pictures and 3d motion data. ACM TOG 26, 1, 1:1--1:9. Google Scholar
- Huang, P., Hilton, A., and Starck, J. 2009. Human motion synthesis from 3d video. In Proc. CVPR, 1478--1485.Google Scholar
- Jain, A., Thormählen, T., Seidel, H.-P., and Theobalt, C. 2010. Moviereshape: tracking and reshaping of humans in videos. ACM TOG (Proc. SIGGRAPH Asia) 29, 148:1--148:10. Google Scholar
- Jimenez, J., Scully, T., Barbosa, N., Donner, C., Alvarez, X., Vieira, T., Matts, P., Orvalho, V., Gutierrez, D., and Weyrich, T. 2010. A practical appearance model for dynamic facial color. ACM TOG (Proc. SIGGRAPH Asia) 29, 141:1--141:10. Google Scholar
- Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being john malkovich. In Proc. of ECCV, 341--353. Google ScholarDigital Library
- Leyvand, T., Cohen-Or, D., Dror, G., and Lischinski, D. 2008. Data-driven enhancement of facial attractiveness. ACM TOG (Proc. SIGGRAPH) 27, 3, 38:1--38:9. Google Scholar
- Matusik, W., Buehler, C., Raskar, R., Gortler, S. J., and McMillan, L. 2000. Image-based visual hulls. SIGGRAPH '00, 369--374. Google ScholarDigital Library
- Mori, G., Berg, A., Efros, A., Eden, A., and Malik, J. 2004. Video based motion synthesis by splicing and morphing. UC Berkeley Technical Reports, No. UCB/CSD-4-1337.Google Scholar
- Narayanan, P. J., Rander, P., and Kanade, T. 1998. Constructing virtual worlds using dense stereo. In Proc. of ICCV, 3--10. Google ScholarDigital Library
- Schaefer, S., McPhail, T., and Warren, J. D. 2006. Image deformation using moving least squares. ACM TOG (Proc. SIGGRAPH) 25, 3, 533--540. Google ScholarDigital Library
- Schödl, A., and Essa, I. 2002. Controlled animation of video sprites. In Proc. of SCA, 121--127. Google Scholar
- Schödl, A., Szeliski, R., Salesin, D. H., and Essa, I. 2000. Video textures. In SIGGRAPH, 489--498. Google Scholar
- Starck, J., Miller, G., and Hilton, A. 2005. Video-based character animation. In Proc. of SCA, 49--58. Google Scholar
- Stich, T., Linz, C., Albuquerque, G., and Magnor, M. 2008. View and Time Interpolation in Image Space. Computer Graphics Forum (Proc. Pacific Graphics) 27, 7.Google Scholar
- Stoll, C., Gall, J., de Aguiar, E., Thrun, S., and Theobalt, C. 2010. Video-based reconstruction of animatable human characters. ACM TOG (Proc. SIGGRAPH Asia) 29, 139:1--139:10. Google Scholar
- Theobalt, C., Wuermlin, S., de Aguiar, E., and Nieder-berger, C. 2007. New trends in 3d video. In Eurographics Courses.Google Scholar
- Tung, T., Nobuhara, S., and Matsuyama, T. 2009. Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo. In Proc. IEEE ICCV, 1709--1716.Google Scholar
- Vlasic, D., Baran, I., Matusik, W., and Popović, J. 2008. Articulated mesh animation from multi-view silhouettes. ACM TOG (Proc. SIGGRAPH '08). Google Scholar
- Vlasic, D., Peers, P., Baran, I., Debevec, P., Popović, J., Rusinkiewicz, S., and Matusik, W. 2009. Dynamic shape capture using multi-view photometric stereo. In ACM TOG (Proc. SIGGRAPH Asia '09). Google Scholar
- Waschbüsch, M., Würmlin, S., and Gross, M. 2006. Interactive 3d video editing. Vis. Comput. 22, 631--641. Google ScholarDigital Library
- Weyrich, T., Pfister, H., and Gross, M. 2005. Rendering deformable surface reflectance fields. IEEE TVCG 11, 48--58. Google Scholar
- Wilburn, B., Joshi, N., Vaish, V., Talvala, E.-V., Antunez, E., Barth, A., Adams, A., Horowitz, M., and Levoy, M. 2005. High performance imaging using large camera arrays. ACM TOG (Proc. SIGGRAPH) 24, 765--776. Google ScholarDigital Library
- Zhou, S., Fu, H., Liu, L., Cohen-Or, D., and Han, X. 2010. Parametric reshaping of human bodies in images. ACM TOG (Proc. SIGGRAPH) 29, 4, 126:1--126:10. Google Scholar
- Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S. A. J., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM TOG (Proc. SIGGRAPH) 23, 3, 600--608. Google ScholarDigital Library
Index Terms
- Video-based characters: creating new human performances from a multi-view video database
Recommendations
Video-based characters: creating new human performances from a multi-view video database
SIGGRAPH '11: ACM SIGGRAPH 2011 papersWe present a method to synthesize plausible video sequences of humans according to user-defined body motions and viewpoints. We first capture a small database of multi-view video sequences of an actor performing various basic motions. This database ...
Video-based reconstruction of animatable human characters
We present a new performance capture approach that incorporates a physically-based cloth model to reconstruct a rigged fully-animatable virtual double of a real person in loose apparel from multi-view video recordings. Our algorithm only requires a ...
Video-based reconstruction of animatable human characters
SIGGRAPH ASIA '10: ACM SIGGRAPH Asia 2010 papersWe present a new performance capture approach that incorporates a physically-based cloth model to reconstruct a rigged fully-animatable virtual double of a real person in loose apparel from multi-view video recordings. Our algorithm only requires a ...
Comments