Learning Visibility for Robust Dense Human Body Estimation

Yao, Chun-Han; Yang, Jimei; Ceylan, Duygu; Zhou, Yi; Zhou, Yang; Yang, Ming-Hsuan

doi:10.1007/978-3-031-19769-7_24

Learning Visibility for Robust Dense Human Body Estimation

Chun-Han Yao¹²,
Jimei Yang¹³,
Duygu Ceylan¹³,
Yi Zhou¹³,
Yang Zhou¹³ &
…
Ming-Hsuan Yang^12,14,15

Conference paper
First Online: 23 October 2022

3044 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Abstract

Estimating 3D human pose and shape from 2D images is a crucial yet challenging task. While prior methods with model-based representations can perform reasonably well on whole-body images, they often fail when parts of the body are occluded or outside the frame. Moreover, these results usually do not faithfully capture the human silhouettes due to their limited representation power of deformable models (e.g., representing only the naked body). An alternative approach is to estimate dense vertices of a predefined template body in the image space. Such representations are effective in localizing vertices within an image but cannot handle out-of-frame body parts. In this work, we learn dense human body estimation that is robust to partial observations. We explicitly model the visibility of human joints and vertices in the x, y, and z axes separately. The visibility in x and y axes help distinguishing out-of-frame cases, and the visibility in depth axis corresponds to occlusions (either self-occlusions or occlusions by other objects). We obtain pseudo ground-truths of visibility labels from dense UV correspondences and train a neural network to predict visibility along with 3D coordinates. We show that visibility can serve as 1) an additional signal to resolve depth ordering ambiguities of self-occluded vertices and 2) a regularization term when fitting a human body model to the predictions. Extensive experiments on multiple 3D human datasets demonstrate that visibility modeling significantly improves the accuracy of human body estimation, especially for partial-body cases. Our project page with code is at: https://github.com/chhankyao/visdb.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2Shape: detailed full human body geometry from a single image. In: ICCV (2019)
Google Scholar
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chapter Google Scholar
Choi, H., Moon, G., Chang, J.Y., Lee, K.M.: Beyond static features for temporally consistent 3D human pose and shape from a video. In: CVPR, pp. 1964–1973 (2021)
Google Scholar
Choi, H., Moon, G., Lee, K.M.: Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 769–787. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_45
Chapter Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar
Dwivedi, S.K., Athanasiou, N., Kocabas, M., Black, M.J.: Learning to regress bodies from images using differentiable semantic rendering. In: ICCV, pp. 11250–11259 (2021)
Google Scholar
Guler, R.A., Kokkinos, I.: Holopose: holistic 3D human reconstruction in-the-wild. In: CVPR, pp. 10884–10894 (2019)
Google Scholar
Güler, R.A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: CVPR, pp. 7297–7306 (2018)
Google Scholar
Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3D human pose ambiguities with 3D scene constraints. In: ICCV, pp. 2282–2292 (2019)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3D human sensing in natural environments. PAMI 36(7), 1325–1339 (2013)
Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR, pp. 7122–7131 (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: CVPR, pp. 5253–5263 (2020)
Google Scholar
Kocabas, M., Huang, C.H.P., Hilliges, O., Black, M.J.: PARE: part attention regressor for 3D human body estimation. In: ICCV, pp. 11127–11137 (2021)
Google Scholar
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: ICCV, pp. 2252–2261 (2019)
Google Scholar
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR, pp. 4501–4510 (2019)
Google Scholar
Kolotouros, N., Pavlakos, G., Jayaraman, D., Daniilidis, K.: Probabilistic modeling for human mesh recovery. In: ICCV (2021)
Google Scholar
Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: CVPR, pp. 6050–6059 (2017)
Google Scholar
Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: CVPR, pp. 1954–1963 (2021)
Google Scholar
Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: ICCV (2021)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. TOG 34(6), 1–16 (2015)
Article Google Scholar
von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 614–631. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_37
Chapter Google Scholar
Mehta, D., et al.: Single-shot multi-person 3D pose estimation from monocular RGB. In: 3DV, pp. 120–130 (2018)
Google Scholar
Moon, G., Chang, J.Y., Lee, K.M.: V2V-posenet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: CVPR, pp. 5079–5088 (2018)
Google Scholar
Moon, G., Chang, J.Y., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: ICCV, pp. 10133–10142 (2019)
Google Scholar
Moon, G., Lee, K.M.: I2L-MeshNet: image-to-Lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 752–768. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_44
Chapter Google Scholar
Muller, L., Osman, A.A., Tang, S., Huang, C.H.P., Black, M.J.: On self-contact and human pose. In: CVPR, pp. 9990–9999 (2021)
Google Scholar
Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: 3DV, pp. 484–494 (2018)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. NeurIPS 32, 8026–8037 (2019)
Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019)
Google Scholar
Pavlakos, G., Kolotouros, N., Daniilidis, K.: Texturepose: supervising human mesh estimation with texture consistency. In: ICCV, pp. 803–812 (2019)
Google Scholar
Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: CVPR, pp. 459–468 (2018)
Google Scholar
Rockwell, C., Fouhey, D.F.: Full-body awareness from partial observations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 522–539. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_31
Chapter Google Scholar
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: ICCV, pp. 2304–2314 (2019)
Google Scholar
Saito, S., Simon, T., Saragih, J., Joo, H.: PifuHD: multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In: CVPR, pp. 84–93 (2020)
Google Scholar
Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33
Chapter Google Scholar
Sun, Y., Bao, Q., Liu, W., Fu, Y., Black, M.J., Mei, T.: Monocular, one-stage, regression of multiple 3D people. In: ICCV, pp. 11179–11188 (2021)
Google Scholar
Varol, G., et al.: BodyNet: volumetric inference of 3D human body shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 20–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_2
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)
Google Scholar
Xu, Y., Zhu, S.C., Tung, T.: DenseRAC: joint 3D pose and shape estimation by dense render-and-compare. In: ICCV, pp. 7760–7770 (2019)
Google Scholar
Zeng, W., Ouyang, W., Luo, P., Liu, W., Wang, X.: 3D human mesh regression with dense correspondence. In: CVPR, pp. 7054–7063 (2020)
Google Scholar
Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: CVPR, pp. 7376–7385 (2020)
Google Scholar
Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: MonoCap: monocular human motion capture using a CNN coupled with a geometric prior. PAMI 41(4), 901–914 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

UC Merced, Merced, USA
Chun-Han Yao & Ming-Hsuan Yang
Adobe, San Jose, USA
Jimei Yang, Duygu Ceylan, Yi Zhou & Yang Zhou
Google, Mountain View, USA
Ming-Hsuan Yang
Yonsei University, Seoul, South Korea
Ming-Hsuan Yang

Authors

Chun-Han Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jimei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Duygu Ceylan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Hsuan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-Han Yao .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 11472 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, CH., Yang, J., Ceylan, D., Zhou, Y., Zhou, Y., Yang, MH. (2022). Learning Visibility for Robust Dense Human Body Estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-19769-7_24
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19768-0
Online ISBN: 978-3-031-19769-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics