Abstract
We propose a novel framework to reconstruct super-resolution human shape from a single low-resolution input image. The approach overcomes limitations of existing approaches that reconstruct 3D human shape from a single image, which require high-resolution images together with auxiliary data such as surface normal or a parametric model to reconstruct high-detail shape. The proposed framework represents the reconstructed shape with a high-detail implicit function. Analogous to the objective of 2D image super-resolution, the approach learns the mapping from a low-resolution shape to its high-resolution counterpart and it is applied to reconstruct 3D shape detail from low-resolution images. The approach is trained end-to-end employing a novel loss function which estimates the information lost between a low and high-resolution representation of the same 3D surface shape. Evaluation for single image reconstruction of clothed people demonstrates that our method achieves high-detail surface reconstruction from low-resolution images without auxiliary data. Extensive experiments show that the proposed approach can estimate super-resolution human geometries with a significantly higher level of detail than that obtained with previous approaches when applied to low-resolution images. https://marcopesavento.github.io/SuRS/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
3D people. https://3dpeople.com/en/. Accessed 6 Oct 2021
Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1175–1186 (2019)
Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3d people models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8387–8397 (2018)
Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2shape: detailed full human body geometry from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2293–2303 (2019)
Barill, G., Dickson, N., Schmidt, R., Levin, D.I., Jacobson, A.: Fast winding numbers for soups and clouds. ACM Trans. Graph. 37, 1–12 (2018)
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chen, L., Ye, J., Jiang, L., Ma, C., Cheng, Z., Zhang, X.: Synthesizing cloth wrinkles by CNN-based geometry image superresolution. Comput. Anim. Vir. Worlds 29(3–4), e1810 (2018)
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5939–5948 (2019)
Dinesh, C., Cheung, G., Bajić, I.V.: Super-resolution of 3D color point clouds via fast graph total variation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1983–1987. IEEE (2020)
Garland, M., Heckbert, P.S.: Simplifying surfaces with color and texture using quadric error metrics. In: Proceedings Visualization’98 (Cat. No. 98CB36276), pp. 263–269. IEEE (1998)
He, T., Collomosse, J., Jin, H., Soatto, S.: Geo-PIFU: geometry and pixel aligned implicit functions for single-view human reconstruction. arXiv preprint arXiv:2006.08072 (2020)
He, T., Xu, Y., Saito, S., Soatto, S., Tung, T.: Arch++: animation-ready clothed human reconstruction revisited. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11046–11056 (2021)
Hong, Y., Zhang, J., Jiang, B., Guo, Y., Liu, L., Bao, H.: StereoPiFu: depth aware clothed human digitization via stereo vision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 535–545 (2021)
Huang, Z., Xu, Y., Lassner, C., Li, H., Tung, T.: ARCH: animatable reconstruction of clothed humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3093–3102 (2020)
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (2010). https://doi.org/10.5244/C.24.12
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5253–5263 (2020)
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Li, Y., Tsiminaki, V., Timofte, R., Pollefeys, M., Gool, L.V.: 3D appearance super-resolution with deep learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9671–9680 (2019)
Li, Z., Oskarsson, M., Heyden, A.: Detailed 3d human body reconstruction from multi-view images combining voxel super-resolution and learned implicit representation. arXiv preprint arXiv:2012.06178 (2020)
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3d surface construction algorithm. ACM SIGGRAPH Comput. Graph. 21(4), 163–169 (1987)
Malleson, C., Collomosse, J., Hilton, A.: Real-time multi-person motion capture from multi-view video and imus. Int. J. Comput. Vision 128(6), 1594–1611 (2020)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Ni, M., Lei, J., Cong, R., Zheng, K., Peng, B., Fan, X.: Color-guided depth map super resolution using convolutional neural network. IEEE Access 5, 26666–26672 (2017)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)
Pesavento, M., Volino, M., Hilton, A.: Attention-based multi-reference learning for image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14697–14706 (2021)
Pesavento, M., Volino, M., Hilton, A.: Super-resolution appearance transfer for 4D human performances. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1791–1801 (2021)
Richard, A., Cherabier, I., Oswald, M.R., Tsiminaki, V., Pollefeys, M., Schindler, K.: Learned multi-view texture super-resolution. In: 2019 International Conference on 3D Vision (3DV), pp. 533–543. IEEE (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Rossi, M., Frossard, P.: Geometry-consistent light field super-resolution via graph-based regularization. IEEE Trans. Image Process. 27(9), 4207–4218 (2018)
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFU: Pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2304–2314 (2019)
Saito, S., Simon, T., Saragih, J., Joo, H.: PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 84–93 (2020)
Sang, L., Haefner, B., Cremers, D.: Inferring super-resolution depth from a moving light-source enhanced RGB-D sensor: a variational approach. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1–10 (2020)
Sclaroff, S., Pentland, A.: Generalized implicit functions for computer graphics. ACM SIGGRAPH Comput. Graph. 25(4), 247–250 (1991)
Sinha, A., Unmesh, A., Huang, Q., Ramani, K.: SurfNet: generating 3D shape surfaces using deep residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6040–6049 (2017)
Song, X., et al.: Channel attention based iterative residual learning for depth map super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5631–5640 (2020)
Varol, G., et al.: BodyNet: volumetric inference of 3D human body shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 20–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_2
Voynov, O., et al.: Perceptual deep depth super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5653–5663 (2019)
Wang, Z., Chen, J., Hoi, S.C.: Deep learning for image super-resolution: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3365–3387 (2020)
Wu, H., Zhang, J., Huang, K.: Point cloud super resolution with adversarial residual graph networks. arXiv preprint arXiv:1908.02111 (2019)
Xu, X., Chen, H., Moreno-Noguer, F., Jeni, L.A., De la Torre, F.: 3D human pose, shape and texture from low-resolution images and videos. IEEE Trans. Pattern Anal. Mach. Intell. (99), 1–1 (2021)
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4d: real-time human volumetric capture from very sparse consumer RGBD sensors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR2021), June 2021
Zhang, S., Liu, J., Liu, Y., Ling, N.: DimNet: dense implicit function network for 3d human body reconstruction. Comput. Graph. 98, 1–10 (2021)
Zhang, S., Chang, S., Lin, Y.: End-to-end light field spatial super-resolution network using multiple epipolar geometry. IEEE Trans. Image Process. 30, 5956–5968 (2021)
Zheng, Z., Yu, T., Liu, Y., Dai, Q.: PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7739–7749 (2019)
Zins, P., Xu, Y., Boyer, E., Wuhrer, S., Tung, T.: Data-driven 3D reconstruction of dressed humans from sparse views. In: 2021 International Conference on 3D Vision (3DV), pp. 494–504. IEEE (2021)
Acknowledgement
This research was supported by UKRI EPSRC Platform Grant EP/P022529/1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pesavento, M., Volino, M., Hilton, A. (2022). Super-Resolution 3D Human Shape from a Single Low-Resolution Image. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13662. Springer, Cham. https://doi.org/10.1007/978-3-031-20086-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-20086-1_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20085-4
Online ISBN: 978-3-031-20086-1
eBook Packages: Computer ScienceComputer Science (R0)