skip to main content
10.1145/3594806.3594841acmotherconferencesArticle/Chapter ViewAbstractPublication PagespetraConference Proceedingsconference-collections
research-article
Open Access

Compacting MocapNET-based 3D Human Pose Estimation via Dimensionality Reduction

Authors Info & Claims
Published:10 August 2023Publication History

ABSTRACT

Abstract. MocapNETs are state of the art Neural Network (NN) ensembles that estimate 3D human pose based on visual input in the form of an RGB image. They do so by deriving a 3D Bio Vision Hierarchy (BVH) skeleton from estimated 2D human body joint projections. BVH output makes MocapNETs directly compatible with a large variety of 3D graphics engines, where virtual avatars can be directly animated from RGB sources and off-the-shelf webcam input. MocapNETs have satisfactory accuracy and state of the art computational performance that, however, prior to this work was not sufficient for their deployment on embedded devices. In this paper we explore dimensionality reduction via the use of Principal Components Analysis (PCA) as a means to optimize their size and make them applicable to mobile and edge devices. PCA allows (a) reduction of input dimensionality, (b) fine-grained control over the variance covered by the maintained dimensions and, (c) drastic reduction of the total number of model/network parameters without compromising regression accuracy. Extensive experiments on the CMU BVH dataset provide insight on the effective receptive fields for densely connected networks. Moreover, PCA-based dimensionality reduction results in a 35% smaller NN compared to the baseline (original NN without any dimension reduction) and derives BVH skeletons without accuracy degradation. As such, the proposed compact NN solution becomes deployable on the Raspberry Pi 4 ARM CPU @ 23Hz.

References

  1. Hervé et al. Abdi. 2010. Principal component analysis. Wiley interdisciplinary reviews: computational statistics 2, 4 (2010), 433–459.Google ScholarGoogle Scholar
  2. Caglar Aytekin. 2022. Neural Networks are Decision Trees. https://doi.org/10.48550/ARXIV.2210.05189Google ScholarGoogle ScholarCross RefCross Ref
  3. David Barber. 2012. Bayesian reasoning and machine learning. Algorithm 21.1.Cambridge University Press.Google ScholarGoogle Scholar
  4. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165Google ScholarGoogle ScholarCross RefCross Ref
  5. Zhe et al. Cao. 2017. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In CVPR.Google ScholarGoogle Scholar
  6. Sai Kumar et al. Dwivedi. 2021. Learning To Regress Bodies From Images Using Differentiable Semantic Rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 11250–11259.Google ScholarGoogle Scholar
  7. Jonathan Frankle, David J Schwab, and Ari S Morcos. 2020. The early phase of neural network training. arXiv preprint arXiv:2002.10365 (2020).Google ScholarGoogle Scholar
  8. Jonathan et al. Frankle. 2020. Pruning neural networks at initialization: Why are we missing the mark?arXiv preprint arXiv:2009.08576 (2020).Google ScholarGoogle Scholar
  9. Google. 2022. Tensorflow Model Pruning comprehensive guide. https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide.Google ScholarGoogle Scholar
  10. John C Gower. 1975. Generalized procrustes analysis. Psychometrika 40, 1 (1975), 33–51.Google ScholarGoogle ScholarCross RefCross Ref
  11. Rıza Alp et al. Güler. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7297–7306.Google ScholarGoogle Scholar
  12. B. Hahne. 2010. The Daz-friendly BVH release of CMU motion capture database. https://sites.google.com/a/cgspeed.com/cgspeed/motion-capture/daz-friendly-release.Google ScholarGoogle Scholar
  13. Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217–288.Google ScholarGoogle Scholar
  14. Aapo Hyvarinen. 1999. Fast and robust fixed-point algorithms for independent component analysis. IEEE transactions on Neural Networks 10, 3 (1999), 626–634.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ian T Jolliffe. 2002. Principal component analysis for special types of data. Springer.Google ScholarGoogle Scholar
  16. Sven et al. Kreiss. 2019. PifPaf: Composite Fields for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  17. Yann et al. LeCun. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google ScholarGoogle ScholarCross RefCross Ref
  18. Daniel D Lee and H Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788–791.Google ScholarGoogle Scholar
  19. Subhash Lele and Joan T Richtsmeier. 1991. Euclidean distance matrix analysis: A coordinate-free approach for comparing biological shapes using landmark data. American journal of physical anthropology 86, 3 (1991), 415–427.Google ScholarGoogle Scholar
  20. Kevin et al. Lin. 2021. End-to-end human pose and mesh reconstruction with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1954–1963.Google ScholarGoogle Scholar
  21. Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018).Google ScholarGoogle Scholar
  22. Matthew et al. Loper. 2015. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 6 (2015), 1–16.Google ScholarGoogle Scholar
  23. Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. 2009. Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning. 689–696.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert. 2011. A randomized algorithm for the decomposition of matrices. Applied and Computational Harmonic Analysis 30, 1 (2011), 47–68.Google ScholarGoogle ScholarCross RefCross Ref
  25. Maddock Meredith, Steve Maddock, 2001. Motion capture file formats explained. Department of Computer Science, University of Sheffield 211 (2001), 241–244.Google ScholarGoogle Scholar
  26. Ammar et al. Qammaz. 2019. MocapNET: Ensemble of SNN Encoders for 3D Human Pose Estimation in RGB Images. In British Machine Vision Conference (BMVC 2019). BMVA, Cardiff, UK.Google ScholarGoogle Scholar
  27. Ammar et al. Qammaz. 2021. Occlusion-tolerant and personalized 3D human pose estimation in RGB images. In IEEE International Conference on Pattern Recognition (ICPR 2020), (to appear).Google ScholarGoogle Scholar
  28. Ammar et al. Qammaz. 2021. Towards Holistic Real-time Human 3D Pose Estimation using MocapNETs. In BMVC 2021. BMVA.Google ScholarGoogle Scholar
  29. Atefeh Shahroudnejad. 2021. A survey on understanding, visualizations, and explanation of deep neural networks. arXiv preprint arXiv:2102.01792 (2021).Google ScholarGoogle Scholar
  30. Paul Tassi. 2022. Mark Zuckerbergs metaverse legs demo was staged with motion capture. Forbes. https://www.forbes.com/sites/paultassi/2022/10/14/mark-zuckerbergs-metaverse-legs-demo-was-staged-with-motion-capture/.Google ScholarGoogle Scholar
  31. Ilya O Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, 2021. Mlp-mixer: An all-mlp architecture for vision. Advances in Neural Information Processing Systems 34 (2021), 24261–24272.Google ScholarGoogle Scholar
  32. Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of machine learning research 9, 11 (2008).Google ScholarGoogle Scholar
  33. Ashish et al. Vaswani. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  34. Bastian et al. Wandt. 2021. ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses. https://doi.org/10.48550/ARXIV.2112.07088Google ScholarGoogle ScholarCross RefCross Ref
  35. Donglai et al. Xiang. 2019. Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10965–10974.Google ScholarGoogle Scholar
  36. Ailing et al. Zeng. 2020. Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach. In ECCV. Springer, 507–523.Google ScholarGoogle Scholar
  37. Ce et al. Zheng. 2021. 3d human pose estimation with spatial and temporal transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11656–11665.Google ScholarGoogle Scholar

Index Terms

  1. Compacting MocapNET-based 3D Human Pose Estimation via Dimensionality Reduction
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments
            July 2023
            797 pages
            ISBN:9798400700699
            DOI:10.1145/3594806

            Copyright © 2023 Owner/Author

            This work is licensed under a Creative Commons Attribution International 4.0 License.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 10 August 2023

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)187
            • Downloads (Last 6 weeks)20

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format