research-article

Open Access

Compacting MocapNET-based 3D Human Pose Estimation via Dimensionality Reduction

Authors:
Ammar Qammaz

Institute of Computer Science, Foundation for Research and Technology - Hellas, Greece and Computer Science Department, University of Crete, Greece

Institute of Computer Science, Foundation for Research and Technology - Hellas, Greece and Computer Science Department, University of Crete, Greece

0000-0002-1292-5866
View Profile

,
Antonis Argyros

Institute of Computer Science, Foundation for Research and Technology - Hellas, Greece and Computer Science Department, University of Crete, Greece

Institute of Computer Science, Foundation for Research and Technology - Hellas, Greece and Computer Science Department, University of Crete, Greece

0000-0001-8230-3192
View Profile

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive EnvironmentsJuly 2023Pages 306–312https://doi.org/10.1145/3594806.3594841

Published:10 August 2023Publication History

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments

Pages 306–312

ABSTRACT

Abstract. MocapNETs are state of the art Neural Network (NN) ensembles that estimate 3D human pose based on visual input in the form of an RGB image. They do so by deriving a 3D Bio Vision Hierarchy (BVH) skeleton from estimated 2D human body joint projections. BVH output makes MocapNETs directly compatible with a large variety of 3D graphics engines, where virtual avatars can be directly animated from RGB sources and off-the-shelf webcam input. MocapNETs have satisfactory accuracy and state of the art computational performance that, however, prior to this work was not sufficient for their deployment on embedded devices. In this paper we explore dimensionality reduction via the use of Principal Components Analysis (PCA) as a means to optimize their size and make them applicable to mobile and edge devices. PCA allows (a) reduction of input dimensionality, (b) fine-grained control over the variance covered by the maintained dimensions and, (c) drastic reduction of the total number of model/network parameters without compromising regression accuracy. Extensive experiments on the CMU BVH dataset provide insight on the effective receptive fields for densely connected networks. Moreover, PCA-based dimensionality reduction results in a 35% smaller NN compared to the baseline (original NN without any dimension reduction) and derives BVH skeletons without accuracy degradation. As such, the proposed compact NN solution becomes deployable on the Raspberry Pi 4 ARM CPU @ 23Hz.

References

Hervé et al. Abdi. 2010. Principal component analysis. Wiley interdisciplinary reviews: computational statistics 2, 4 (2010), 433–459.Google Scholar
Caglar Aytekin. 2022. Neural Networks are Decision Trees. https://doi.org/10.48550/ARXIV.2210.05189Google ScholarCross Ref
David Barber. 2012. Bayesian reasoning and machine learning. Algorithm 21.1.Cambridge University Press.Google Scholar
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165Google ScholarCross Ref
Zhe et al. Cao. 2017. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In CVPR.Google Scholar
Sai Kumar et al. Dwivedi. 2021. Learning To Regress Bodies From Images Using Differentiable Semantic Rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 11250–11259.Google Scholar
Jonathan Frankle, David J Schwab, and Ari S Morcos. 2020. The early phase of neural network training. arXiv preprint arXiv:2002.10365 (2020).Google Scholar
Jonathan et al. Frankle. 2020. Pruning neural networks at initialization: Why are we missing the mark?arXiv preprint arXiv:2009.08576 (2020).Google Scholar
Google. 2022. Tensorflow Model Pruning comprehensive guide. https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide.Google Scholar
John C Gower. 1975. Generalized procrustes analysis. Psychometrika 40, 1 (1975), 33–51.Google ScholarCross Ref
Rıza Alp et al. Güler. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7297–7306.Google Scholar
B. Hahne. 2010. The Daz-friendly BVH release of CMU motion capture database. https://sites.google.com/a/cgspeed.com/cgspeed/motion-capture/daz-friendly-release.Google Scholar
Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217–288.Google Scholar
Aapo Hyvarinen. 1999. Fast and robust fixed-point algorithms for independent component analysis. IEEE transactions on Neural Networks 10, 3 (1999), 626–634.Google ScholarDigital Library
Ian T Jolliffe. 2002. Principal component analysis for special types of data. Springer.Google Scholar
Sven et al. Kreiss. 2019. PifPaf: Composite Fields for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Yann et al. LeCun. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google ScholarCross Ref
Daniel D Lee and H Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788–791.Google Scholar
Subhash Lele and Joan T Richtsmeier. 1991. Euclidean distance matrix analysis: A coordinate-free approach for comparing biological shapes using landmark data. American journal of physical anthropology 86, 3 (1991), 415–427.Google Scholar
Kevin et al. Lin. 2021. End-to-end human pose and mesh reconstruction with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1954–1963.Google Scholar
Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018).Google Scholar
Matthew et al. Loper. 2015. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 6 (2015), 1–16.Google Scholar
Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. 2009. Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning. 689–696.Google ScholarDigital Library
Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert. 2011. A randomized algorithm for the decomposition of matrices. Applied and Computational Harmonic Analysis 30, 1 (2011), 47–68.Google ScholarCross Ref
Maddock Meredith, Steve Maddock, 2001. Motion capture file formats explained. Department of Computer Science, University of Sheffield 211 (2001), 241–244.Google Scholar
Ammar et al. Qammaz. 2019. MocapNET: Ensemble of SNN Encoders for 3D Human Pose Estimation in RGB Images. In British Machine Vision Conference (BMVC 2019). BMVA, Cardiff, UK.Google Scholar
Ammar et al. Qammaz. 2021. Occlusion-tolerant and personalized 3D human pose estimation in RGB images. In IEEE International Conference on Pattern Recognition (ICPR 2020), (to appear).Google Scholar
Ammar et al. Qammaz. 2021. Towards Holistic Real-time Human 3D Pose Estimation using MocapNETs. In BMVC 2021. BMVA.Google Scholar
Atefeh Shahroudnejad. 2021. A survey on understanding, visualizations, and explanation of deep neural networks. arXiv preprint arXiv:2102.01792 (2021).Google Scholar
Paul Tassi. 2022. Mark Zuckerbergs metaverse legs demo was staged with motion capture. Forbes. https://www.forbes.com/sites/paultassi/2022/10/14/mark-zuckerbergs-metaverse-legs-demo-was-staged-with-motion-capture/.Google Scholar
Ilya O Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, 2021. Mlp-mixer: An all-mlp architecture for vision. Advances in Neural Information Processing Systems 34 (2021), 24261–24272.Google Scholar
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of machine learning research 9, 11 (2008).Google Scholar
Ashish et al. Vaswani. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Bastian et al. Wandt. 2021. ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses. https://doi.org/10.48550/ARXIV.2112.07088Google ScholarCross Ref
Donglai et al. Xiang. 2019. Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10965–10974.Google Scholar
Ailing et al. Zeng. 2020. Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach. In ECCV. Springer, 507–523.Google Scholar
Ce et al. Zheng. 2021. 3d human pose estimation with spatial and temporal transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11656–11665.Google Scholar

Index Terms

Compacting MocapNET-based 3D Human Pose Estimation via Dimensionality Reduction
1. Computing methodologies
2. Human-centered computing

Index terms have been assigned to the content through auto-classification.

Recommendations

Dimensionality reduction-based spoken emotion recognition

To improve effectively the performance on spoken emotion recognition, it is needed to perform nonlinear dimensionality reduction for speech data lying on a nonlinear manifold embedded in a high-dimensional acoustic space. In this paper, a new supervised ...
Read More
Random projection in dimensionality reduction: applications to image and text data
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using ...
Read More
Supervised Dimensionality Reduction via Nonlinear Target Estimation
DaWaK 2013: Proceedings of the 15th International Conference on Data Warehousing and Knowledge Discovery - Volume 8057

Dimensionality reduction is a crucial ingredient of machine learning and data mining, boosting classification accuracy through the isolation of patterns via omission of noise. Nevertheless, recent studies have shown that dimensionality reduction can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments
July 2023
797 pages
ISBN:9798400700699
DOI:10.1145/3594806

Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2023
Check for updates
Author Tags
3D Human Pose Estimation
Dimensionality Reduction
Mobile Devices
MocapNET
Neural Networks
VR
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 187
  Total Downloads
- Downloads (Last 12 months)187
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Compacting MocapNET-based 3D Human Pose Estimation via Dimensionality Reduction

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments

ABSTRACT

References

Cited By

Index Terms

Recommendations

Dimensionality reduction-based spoken emotion recognition

Random projection in dimensionality reduction: applications to image and text data

Supervised Dimensionality Reduction via Nonlinear Target Estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Compacting MocapNET-based 3D Human Pose Estimation via Dimensionality Reduction

PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments

ABSTRACT

References

Cited By

Index Terms

Recommendations

Dimensionality reduction-based spoken emotion recognition

Random projection in dimensionality reduction: applications to image and text data

Supervised Dimensionality Reduction via Nonlinear Target Estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media