ABSTRACT
We present Pose-on-the-Go, a full-body pose estimation system that uses sensors already found in today’s smartphones. This stands in contrast to prior systems, which require worn or external sensors. We achieve this result via extensive sensor fusion, leveraging a phone’s front and rear cameras, the user-facing depth camera, touchscreen, and IMU. Even still, we are missing data about a user’s body (e.g., angle of the elbow joint), and so we use inverse kinematics to estimate and animate probable body poses. We provide a detailed evaluation of our system, benchmarking it against a professional-grade Vicon tracking system. We conclude with a series of demonstration applications that underscore the unique potential of our approach, which could be enabled on many modern smartphones with a simple software update.
Supplemental Material
Available for Download
- Karan Ahuja, Mayank Goel, and Chris Harrison. 2020. BodySLAM: Opportunistic User Digitization in Multi-User AR/VR Experiences. In Symposium on Spatial User Interaction (Virtual Event, Canada) (SUI ’20). Association for Computing Machinery, New York, NY, USA, Article 16, 8 pages. https://doi.org/10.1145/3385959.3418452Google ScholarDigital Library
- Karan Ahuja, Chris Harrison, Mayank Goel, and Robert Xiao. 2019. MeCap: Whole-Body Digitization for Low-Cost VR/AR Headsets. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (New Orleans, LA, USA) (UIST ’19). Association for Computing Machinery, New York, NY, USA, 453–462. https://doi.org/10.1145/3332165.3347889Google ScholarDigital Library
- Karan Ahuja, Andy Kong, Mayank Goel, and Chris Harrison. 2020. Direction-of-Voice (DoV) Estimation for Intuitive Speech Interaction with Smart Devices Ecosystems(UIST ’20). Association for Computing Machinery, New York, NY, USA, 1121–1131. https://doi.org/10.1145/3379337.3415588Google ScholarDigital Library
- Rıza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR ’18). IEEE, 7297–7306. https://doi.org/10.1109/CVPR.2018.00762Google ScholarCross Ref
- ALT LLC. 2020. Antilatency. Retrieved 2020 from https://antilatency.com/Google Scholar
- Raphael Anderegg, Loïc Ciccone, and Robert W. Sumner. 2018. PuppetPhone: Puppeteering Virtual Characters Using a Smartphone. In Proceedings of the 11th Annual International Conference on Motion, Interaction, and Games (Limassol, Cyprus) (MIG ’18). Association for Computing Machinery, New York, NY, USA, Article 5, 6 pages. https://doi.org/10.1145/3274247.3274511Google ScholarDigital Library
- Apple Inc.2020. Apple Developer - ARFaceAnchor. Retrieved 2020 from https://developer.apple.com/documentation/arkit/arfaceanchorGoogle Scholar
- Apple Inc.2020. Apple Developer - CoreMotion Activity. Retrieved 2020 from https://developer.apple.com/documentation/coremotion/cmmotionactivityGoogle Scholar
- Apple Inc.2020. Apple Developer - CoreMotion Pedometer. Retrieved 2020 from https://developer.apple.com/documentation/coremotion/cmpedometerdataGoogle Scholar
- Apple Inc.2020. Support - Animoji. Retrieved 2020 from https://support.apple.com/en-au/HT208190Google Scholar
- Teo Babic, Florian Perteneder, Harald Reiterer, and Michael Haller. 2020. Simo: Interactions with Distant Displays by Smartphones with Simultaneous Face and World Tracking. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3334480.3382962Google ScholarDigital Library
- Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. Openface 2.0: Facial behavior analysis toolkit. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition(FG ’18). IEEE, 59–66. https://doi.org/10.1109/FG.2018.00019Google ScholarDigital Library
- Ling Bao and Stephen S Intille. 2004. Activity recognition from user-annotated acceleration data. In International conference on pervasive computing. Springer, 1–17.Google ScholarCross Ref
- Steve Benford, John Bowers, Lennart E. Fahlén, Chris Greenhalgh, and Dave Snowdon. 1995. User Embodiment in Collaborative Virtual Environments. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’95). ACM Press/Addison-Wesley Publishing Co., USA, 242–249. https://doi.org/10.1145/223904.223935Google ScholarDigital Library
- Barry Brown and Marek Bell. 2004. CSCW at Play: ’there’ as a Collaborative Virtual Environment. In Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work (Chicago, Illinois, USA) (CSCW ’04). Association for Computing Machinery, New York, NY, USA, 350–359. https://doi.org/10.1145/1031607.1031666Google ScholarDigital Library
- Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR ’17). IEEE, 7291–7299. https://doi.org/10.1109/CVPR.2017.143Google ScholarCross Ref
- Ke-Yu Chen, Shwetak N. Patel, and Sean Keller. 2016. Finexus: Tracking Precise Motions of Multiple Fingertips Using Magnetic Sensing. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 1504–1514. https://doi.org/10.1145/2858036.2858125Google ScholarDigital Library
- Weiya Chen, Chenchen Yu, Chenyu Tu, Zehua Lyu, Jing Tang, Shiqi Ou, Yan Fu, and Zhidong Xue. 2020. A Survey on Hand Pose Estimation with Wearable Sensors and Computer-Vision-Based Methods. Sensors 20, 4 (2020), 1074. https://doi.org/10.3390/s20041074Google ScholarCross Ref
- Xiang ’Anthony’ Chen, Julia Schwarz, Chris Harrison, Jennifer Mankoff, and Scott Hudson. 2014. Around-Body Interaction: Sensing & Interaction Techniques for Proprioception-Enhanced Input with Mobile Devices. In Proceedings of the 16th International Conference on Human-Computer Interaction with Mobile Devices & Services (Toronto, ON, Canada) (MobileHCI ’14). Association for Computing Machinery, New York, NY, USA, 287–290. https://doi.org/10.1145/2628363.2628402Google ScholarDigital Library
- Amit Das, Ivan Tashev, and Shoaib Mohammed. 2017. Ultrasound based gesture recognition. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP ’17). IEEE, 406–410. https://doi.org/10.1109/ICASSP.2017.7952187Google ScholarDigital Library
- Muybridge Eadweard. 1878. The Horse in Motion.Google Scholar
- Facebook Technologies LLC. 2020. Oculus Quest. Retrieved 2020 from https://www.oculus.com/questGoogle Scholar
- Bo Fan, Lei Xie, Shan Yang, Lijuan Wang, and Frank K Soong. 2016. A deep bidirectional LSTM approach for video-realistic talking head. Multimedia Tools and Applications 75, 9 (2016), 5287–5309.Google ScholarDigital Library
- Eric Foxlin and Michael Harrington. 2000. WearTrack: a self-referenced head and hand tracker for wearable computers and portable VR. In Digest of Papers. Fourth International Symposium on Wearable Computers(ISWC ’00). IEEE, 155–162. https://doi.org/10.1109/ISWC.2000.888482Google ScholarCross Ref
- Sehoon Ha, Yunfei Bai, and C. Karen Liu. 2011. Human Motion Reconstruction from Force Sensors. In Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (Vancouver, British Columbia, Canada) (SCA ’11). Association for Computing Machinery, New York, NY, USA, 129–138. https://doi.org/10.1145/2019406.2019424Google ScholarDigital Library
- Edward Twitchell Hall. 1962. Proxemics: The study of man’s spatial relations.Google Scholar
- Chris Harrison, Hrvoje Benko, and Andrew D. Wilson. 2011. OmniTouch: Wearable Multitouch Interaction Everywhere. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 441–450. https://doi.org/10.1145/2047196.2047255Google ScholarDigital Library
- Chris Harrison, Julia Schwarz, and Scott E. Hudson. 2011. TapSense: Enhancing Finger Interaction on Touch Surfaces. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 627–636. https://doi.org/10.1145/2047196.2047279Google ScholarDigital Library
- Gregor Hofer, Junichi Yamagishi, and Hiroshi Shimodaira. 2008. Speech-driven lip motion generation with a trajectory HMM. (2008).Google Scholar
- Notch Interfaces Inc.2020. Notch Interfaces. Retrieved 2020 from https://wearnotch.com/Google Scholar
- Intel Corporation. 2020. RealSense. Retrieved 2020 from https://www.intelrealsense.com/Google Scholar
- Stephen S. Intille, Ling Bao, Emmanuel Munguia Tapia, and John Rondoni. 2004. Acquiring in Situ Training Data for Context-Aware Ubiquitous Computing Applications. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vienna, Austria) (CHI ’04). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/985692.985693Google ScholarDigital Library
- Haojian Jin, Zhijian Yang, Swarun Kumar, and Jason I. Hong. 2018. Towards Wearable Everyday Body-Frame Tracking Using Passive RFIDs. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4, Article 145 (Jan. 2018), 23 pages. https://doi.org/10.1145/3161199Google ScholarDigital Library
- Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR ’14). IEEE Computer Society, USA, 1867–1874. https://doi.org/10.1109/CVPR.2014.241Google ScholarDigital Library
- David Kim, Otmar Hilliges, Shahram Izadi, Alex D. Butler, Jiawen Chen, Iason Oikonomidis, and Patrick Olivier. 2012. Digits: Freehand 3D Interactions Anywhere Using a Wrist-Worn Gloveless Sensor. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology(Cambridge, Massachusetts, USA) (UIST ’12). Association for Computing Machinery, New York, NY, USA, 167–176. https://doi.org/10.1145/2380116.2380139Google ScholarDigital Library
- Daehwa Kim, Keunwoo Park, and Geehyuk Lee. 2020. OddEyeCam: A Sensing Technique for Body-Centric Peephole Interaction Using WFoV RGB and NFoV Depth Cameras. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 85–97. https://doi.org/10.1145/3379337.3415889Google ScholarDigital Library
- Huy Viet Le, Sven Mayer, and Niels Henze. 2019. Investigating the Feasibility of Finger Identification on Capacitive Touchscreens Using Deep Learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces (Marina del Ray, California) (IUI ’19). Association for Computing Machinery, New York, NY, USA, 637–649. https://doi.org/10.1145/3301275.3302295Google ScholarDigital Library
- Mingyang Li and Anastasios I Mourikis. 2013. 3-D motion estimation and online temporal calibration for camera-IMU systems. In 2013 IEEE International Conference on Robotics and Automation(ICRA ’13). IEEE, IEEE, 5709–5716. https://doi.org/10.1109/ICRA.2013.6631398Google ScholarCross Ref
- Sven Mayer, Huy Viet Le, and Niels Henze. 2017. Estimating the Finger Orientation on Capacitive Touchscreens Using Convolutional Neural Networks. In Proceedings of the 2017 ACM International Conference on Interactive Surfaces and Spaces (Brighton, United Kingdom) (ISS ’17). Association for Computing Machinery, New York, NY, USA, 220–229. https://doi.org/10.1145/3132272.3134130Google ScholarDigital Library
- Meta Motion. 2018. Gypsy Motion Capture System. Retrieved 2021 from http://metamotion.com/gypsy/gypsy-motion-capture-system.htmGoogle Scholar
- Microsoft Corporation. 2010. Microsoft Kinect. Retrieved 2021 from https://en.wikipedia.org/wiki/KinectGoogle Scholar
- Microsoft Corporation. 2010. Microsoft Kinect Games. Retrieved 2021 from https://en.wikipedia.org/wiki/Category:Kinect_gamesGoogle Scholar
- Microsoft Corporation. 2019. HoloLens. Retrieved 2021 from https://www.microsoft.com/en-us/hololensGoogle Scholar
- Nathan Miller, Odest Chadwicke Jenkins, Marcelo Kallmann, and Maja J Mataric. 2004. Motion capture from inertial sensing for untethered humanoid teleoperation. In 4th IEEE/RAS International Conference on Humanoid Robots(ICHR ’04, Vol. 2). IEEE, 547–565. https://doi.org/10.1109/ICHR.2004.1442670Google ScholarCross Ref
- NaturalPoint Inc.2020. OptiTrack. Retrieved 2020 from http://optitrack.comGoogle Scholar
- Seungtak Noh, Hui-Shyong Yeo, and Woontack Woo. 2015. An HMD-based Mixed Reality System for Avatar-Mediated Remote Collaboration with Bare-hand Interaction. In International Conference on Artificial Reality and Telexistence and Eurographics Symposium on Virtual Environments(ICAT-EGVE ’15). The Eurographics Association, 61–68. https://doi.org/10.2312/egve.20151311Google ScholarCross Ref
- Northern Digital Inc. 2020. trakSTAR. Retrieved 2020 from https://www.ndigital.com/msci/products/drivebay-trakstar/Google Scholar
- OpenNI. 2020. OpenNI. Retrieved 2020 from https://structure.io/openniGoogle Scholar
- George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, and Kevin Murphy. 2018. Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In Proceedings of the European Conference on Computer Vision(ECCV ’18). 269–286. https://doi.org/10.1007/978-3-030-01264-9_17Google ScholarDigital Library
- Mathias Parger, Joerg H. Mueller, Dieter Schmalstieg, and Markus Steinberger. 2018. Human Upper-Body Inverse Kinematics for Increased Embodiment in Consumer-Grade Virtual Reality(VRST ’18). Association for Computing Machinery, New York, NY, USA, Article 23, 10 pages. https://doi.org/10.1145/3281505.3281529Google ScholarDigital Library
- PhaseSpace Inc.2020. PhaseSpace. Retrieved 2020 from https://phasespace.com/Google Scholar
- Thammathip Piumsomboon, Gun A. Lee, Jonathon D. Hart, Barrett Ens, Robert W. Lindeman, Bruce H. Thomas, and Mark Billinghurst. 2018. Mini-Me: An Adaptive Avatar for Mixed Reality Remote Collaboration. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3173620Google ScholarDigital Library
- Polhemus. 2020. Polhemus. Retrieved 2020 from https://polhemus.com/case-study/detail/polhemus-motion-capture-system-is-used-to-measure-real-time-motion-analysisGoogle Scholar
- Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. 2016. EgoCap: Egocentric Marker-Less Motion Capture with Two Fisheye Cameras. ACM Trans. Graph. 35, 6, Article 162 (Nov. 2016), 11 pages. https://doi.org/10.1145/2980179.2980235Google ScholarDigital Library
- Thiago Braga Rodrigues, Ciarán Ó Catháin, Declan Devine, Kieran Moran, Noel E O’Connor, and Niall Murray. 2019. An Evaluation of a 3D Multimodal Marker-Less Motion Analysis System. In Proceedings of the 10th ACM Multimedia Systems Conference (Amherst, Massachusetts) (MMSys ’19). Association for Computing Machinery, New York, NY, USA, 213–221. https://doi.org/10.1145/3304109.3306236Google ScholarDigital Library
- Grégory Rogez, Maryam Khademi, JS Supančič III, Jose Maria Martinez Montiel, and Deva Ramanan. 2014. 3D hand pose detection in egocentric RGB-D images. In European Conference on Computer Vision. Springer, 356–371. https://doi.org/10.1007/978-3-319-16178-5_25Google ScholarCross Ref
- Root Motion. 2020. FINAL IK - VRIK Solver Locomotion. Retrieved 2020 from http://www.root-motion.com/finalikdox/html/page16.htmlGoogle Scholar
- Root Motion. 2020. Root Motion. Retrieved 2020 from http://root-motion.com/Google Scholar
- Sheng Shen, He Wang, and Romit Roy Choudhury. 2016. I Am a Smartwatch and I Can Track My User’s Arm. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services (Singapore, Singapore) (MobiSys ’16). Association for Computing Machinery, New York, NY, USA, 85–96. https://doi.org/10.1145/2906388.2906407Google ScholarDigital Library
- Takaaki Shiratori, Hyun Soo Park, Leonid Sigal, Yaser Sheikh, and Jessica K. Hodgins. 2011. Motion Capture from Body-Mounted Cameras. In ACM SIGGRAPH 2011 Papers. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1964921.1964926Google ScholarDigital Library
- Snap Inc.2020. Snapchat Lenses. Retrieved 2020 from https://lensstudio.snapchat.com/lenses/Google Scholar
- Ivan E. Sutherland. 1968. A Head-Mounted Three Dimensional Display. In Proceedings of the December 9-11, 1968, Fall Joint Computer Conference, Part I (San Francisco, California) (AFIPS ’68 (Fall, part I)). Association for Computing Machinery, New York, NY, USA, 757–764. https://doi.org/10.1145/1476589.1476686Google ScholarDigital Library
- Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernd Eberhardt. 2011. Motion Reconstruction Using Sparse Accelerometer Data. ACM Trans. Graph. 30, 3, Article 18 (May 2011), 12 pages. https://doi.org/10.1145/1966394.1966397Google ScholarDigital Library
- Denis Tome, Patrick Peluse, Lourdes Agapito, and Hernan Badino. 2019. xr-egopose: Egocentric 3d human pose from an hmd camera. In Proceedings of the IEEE International Conference on Computer Vision(ICCV ’19). IEEE, 7728–7738. https://doi.org/10.1109/ICCV.2019.00782Google ScholarCross Ref
- Unity Technologies. 2020. Unity. Retrieved 2020 from https://unity.com/Google Scholar
- Verhaert. 2020. Verhaert. Retrieved 2020 from https://verhaert.com/Google Scholar
- Vicon Motion Systems Ltd. 2020. Vicon. Retrieved 2020 from https://vicon.com/Google Scholar
- Vive. 2020. HTC VIVE. Retrieved 2020 from https://www.vive.com/Google Scholar
- Daniel Vlasic, Rolf Adelsberger, Giovanni Vannucci, John Barnwell, Markus Gross, Wojciech Matusik, and Jovan Popović. 2007. Practical Motion Capture in Everyday Surroundings. ACM Trans. Graph. 26, 3 (July 2007), 35–es. https://doi.org/10.1145/1276377.1276421Google ScholarDigital Library
- Robert Xiao, Julia Schwarz, and Chris Harrison. 2015. Estimating 3D Finger Angle on Commodity Touchscreens. In Proceedings of the 2015 International Conference on Interactive Tabletops & Surfaces (Madeira, Portugal) (ITS ’15). Association for Computing Machinery, New York, NY, USA, 47–50. https://doi.org/10.1145/2817721.2817737Google ScholarDigital Library
- Xsens. 2020. Motion Capture. Retrieved 2020 from https://www.xsens.com/motion-captureGoogle Scholar
- Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Pascal Fua, Hans-Peter Seidel, and Christian Theobalt. 2019. Mo2Cap2: Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera. IEEE Transactions on Visualization and Computer Graphics 25, 5(2019), 2093–2101. https://doi.org/10.1109/TVCG.2019.2898650Google ScholarCross Ref
- Yasuyoshi Yokokohji, Yuki Kitaoka, and Tsuneo Yoshikawa. 2005. Motion capture from demonstrator’s viewpoint and its application to robot teaching. Journal of Robotic Systems 22, 2 (2005), 87–97. https://doi.org/10.1002/rob.20050Google ScholarCross Ref
- Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. 2018. Through-wall human pose estimation using radio signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR ’18). IEEE, 7356–7365. https://doi.org/10.1109/CVPR.2018.00768Google ScholarCross Ref
Index Terms
- Pose-on-the-Go: Approximating User Pose with Smartphone Sensor Fusion and Inverse Kinematics
Recommendations
Smart pose: mobile posture-aware system for lowering physical health risk of smartphone users
CHI EA '13: CHI '13 Extended Abstracts on Human Factors in Computing SystemsWith the widespread use of smartphones, users tend to use their smartphones for a long period of time with unhealthy postures, bending forward their upper body including the neck. If users keep such an unhealthy posture for a long time, their neck and ...
Evaluation of a smartphone-based assessment system in subjects with chronic ankle instability
The smartphone-based assessment system can be used to provide quantitative determination of differences in postural control performance.Smartphone can be used to discriminate the different postural control performance between healthier leg and injured ...
SmartPoser: Arm Pose Estimation with a Smartphone and Smartwatch Using UWB and IMU Data
UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and TechnologyThe ability to track a user’s arm pose could be valuable in a wide range of applications, including fitness, rehabilitation, augmented reality input, life logging, and context-aware assistants. Unfortunately, this capability is not readily available to ...
Comments