Pose-on-the-Go: Approximating User Pose with Smartphone Sensor Fusion

Full-body motion capture is most closely associated with computer-generated imagery in blockbuster films, using expensive multi-camera rigs and special suits with markers. However, as technologies have improved, consumer-oriented uses have become possible. For instance, there are now several companies offering small sensors, worn on the body, that digitize the wearer’s pose for use in more immersive VR experiences. Of course, the bar for consumer acceptance is high, and this highly-instrumented approach seems unlikely to go mainstream in the near future. A decade ago, Microsoft took a different approach with its Xbox Kinect sensor, a $150 accessory depth camera that could capture users’ pose without any worn instrumentation. A variety of interactive, pose-enabled games proliferated, crossing genres including sports, dance, and role-playing games. Regardless of whether the sensors are worn or external, the necessity for extra devices, plus the added cost of that hardware, dampens the likelihood of mass adoption. More importantly, the latter two approaches preclude many interesting uses of body digitization when people are mobile and outside of controlled settings.

In response, we set out to develop a full-body pose estimation system that could run entirely self-contained on a smartphone held normally in one’s hand. Our system can work on the go, offering new avenues of interactivity anywhere and without prior setup. For this reason, we call our system Pose-on-the-Go. Achieving this vision required leveraging almost every sensor at our disposal in modern smartphones, including the front and rear cameras, user-facing depth camera, capacitive touchscreen, and IMU. We fuse data from these disparate sensors to rig a real-time, animated skeleton of the user as they operate their phone. As far as we are aware, Pose-on-the-Go is the first system to demonstrate full-body pose estimation using an unmodified smartphone held in the hand. This affords our approach wide applicability and superior practicality over other methods, which almost all require special instrumentation. An additional contribution is our rigorous study, benchmarking against a true gold standard - a professional-grade Vicon optical tracking system.

We believe exposing live user pose (even a coarse approximation as we demonstrate) as an API on mobile devices could enable some very creative and novel interactive experiences. For example, one of our example applications is a 3/4 perspective space shooter where the user’s virtual on-screen character matches their live body pose, offering a unique level of embodiment not previously seen in smartphone gaming. Indeed, a significant benefit of being software-only is that many recent smartphone models could be enabled via an over-the-air update, and our software could run as a background service on top of which developers could build pose-enabled apps.

Research Team: Karan Ahuja, Nate Smith, Susan Teamster


Karan Ahuja, Sven Mayer, Mayank Goel, and Chris Harrison. 2021. Pose-on-the-Go: Approximating User Pose with Smartphone Sensor Fusion and Inverse Kinematics. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 9, 1–12. DOI:https://doi.org/10.1145/3411764.3445582