3D Hand Pose Estimation on Conventional Capacitive Touchscreens

Today's touchscreen mobile computers are marvels of modern engineering, yet touch input remains fairly simplistic, generally only able to report the X/Y position of one or more finger contacts. While this has usefully lowered the barrier to entry for novice users – making interactions "natural" and "intuitive" – it has simultaneously contracted the ceiling of interactive possibilities. For this reason, many complex applications are cumbersome on touch-only devices.

In response, researchers have long looked at ways to move beyond multitouch, and towards more sophisticated and holistic treatment of the hands for user input when operating on a flat-screen. Indeed, as far back as 1976, researchers were exploring touchscreens that could capture not only finger X/Y position, but also X/Y shear force, downwards pressure, and twisting torque for the purposes of a more "natural ... rich channel ... for man-machine interaction". Since then, a host of other sensing techniques and finger input dimensions have been explored, including the angle of attack, touch-type, digit differentiation, and hand pose estimation . These projects look beyond multi-touch and towards a "rich-touch" future.

In this research, we set out to see if conventional capacitive touchscreens could be used to estimate a full 3D hand pose, without any new or external sensors (which is where most research is focused, especially hand-sensing wearables). A live 3D hand model would not only provide the location of finger touches, but also aforementioned input dimensions such as the angle of attack, touch type, and digit differentiation – all in one unified method. In some respects, it is the "holy grail" of touchscreen input, offering a true "digital twin" of the user’s hand. It is this large research vision to which we contribute a new method and proof-of-concept implementation.

While the utility for a live 3D hand model in traditional touchscreen interfaces (2D menus, buttons, sliders, etc.) is currently limited (i.e., existing touch pipelines are sufficient for traditional widgets), we see the growing importance of 3D modalities as an impetus for renewed attention in this field. Mobile touchscreen devices increasingly become gateways for three-dimensional content – whether it be games, CAD, GIS, passthrough AR and many other types of software incorporating 3D elements and manipulation. For example, rather than users simply clicking on 3D objects in smartphone-mediated passthrough AR experiences (analogous to tapping on aquarium glass), we envision virtual hands projecting out into 3D scenes, able to manipulate and interact with objects. Similarly, instead of merely rotating and translating virtual on-screen 2D tools (as seen in e.g., TouchTools), they can be 3D and more richly grasped and manipulated. To illustrate some of our ideas, we created a series of small demos, which we describe at the end of the paper. We now move to key related work, followed by a discussion of our implementation and evaluation.

Research Team: Frederick Choi, Sven Mayer, and Chris Harrison


Frederick Choi, Sven Mayer, and Chris Harrison. 2021. 3D Hand Pose Estimation on Conventional Capacitive Touchscreens. Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction. Association for Computing Machinery, New York, NY, USA, Article 3, 1–13. DOI:https://doi.org/10.1145/3447526.3472045