(ACM UIST, ICMI and IMWUT papers to be public soon)
We present a new and practical method for capturing user body pose in virtual reality experiences: integrating cameras into handheld controllers, where batteries, computation and wireless communication already exist. By virtue of the hands operating in front of the user during many VR interactions, our controller-borne cameras can capture a superior view of the body for digitization. We developed a series of demo applications illustrating the potential of our approach and more leg-centric interactions, such as balancing games and kicking soccer balls. Published at CHI 2022.
Today’s consumer virtual reality systems offer limited haptic feedback via vibration motors in handheld controllers. Rendering haptics to other parts of the body is an open challenge, especially in a practical and consumer-friendly manner. The mouth is of particular interest, as it is a close second in tactile sensitivity to the fingertips. In this research, we developed a thin, compact, beamforming array of ultrasonic transducers, which can render haptic effects onto the mouth. Importantly, all components are integrated into the VR headset, meaning the user does not need to wear an additional accessory or place any external infrastructure in their room. Our haptic sensations can be felt on the lips, teeth, and tongue, which can be incorporated into new and interesting VR experiences. Published at CHI 2022.
Touchscreen tracking latency, often 50ms or more, creates a rubber-banding effect in everyday direct manipulation tasks such as dragging, scrolling, and drawing. In this research, we demonstrate how the addition of a thin, 2D micro-patterned surface with 5 micron spaced features can be used to reduce motor-visual touchscreen latency. When a finger, stylus, or tangible is translated across this textured surface frictional forces induce acoustic vibrations which naturally encode sliding velocity. This high-speed 1D acoustic signal is fused with conventional low-speed, but high-spatial-accuracy 2D touch position data to reduce touchscreen latency. Published at CHI 2022.
We describe how sheets of metalized mylar can be cut and then “inflated” into complex 3D forms with electrostatic charge for use in digitally-controlled, shape-changing displays. Our technique is compatible with industrial and hobbyist cutting processes, from die and laser cutting to handheld exacto-knives and scissors. Given that mylar film costs <$1 per square meter, we can create self-actuating 3D objects for just a few cents, opening new uses in low-cost consumer goods. Published at CHI 2022.
LRAir is a new scalable, non-contact haptic actuation technique based on a speaker in a ported enclosure which can deliver air pulses to the skin. The technique is low cost, low voltage, and uses existing electronics. We detail a prototype device's design and construction, and validate a multiple domain impedance model with current, voltage, and pressure measurements. A non-linear phenomenon at the port creates pulsed zero-net-mass-flux flows, so-called "synthetic jets". Our prototype is capable of 10 mN time averaged thrusts at an air velocity of 10.4 m/s (4.3W input power). A perception study reveals that tactile effects can be detected 25 mm away with only 380 mVrms applied voltage, and 19 mWrms input power. Published at Haptics Symposium 2022.
Today's smart cities use thousands of physical sensors distributed across the urban landscape to support decision making in areas such as infrastructure monitoring, public health, and resource management. These weather-hardened devices require power and connectivity, and often cost thousands just to install, let alone maintain. We show how long-range laser vibrometry can be used for low-cost, city-scale sensing. Although typically limited to just a few meters of sensing range, the use of retroreflective markers can boost this to 1km or more. Fortuitously, cities already make extensive use of retroreflective materials for street signs, construction barriers, and many other markings. Our system can co-opt these existing markers at very long ranges and use them as unpowered accelerometers. Published at CHI 2021.
Pose-on-the-Go is a full-body pose estimation system that uses sensors already found in today’s smartphones. This stands in contrast to prior systems, which require worn or external sensors. We achieve this result via extensive sensor fusion, leveraging a phone's front and rear cameras, the user-facing depth camera, touchscreen, and IMU. Even still, we are missing data about a user's body (e.g., angle of the elbow joint), and so we use inverse kinematics to estimate and animate probable body poses. Published at ACM CHI 2021.
Capacitive touchscreens are near-ubiquitous in today's touch-driven devices, such as smartphones and tablets. By using rows and columns of electrodes, specialized touch controllers are able to capture a 2D image of capacitance at the surface of a screen. For over a decade, capacitive "pixels" have been around 4mm in size – a surprisingly low resolution that precludes a wide range of interesting applications. In this research, we show how super-resolution techniques, long used in fields such as biology and astronomy, can be applied to capacitive touchscreen data. This opens the door to passive tangibles with higher-density fiducials and also recognition of every-day metal objects, such as keys and coins. Published at CHI 2021.
Millimeter wave (mmWave) Doppler radar is a new and promising sensing approach for human activity recognition, offering signal richness approaching that of microphones and cameras, but without many of the privacy-invading downsides. However, unlike audio and computer vision approaches that can draw from huge libraries of videos for training deep learning models, Doppler radar has no existing large datasets, holding back this otherwise promising sensing modality. In response, we set out to create a software pipeline that converts videos of human activities into realistic, synthetic Doppler radar data. Our approach is an important stepping stone towards reducing the burden of training human sensing systems, and could help bootstrap uses in human-computer interaction. Published at CHI 2021.
Classroom sensing is an important and active area of research with great potential to improve instruction. Complementing professional observers - the current best practice - automated pedagogical professional development systems can attend every class and capture fine-grained details of all occupants. Unfortunately, prior classroom gaze-sensing systems have limited accuracy and often require specialized external or worn sensors. In this research, we developed a new computer-vision-driven system that powers a 3D “digital twin” of the classroom and enables whole-class, 6DOF head gaze vector estimation without instrumenting any of the occupants. Published at CHI 2021.
Today’s consumer virtual reality systems offer immersive graphics and audio, but haptic feedback is rudimentary – delivered through controllers with vibration feedback or is non-existent (i.e., the hands operating freely in the air). In this paper, we explore an alternative, highly mobile and controller-free approach to haptics, where VR applications utilize the user’s own body to provide physical feedback. To achieve this, we warp (retarget) the locations of a user’s hands such that one hand serves as a physical surface or prop for the other hand. For example, a hand holding a virtual nail can serve as a physical backstop for a hand that is virtually hammering, providing a sense of impact in an air-borne and uninstrumented experience. Published at ACM UIST 2021.
As smartphone screens have grown in size, single-handed use has become more cumbersome. Interactive targets that are easily seen can be hard to reach, particularly notifications and upper menu bar items. Users must either adjust their grip to reach distant targets, or use their other hand. In this research, we show how gaze estimation using a phone’s user-facing camera can be paired with IMU-tracked motion gestures to enable a new, intuitive, and rapid interaction technique on handheld phones. We describe our proof-of-concept implementation and gesture set, built on state-of-the-art techniques and capable of self-contained execution on a smartphone. Published at ICMI 2021.
The ability to co-opt everyday surfaces for touch interactivity has been an area of HCI research for several decades. In the past, advances in depth sensors and computer vision led to step-function improvements in ad hoc touch tracking. However, progress has slowed in recent years. We surveyed the literature and found that the very best ad hoc touch sensing systems are able to operate at ranges up to around 1.5 m. This limited range means that sensors must be carefully positioned in an environment to enable specific surfaces for interaction. Furthermore, the size of the interactive area is more table-scale than room-scale. In this research, we set ourselves the goal of doubling the sensing range of the current state of the art system. Published at SUI 2021.
Contemporary touch-interface devices capture the X/Y position of finger tips on the screen, and pass these coordinates to applications as though the input were points in space. Of course, human hands are much more sophisticated, able to form rich 3D poses capable of far more complex interactions than poking at a screen. In this paper, we describe how conventional capacitive touchscreens can be used to estimate 3D hand pose, enabling rich interaction opportunities. Our approach requires no new sensors, and could be deployed to existing devices with a simple software update. After describing our software pipeline, we report findings from our user study, which shows our 3D joint tracking accuracy is competitive with even external sensing techniques. Published at MobileHCI 2021.
In this work, we take advantage of an emerging use case: co-located, multi-user AR/VR experiences. In such contexts, participants are often able to see each other’s bodies, hands, mouths, apparel, and other visual facets, even though they generally do not see their own bodies. Using the existing outwards-facing cameras on AR/VR headsets, these visual dimensions can be opportunistically captured and digitized, and then relayed back to their respective users in real time. Our system name was inspired by SLAM (simultaneous localization and mapping) approaches to mapping unknown environments. In a similar vein, BodySLAM uses disparate camera views from many participants to reconstruct the geometric arrangement of users in an environment, as well body pose and appearance. Published at SUI 2020.
In addition to receiving and processing spoken commands, we propose that computing devices also infer the Direction of Voice (DoV). Such DoV estimation innately enables voice commands with addressability, in a similar way to visual gaze, but without the need for cameras. This allows users to easily and naturally interact with diverse ecosystems of voice-enabled devices, whereas today’s voice interactions suffer from multi-device confusion. With DoV estimation providing a disambiguation mechanism, a user can speak to a particular device and have it respond; e.g., a user could ask their smartphone for the time, laptop to play music, smartspeaker for the weather, and TV to play a show. Published at UIST 2020.
Inertial Measurement Units (IMUs) with gyroscopic sensors are standard in today’s mobile devices. We show that these sensors can be co-opted for vibroacoustic data reception. Our approach, called VibroComm, requires direct physical contact to a transmitting (i.e., vibrating) surface. This makes interactions targeted and explicit in nature, making it well suited for contexts with many targets or requiring and intent. It also offers an orthogonal dimension of physical security to wireless technologies like Bluetooth and NFC. We achieve a transfer rate over 2000 bits/sec with less than 5% packet loss – an order of magnitude faster than prior IMU-based approaches at a quarter of the loss rate. Published at MobileHCI 2020.
Acoustic activity recognition has emerged as a foundational element for imbuing devices with context-driven capabilities, enabling richer, more assistive, and more accommodating computational experiences. Traditional approaches rely either on custom models trained in situ, or general models pre-trained on preexisting data, with each approach having accuracy and user burden implications. We present Listen Learner, a technique for activity recognition that gradually learns events specific to a deployed environment while minimizing user burden. More specifically, we built an end-to-end system for self-supervised learning of events labelled through one-shot voice interactions. Published at CHI 2020.
Today's virtual reality (VR) systems allow users to explore immersive new worlds and experiences through sight. Unfortunately, most VR systems lack haptic feedback, and even high-end consumer systems use only basic vibration motors. This clearly precludes realistic physical interactions with virtual objects. Larger obstacles, such as walls, railings, and furniture are not simulated at all. In response, we developed Wireality, a self-contained worn system that allows for individual joints on the hands to be accurately arrested in 3D space through the use of retractable wires that can be programmatically locked. This allows for convincing tangible interactions with complex geometries, such as wrapping fingers around a railing. Published at CHI 2020.
Smart speakers with voice agents have seen rapid adoption in recent years. These devices use traditional speaker coils, which means the agent’s voice always emanates from the device itself, even when that information might be more contextually and spatially relevant elsewhere. We describe our work on Digital Ventriloquism, which allows a single smart speaker to render sounds onto passive objects in the environment. Not only can these items speak, but also make other sounds, such as notification chimes. Importantly, objects need not be modified in any way: the only requirement is line of sight to our speaker. As smart speaker microphones are omnidirectional, it is possible to have interactive conversations with totally passive objects, such as doors and plants. Published at CHI 2020.
Contemporary voice assistants, such as Siri, require that objects of interest be specified in spoken commands. WorldGaze is a software-only method for smartphones that tracks the real-world gaze of a user, which voice agents can utilize for rapid, natural, and precise interactions. We achieve this by simultaneously opening the front and rear cameras of a smartphone. The front-facing camera is used to track the head in 3D, including estimating its direction vector. As the geometry of the front and back cameras are fixed and known, we can raycast the head vector into the 3D world scene as captured by the rear-facing camera. This allows the user to intuitively define an object or region of interest using their head gaze. Published at CHI 2020.
LightAnchors is a new method to display spatially-anchored data in augmented reality applications. Unlike most prior tracking methods, which instrument objects with markers (often large and/or obtrusive), we take advantage of point lights already found in many objects and environments. For example, most electrical appliances now feature small (LED) status lights, and light bulbs are common in indoor and outdoor settings. In addition to leveraging these point lights for in-view anchoring (i.e., attaching information and interfaces to specific objects), we also co-opt these lights for data transmission, blinking them rapidly to encode binary data. Devices need only an inexpensive microcontroller with the ability to blink a LED to enable new experiences in AR. Published at UIST 2019.
Robust, wide-area sensing of human environments has been a long-standing research goal. We present Sozu, a new low-cost sensing system that can detect a wide range of events wirelessly, through walls and without line of sight, at whole-building scale. To achieve this in a battery-free manner, Sozu tags convert energy from activities that they sense into RF broadcasts, acting like miniature self-powered radio stations. We describe the results from a series of iterative studies, culminating in a deployment study with 30 instrumented objects. Results show that Sozu is very accurate, with true positive event detection exceeding 99%, with almost no false positives. Published at UIST 2019.
Contemporary AR/VR systems use in-air gestures or handheld controllers for interactivity. This overlooks the skin as a convenient surface for tactile, touch-driven interactions, which are generally more accurate and comfortable than free space interactions. In response, we developed ActiTouch, a new electrical method that enables precise on-skin touch segmentation by using the body as an RF waveguide. We combine this method with computer vision, enabling a system with both high tracking precision and robust touch detection. We quantify the accuracy of our approach through a user study and demonstrate how it can enable touchscreen-like interactions on the skin. Published at UIST 2019.
Low-cost, smartphone-powered VR/AR headsets are becoming more popular. These basic devices, little more than plastic or cardboard shells, lack advanced features such as controllers for the hands, limiting their interactive capability. Moreover, even high-end consumer headsets lack the ability to track the body and face. We introduce MeCap, which enables commodity VR headsets to be augmented with powerful motion capture (“MoCap”) and user-sensing capabilities at very low cost (under $5). Using only a pair of hemi-spherical mirrors and the existing rear-facing camera of a smartphone, MeCap provides real-time estimates of a wearer’s 3D body pose, hand pose, facial expression, physical appearance and surrounding environment. Published at UIST 2019.
SurfaceSight is an approach that enriches IoT experiences with rich touch and object sensing, offering a complementary input channel and increased contextual awareness for "smart" devices. For sensing, we incorporate LIDAR into the base of IoT devices, providing an expansive, ad hoc plane of sensing just above the surface on which devices rest. We can recognize and track a wide array of objects, including finger input and hand gestures. We can also track people and estimate which way they are facing. We evaluate the accuracy of these new capabilities and illustrate how they can be used to power novel and contextually-aware interactive experiences. Published at CHI 2019.
EduSense is a comprehensive sensing system that produces a plethora of theoretically-motivated visual and audio features correlated with effective instruction, which could feed professional development tools in much the same way as a Fitbit sensor reports step count to an end user app. Although previous systems have demonstrated some of our features in isolation, EduSense is the first to unify them into a cohesive, real-time, in-the-wild evaluated, and practically-deployable system. Our two studies quantify where contemporary machine learning techniques are robust, and where they fall short, illuminating where future work remains to bring the vision of automated classroom analytics to reality. Published at IMWUT/UbiComp 2019.
Capturing fine-grained hand activity could make computational experiences more powerful and contextually aware. Indeed, philosopher Immanuel Kant argued, "the hand is the visible part of the brain." However, most prior work has focused on detecting whole-body activities, such as walking, running and bicycling. In this work, we explore the feasibility of sensing hand activities from commodity smartwatches, which are the most practical vehicle for achieving this vision. Our investigations started with a 50 participant, in-the-wild study, which captured hand activity labels over nearly 1000 worn hours. We conclude with a second, in-lab study that evaluates our classification stack, demonstrating 95.2% accuracy across 25 hand activities. Published at CHI 2019.
BeamBand is a wrist-worn system that uses ultrasonic beamforming for hand gesture sensing. Using an array of small transducers, arranged on the wrist, we can ensemble acoustic wavefronts to project acoustic energy at specified angles and focal lengths. This allows us to interrogate the surface geometry of the hand with inaudible sound in a raster-scan-like manner, from multiple viewpoints. We use the resulting, characteristic reflections to recognize hand pose. In our user study, we found that BeamBand supports a six-class hand gesture set at 94.6% accuracy. We describe our software and hardware, and future avenues for integration into devices such as smartwatches and VR controllers. Published at CHI 2019.
Interferi uses ultrasonic transducers resting on the skin to create acoustic interference patterns inside the wearer’s body, which interact with anatomical features in complex, yet characteristic ways. We focus on two areas of the body with great expressive power: the hands and face. For each, we built and tested a series of worn sensor configurations, which we used to identify useful transducer arrangements and machine learning features. We created final prototypes for the hand and face, which our study results show can support eleven- and nine-class gestures sets at 93.4% and 89.0% accuracy, respectively. We also evaluated our system in four continuous tracking tasks, including smile intensity and weight estimation, which never exceed 9.5% error. Published at CHI 2019.
Despite sound being a rich source of information, computing devices with microphones do not leverage audio to glean useful insights about their physical and social context. For example, a smart speaker sitting on a kitchen countertop cannot figure out if it is in a kitchen, let alone know what a user is doing in a kitchen – a missed opportunity. In this work, we describe a novel, real-time, sound-based activity recognition system. We start by taking an existing, state-of-the-art sound labeling model, which we then tune to classes of interest by drawing data from professional sound effect libraries traditionally used in the entertainment industry. These well-labeled and high-quality sounds are the perfect atomic unit for data augmentation, including amplitude, reverb, and mixing, allowing us to exponentially grow our tuning data in realistic ways. We quantify the performance of our approach across a range of environments and device categories and show that microphone-equipped computing devices already have the requisite capability to unlock real-time activity recognition comparable to human accuracy. Published at UIST 2018.
Smart and responsive environments rely on the ability to detect physical events, such as appliance use and human activities. Currently, to sense these types of events, one must either upgrade to "smart" appliances, or attach aftermarket sensors to existing objects. These approaches can be expensive, intrusive and inflexible. In this work, we present Vibrosight, a new approach to sense activities across entire rooms using long-range laser vibrometry. Unlike a microphone, our approach can sense physical vibrations at one specific point, making it robust to interference from other activities and noisy environments. This property enables detection of simultaneous activities, which has proven challenging in prior work. Through a series of evaluations, we show that Vibrosight can offer high accuracies at long range, allowing our sensor to be placed in an inconspicuous location. We also explore a range of additional uses, including data transmission, sensing user input and modes of appliance operation, and detecting human movement and activities on work surfaces. Published at UIST 2018.
Compact, worn computers with projected, on-skin touch interfaces have been a long-standing yet elusive goal, largely written off as science fiction. Such devices offer the potential to mitigate the significant human input/output bottleneck inherent in worn devices with small screens. In this work, we present the first, fully-functional and self-contained projection smartwatch implementation, containing the requisite compute, power, projection, and touch-sensing capabilities. Our watch offers roughly 40 cm² of interactive surface area – more than five times that of a typical smartwatch display. We demonstrate continuous 2D finger tracking with interactive, rectified graphics, transforming the arm into a touchscreen. We discuss our hardware and software implementation, as well as evaluation results regarding touch accuracy and projection visibility. Published at CHI 2018.
In this work, we present a new technical approach for bringing the digital and paper worlds closer together, by enabling paper to track finger input and also drawn input with writing implements. Importantly, for paper to still be considered paper, our method had to be very low cost. This necessitated research into materials, fabrication methods and sensing techniques. We describe the outcome of our investigations and show that our method can be sufficiently low-cost and accurate to enable new interactive opportunities with this pervasive and venerable material. Published at CHI 2018.
Human environments are typified by walls – homes, offices, schools, museums, hospitals, and pretty much every indoor context one can imagine has walls. In many cases, they make up a majority of readily accessible indoor surface area, and yet they are static – their primary function is to be a wall, separating spaces and hiding infrastructure. We present Wall++, a low-cost sensing approach that allows walls to become a smart infrastructure. Instead of merely separating spaces, walls can now enhance rooms with sensing and interactivity. Our wall treatment and sensing hardware can track users' touch and gestures, as well as estimate body pose if they are close. By capturing airborne electromagnetic noise, we can also detect what appliances are active and where they are located. Published at CHI 2018.
Smart appliances with built-in cameras, such as the Nest Cam and Amazon Echo Look, are becoming pervasive. They hold the promise of bringing high fidelity, contextually rich sensing into our homes, workplaces and other environments. Despite recent advances, computer vision systems are still limited in the types of questions they can answer. In response, researchers have investigated hybrid crowd- and AI-powered methods that collect human labels to bootstrap automatic processes. We describe our iterative development of Zensors++, a full-stack crowd-AI camera-based sensing system that moves significantly beyond prior work in terms of scale, question diversity, accuracy, latency, and economic feasibility. Published at UbiComp 2018.
Low-cost virtual reality (VR) headsets powered by smartphones are becoming ubiquitous. Their unique position on the user's face opens interesting opportunities for interactive sensing. In this paper, we describe EyeSpyVR, a software-only eye sensing approach for smartphone-based VR, which uses a phone's front-facing camera as a sensor and its display as a passive illuminator. Our proof-of-concept system, using a commodity Apple iPhone, enables four sensing modalities: detecting when the VR headset is worn, detecting blinks, recognizing the wearer's identity, and coarse gaze tracking - features typically found in high-end or specialty VR headsets. Published at UBICOMP 2018.