1 Introduction
Our hands provide a window into our intentions, context and activities. As the body’s primary manipulator, the hand engages in a wide variety of tasks, such as grasping objects, gesturing to signal intention, and operating interactive controls. Wearable sensing can elucidate these interactions, providing context or input to enable richer and more powerful computational experiences for gaming, augmented and virtual reality (AR/VR), and ubiquitous computing. In this work, we introduce a sensing system that leverages electric field sensing in a unique antenna topology. With this single sensing modality, we can detect four separate hand-related activities — held-object recognition, gesture sensing, tangible UI interaction, and biometric identification — all with only a single point of instrumentation on the base of a user’s finger. To our knowledge, we present the first system to achieve such a wide diversity of hand-related activities with a single piece of hardware.
Our approach leverages the human body’s electrical conductivity by appropriating the hand as an antenna. As the hand assumes different poses, grasps objects, or touches conductive surfaces, the electromagnetic properties of this antenna system change. These changes can be quantified by measuring the hand’s bio-impedance, generally denoted by Z, which serves as the basis of our sensing system. Z-Ring, our custom-designed ring prototype, can detect subtle touches and finger movements, enabling micro-gesture interactions for input. Since electrical signals from the ring can travel through the hand to external objects or surfaces contacted by the hand, Z-Ring can detect variations in the hand’s impedance profile that are caused by external interactions, enabling recognition of objects the hand is holding without modifications to those objects.
Additionally, to utilize this capability, we created a set of interfaces, i.e., various buttons, a 1D slider, and a 2D trackpad, that function with Z-Ring. We designed the interface geometry to provide varying impedance profiles in response to different interactions, which Z-ring detects to determine the point of contact using no additional interface electronics or batteries. Further, Z-Ring can identify the wearer due to anatomical variations of the human body that produce a distinct frequency signature response.
Unlike previous systems, which instrument an object (e.g., doorknob) to become an antenna [
58] or that wrap the hand around a device with an embedded antenna [
34],
Z-Ring instead uses the hand itself as a duplex antenna. The system can thus adopt a wearable form factor that, in turn, opens interactive possibilities to a wider application space. Previous work on body-as-antenna [
13] relied on ambient RF signals for operation, limiting its operation to a specific location, or relied on RF emission from devices for detection, limiting its use to active objects [
38]. However, Z-ring’s active impedance sensing approach enables its use anywhere and with passive external objects. Additionally, Z-Ring uses a broad range of frequencies for sensing, providing a richer set of sensing capabilities compared to systems that use discrete frequency impedance sensing [
31,
86].
This work makes the following contributions:
(1)
We apply a novel duplex topology of multi-frequency impedance sensing to enable a single-point-of-instrumentation, wearable form factor for human-computer interaction.
(2)
We demonstrate the potential of Z-Ring’s single sensing modality across diverse applications: (a) one- and two-handed gesture recognition, (b) held object recognition, (c) discrete and continuous touch input on uninstrumented, passive user interfaces, and (d) user identification and authentication.
(3)
We evaluate Z-Ring’s performance for each application area in user studies and show that it performs robustly and consistently across users, applications, and time.
2 Related Work
Z-Ring research intersects five areas of related work. From the sensing modality perspective, it extends work in radio-frequency and electric field sensing but employs a unique topology, widening its interactive potential. From the application standpoint, Z-Ring builds upon work in four domains, i.e., hand gesture detection, object detection, surface UI interaction and user identification. We first examine how Z-Ring relates to previous field sensing and RF work, organizing our discussion by sensing topology. We then highlight non-electric sensing systems adjacent to Z-Ring in each application space.
2.1 Electric Field and RF Sensing of the Hand and Body
The human body is a lossy conductor of high frequency electric fields [19], allowing it to act as a transmission medium for AC signals or as a shunt to ground. This property has been widely employed for human-computer interaction for proximity, touch, communication, identification, medical imaging, and motion sensing applications. We examine these works in the context of their electrode (i.e., antenna) topology, extending the taxonomy from Zimmerman et al. [91] and Grosse-Puppendahl et al. [21], as shown in Fig. 2. 2.1.1 Mutual Capacitance.
In the mutual capacitance or shunt mode configuration, the proximity of the hand or body to a transmitting and receiving electrode pair modifies the mutual capacitance between the two electrodes. This technique is widely used in touchscreens and trackpads. AuraSense [88] uses different configurations of a single transmit and four receive electrodes around a watch body to enable in-air radial input and on-skin buttons and sliders. The authors note a limitation of mutual capacitance for this application: as the transmit and receive electrodes are placed closer together, the projected E-field is bound closely to the electrodes, necessitating proximate interaction. While this property is advantageous for touchscreen sensing, reducing power and noise, it poses a challenge for far-field sensing when attempting to fit the electrodes onto a compact wearable. Unlike mutual capacitance configurations, Z-Ring adopts a small electrode configuration by injecting signal within the body, using the body itself as a transmission medium. 2.1.2 Self Capacitance.
Self-capacitance or loading mode sensing utilizes the same electrode for both transmit and receive. As a ground-coupled body moves closer to the electrode, some of the field is directed through the body, modifying the electrode’s capacitance. eRing [77] and PeriSense [76] both propose a ring-based peripheral with electrodes on the outside of the device. By sensing adjacent finger positions, they can classify discrete hand postures. Touché [58] proposes a swept frequency capacitive sensing method for instrumenting conductive objects (e.g., door knob or water tank) to be touch sensitive. Like Touché, we a implement swept frequency approach, enabling capture of changes in touched object detection and user identification (see Capacitive Fingerprinting [25]); however, we use a broader frequency range. Unlike Touché, we galvanically inject AC signal into the body via a worn device, permitting hand gesture capture without the user touching an instrumented object. AtaTouch [
34] presents a VR controller designed to robustly detect pinches. As the user closes their fingers around a central 6 cm linear antenna buried in the handle, the antenna’s far-field coupled return loss value changes. EtherPose [
32] extends AtaTouch by designing two wrist-mounted cloverleaf antennae to sense continuous hand pose and microgestures. To obtain sufficient signal, the antenna is designed to be resonant at 1.4 GHz, yielding a 2 cm diameter and a large tuning ground plane. Like AtaTouch and EtherPose, Z-Ring also uses impedance sensing via a vector network analyzer (VNA), with the same electrode for transmit and receive. However, unlike these two works, Z-Ring injects a swept frequency, with the hand as an antenna, rather than detuning an external resonant antenna. This enables an electrode configuration that is much more compact and lets Z-Ring sweep a larger frequency range (1 MHz to 1 GHz) for richer interaction modalities.
2.1.3 Body-as-Transmitter.
As a lossy conductor, the body itself can be appropriated as a transmitting antenna, e.g., when holding a car fob to one’s head to extend its range in a crowded parking lot. Hantenna [62] explores the properties of the human body as a transmitting (or receiving) dipole antenna, with users directly contacting electrodes attached to a VNA. They find that the body can improve link levels by up to a substantial 15 to 20 dB. Vu et al. [71] and Biometric Touch Sensing [28] both utilize a body-injected signal via a ring and wristband, respectively, for user identification when interacting with a sensing tablet. Z-Ring also injects signal into the body, but it senses the reflected signal via the same electrode and therefore requires no external devices for sensing. 2.1.4 Body-as-Receiver.
Similarly, the body can also act a receiver. Like Biometric Touch Sensing, DiamondTouch [16] also enables user identification through touch interactions; however, in this configuration, the interactive table is a transmitter, and the body and chair act as a receiver. Carpacio [73] uses this same principle to identify touches on a car’s touchscreen. For touched object interaction, Cohn et al. [13] use an electrode on the rear of the neck to measure changes in ambient RF signal (such as power lines and appliances) as users touched appliances, light switches, and walls. Humantenna [14] extends the technique to sense whole body poses. Finally, in the domain of held object detection, EM-Sense [39] and Maekawa et al. [49] use a radio and wire coil, respectively, to capture broadband electromangetic noise generated by electrically active household objects. In contrast, Z-Ring uses an active sensing approach to excite the hand and held objects and can therefore recognize objects that are electrically passive. 2.1.5 Body-as-Waveguide.
The body-as-waveguide or intrabody coupling configuration combines both transmit and receive topologies, with the body in direct galvanic contact with both electrodes. This configuration has historically been investigated for intrabody and interbody communication networks [90] and in the medical context to non-invasively examine the body’s internal make up and tissue properties [7]. For example, Usman et al. propose a ring-based bioelectrical impedance analyzer to estimate body fat [70]. Zensei leverages wide spectrum bio-impedance sensing to identify users from a set of six electrodes via instrumented objects [63].Closer to our work, Cornelius et al. [15] present an eight-electrode, bio-impedance sensing wristband. The system features a 98% accuracy among 8 individuals and an authentication accuracy of 86.9%.Z-Ring also senses body composition variation to enable biometric user identification and authentication; however, our system uses one electrode pair on the index finger. Intrabody coupling methods have also been leveraged for gesture applications. Tomo [85] "images" the internal composition of the forearm with a band of eight electrodes performing EIT, enabling the system to classify 11 discrete hand gestures. BodyRC [75], ActiTouch [86], SkinTrack [87] and ElectroRing [31] all couple high frequency alternating current into the body and measure impedance changes from skin contact. BodyRC uses two electrodes on the left and right arms to differentiate touches between each arm, while ActiTouch uses an electrode on a VR headset and wristband for the same purpose. SkinTrack couples an 80 MHz transmit frequency via a ring and senses the impedance at four locations on a watch body, enabling 2D interaction on the skin near the watch. ElectroRing proposes a single-point-of-instrumentation ring with transmit and receive pairs separated by an active shield that can robustly detect pinches, touchdowns and releases. In contrast, Z-Ring’s impedance sensing technique uses a single active electrode as both transmit and receive without the need for an active shield. Additionally, Z-Ring uses a wideband frequency sweep instead of a single-frequency measurement, enabling a wider range of gestures and interactive applications beyond touch recognition. 2.1.6 Body-as-Reflector.
Radio-frequency EM waves reflect off sharp changes in impedance, such as when they encounter the boundary between air and a body. Doppler radar uses this phenomenon to measure spatial changes, including subtle ones such as those associated with thumb-to-finger microgestures. Several papers have evaluated radar methods using external benchtop antennas to capture microgesture interactions [22, 26]. Google’s Soli project miniaturized a 60 GHz radar system into a single integrated circuit (IC) capable of measuring dynamic gestures [46, 74]. ThuMouse extended this work to enable continuous movement [43]. Unlike approaches that use external antennae to radiate RF signals into the air, Z-Ring uses lower frequencies that can be coupled into the body via a wearable device, allowing changes in the signal when a pinch, touchdown, or touch up occurs. 2.1.7 Body-as-Transceiver.
Z-Ring employs a unique topology, which we call body-as-transceiver , that combines and extends elements of the preceding configurations. Similar to body-as-waveguide, this topology galvanically injects and receives current through the body as a transmission medium, although it can use a single active electrode (like self-capacitance) for both transmit and receive to enable a single-point-of-instrumentation. As an active transmit sensing method, our topology is less sensitive to EM noise than passive approaches, but it remains vulnerable to potential swings between the system’s ground and the earth ground (such as when a user takes off their shoes). To suppress potential swings, we employ a bias resistor between an electrode to the local body ground and Z-Ring’s sensor ground (see Sec. 3.1). Additionally, our signal processing methods either directly employ or learn adaptive normalization over time. Z-Ring’s configuration sweeps a wideband frequency range from low MHz to low GHz to collect a rich impedance profile, a key advantage over previous works like EtherPose and ElectroRing. This feature makes possible four separate interaction applications on the same hardware: gesture sensing, held object detection, passive UI interaction, and user identification. To our knowledge, no previous work (E-field sensing or not) spans this breadth of tasks. 2.2 Non-Electric Field Sensing Approaches
We now highlight non-electric field (EF) sensing approaches in each of Z-Ring’s four application domains.
2.2.1 Hand Gesture Detection via Non-EF Wearables.
Commercial hand gesture detection systems [
1,
2,
52] often rely on optical methods, using cameras mounted on external devices such as AR/VR headsets or necklaces. However, these systems require a clear line of sight to the hand, complicating efforts to detect gestures made outside the camera’s field of view. Therefore, many systems have explored sensors mounted on the wrist, hand, and/or fingers, spanning a wide range of modalities:
optical ([11], [33], [50], [10], [48]), ([20], [84]), bio-acoustic ([4], [53], [82]), ultrasonic ([29], [81]), mechanical ([69], [47], [35]), and inertial ([23], [51], [18], [44]). The capability of these devices ranges from detecting pinches to recognizing discrete gesture sets to driving full kinematic models.
However, these approaches have several limitations: (1) passive IMUs and bio-acoustic techniques can detect when the fingers make contact but not when they release, which is important for actions like dragging and dropping; (2) IMUs used for gesture sensing require multiple points of instrumentation, making the system awkward to use; (3) optical and ultrasonic techniques require a clear line of sight to each gesturing appendage, limiting possible mounting positions and making it difficult to detect pinches; and (4) mechanical and magnetic systems require instrumentation of the whole hand, back of the hand, or fingertips, making the system uncomfortable to use. In contrast, Z-Ring instruments the body at only a single location, i.e., the base of the index finger, an ergonomic and socially acceptable area for worn systems. By using the body as a transmission medium, Z-Ring does not require line-of-sight for microgesture sensing. In addition, it can robustly sense both touchdown and touch up events for one- and two-handed gestures even if touch velocity is low.
2.2.2 Held Object Detection via Non-EF Wearables.
Recognizing an object that a user is holding can provide insight into the user’s activity or intention. InDexMo [
45] uses a finger-worn RFID transceiver to recognize tagged objects. However, this impractical approach requires tagging or modifying each object. Other methods identify objects by assessing the user’s grasp, such as using EMG [
17], wrist topography [
61], and inertial sensors [
9]; however, these methods are highly sensitive to variations in the user’s grip.
Viband [
37] and VibEye [
57] both use vibrations to identify objects. Viband uses an oversampled IMU to detect the characteristic vibrations of active objects, such as a drill or blender. VibEye identifies passive objects by actively exciting the object mechanically and capturing the resulting vibration, reporting an accuracy of 92.5% across 16 objects. Similarly, Z-Ring also uses active excitation, but in the electrical domain, allowing it to recognize electrically passive objects, unlike EM-Sense [
39] and Maekawa et al. [
49] (see Sec.
2.1.4).
2.2.3 Passive User Interfaces.
Passive user interfaces provide an input surface that does not require a power source or batteries to operate. Multiple sensing modalities have been investigated to enable such input. Audio-based input, such as Scratchinput [
24], provides passive input on textured surfaces by analyzing the sound produced by dragging a fingernail across the surface. Acoustruments [
36] also uses audio to provide passive input by combining low-cost and powerless mechanisms with portable devices. OptoSense [
83], which uses an array of photodiodes to detect motions performed over it, is powered by ambient light captured from the environment. Other optical approaches, such as Magic finger [
78] and LightRing [
30], use a tiny camera on the fingertip with a gyroscope and proximity sensor, respectively, to convert any surface into an input medium. UbiquiTouch[
72] operates a low-power touch sensor on energy harvested from ambient light. IDsense [
42], PaperID [
41], RIO [
59], and RapID [
66] give passive input via RFID tags whose backscatter signals are detectable by neighboring RFID readers, which then interpret the interaction based on this data. MARS [
5] provides passive touch surfaces through an ultra-low power sensing and back-scatter communication system.
While the research discussed here provides passive input capabilities, it is either limited in interactivity, expensive to deploy, or requires specialized infrastructure, such as RFID readers, in the environment. Z-Ring addresses these limitations by providing rich input capabilities on low-cost interfaces that operate without electronic components or batteries.
2.2.4 User Identification via Non-EF Wearables.
Biometric recognition systems can regulate access to resources or enhance personalization for interactive applications. Wearable systems use widely varied sensing methods to collect distinguishing biometrics: impedance (Sec. 2.1.5), PPG, ECG, iris, gait sensing via accelerometer, heartbeat sounds, and skin conductance [8]. Since wearables often contain several of these sensors for fitness and activity recognition, these sensing modalities can be hybridized with each other or with more traditional techniques, such as passwords, to increase specificity or provide continuous authentication [67]. For example, Nymi [56] uses an onboard fingerprint reader for user authentication, ECG for continuous "liveness," and an NFC transmitter to communicate with the host system. Z-Ring similarly offers methods of continuous user recognition for more secure, context-aware, or personalized experiences. Beyond non-EF methods, Z-Ring is extensible, permitting authentication when the user is physically in contact with a resource by combining its capabilities of held object recognition, user identification, and transmission through the body. 3 Z-Ring
The Z-Ring prototype consists of a ring with an electrical setup that measures the impedance of a user’s hand. An impedance change can occur when the user moves their fingers, holds an object or touches an external surface. By analyzing the change in impedance over time, Z-Ring can detect a gesture the user performs, identify the interactions with custom-designed passive user interfaces, recognize the object held in the user’s hand and identify the users themselves. For the explorations in this paper, the Z-ring prototype is worn on the index finger.
3.1 Sensing Technique
When an electromagnetic wave travels from one transmission medium to another, part of the wave passes through to the new medium and the remainder is reflected back into the original medium due to the impedance mismatch between the two mediums. Measuring the magnitude and phase of the reflected wave at the transmission interface can assist in comprehending the new medium’s impedance characteristics. This technique is typically applied in electrical engineering to measure the impedance of an antenna: a vector network analyzer (VNA) applies a continuous wave signal with a frequency that varies with time to an antenna being tested and analyzes the reflected signals to determine the antenna’s impedance as a function of frequency. We employ this technique to analyze the hand as if it were an antenna, reading its impedance over frequency.
The human body absorbs RF waves and permits transmission at specific frequencies [
19], allowing the hand to act as an RF antenna.
Z-Ring leverages this phenomenon by injecting a small RF signal into the body through its contact with the finger and capturing the reflected signal to measure hand impedance. As the hand posture changes, the antenna geometry changes, in turn changing the associated impedance.Additional impedance change may occur if the hand touches exterior surfaces, such as external objects or other parts of the user’s body. The signal initially injected from the ring can then flow through the user’s hand to the exterior surfaces, causing the signal to reflect at the newly constructed boundaries between the hand and surface, resulting in additional impedance change. This change can provide information useful for identifying hand interactions with external surfaces.
3.2 Electrical Setup
Z-Ring measures impedance by measuring the
reflection coefficient, also known as S11, a metric that specifies the amount of a wave that is reflected by an impedance discontinuity in the transmission medium. The magnitude component of this measurement is defined as the ratio of the reflected wave’s amplitude to the incident wave’s amplitude. The S11 port of a VNA is used to perform this measurement. The prototype Z-Ring has two electrodes for measuring impedance with a VNA: the
signal electrode transmits the signal into the hand and reads the reflected signal, while the
bias electrode biases the hand to a local ground through a 2 MΩ biasing resistor. Figure
3a shows this arrangement.
Figure
3b shows the simplified circuit of Z-Ring’s electrical model. The hand is modeled as a lumped combination of a variable resistor
Rb, capacitor
Cb, and inductor
Lb, whose values are based on hand posture and what the hand is touching externally. The hand is fed an AC signal through a 50 Ω transmission line. Due to the ring-skin interface’s impedance mismatch, part of this signal reflects, and the rest propagates to the hand.
Re and
Ce represent this impedance mismatch.
Re depends on factors like skin moisture, and
Ce is determined by variables, e.g., how tightly the electrodes are in contact with the skin.
The hand is also coupled to the sensor’s local ground through a 2 MΩ resistor.
Cp represents parasitic capacitance as the body couples to the earth ground. Factors like the material and thickness of the user’s shoe soles and the number of feet in contact with the floor affect
Cp.
Cp is relatively small due to the weak coupling with the earth, and the hand’s impedance mainly defines the circuit’s impedance.
4 Background Experiments
Our sensing approach uses the human hand as a broadband, full-duplex antenna. The hand is a complex structure comprised of multiple layers, such as skin, muscle, fat, and bone, which affects its behavior as an antenna. To better understand its antenna function, we conducted simulations to characterize its properties, investigating the frequency response of the hand and how it changes in different postures when holding objects or in contact with external surfaces.
We performed simulations using CST Microwave Studio [68], a commercial electromagnetic analysis suite. CST employs numerical techniques (the finite element method and finite integration technique) to simulate the behavior of electromagnetic fields in complex structures. We leverage Hugo [65], an electromagnetically accurate 3D model of the human body from the US National Library of Medicine. Using CST Studio’s PoserGUI, we used copper material to create hand models in different postures and their electrodes as 3D shapes. Figure 19 shows an example hand model with the ring electrodes. We provide simulation experiment specifics and results in Sec. A. Results show that the frequency range from 1MHz to 1000MHz is optimal for absorption by the human body. Hence, we use this range for gesture recognition; simulations for passive interfaces also indicate that this range is ideal. For object detection, we found that at frequencies exceeding 500MHz (to 1000MHz), the ring starts to couple with the object through the air as the signal wavelength becomes comparable to the object dimension. Hence, we use 1MHz to 500 MHz for object recognition experiments. For user identification and authentication, frequencies between 1MHz-400MHz show the most distinct response, so we use these for our investigation. 5 Implementation
As noted, the Z-Ring prototype has two electrodes. The signal electrode transmits the signal into the hand and reads the reflected signal; the bias electrode biases the hand to a local ground through a 2 MΩ resistor. We designed each electrode as a 65 mm by 8 mm exposed copper region on a flexible printed circuit board (PCB) built on a polyimide sheet. Figure
4a shows the electrode setup. The flexible PCB lets the electrodes wrap conformally around the user’s finger within the ring. Both electrodes are placed adjacent to one another along their entire length, with a 4 mm gap between them. To prevent skin and environmental moisture from causing oxidization, we coated the electrodes with gold using the electroless nickel immersion gold (ENIG) technique.
On the opposite side of the electrodes, the flexible PCB features a U.FL connector to connect a shielded coaxial cable between the electrodes and the VNA. The same side contains the bias resistor for the ground electrode. The flexible PCB is affixed to a velcro strip with double-sided tape, allowing the electrodes to be wrapped around fingers of varying sizes. The flexible PCB mounted on the velcro constitutes the ring setup. We show this configuration in Figure
4b .
The S11 measurements are captured with a portable VNA (called LiteVNA[
80]). LiteVNA is powered by a rechargeable battery and supports a frequency range from 51 kHz to 6 GHz.
The units draw 2.4 W (of which 1.16 W is measurement circuitry and display driver and 1.22 W is LCD backlight). We secured the VNA to the user’s wrist using a Velcro strap to maintain a short connection between the ring and S11 port; the VNA’s small form factor measures around 91 mm by 58 mm, making it suitable for this purpose.
Each S11 measurement is made by transmitting a sweep of signal frequencies between a given start and end frequency and measuring the reflected signal for this sweep. The VNA is configured to record this response as a 51 data point array and perform 30 sweeps per second, thus setting the sample rate of 30 Hz. The user application determines the start and end frequencies, which can vary (between 1 MHz to 1 GHz). The data is transmitted over a wired USB connection to a MacBook Pro laptop, where a custom Python application logs and analyzes it.
Notably, even when not plugged into the laptop and battery-powered, our prototype generates the same amplitude of S11 change during interactions. We set the VNA’s maximum output power to 5 dBm, which is considered safe for humans [
3]. Section
6 describes the software processing pipeline for each interactive task.
6 Application Domains
We now describe the data processing pipeline, evaluation and results from our user studies of four application domains: gesture recognition, object recognition, tangible user interfaces, and user identification. The study included 21 participants (4 female; average age 27.7, min 18, max 33). All participants did not participate in each app sub-study for logistical reasons. Each sub-study varied in length from one to three hours, and participants completed sub-studies in a single session. No data was omitted for performance reasons. The user research took place over a span of three months.
6.1 Gesture Input
Z-Ring provides input via hand gestures, enabling always-available input for wearable or environmental computing control. It supports two different types of interactions: one-handed and two-handed interactions. The former provides subtle interactions between the thumb and index finger, while the latter facilitates interaction with one hand on the back of the other. We offer five distinct gestures for both scenarios: tap, double tap, long tap, and left and right swipe. Figure
5 demonstrates how these interactions are performed. For example, the various taps support different selection possibilities in a user application, and the bidirectional swipes enable navigation.
6.1.1 Recognition.
Gesture recognition is built upon the frequency domain and temporal pattern generated in the S11 measurements made while performing the gestures. Changes in the frequency domain occur due to new propagation paths for the transmit signal while performing the gesture. Figure
6 shows the new signal paths generated while performing one- and two-handed gestures. Temporal patterns result from finger motions needed to complete the gesture. For instance, the time-varying movement of a double tap differs from that of a single tap, and so on.
We use S11 magnitude as the feature of choice in our modeling approach, but we also explored using phase shifts. However, phase shifts displayed low stability since the signals pass through the hand, a lossy transmission medium producing unpredictable phase shifts.The S11 measurements for gesture recognition are taken using a frequency sweep ranging from 1 MHz to 1 GHz. The gesture recognition pipeline begins by applying a moving median filter to the live S11 data stream with a sliding window of 200 milliseconds. This emphasizes impedance changes while attenuating the noise generated by motion artifacts. Then, 1.5-second windows of S11 data (about 45 S11 samples at 30Hz) are singly processed to detect any gesture executed. All S11 samples in each window are vertically stacked to produce a spectrogram; Figure
7 shows spectrograms for the five gestures and the window in which no gestures were performed (null state).
The spectrogram is then resized to 25-by-25 and fed into a convolutional neural network (CNN) to identify the gesture. Figure
8 shows the architecture of our CNN. Since the gesture could occur anywhere within the 1.5-second window, we produce synthetic data by moving this window in time between -600 and 600 milliseconds in increments of 30 milliseconds and append it to the original data when training the CNN model. The time shifting is accomplished by rolling the spectrogram along the time axis while wrapping around the edges.
For our evaluation, we built both user-independent and user-dependent models. For the former, we additionally augment the training set by generating data from rolling the spectrograms along the frequency axis: because each person’s unique hand anatomy results in impedance responses in different frequency bands, we roll the spectograms in this way for the model to learn patterns across the whole frequency domain and be able to generalize. We slide the spectrogram for 500 Mhz (in increasing frequency direction) in steps of 20 MHz and wrap it around the edges. Therefore, for user-independent models, the training set is augmented both in the time and frequency domains. 6.1.2 Evaluation.
We evaluated the one- and two-handed gestures in user studies with 14 and 15 individuals, respectively. Participants sat in front of a computer screen that displayed visual cues to perform a specific gesture. Both one- and two-handed gestures followed the same study protocol; the study required participants to complete each of the five gestures and a null gesture. A motion other than making specific gestures was considered a null gesture; participants interacted with the desk and their personal belongings, like phones, wallets, keys, etc., with both hands during this time. The null gesture was designed to let us examine how the recognition system would perform when the user engages in other activities. In total, each participant made 180 gestures (= 6 sessions × 6 gestures × 5 repetitions per gesture), enabling us to evaluate a single set of interactions (one- or two-handed). Participants were instructed to remove and re-wear the ring after each session so we could evaluate cross-session performance more accurately.
6.1.3 Results.
We chose accuracy as our primary metric for measuring the recognition system’s performance. We analyze and present both user-dependent and user-independent recognition models. For the former, we use the first four of the six data sessions collected to train the CNN, which we then test on the final two sessions.
For the
user-dependent model for one-handed gestures, Figure
9 shows recognition accuracy per participant. The average recognition accuracy is 93.14%, with the highest at 100% for P4 and the lowest at 89% for P2. Figure
10a shows the confusion matrix for this result. We see that the most challenging gesture to distinguish is the left swipe, which is confused with the right swipe and tap gesture. We can improve this result by replacing the tap gesture in the user application with a double tap or long tap. The null gesture is recognized with a high accuracy of 96.1%, suggesting that the recognition system provides tolerance to false positives while accurately recognizing gestures.
The average recognition accuracy for the
user-dependent model for two-handed gestures is 92.67%, with the highest at 98% for P21 and the lowest at 86 for P1 and P10. Figure
9 displays the accuracy per participant. The confusion matrix in Figure
10c reveals that the left swipe, confused with the long tap, is the most challenging gesture to differentiate; long tap is also often confused with left swipe. Therefore, excluding the long tap from this gesture set can increase its accuracy.
For user-independent analysis, the CNN model was trained on all six data sessions from all but one user and tested on all data sessions from the left-out user. This process was repeated for each user.
The average recognition accuracy for the user-independent model for one-handed gestures is 88%, with the highest and lowest accuracy at 99% and 77%, respectively, for P4 and P1, as shown in Figure 9. The confusion matrix in Figure 10b shows that double tap is the most challenging motion to recognize. For the user-independent model for two-handed gestures , the average recognition accuracy is 83.67%, with the highest at 92% (P4), demonstrating the potential for a generalizable gesture set. Figure 10d shows the confusion matrix for this result. Overall, Z-Ring provides gesture input that functions across the population with high accuracy but can deliver additional robustness with further fine-tuning via user-specific models. Collecting the data over a period of three months demonstrates that Z-Ring’s performance (cross-user) remains stable over time.
6.2 Tangible User Interfaces
Z-Ring provides a new method for measuring surface impedance by touch. We use this method to develop physical user interfaces and design them so each offers a unique characteristic impedance. Z-Ring can identify the touch and interaction with these interfaces based simply on their different impedance signatures, enabling the development of passive and battery-free interactive user interfaces. We propose three user interfaces: buttons, a continuous 1D slider, and a continuous 2D trackpad. Figure
11 illustrates these interfaces.
6.2.1 Design.
We construct these interfaces using a thin copper sheet, which, as an excellent electrical conductor, offers a significant impedance change when touched with the Z-Ring. Because impedance is dependant on shape and size [
64], we vary these aspects as we construct the interfaces to create distinct impedance signatures across frequency.
We designed each button with a unique shape to ensure that each has a distinct impedance profile. The 1D slider is built asymmetrically along the direction of sliding to generate a continuously varying impedance change, which helps determine where the finger is on the slider. Similarly, the geometry of a 2D trackpad is asymmetric in two directions so that each trackpad location offers a distinct impedance profile.
6.2.2 Recognition.
To recognize a button with Z-Ring, we employ a support vector machine classifier (kernel=rbf). The classifier takes the 51-point S11 measurement as a feature vector, predicts whether any button is touched, and identifies which. The classifier is trained on data collected while each button is touched and while no button is touched (null).
To predict finger location on the 1D slider and 2D trackpad, we use a random forest regressor (number of trees = 300, maximum depth = 30) independently for each. The regressor receives S11 measurements (51-point gesture vector length) from discrete locations on the interface as training data and predicts a continuous output (
x for 1D and both
x and
y for 2D). Figure
12 shows the discrete locations on the 1D and 2D interfaces. On the actual interface prototypes, each discrete position was marked as a 1x1cm square with a sharpie marker.; the user then placed their finger within this square box so we could collect data. For the 1D slider, eight locations, each 2 cm apart, were specified. For the 2D trackpad, 12 (3 rows by 4 columns) different locations were marked, each 3 cm apart.
For all three user interfaces, the start and stop frequencies for the S11 measurement sweep are set to 1MHz and 1GHz, respectively.
6.2.3 Evaluation.
We conducted a user study to assess how well Z-Ring recognizes buttons. We created four buttons (Figure
11) and investigated how effectively Z-Ring could differentiate among them and between touching/not touching the buttons. We also evaluated the continuous tracking accuracy for both 1D and 2D interfaces. Table
1 shows study details.
Results for Buttons. We first determined if touch to any button could be reliably identified. We created a binary classifier (SVM, kernel=rbf), with one class using data from button touches and the other as null gesture data from the gesture testing user study; the null class included users engaging with their phones, desks, and actions not involving button touches. We created two models, user-dependent and user-independent. The former was trained on the first four buttons and gesture study sessions and tested on the last two. The latter used all six sessions from one user for testing and all six sessions from remaining users for training, and we repeated this process for all users. This binary classifier yielded 100% accuracy for both user-dependent and independent models.
Next, we tested the system’s ability to differentiate buttons. We used the first four sessions to train an SVM classifier (kernel=rbf) and the last two for testing. Each user had an individual model. Average button identification accuracy across all participants was 91.8%, with highest accuracy at 99.3% (P4) and lowest at 82.79% (P19). Figure
13 shows these results. Additionally, we discovered that user-independent models for buttons cannot be easily generalized: as signal travels through the user’s hand, each user’s unique anatomy influences the frequency profile differently, producing different spectral signatures for the same button.
Results for 1D Slider. We tested how accurately Z-Ring predicted finger location on the 1D slider. We built a regression model, trained it on data from slider positions ’4’, ’8’, ’12’, and ’16’ (Figure
12a), and tested it on the rest. We built a user-dependent model using the first four sessions for training and the last two for testing. Our user-independent model used all six sessions for training and testing; its training set consists of data from all participants except the one being tested.
Both models had a mean absolute error of 3 cm and 4.4 cm, respectively. The user-dependent model’s maximum and minimum errors were 4 cm and 1.52 cm, respectively. The user-independent model had a maximum and minimum error of 5.71 cm and 3.78 cm, respectively. Both models have the highest error at the ’2’ position. This is likely because the ’2’ point was located at the slider’s thinner end, where only a portion of the user’s finger could make solid contact with the copper; building a taller slider could overcome this problem. Figure
14 shows the results.
Results for 2D Trackpad. We tested how accurately Z-Ring predicted finger location on the 2D trackpad. We built a regression model, trained it on data from positions A, C, F, H, I, and K (Figure
12), and tested it on the rest. We built user-dependent and independent models following the same procedure used for the 1D slider.
We calculated the error as the mean absolute Euclidean error for each position. The mean absolute error for the user-dependent model was 3.2 cm and for the user-independent model was 4.14 cm. The maximum and minimum errors for the user-dependent model were 4.39 and 2.67 cm, respectively, and for the user-independent model were 4.62 cm and 3.85 cm, respectively. Positions D and L on the trackpad’s right edge showed higher errors than other points for both models, probably because the edge protrudes, offering a broader range of impedance variations than other locations. This problem can be addressed by bringing the corner closer together or gathering more data for spots in that vicinity. Figure
15 shows these results.
For 1D sliders and 2D trackpads, the user-independent model out-performed the model for buttons. Although all are copper-based topologies, the model for 1D and 2D interfaces needs to identify changes only in relative impedance between different locations on the interface, while the one for buttons must detect absolute impedance changes, a more difficult task.
Study results show that the slider and trackpad, though less accurate than those on conventional electronic devices, open new opportunities for inexpensive, battery-free, and low-resolution input devices. Within the limits of the given resolution, these interfaces can support gestural input. For example, the sliders can support 1D gestures like left and right swipes, and the trackpad can support 2D gestures like cardinal-directional swipes or unistroke character inputs. The low-cost gestural input capability makes these interfaces useful for ubiquitous, situated UIs.
6.3 Object Detection
We leverage Z-Ring’s ability to detect the impedance of external surfaces the user’s hand touches to identify hand-held objects. Since objects have a variety of shapes, sizes, volumes, and materials, each has a distinctive impedance signature that can identify them. We examined six commonly used objects: a doorknob, a juice can, a water bottle, a small storage box, a wrench, and tweezers. These objects fall within categories such as small (tweezers) vs. large (wrench) and hollow (bottle) vs. solid (doorknob) and require different hand grips to use them. We focus on metallic objects since they produce the most significant impedance shift; however, non-metallic items can also offer impedance changes. Figure
16 shows the different objects and their unique impedance signatures. By detecting objects in hand, Z-Ring can provide a contextually aware input modality.
6.3.1 Recognition.
We classify objects with an SVM classifier (kernel=polynomial) using a 51-length S11 measurement as the feature vector. Start and end frequencies were set to 1 MHz and 500 MHz, respectively, since most dynamic changes are observed in this band.
6.3.2 Evaluation.
In a study with 14 participants, we assessed the Z-Ring’s object recognition accuracy. Participants were asked to hold objects in the air with the hand wearing the ring. They grasped the objects as they typically would when interacting with them. Each participant performed a total of 240 grabs, which equals 10 sessions × 6 objects × 4 repetitions for each object. Each grab lasted 2 seconds (about 60 S11 measurements). We increased the total number of sessions for this study compared to the previous ones due to the wide diversity of ways participants held the objects.
6.3.3 Results.
We used the first eight user study sessions to train the classifier model and the final two sessions to test it. We constructed a unique model for each participant. In addition to the data gleaned from holding the objects, null gesture data from the gesture recognition user study was also included; null gestures included participants interacting with their phones or desk or performing any action not in the gesture set. Including null gestures helped us discover whether the classifier could differentiate between holding and not holding objects and between holding objects from/not from the test set.
The average object recognition accuracy was 94.5% across all participants, with a maximum accuracy of 99% (P2, P16) and minimum of 87% (P20). The most frequently confused objects were tweezers and the metal box, followed by the bottle and can. The Null class had 100% accuracy for all participants. Figure
17 shows these results.
The user-independent model was challenging to generalize across users for objects. As was the case for buttons, the impedance profile of the user’s hand affects the signals traveling to/from the object, influencing its impedance response.
6.4 User Identification
Z-Ring provides a unique ability to identify and authenticate the user wearing it. It identifies users based on their unique hand impedance signatures due to their individual anatomical structure. Since Z-Ring is wearable, it can continuously identify and authenticate the user and provide a secure input modality.
For this analysis, we repurposed data from the object detection study because it reflects a more realistic environment where users interact with and touch objects and surfaces. Since touching external surfaces generates additional impedance changes, we wanted to determine if Z-Ring could still identify and authenticate users given this added noise.
6.4.1 Recognition.
Users are identified and authenticated with a random forest classifier (number of trees = 50, maximum depth = 30). S11 measurements were input to this model. Based on simulation results, we utilized the frequency range of 1MHz to 400MHz. We trimmed data from the object detection study (sweep from 1 MHz to 500 MHz) for the smaller frequency range and repurposed it for this investigation.
6.4.2 Evaluation.
We consider all classes derived from the source data to represent the user as a single class. We conducted two analyses. For user identification, we train a model on each user’s first eight sessions and test it on the remaining two sessions. For user authentication, we train a binary classifier with all test users’ data as one class and all other users’ data as the other. The amount of data is uneven between the two classes since it comes from one participant vs. remaining participants; therefore, we uniformly resample data from remaining individuals for comparable data amounts.
6.4.3 Results.
For closed-set user identification, the average accuracy is 99%. Of the 14 participants, 12 had a perfect accuracy of 100%. The remaining two had an accuracy of 89% and 97.4%. For user authentication, the average classification accuracy across participants is 98.3%, with 8 of 14 showing an accuracy of 100%. The remaining 5 of 6 participants had an accuracy of over 90%, and the last participant’s accuracy was 89%. We also re-ran this evaluation with data from the one-handed gesture study and obtained similar results.
7 Discussion and Limitations
Z-Ring offers multiple functions, like gesture input, object detection, and interaction with passive user interfaces.
Our user studies explored each application domain separately; the data for each application domain was collected and tested independently. However, in a real-world scenario, multiple applications may be required to operate simultaneously. For example, a user might use a passive button to enable a device and then one-hand gestures to control it. In this scenario, the system must be able to mode switch between functions.
To understand the feasibility of this scenario, we trained a classifier (Random Forest, number of trees=50, max depth=30) to distinguish between the various applications by combining the data collected from the user study. Figure
18 shows the confusion matrix from this analysis. Results show that differentiating different applications with high accuracy is possible. In this architecture, a gating classifier could determine the application, and a second classifier could identify the interaction; another approach to this problem could involve training a single classifier to operate across all application domains. We leave this for future work.
For our tangible button investigation, we focused on altering the button’s geometry to obtain a distinct impedance signature. In addition to shapes and sizes, additional methods can vary or control impedances, some of which are commonly used in RF circuit design. Cross-hatching, one such technique, carves out vast sections of copper on a PCB in a lattice pattern to control trace impedance. Attaching passive devices such as capacitors and inductors to button surfaces can also alter impedances. In certain chipless RFID designs [
27,
60], resonant structures are constructed around the main feed line to introduce peaks at specific frequencies, contributing to generating a unique spectral signature. This could be yet another possibility for enhancing the buttons’ frequency signature. Some chipless RFID systems [
54,
79] use structural features along the signal line at varying distances, from which the signal bounces back and generates a temporal signature; such an approach with a temporal and an impedance pattern might be helpful when developing user-independent models for button recognition.
We explored copper surfaces and metallic objects for passive user interfaces and object detection because their electrical conductivity produces significant impedance changes. Nevertheless, nonmetallic materials or dielectrically distinct objects, such as paper, cardboard, and glass, can induce impedance change, although to a lesser degree. Items with high water content, such as fruits and vegetables, can result in strong impedance changes. Certainly, our system could sense electrically active objects such as those explored in EM-Sense [
38].
The current Z-Ring prototype employs a commercially available VNA device that is too large for use as a wearable and, due to its lack of wireless capabilities, is tethered to a laptop for data transmission. This restricts the current prototype’s utility in real-world scenarios or for extended periods. However, advancements in chip manufacturing, materials, and circuit design offer paths towards single-chip VNAs [
12,
40,
55] that could substantially reduce the prototype’s size. Moreover, we do not require the VNA’s more sophisticated measurements in our processing pipeline, opening the door to simpler scalar network analyzer circuit designs, or even focusing on impedance measurements at only the set of discrete frequencies that demonstrate the greatest discrimination in our models [
6]. We will explore this aspect in the future.
Currently, most wearable devices, such as those used for fitness tracking, feature a pair of electrodes that make contact with the skin to measure stress levels through skin conductance measurements. In the future, Z-Ring can repurpose these electrode pairs to enable new interactive functionality.
8 Conclusion
We propose Z-Ring, a custom-designed ring prototype that leverages a measurement of the finger’s bio-impedance to enable a wide variety of interactive applications, while requiring only a single point of instrumentation. Our evaluation shows that Z-Ring recognizes one-handed gestures at 93.1% and two-handed gestures at 92.6% accuracy and performs object detection at 94.5% accuracy with six objects. We demonstrate that Z-Ring accurately detects touch-down on a copper button with 100% accuracy and one’s finger position on a 1D and 2D slider, with an average MAE of <4.4 cm for the 1D slider and an average euclidean MAE of <4.1 cm for the 2D slider. It also recognizes different copper shapes with an accuracy of 91.8%. Z-Ring performs user identification with 99% accuracy. Due to the richness of data afforded by the sensing modality and coupling technique described in this paper, we believe the methods proposed in Z-Ring offer great potential for further HCI explorations.
Acknowledgments
We express our gratitude to all the participants in our research study. We especially thank Prof. Joshua Smith, Prof. Vikram Iyer, and all reviewers for their invaluable feedback and guidance on the paper. Additionally, we acknowledge Bo Liu and Qiuyue Xue for their assistance in recording parts of the project video and Hana Kim for video voice-over.
This research was approved by the University of Washington IRB (STUDY00015302) and supported by the University of Washington Endowment Fund. Ishan Chatterjee is supported by the National Defense Science and Engineering Fellowship.
A Appendix: Background Experiments
In this section, we describe the results from our background experiments performed in CST Microwave Studio [68]. A.1 RF Energy Absorption
To determine the most efficient excitation frequency for our prototype, we model specific absorption rate (SAR) of the hand at a range of frequencies. SAR measures the RF (radio frequency) energy absorption rate per unit mass by a human body. We tested frequencies between 1 MHz and 2000 MHz (Figure 19), which have been shown to have high absorption rates by the body [62]. From the results, we found that the body’s absorption rate decreases continuously from 1 MHz (maximum normalized SAR = 1) to 2000 MHz (maximum normalized SAR = 0.2). However, the absorption rate reduces significantly after 1000 MHz (maximum normalized SAR = 0.47). Based on these findings, we selected 1 MHz to 1000 MHz as our prototype’s range of excitation frequencies. A.2 S11 Variations Based on Hand Pose
We next focused on variations of S11 for one-handed gestures with an emphasis on thumb-to-index finger interactions. Specifically, we wanted to understand how the thumb and index finger contact would affect S11 and how the location of the contact (e.g., at the tip or middle of the index finger) influences S11. We looked at three scenarios:
(1)
When the index finger and thumb are not touching.
(2)
When the thumb touches the tip of the index finger.
(3)
When the thumb touches the middle of the index finger.
We built hand pose models for each scenario and simulated them for a frequency sweep from 1 MHz to 1000 MHz. The resulting S11 plots are shown in Figure 20. The simulation results showed that when the thumb and index finger touch, the S11 value decreases significantly. As the thumb touches the index finger, it loads the index finger antenna, coupling away the signal from the ring. Less power is reflected through the signal electrode, causing S11 to drop. This S11 change allows us to robustly differentiate between a pinch and no pinch). When the thumb contacts the index finger and swipes closer to the ring, the S11 value also decreases. As the physical distance between the thumb and the ring’s contact point decreases, additional signal shunts, leading to less reflected power and a lower S11 value. A.3 S11 Variations When Holding Objects
We explored how the S11 measurements change when a user holds different objects. For this exploration, we choose four objects: a sphere, a disk, a cube, and a cylinder. These objects are variety of shapes and sizes, which will help with object diversity. Additionally, each object requires a different held grip, further increasing variation. Although many materials affect impedance, metal produces the most remarkable impedance changes; hence, we modeled these items as solid aluminum shapes. We designed four hand models holding each object, as shown in figure 21. While conducting the experiments, we found that at higher frequencies, the dimensions of the objects become similar to the quarter wavelength of the excitation signal (for example, at 500 MHz, the quarter wavelength is approximately 15 cm, and at 1000 MHz, it is approximately 7.5 cm). This similarity, along with the close proximity of the objects to the ring, caused the objects to be directly coupled to the ring over the air, affecting the impedance measurements [89]. To avoid this issue, we limited the sweep frequency from 1 MHz to 500 MHz. Figure 21 shows the S11 results from the simulation, indicating subtle but distinguishable differences in the S11 curves for different objects. A.4 S11 Variations When Touching External Surfaces
We conducted two sets of experiments to better understand how impedance is affected when touching passive surfaces. In the first, we varied shape and size of copper sheets. In the second, we explored the effect of touching the same surface at different locations. We used a thin sheet of copper (0.03 mm thick) as our surface material for these experiments.
We tested three different shapes: square (3cm sides), triangle (3cm sides), and circle (3cm diameter). The results of this experiment are shown in Figure 22a . The S11 measurements for the copper-based shapes in our experiments showed a more diversity at lower frequencies and gradually tapered off toward the higher end of the frequency sweep. We also conducted experiments with the same shape (a square) but at different sizes (1 cm, 3 cm, 5 cm, and 7 cm sides), finding that the S11 measurements were more affected by shape than the size. For example, when comparing the square with 3 cm sides and the triangle with 3 cm sides, the maximum difference in S11 values was 0.124 dB at 10 MHz. In contrast, when comparing the square with 1cm sides and the square with 7cm sides, the maximum difference in S11 values was 0.07 dB at 10 MHz. This suggests that the shape of the surface has a greater impact on the S11 measurements than the size of the surface, at least within the range of sizes that we tested. In the second experiment, we placed a finger on a 10-centimeter square piece of copper at three different locations. The results of this simulation are shown in figure 22b . The results confirm that the S11 measurements change depending on touch point. A.5 S11 Variations for Different Users
Since our sensing technique uses the hand as an antenna, the each person’s unique anatomy influences the impedance measurement. To understand how the S11 measurements might look for different individuals, we built hand models with the same posture but differing anatomical characteristics. The differences were brought about by varying characteristics values of biological components such as bone, blood, fat, muscle, and skin. Table
2 lists these values. Figure
23 shows the simulation results. The results reveal that differences in S11 readings between individuals are most significant between 1MHz to about 400MHz.