WiFi-Enabled User Authentication through Deep Learning in Daily Activities WiFi-Enabled User Authentication through Deep Learning in Daily Activities

CONG SHI,

Rutgers University, New Brunswick, New Jersey

JIAN LIU,

University of Tennessee, Knoxville, Tennessee

HONGBO LIU,

University of Electronic Science and Technology of China, Chengdu, Sichuan, China

YINGYING CHEN,

Rutgers University, New Brunswick, New Jersey

ACM Trans. Internet Things, Vol. 2, No. 2, Article 13, Publication date: May 2021.
DOI: https://doi.org/10.1145/3448738

User authentication is a critical process in both corporate and home environments due to the ever-growing security and privacy concerns. With the advancement of smart cities and home environments, the concept of user authentication is evolved with a broader implication by not only preventing unauthorized users from accessing confidential information but also providing the opportunities for customized services corresponding to a specific user. Traditional approaches of user authentication either require specialized device installation or inconvenient wearable sensor attachment. This article supports the extended concept of user authentication with a device-free approach by leveraging the prevalent WiFi signals made available by IoT devices, such as smart refrigerator, smart TV, and smart thermostat, and so on. The proposed system utilizes the WiFi signals to capture unique human physiological and behavioral characteristics inherited from their daily activities, including both walking and stationary ones. Particularly, we extract representative features from channel state information (CSI) measurements of WiFi signals, and develop a deep-learning-based user authentication scheme to accurately identify each individual user. To mitigate the signal distortion caused by surrounding people's movements, our deep learning model exploits a CNN-based architecture that constructively combines features from multiple receiving antennas and derives more reliable feature abstractions. Furthermore, a transfer-learning-based mechanism is developed to reduce the training cost for new users and environments. Extensive experiments in various indoor environments are conducted to demonstrate the effectiveness of the proposed authentication system. In particular, our system can achieve over $94\%$ authentication accuracy with 11 subjects through different activities.

CCS Concepts: • Security and privacy → Authentication;

Additional Key Words and Phrases: User authentication, WiFi signals, IoT

ACM Reference format:
Cong Shi, Jian Liu, Hongbo Liu, and Yingying Chen. 2021. WiFi-Enabled User Authentication through Deep Learning in Daily Activities. ACM Trans. Internet Things 2, 2, Article 13 (May 2021), 25 pages. https://doi.org/10.1145/3448738

1 INTRODUCTION

User authentication aims to verify the legitimacy of a user who is trying to access private resources (e.g., proprietary information and home appliances) and has drawn considerable attention due to the growing concerns of security and privacy leakage. For example, unauthorized users may operate on personal devices that always contain private information. They may also access confidential documents or get in the restricted areas that only allow designated personnel to enter. Furthermore, electronic appliances in smart environments (e.g., homes, offices) have a growing need to provide customized/personalized services such as prohibiting children and elderly people to operate risky appliances (e.g., stove and dryer), adjusting room temperature/lighting conditions and recommending TV content. These advancements in smart environments accelerate the adoption of user authentication in numerous daily activities beyond traditional applications.

Traditional solutions are mainly relying on passwords [18] to authenticate users. These approaches are based on the complexity of the secret and thus require the user to memorize long and tedious passwords to ensure high security. Some other authentication techniques rely on physiological biometrics such as fingerprints, iris patterns, and faces [14, 17]. However, they usually require the installation of dedicated devices (e.g., fingerprint scanner, camera) before deployment and might disclose users’ privacy. Other research studies reveal the behavioral features of users, such as key-press durations [23] during typing and mouse dynamics [37], could be applied to perform continuous user authentication. However, these approaches only work when the user operates the keyboard or mouse. Additionally, gait patterns [22] derived through mobile devices require users to carry additional devices when user authentication is performed. In this paper, we introduce a device-free user authentication approach that eliminates the need of remembering tedious passwords, installing specialized equipment, or carrying any additional devices. The basic idea is to exploit unique physical properties embedded in people's daily activities (e.g., entering an office with proprietary information, opening a refrigerator, or cooking on a stove) to capture each person's physiological and behavioral characteristics to facilitate user authentication.

In recent years, Internet of Things (IoT) devices, such as smart refrigerator, smart TV, smart thermostat, home security system, and wearable devices are interconnected wirelessly because of the prevalence of WiFi technology. The increasing complexity of WiFi links among such devices could provide a rich web of reflected rays that cover almost every corner of indoor environments. Although the wireless signal generated by IoT devices that are designed for many special applications, it has the potential to capture human's unique physiological/behavioral characteristics inherited from people's daily activities when operating such devices (e.g., opening the refrigerator), which provides an appealing direction to differentiate each individual.

Recent years have witnessed the emergence of technologies [30, 35, 36] that explore WiFi signals for user identification. For example, WiFiU [30] presents a user identification system that captures the unique gait patterns of different people with commodity WiFi devices. However, these approaches only apply to a small group of people (i.e., 2 to 7) and are limited to walking people. They either require the users to walk through well-designed paths (e.g., clear Line of Sight (LoS) path between the WiFi devices) or have the WiFi transceivers placed close to each other, which is not practical in many real-world scenarios. Unlike existing approaches, our device-free system could capture distinctive WiFi characteristics exhibited in both walking and stationary daily activities (e.g., using a dryer, watching TV, or fetching a document) by using IoT devices.

In order to exploit human daily activities for user authentication, our device-free system should be able to recognize different types of daily activities and also differentiate each individual user if the same type of activity is performed. Thus, it is essential to derive representative wireless measurements to well capture the physiological (e.g., body shape, height, and weight) and behavioral characteristics (e.g., walking patterns, preferences when operating appliances) of each individual. In addition, recognizing activities and identifying users require different granularity of abstractions from physiological and behavioral features. In general, activity recognition requires less feature granularity than human identification because coarse data representations are sufficient to recognize different types of activities with reasonable accuracy. Therefore, the designed system needs to have the capability to extract different levels of feature representations to perform activity recognition and further conduct human identification. Moreover, in many shared spaces (e.g., corporate offices, apartment living rooms), the wireless measurements to capture unique human characteristics may be distorted by the movements of surrounding people. Thus, the system should be able to suppress the impacts of such interferences and robustly authenticate the user.

Toward this end, we propose to extract the representative features based on both amplitude and relative phase of Channel State Information (CSI) measurements in WiFi signals, which have the potential to reveal unique characteristics of different users. In addition, a three-layer deep neural network (DNN) model is developed to learn high-level abstractions of human physiological and behavioral characteristics for both activity recognition and human identification, which meets the hierarchical nature of our user authentication system involving different granularity levels of activity/human identification. In particular, the DNN scheme detects the activity type (i.e., stationary or walking) in the first layer and obtains the activity details (e.g., walking paths, opening a refrigerator) in the second layer. In the third layer, the model can learn the highest level non-linear abstractions from the representative features obtained from human activities and authenticate the user accordingly. In our prior work [25], we introduce an AutoEncoder-based DNN model that could authenticate users with high accuracy under stable environments. Considering the interferences of surrounding people in many realistic scenarios (e.g., corporate office), this article also explores the spatial diversity brought by multiple WiFi antennas to capture wireless signals transmitted via different propagation paths. Based on the high-dimensional features extracted from multiple antennas, we develop an architecture relying on convolutional neural networks (CNNs) to derive more reliable feature abstractions. Furthermore, to enhance the system extensibility, we develop a transfer learning-based mechanism to adapt the DNN model to new enrollments (e.g., new staff members of a company) or new environments (e.g., new apartments or offices). Additionally, we also build one spoofing detection scheme based on support vector machine (SVM) and study its effectiveness under various spoofing attacks that could be harmful to the proposed system. Extensive experiments involving 11 subjects are conducted in both lab and apartment environments for testing accessing restricted areas and operating risky appliances. The results demonstrate that our device-free system can perform accurate user authentication through human daily activities, and is thus capable to facilitate many emerging applications (e.g., smart homes/offices and smart healthcare) in both corporation offices and residence areas. The main contributions of our work are summarized as follows:

Our study shows that the existing WiFi signals generated by indoor IoT devices can be utilized to capture unique human physiological and behavioral characteristics and thereby authenticate users from their daily activities (i.e., both walking and stationary activities).
Our proposed device-free system leverages a single pair of WiFi-enabled devices to extract both amplitude and relative phase from fine-grained channel state information (CSI) to facilitate accurate user authentication without the active participation of the users.
We develop a deep-learning-based model to capture distinct WiFi fingerprints of different users and identify each individual user. Our system is resilient to various spoofing attacks by integrating with the SVM technique.
To mitigate the interferences from surrounding people, we propose to extract features from multiple antennas and design a CNN-based architecture to derive more reliable feature abstractions for user authentication.
We design a transfer-learning-based mechanism to reduce training efforts when updating an existing DNN model for new user enrollments or new environments.
Extensive experiments are conducted in two environments over a eight-month period, and our system can achieve over $94\%$ and $91\%$ authentication accuracy through walking and stationary activities, respectively.

2 RELATED WORK

Traditional approaches mainly rely on the secure texts or graphical patterns [4, 18, 29] to authenticate users. These approaches require the user to remember long and tedious password, and thus they incur inconvenience for users, elder people with age-related memory loss. Furthermore, simply replying on the knowledge of text secrets makes these approaches vulnerable to various attacks such as password theft, shoulder surfing and smudge attacks [1]. Other research studies resort to physiological biometrics such as fingerprints, iris, and facial information [5, 14, 17] to perform user authentication. These approaches, however, require the installation of dedicated equipment (e.g., fingerprint scanner or iris camera) before deployment.

To overcome the aforementioned weaknesses, some studies explore human behavioral biometrics to perform continuous user authentication. For example, Revett [23] demonstrates the effectiveness of using keystroke dynamics (i.e., key-press duration) as biometrics for user identification. In another instance, Zheng et al. [37] present a mouse-movement-based authentication system by exploring angle preferences when a user operates a mouse. However, these approaches require the user's active participation and can only work when the user operates the keyboard or mouse. Furthermore, Ren et al. [22] utilize the accelerometer embedded in mobile devices to capture unique gait patterns for user identification. Ranjan and Whitehouse [21] propose to recognize the unique hallmarks through wearable sensors readings when the users are operating home appliances. These schemes require users to carry additional devices which may cause inconvenience for users.

Recent years have witnessed great efforts on exploring WiFi signals for various sensing tasks, such as activity recognition [32], walking direction estimation [33] and even vital sign monitoring [16]. Furthermore, researchers demonstrate the possibility of utilizing wireless signals to perform user authentication. Existing studies [11, 15, 30, 35, 36] propose to capture human walking gait pattern and identify users in a small group by examining the CSI measurements. Specifically, Zhang et al. [36] extract a set of 10 features from CSI variations caused by human walking and uniquely identify each individual. Wang et al. [30] correlate movement speed of different body parts with WiFi spectrogram and perform gait-pattern-based user authentication. WiAU [15] proposes an ResNet-based user authentication scheme that could derive unique and robust representations for the walking gaits of legitimate users. These approaches are limited to walking people and they require either the users to walk through well-designed paths (e.g., clear Line of Sight (LoS) path between the WiFi devices) or have the WiFi transceivers placed close to each other, which is not impractical in many scenarios. Moreover, FingerPass [11] proposes a continuous user authentication scheme which leverages CSI in WiFi signals to capture unique behavioral biometrics from finger gestures. Unlike previous works, our system examines the WiFi signals and extracts unique physiological and behavioral characteristics inherited from people's daily activities including both walking activities (e.g., walking between rooms) and stationary activities (e.g., operating appliances) to differentiate each individual person. We exploit the unique individual characteristics from both amplitude and relative phase of CSI during people's daily activities. A deep-learning-based model is developed to learn deep representations and perform both activity recognition and user authentication, which is capable to facilitate many applications in both corporation offices and residential areas. With the proposed CNN-based architecture and the transfer-learning-based mechanism, our system could be extended to new users/environments with reduced training efforts.

3 PRELIMINARIES

The prevalence of WiFi signals emitted from a multitude of smart devices and appliances (laptop, smart refrigerator, and smart microwave oven) can be exploited to capture wireless channel distortions introduced by users’ daily activities. Such distortions can reflect unique physiological (e.g., body shape, height) and behavioral characteristics (e.g., body moving) of different users, even when people are performing the same type of activity. We are thus motivated to utilize channel state information (CSI), which is readily available in WiFi-enabled IoT devices, to capture such channel distortions and perform device-free user authentication.

Channel State Information. The fine-grained CSI describes how an OFDM signal propagates over multiple subcarriers between a pair of transmitter and receiver. It presents the combined effect of scattering, fading, and multi-path, which result in distortions on the amplitude, phase, and angle of arrival of the signal. Compared to the distortions caused by nearby wireless devices (e.g., access points, smartphones), human body movements have more significant impacts on the CSI measurements and thus they could be effectively captured even under the presence of the WiFi interferences. Without loss of generality, the CSI of the ith subcarrier between antenna $m$ and antenna $n$ can be defined as: $H_{i}^{m\leftrightarrow n}=|H_{i}^{m\leftrightarrow n}|e^{j\angle {H_{i}^{m\leftrightarrow n}}}$, where $|H_{i}^{m\leftrightarrow n}|$ and $\angle {H_{i}^{m\leftrightarrow n}}$ denote the amplitude and phase of CSI, respectively. Previous studies have shown their success in utilizing CSI amplitude to identify users based on large-scale body movements such as walking [30--32, 35]. In this work, besides CSI amplitude, we propose to utilize the relative phase to capture more subtle channel variations caused by small-scale body movements along with the users’ unique physiological and behavioral characteristics. As in the example depicted in Figure 1, the difference on signal path lengths, $\Delta {d}$, between two antennas (i.e., $n_1$ and $n_2$) varies as the body moves and thereby results in relative phase shift (e.g., $m\leftrightarrow n_1$ and $m\leftrightarrow n_2$). The relative channel response at the ith subcarrier can be formulated as:

\begin{equation} \hat{H_{i}}=H_{i}^{m\leftrightarrow n_1}(H_{i}^{m\leftrightarrow n_2})^{*}=|\hat{H_{i}}|e^{j\angle {\hat{H_{i}}}}, \end{equation} (1)

where $*$ denotes the complex conjugate, $\angle {\hat{H_{i}}}=-\frac{2\pi }{\lambda }\Delta {d}$ [ 12] is the relative phase value and $\lambda$ is the signal wavelength. Given that $cm$-scale $\lambda$, the relative phase is capable of capturing subtle movements. The relative phase can also eliminate the impact of unpredictable offset on the absolute phase that is always hidden in the hardware control mechanism.

Fig. 1. Relative phase produced by one transmitting antenna and two receiving antennas.

Physiological and Behavioral Biometrics. Next, we present theoretical analysis on exploiting CSI to capture human physiological and behavioral characteristics embedded in daily activities. CSI describes radio channel characteristics of WiFi signals traveling through both the LOS and NLOS paths between a pair of transmitting and receiving antennas. Assuming there are $N$ signal propagation paths, the complex representation of CSI can be formulated as [27, 34]:

\begin{equation} \begin{aligned}H^{m\leftrightarrow n}=\sum _{k=1}^{N}a(k)e^{-j2\pi {f\tau _k}}, \end{aligned} \end{equation} (2)

where $a(k)$ denotes the real-value attenuation, and $\tau _k$ is the phase shift introduced by the propagation delay of the $k\text{th}_{}$ path. When a user presents or conducts activities in the area of interest, the WiFi signal propagation will be affected accordingly, leading to the variations in both amplitude and relative phase of CSI measurements. Please note that the movements of surrounding people may disturb the CSI measurements from an individual user. To mitigate such interferences, we propose a multiple WiFi antenna-based technique in Section 6. We assume that the surrounding people usually keep a proper distance with the target user, so the WiFi signals propagating along some of the paths could capture the user's activity while remaining unaffected by the surrounding people.

People's physiological characteristics, such as body shape, height, and weight, would have unique impacts on signal propagation in terms of absorption and diffraction. Each propagation path would be uniquely affected by user physiological characteristics (e.g., weight, tissues, fat, and muscle), and the summation of the attenuations (i.e., $\sum _{k=1}^{N}a(k)e^{-j2\pi {f\tau _k}}$) will create unique patterns, such as decreased amplitude due to a high signal absorption rate. In addition, the distortions in the relative phase could be affected by the unique density of the human body. For example, a human body with high density could result in a low penetration rate, which leads to a long propagation path and large relative phase shifts. Due to the cm-scale wavelength for WiFi signals (e.g., in $2.4 \text{GHz}$), the relative phase could measure subtle propagation path changes of the signals diffracted by human body, making it capable to capture unique physiological characteristics.

Human behaviors, such as walking gait and gesture preferences, consist of a series of unique movements that would produce a time-series of CSI amplitude/relative phase changes. In accordance with Equation (2), time-series changes of CSI amplitude between $t$ and $t+\Delta {t}$ can be represented as:

\begin{equation} \begin{aligned}|\Delta {H^{m\leftrightarrow n}}|=|H^{m\leftrightarrow n}(t)|-|H^{m\leftrightarrow n}(t+\Delta {t})|. \end{aligned} \end{equation} (3)

The variation in the signal attenuation, $|\Delta {H^{m\leftrightarrow n}}|$, could quantize the changes of WiFi signals reflected by the human body, capturing behavioral characteristics such as walking dynamics (e.g., speed, acceleration) and gesture preferences (e.g., using the left or right hand) of a user. Furthermore, as shown in Figure 2, the paths to the two receiving antennas could also be altered by the movements of different body parts (e.g., leg, torso). Such path length changes between $t$ and $t+\Delta {t}$ could be formulated as:

\begin{equation} \begin{aligned}\Delta \angle {H^{m\leftrightarrow n_1,n_2}}=\angle {H^{m\leftrightarrow n_1,n_2}}(t)-\angle {H^{m\leftrightarrow n_1,n_2}}(t+\Delta {t}). \end{aligned} \end{equation} (4)

The relative phase variation, $\Delta \angle {H^{m\leftrightarrow n_1,n_2}}$, could also capture unique movement patterns in terms of time-series changes in the signal propagation path.

Fig. 2. Unique radio signal attenuation alterations and path length changes caused by human activities.

Figure 3 shows the extracted CSI amplitude and relative phase of a subcarrier over a $802.11n$ WiFi link over a time when two users are walking along the same trajectory (3 rounds each) and opening a cabinet (3 rounds each), respectively, in an office. We observe that both the CSI amplitude and the relative phase exhibit different variation trends between these two users, which confirms that CSI is able to capture the unique physiological and behavioral characteristics of users. Additionally, for stationary activities (e.g., opening a cabinet), the difference on the relative phase is more significant than that on the amplitude, so it indicates the high sensitivity of the relative phase on capturing small-scale body movements.

Fig. 3. CSI amplitude and relative phase of two users when walking or opening a cabinet.

4 SYSTEM DESIGN

4.1 Challenges

Uniqueness of Individual Characteristics. The distortions of WiFi CSI could reflect a person's minute body movements. Additionally, as demonstrated in Section 3, the amplitude and relative phase in WiFi CSI could be affected by the users’ physiological (e.g., shape and height) and behavioral characteristics (e.g., walking gait, gesture preferences). The system needs to extract effective features from WiFi CSI of daily activities to quantize such unique characteristics of each individual user.

System Robustness and Generality. The collected CSI measurements from real-world environments are usually noisy due to the continuous environmental changes, radio interference, and the like. Besides, the movements of surrounding people could also distort the WiFi CSI, introducing variations in the derived users’ characteristics. Therefore, the system should be robust to capture distinguishable characteristics among users from noisy channel measurements while mitigating the impacts of surrounding people.

Recognizing Activity and Identity Simultaneously. Recognizing activity and user identity simultaneously is very important in many smart home/office enabled applications. For instance, the system can prohibit a specific user (e.g., child) to watch TV at a specific time period. However, recognizing activities and identifying users require a different granularity of features extracted from their activities.

4.2 Attack Model

We study the following possible attacks that might be harmful to the proposed authentication system.

Naive Attack. The attacker does not have prior knowledge about the activities the legitimate users performed. In order to pass authentication, the adversary attempts to conduct random activities to create similar impacts on the WiFi signals as the legitimate user.
Content-Aware Attack. The attacker knows about which types of the activities are used for user authentication, but has not observed how the legitimate user performs them. The adversary tries to pass the authentication through performing the same activities as the legitimate user.
Knowledgeable Observer Attack. The attacker is capable of observing the activities performed by the legitimate user for whom the proposed system is authenticating via shoulder surfing or videotaping. The adversary tries to perform the same activity and imitate the legitimate user's behaviors to pass the authentication.

4.3 System Overview

The basic idea of our system is to capture the unique physiological and behavioral characteristics inherited from human daily activities for user authentication leveraging WiFi signals. The users have habitual patterns on their behaviors, so the daily activities usually present high consistency for each individual [26]. As illustrated in Figure 4, our system takes as input CSI measurements from WiFi links between WiFi-enabled IoT devices (e.g., smart appliances), and then extracts both the CSI amplitude and the relative phase for each OFDM subcarrier for signal pre-processing. Unlike previous studies [30, 35, 36] which only utilize CSI amplitude, we explore relative phase along with CSI amplitude to capture representative characteristics through both large-scale walking activities and small-scale stationary activities. Given the amplitude and relative phase information, a band-pass filter is first deployed to eliminate the environmental interferences (e.g., reflected signals from furniture and walls) and ambient noises. We also propose a subcarrier selection algorithm to pick out the subcarriers with stable CSI measurements, which could represent reliable activity characteristics. Before performing features extraction, we examine the moving variance and related short time energy (STE) of the pre-processed data to determine the CSI segments, which capture the location changes for walking activities and body movement for stationary activities.

Next, we will present the core components of our system, Physiological and Behavioral Feature Extraction and Deep Learning-Based User Authentication. We perform activity recognition and user authentication based on the physiological and behavioral features extracted from CSI measurements, which characterize both human activity/identity uniqueness. The system extracts six time domain and three frequency domain features to capture both the physiological and behavioral characteristics of users such as height, shape, and behavioral preference. Specifically, the time domain features, including maximum, minimum, mean, skewness, kurtosis, and standard deviation, aiming to represent the extent of human movements and contour of human body, while the frequency domain features, including spectrogram energy, percentile frequency component, and spectrogram energy difference, are used to depict the fine-grained behavioral characteristics such as moving speed of torso and leg. All the above CSI-based features together provide a comprehensive and detailed representation for walking/stationary activities.

Finally, our system performs activity recognition and human authentication by building a three-layer DNN model based on both AutoEncoder [6] and CNN architectures. To mitigate the interferences from surrounding people, we propose to explore the spatial diversity benefit from MIMO technology and design a CNN-based model to achieve more reliable authentication. Unlike previous authentication schemes based on high-dimension feature sets and linear classification models (e.g., SVM), our DNN model learns non-linear biometric abstractions which are computation efficient and are robust to small-scale input variations (e.g., the variations of features caused by the wearing changes of users). Particularly, we obtain the biometric abstractions with respect to single activity and authenticate the user based on the corresponding CSI activity segment. Figure 4 illustrates the functionality of each layer in our deep learning architecture for people authentication. In particular, the first level coarsely distinguishes the activity types (i.e., walking or stationary activity); the second layer exploits deep representations of the first layer and obtains the activity details such as walking trajectories and detailed stationary activity types (e.g., turning on a light); and the third level obtains even deeper representation of the features and finally completes user authentication process. To reduce the profiling efforts, we design a transfer-learning-based mechanism to adapt the DNN model to new users and environments by retraining the model with a reduced profile size. Additionally, our system is resilient to user spoofing, who either does not exist in legitimate user profiles or tries to mimic a legitimate user's activity, by using a SVM-based model with the generated DNN abstractions.

5 ACTIVITY SEGMENTATION AND FEATURE EXTRACTION

In this section, we first present how to perform data segmentation on the CSI measurements that reflect people's daily activities, and then we proceed to extract effective features that capture unique physiological and behavioral characteristics of people from WiFi signals.

5.1 Activity Detection and Segmentation

To ensure the reliability of the features extracted from the CSI measurements, data calibration and subcarrier selection techniques are developed to mitigate the ambient noises and select the subcarriers with stable CSI measurements, respectively. The details are presented in Section 7.

Both walking and stationary activities lead to the variations in wireless channel, resulting in changing CSI measurements. So we apply the short time energy (STE) upon CSI amplitude's moving variance to detect human activities, and then perform corresponding data segmentation. Moreover, stationary activities (e.g., opening a cabinet) usually involve relative smaller-scale body movements than walking activities, which makes them even harder to be detected. We thus propose to examine STE which is more sensitive to subtle body movements because it is a summation of squared signals within a sliding window. We calculate STE within a sliding window as follows:

\begin{equation} \begin{aligned}STE(t)=\sum _{n=1}^{N}\left(\sum _{k=1}^{K}v_k(t+n)\right)^{2}, \end{aligned} \end{equation} (5)

where $N$ is the length of the sliding window, $v_k(t)$ is the moving variance of CSI amplitude at the $k_{th}$ subcarrier.

Figure 5(a) shows both CSI's moving variance and STE for two rounds of the same stationary activity (i.e., opening a cabinet). When the activity occurs, we can observe that STE exhibits a greater magnitude, showing its potential at activity detection. Furthermore, we also found the peaks of the fluctuating part in STE always locate around the center of the activity duration. We are thus inspired to utilize a dynamic threshold, which can be applied to all types of activities, to perform activity detection and corresponding data segmentation. Specifically, a weight $w=0.1$ derived from our empirical study is deployed for the dynamic threshold calculation: $\tau =w*E$, where $E$ is the maximum value of STE for an individual activity. We then search for the starting and ending points, $t_s$ and $t_e$, of this activity by solving the following objective problem:

\begin{equation} \begin{aligned}& \underset{t_s, t_e}{\operatorname{arg\,min}}\quad t_s+t_e-2t_m \\ & s.t., STE(t_s), STE(t_e) \lt \tau , STE(t_m) \gt \tau , \\ & \quad \quad t_s \lt t_m \lt t_e \end{aligned} \end{equation} (6)

where $t_m$ is an arbitrary time index in the middle of activity duration. Figures 5(b) and 5(c) show the segmented time series of CSI amplitude and relative phase of the $\text{first}$ subcarrier during two rounds of the stationary activity. The results demonstrate the efficiency of our activity detection/segmentation algorithm.

Fig. 5. Illustration of activity detection and segmentation using CSI moving variance and short time energy.

5.2 Physiological and Behavioral Feature Extraction

To capture the unique physiological and behavioral characteristics inherited from users’ daily activities, it is essential to extract effective and reliable features from the CSI measurements. In particular, both time and frequency domain features based on CSI amplitude and relative phase information are examined to discriminate different users.

Time Domain Feature Extraction. In our system, six time domain features with respect to CSI amplitude and relative phase, including maximum, minimum, mean, skewness, kurtosis and standard deviation, will be extracted to characterize both human activity and identity uniqueness. In order to preserve temporal patterns of the user activity and provide finer feature granularity, we first partition the CSI segment into $l$ chunks of equal length, and then extract the six time domain features from each chunk. We empirically set $l$ as 20, which provides 120 feature points from each subcarrier. Figure 6 and 7 present the extracted time domain features for the same stationary activity (i.e., opening a cabinet) performed by two users based on amplitude and relative phase, respectively. We can find that these features are significantly different between two users. It encourages us to leverage these time domain features to capture human unique characteristics inherited from their daily activities.

Fig. 6. Time domain features of CSI amplitude over 20 chunks of one activity at the $1\text{st}$ subcarrier.

Fig. 7. Time domain features of CSI relative phase over 20 chunks of one activity at the $1st$ subcarrier.

Frequency Domain Feature Extraction. As indicated in previous work [31], CSI measurements in the frequency domain are able to reveal the speeds of WiFi path length changes caused by human movements. Therefore, besides the time domain features, we also extract the representative features in frequency domain to capture the users’ behavioral characteristics (e.g., walking gait, gesture preference). Given a CSI segment, we first adopt the short-time Fourier transform (STFT) to obtain the two-dimensional spectrogram for the CSI amplitude and relative phase of each subcarrier. More specifically, we calculate 1,000 points FFT within a $100\text{-ms}$ sliding window, shifting 50 ms each time. We then use bicubic interpolation [10] to resize the spectrogram into a matrix $M_{(i,j)}$ (i.e., 10-by-10 matrix) of fixed size, which maintains spectrogram in a consistent feature space for different activities. Next, three frequency domain features on the top of $M_{(i,j)}$ are extracted: (1) Spectrogram Magnitude. Each element in the matrix $M_{(i,j)}$; (2) Percentile Frequency Components (PFC). $PFC(i,n)=\frac{\sum _{j=1}^{n}M_{(i,j)}}{\sum _{j=1}^{10}M_{(i,j)}}$, where $n=1\dots 10$, subjected to and , indicating the moving speed of torso and leg [30]. (3) Spectrogram Difference between Time Windows. The element-wise differences between two consecutive rows in $M_{(i,j)}$, which capture the acceleration or deceleration process of body movement. In total, we extract 210 frequency domain features from the CSI segment with respect to one specific activity.

6 DEEP-LEARNING-BASED HUMAN AUTHENTICATION

In this section, we present the proposed deep-learning-based approach for both activity recognition and user authentication. To perform activity recognition and further user authentication, we propose to develop a deep neural network (DNN) to extract high-level abstractions from the extracted CSI features. As illustrated in Figure 8, a three-layer deep neural network [28] model is proposed. Given a set of CSI features $C_m$ from a link $m$, we define $h_i$ ($i$ = 1,2,3) as the activation functions to encode the input, which can be either CSI features or modeled abstractions (i.e., $Z_1$ or $Z_2$) of the previous layer, into a set of compressed representations, which is then fed into classification functions (e.g., SVM [3] or softmax function [2]) in each layer. Specifically, the proposed DNN network extracts abstractions for activity type recognition (i.e., stationary or walking) in the first layer and obtains the detailed abstractions for a specific activity (i.e., the specific type of activity) in the second layer. We denote $Z_1$ and $Z_2$ as the outputs (i.e., high-level, complex abstractions as data representations) from the first two layers, respectively. The third layer learns the highest level abstractions to facilitate user authentication process. Particularly, we consider the scenarios where either single or multi-antenna pairs are presented and propose two deep-learning architectures based on stacked autoencoders and convolutional neural networks. The details of abstraction extraction grounded on stacked autoencoders are presented in our previous work [25].

Fig. 8. Deep learning architecture for user authentication.

6.1 Multi-Antenna Abstraction Extraction Grounded on a Convolutional Neural Network

In many shared spaces (e.g., corporate offices, apartment living rooms), the CSI measurements to capture a user's physiological/behavioral characteristics are easily distorted by the movements of surrounding people. We assume that the surrounding people keep a proper distance from the target user, so that propagation paths could capture the user's activity while remaining unaffected by the surrounding people. To effectively capture such propagation paths, we explore the spatial diversity benefit from MIMO technology and propose a CNN-based model to achieve more reliable user authentication.

Given the high-dimensional features extracted from multiple wireless antennas, the autoencoder architecture proposed in our previous work is not effective. This is because its fully connected network structure requires a huge number of neurons for processing a large set of features. Additionally, such architecture does not take into account the spatial diversity of the feature set from multiple antennas, treating the input features of different antenna pairs as those of single antenna pair. Therefore, we exploit CNN [13], a more effective deep-learning framework, for multi-antenna abstraction extraction.

The proposed CNN model could constructively integrate features from multiple antennas and extract feature abstractions that are more robust under the impacts of surrounding people. It first packs the feature vectors from all $M$ available antenna pairs into a matrix $C=\lbrace C_1;...;C_M\rbrace$. Then, the CNN model processes the data matrix leveraging three stacked hidden layers and extract different levels of abstractions for activity type recognition, activity recognition, and user authentication, respectively. Each neuron applies convolution operation (i.e., cross-correlation) to a subset of features/abstractions, instead of the whole feature/abstraction set. In this way, the CNN model is able to process a large feature set with fewer neurons while extracting effective multi-antenna feature abstractions.

Each hidden layer of the CNN consists of three components, a 2D convolutional layer, a rectified linear unit (ReLU), and a pooling layer. The 2D convolution layer contains a group of $K$ neurons, where each neuron acts as a filter that iteratively computes dot products between a learnable coefficient matrix $W^k$ with height $h$ and width $w$, and subsets of inputs. Specifically, the input subsets are obtained by applying a 2D sliding window on the layer input. The sliding window has a step size of 2 and same height and width as $W^k$. By setting $h$ as the input height (e.g., number of antennas, $M$), the sliding window could include a subset of features/abstractions covering all available antenna pairs. Thus, the derived abstractions are capable to represent all the physiological and behavioral characteristics of users captured by different antenna pairs. We empirically set the width of $w$ as 3. Then a ReLu layer is attached to the convolutional layer to introduce nonlinearity by replacing negative neuron outputs with 0. A max-pooling layer is added to reduce the size of abstraction vector in each hidden layer. The hidden layers are trained with stochastic gradient descent with momentum (SGDM) optimizer [24] with a learning rate of 0.01. The structure of CNN could facilitate abstraction extraction given high-dimensional inputs.

6.2 Activity Recognition and User Identification

Given three hidden layers with either autoencoder or CNN architecture, we stack the hidden layers and a softmax layer to construct a hierarchical model for activity recognition and user authentication. Previous work [28] found that higher level feature abstractions are more robust to small-scale input variations, which meets the hierarchical requirements of our system. Additionally, the three-layer DNN itself can only derive compressed representations of physiological and behavioral characteristics, so we still need a softmax function [2] in each layer to complete the activity recognition and user authentication process in a hierarchical order. Specifically, we define the softmax function as follows:

\begin{equation} \begin{aligned}P(L_k|Z)=\frac{P(Z|L_k)P(L_k)}{\sum _{j=1}^{K}P(Z|L_j)P(L_j)}, \end{aligned} \end{equation} (7)

where $P(L_k|Z)$ denotes the posterior probability of class label $L_k$ given an abstraction $Z$ and $P(L_k)$ represents the prior of the same class. We use $P(Z|L_k)$ to denote likelihood of the abstraction $Z$ given label $L_k$. In addition, the equation is constrained by $0\lt P(L_k|Z)\le {1}$ and $\sum _{k=1}^{K}P(L_k|Z)=1$. The outputs of each softmax function characterize the probability distribution over $K$ profiled classes (e.g., activity/identity), and the abstraction $Z$ will be classified as class $k$, which satisfies $k=argmax_{k\in {K}}P(L_k|Z)$.

Our system constructs a CNN model consisting of three softmax layers for activity separation, activity recognition, and user authentication, respectively. During the user enrollment phase, we collect activity data from each legitimate user as the training data (i.e., user profile) for model construction. The training data are segmented and processed as discussed in Section 5. To reduce the profiling efforts, we find a minimal profile size (i.e., the number of activity segments) for each individual user each activity. Our strategy is to examine the user identification accuracy during data collection. In particular, our system retrains the CNN model after collecting every activity segment from the user and classifies the identity. The data collection process would stop when the user identification accuracy meets a predefined threshold (e.g., $90\%$). The system will profile other activities of the user in the same manner. By using this strategy, our system can find a minimal profile size for each user. We will discuss the activity recognition/user authentication accuracies against the data size for determining the threshold in Section 8.5.

6.3 Transfer Learning

After model construction, the CNN model may still need to update due to many practical factors, such as new enrollments and environment changes. To reduce the training efforts when updating the CNN model, we introduce a supervised transfer learning-based approach, which could adapt the previously trained CNN model based on existing data for new users or new environments. Please note that we only apply transfer learning to the proposed CNN architecture. We exploit inductive transfer learning [19], where the settings of the CNN model (i.e., the number of layers and neurons) are the same as the original one, while the number of categories (i.e., users/activities) is increased with new enrollments. The parameters of the new CNN model is initialized with the CNN trained with existing data. Next, we attach a softmax layer with extended categories to the new CNN model for classifying a larger group of users, including new users. To accommodate new users/activities in our system, the model and the softmax layer parameters, $\theta _n$, will be fine-tuned as following:

\begin{equation} \begin{aligned}Y_n\equiv {CNN(X_n, \theta _s, \theta _n)}, \end{aligned} \end{equation} (8)

where $X_n$ and $Y_n$ denote the training data and ground truth of new enrollments. The CNN model is trained with SGDM optimizer at a learning rate of 0.001, which is 10 times smaller than the learning rate of original CNN model [ 7]. In this manner, the new CNN model would require less training samples from new users. Our evaluation in Section 8.6 show that the transfer learning technique could reduce around six training samples from each new individual user.

Besides adapting the deep learning model to new users and environments, our transfer learning technique can also accommodate the users’ biometric variations associated with age changes, which result in different weight, height, and behaviors. Our system can retrain the CNN model with only a few new activity segments of an enrolled user, which adapts the model to identify the user with changed biometrics. For adolescents with rapid growth in height and varying behaviors, we will keep updating their profiles by storing activity segments during authentication and retrain the model. Compared to new user enrollment and environment changes, updating the biometric profile of an enrolled user would be less challenging since the user's physiological and behavioral characteristics are partially shared in the trained model.

6.4 SVM based Spoofer Detection

Besides three-layer DNN, we also adopt a SVM model [3] to determine whether the activity the user performed matches one of the legitimate user profiles. Particularly, we propose to utilize one-class support vector machine with Gaussian kernel for detecting the user spoofing, who either does not exist in legitimate user profiles or tries to mimic a legitimate user's activity. We first construct an one-class SVM model for each of the legitimate users based on the high-level abstractions from the DNN model. We then derive a class score $S_u$, which compares the similarity between the feature abstractions of each testing sample and that of the user $u$:

\begin{equation} \begin{aligned}S_u(Z)=\sum _{i}^{N_u}k(Z_{u,i}, Z)+b_u, \end{aligned} \end{equation} (9)

where $Z$ is a sample abstraction, $Z_{u,i}$ is the $i$th support vector of the user $u$, $k()$ represents the Gaussian kernel function, and $b_u$ is the function bias. Greater value of the class score $S_u$ represents that the testing sample has the less distance to the support vectors of user $u$. An empirically set threshold $\eta$ thus is used to detect possible spoofers. The testing sample would be determined as spoofer/attacker if the derived class scores (i.e., $S_u$) are less than $\eta$ from all the legitimate user profiles. The steps to construct an SVM model for spoofer detection is similar to the process for training activity recognition and user identification models as discussed in Section 6.2.

7 DATA CALIBRATION & SUBCARRIER SELECTION

In this section, we introduce how to ensure the reliability of the extracted CSI amplitude and relative phase from the noisy wireless measurements.

7.1 Data Calibration

To ensure reliable feature extraction, we preprocess the raw CSI measurements with phase unwrapping and a band-pass filtering techniques, which are effective on eliminating the environmental interferences and ambient noises. Specifically, we first eliminate the relative phase error caused by the phase offset on each subcarrier. As shown in Figure 9(a), the raw relative phase at three subcarriers (i.e., subcarriers 4, 5 and 6) have obvious discontinuities between consecutive packets when the relative phase value is close to $\pm \pi$. To eliminate such discontinuities, a $\pm {2\pi }$ is added to the relative phase of the later packet if the absolute phase difference of two consecutive packets is greater than or equal to $\pi$. Figure 9(b) shows the corresponding relative phase streams after phase calibration.

Fig. 9. Correcting relative phase to eliminate phase offset via unwrapping.

Besides the phase offset, the amplitude and relative phase in CSI measurements are also easily affected by the low-frequency interference (i.e., caused by signals reflected from static objects) and high-frequency noise. In order to eliminate the above impacts while preserving the user physiological and behavioral characteristics in CSI measurements, a bandpass Butterworth filter [20] is adopted. Previous work [31] found that the frequency range of most human activities including running in a fast speed exhibit CSI frequency components less than $300 \text{Hz}$. We thus adopt a relative low frequency band-pass Butterworth filter (i.e., with passing band $5 \text{Hz}$–$100 \text{Hz}$) to effectively remove both low- and high-frequency components from the spectrum. Given the example scenario where a person walks in a room between 2 and 7 seconds, Figure 10(a) shows the spectrogram of the corresponding time series of CSI amplitude at a subcarrier (i.e., subcarrier 1). We can observe that the spectrogram exhibits extremely high energy level in the low-frequency band (i.e., $\lt 10 \text{Hz}$) even the person remains static. As the spectrogram after band pass filtering shown in Figure 10(b), we can observe that the CSI amplitude pattern caused by human walking is still preserved while irrelevant frequency components are removed.

Fig. 10. CSI amplitude spectrogram of at the 1st subcarrier before and after data calibration.

7.2 Subcarrier Selection

Our preliminary study finds that the CSI measurements of several subcarriers are more sensitive to ambient noise. To ensure the reliability of CSI measurements for later processing, we propose a new subcarrier selection method to determine the noise-resilient subcarriers from the CSI measurements. The CSI measurements at neighboring subcarriers are usually highly correlated, however such correlation could be destroyed by heavy noises on some of the subcarriers. To eliminate the negative effects caused by the unstable subcarriers, a covariance-based scoring function is defined to assess each subcarrier's correlation level with its neighboring subcarriers as follows:

\begin{equation} \begin{aligned}score(i)=\sum _{n=1}^{N}\sum _{j=i-\frac{k}{2}}^{i+\frac{k}{2}}\frac{cov_{i,j}(t)-|cov_{i,j}(t)|}{2}, \end{aligned} \end{equation} (10)

where $N$ is the number of non-overlapped 1-second length time windows being divided in the short phase, $k$ is the number of its close-by subcarriers being compared, and $cov_{i,j}$ denotes the covariance value between the CSI relative phase at the i th and j th subcarriers. Figure 11 presents an example showing the scores of four subcarriers (i.e., subcarriers 4, 5, 6, and 7) based on the CSI measurements collected in 30 seconds. We can observe that the CSI measurements of subcarrier 7 keep fluctuating in both empty-room and human walking cases, so it implies the instability of the subcarrier 7 is not caused by human movements. As a result, subcarrier 7 has the lowest score, indicating it has the lowest correlation with its adjacent subcarriers. Through our empirical study on choosing different numbers of subcarriers, we find that selecting the top 20 subcarriers could enable accurate user authentication by excluding noisy subcarriers.

Fig. 11. Detecting noisy subcarriers by using a covariance based scoring function: Subcarrier 7, which has the lowest score, is not a stable subcarrier that can be used in the system.

8 PERFORMANCE EVALUATION

In this section, we evaluate the performance of the proposed system on activity recognition and user authentication in both a university office and an apartment.

8.1 Experimental Methodology

Devices and Network. We emulate the WiFi network in IoT environments with two commercial laptops equipped with $802.11n$ WiFi NICs (i.e., Intel 5300 NICs). Specifically, we deploy a Dell $\text{E}6430$ laptop as transmitter and a Lenovo $\text{T}61$ laptop as receiver. Both the transmitter and receiver are equipped with three embedded antennas and run Ubuntu 14.04 operating system with the 4.2.0 kernel for measuring CSI over 30 subcarrier groups [8]. We extract the CSI amplitude on the link between the main antenna pair (i.e., 1st antenna in both transmitter and receiver), and compute the relative phase of CSI between the two links from the transmitter's main antenna to the first two antennas on the receiver. For multi-antenna abstraction extraction, we extract CSI amplitude/relative phase from all transmitter-receiver antenna pairs (i.e., 9 links). The packet transmission rate is fixed at $1,000$ pkts/s to enable high-resolution analysis in the frequency domain.

Evironments and Activities. The proposed system is evaluated in both a university office and an apartment with the size of $26 \text{ft}\times 14 \text{ft}$ and $36 \text{ft}\times 22 \text{ft}$, respectively. Figure 12 shows the experimental setups involving two laptops to emulate as IoT-enabled devices (e.g., smart refrigerator and smart TV) and generate WiFi traffics. A total of 8 walking activities and 8 stationary activities (30 rounds for each) are performed by 11 and 5 volunteers in these two indoor environments, respectively. Fourteen volunteers are males and the other two volunteers are females. The weights of the 16 volunteers are within 53 kg—78 kg, and their heights range from 1.51 m—1.87 m. Some volunteers have similar body shape and height. Due to the functionality differences of the two environments, we choose different yet still typical stationary activities in the two environments. The details of the activities are listed in Table 1, and the locations of stationary activities and walking trajectories are also shown in Figure 12. In total, we collect $3,336$ activity segments performed by 11 subjects in the office environment, and 834 activity segments performed by 5 subjects in the apartment. Unless mentioned otherwise, half of the collected dataset (i.e., 15 rounds of each activity per subject) is used for training the DNN model, and the rest of data is used for testing the system performance.

Fig. 12. Experimental setups and the illustration of activities in an office and an apartment.

Table 1. Detailed Daily Activities Performed

Code	Walking activity	Code	Stationary activity
A	Entrance$\rightarrow$Seat	a	Working (i.e., typing keyboard)
B	Seat$\rightarrow$Entrance	b	Turning on the light
C	Seat$\rightarrow$Light Switch	c	Opening the cabinet
D	Light Switch$\rightarrow$Seat	d	Fetching documents
E	Seat$\rightarrow$Cabinet	e	Eating at the table
F	Cabinet$\rightarrow$Seat	f	Opening the microwave oven
G	Entrance$\rightarrow$Kitchen	g	Opening the refrigerator
H	Kitchen$\rightarrow$Entrance	h	Opening the door

Classification Strategies. In the user enrollment phase (i.e., training phase), our system collects CSI measurements through WiFi scanning while people are performing daily activities. Then, we associate activity/identity labels with the corresponding CSI segments as the training data (i.e., user profile). We extract both time and frequency domain features from the data and feed them to the CNN model for training. We empirically determine the filter size of each CNN layer as 64 and kernel size as $(3, 3)$. During the testing phase, our system first segments and processes the CSI measurements, which is the same as that in the training phase, and then identifies the user through the trained CNN model. The inference time for an activity segment is around 0.78 seconds (i.e., including feature extraction and user identification).

Evaluation Metrics. To evaluate our system, we use four evaluation metrics: identification accuracy, confusion matrix, true-positive rate (TPR), and false-positive rate (FPR). Following existing work on using WiFi signals for user identification [35], we use identification accuracy as the main evaluation metric. It is defined as the percentage of the human identity/activity correctly recognized by our system. We use the confusion matrix to further illustrate user identification/activity recognition performance, with each entry represents the percentage of correctly classified identify/activity. Each column in the confusion matrix indicates the ground truth of an identity/activity and each row represents the classified identity/activity in our system. TPR denotes the percentage of users that are correctly verified as legitimate users, and FPR is the percentage of attackers that mistakenly pass our system.

8.2 User Authentication Performance

We first present the performance of the proposed system on user identification in both office and apartment environments. As shown in Figure 13(a), we observe that, our system achieves over $93.0\%$ user identification accuracy for all users and the average accuracy is $96.8\%$ with a standard deviation of $2.6\%$ in the office environment. Compared with WiWho [35], our system can identify a larger group of people (i.e., 11 vs. 6) with higher user identification accuracy (i.e., $96.8\%$ vs. $80.0\%$). Figure 13(b) gives the confusion matrix for the user identification in the apartment. Our system achieves over $93.0\%$ identification accuracy for four of the users. The average user identification accuracy is $95.4\%$ with a standard deviation of $1.8\%$. We have comparable high accuracies on user authentication in the two different environments, and thereby confirm the effectiveness and reliability of the proposed system on user identification.

Fig. 13. Performance of user authentication grounded on multi-link abstraction extraction.

8.3 Spoofer Detection Performance

Robustness to Spoofing Attacks. As indicated in Section 6.4, threshold selection is critical for accurate detection of spoofing attacks. To obtain an appropriate threshold, we simulate a spoofing attack scenario in an office environment, where 8 of the 11 users act as spoofers and the rest are legitimate users. A threshold that maximizes the TPR while minimizing the FPR is selected for evaluating our system under naive, content-aware, and knowledge observer attacks.

We take turns to set each participant as the legitimate user and the remaining users as the attackers in both office and apartment environments. We first evaluate the performance of our system on defending against naive attacks, where the attacker does not possess prior knowledge about the user's activities. As shown in Figure 14, our system can achieve over $96.3\%$ TPR with less than $3.2\%$ FPR for all users in the two environments. The results confirm that the random activities of attackers can hardly create similar biometrics as those of the profiled activities of the legitimate users, and thus the proposed system can reliably defend against random attacks. Figure 15 shows the performance of the proposed system under content-aware attacks, where the attackers try to pass the authentication system through performing the same activity as the legitimate user. The average TPRs are $88.5\%$ and $90.4\%$ for the office and the apartment environments, respectively, with FPR lower than $10\%$ in both environments. The results show that even when the adversary has a similar body shape and height to the legitimate user, our system can still identify the legitimate user based on unique behavioral characteristics. Figure 16 shows the robustness of our system against the knowledge observer attack, the most extreme attack, where the attacker is capable of observing the behaviors of the legitimate user. For both environments, we find our system can still achieve over $86.3\%$ and $87.6\%$ TPR, respectively. The FPRs are less than $8.3\%$ and $10.1\%$ in the two environments. The results show that the proposed system is still effective in defending against the knowledgable observer attacks.

Fig. 14. Performance of our system on defending against naive attacks.

Fig. 15. Performance of our system on defending against content-aware attacks.

Fig. 16. Performance of our system on defending against knowledgeable observer attacks.

We further analyze the system's robustness using entropy-based metric, which quantifies the system's security strength. Due to the limited size of our dataset, we use the Euclidean distance between every pair of feature abstractions from different users to calculate the entropy, instead of performing uncertainty analysis on a large group of users. The entropies for the spoofing detection in the two environments are 7.9 and 7.4, respectively. This means that our system at least has a similar security level to human chosen 4-digit PINs [9].

8.4 Activity Recognition Performance

We examine the activity recognition performance in the DNN model. Figure 17 depicts the confusion matrix for activity recognition (i.e., outputs of DNN layer 2) in both office and apartment environments. The average activity recognition accuracies for both stationary and walking activities are as high as $98.6\%$ and $99.1\%$ in the office and apartment, respectively. Further, it is encouraging to find that DNN model achieves similar accuracy for both stationary and walking activities. The slight difference on the recognition accuracy between the two types of activities is caused by the limited resolution of WiFi signals on capturing small scale body movements for stationary activities. Overall, the proposed DNN model is highly effective on recognizing different types of activities.

Fig. 17. Performance of activity recognition grounded on multi-link abstraction extraction.

8.5 Impact of Various Factors

Impact of Surrounding People. In the office environment, we ask one, two, or three participants to act as surrounding people and to walk around when a target subject is conducting activities, which is a typical scenario in small offices and homes. We use walking activities to examine the system's robustness since they produce more significant interference to CSI than stationary activities (e.g., turning on a light, sitting). In the experiment, the participants walk 6 feet away from the target subject, without cutting the LOS path between the transmitter and the receiver, and the distance between every two people is around 6 feet . We do not limit their walking trajectory/speed. We recruit another five subjects to act as target subjects and each of them conducts six walking and four stationary activities. The activity profiles are constructed by using the CSI measurements collected without the interferences. Figure 18 shows the performance of multi-antenna- and single-antenna-based abstraction extraction schemes. We can observe that the multi-antenna based-scheme could help to maintain high user recognition/authentication accuracies under interferences from different number of surrounding people. As shown in Figure 18(a), the proposed scheme achieves $93.1\%$, $90.2\%,$ and $87.1\%$ average accuracies under the interference of 1, 2, or 3 surrounding people, respectively. We can also find that the single-antenna-based scheme is susceptible to the impacts from the activity of surrounding people. Furthermore, as shown in Figure 18(b), the multi-antenna-based scheme could also maintain a high activity recognition accuracy under the impacts of people moving nearby. The results confirm the effectiveness on mitigating the interferences of surrounding people.

Fig. 18. User authentication and activity recognition performance under the impacts of surrounding people.

Feature Comparison. To further analyze the impact of different features on the system performance, we compare the authentication performance using different kinds of CSI features in both time and frequency domains: Amplitude, Relative Phase, and all of these features (i.e., Combined). We present the comparison results of user authentication accuracy in Figure 19 for both office and apartment environments. Figure 19(a) shows that the CSI relative phase features have relative higher user identification accuracies for stationary activities (e.g., $a,b,c,d$) comparing to the amplitude feature. This is primarily because relative phase exhibits higher sensitivity on capturing small-scale body movements. In addition, we also find in both Figures 19(a) and 19(b) that the combined features of both CSI amplitude and relative phase achieve the best performance, indicating the combining features can provide the finest features to distinguish individual subjects.

Fig. 19. Comparison of features used in deep learning based user authentication under different activities.

Impact of Training Size. The DNN model needs to build CSI profiles for each activity or each individual before performing activity recognition and human authentication. It is necessary to study the impact of training size on system performance. Here we define the training size as the number of training samples for each activity or for each individual. As shown in both Figures 20(a) and 20(b), our system can achieve consistently high accuracy on user identification and activity recognition with different training sizes. Especially, our deep learning model maintains over $90\%$ accuracy on user identification and activity recognition even with the training size as small as 4. The above results show that our system has minimum requirements on building the CSI profile while ensuring remarkable performance.

Fig. 20. System performance under different training sizes.

Impact of Sampling Rate. In order to validate that our user authentication scheme can work under various sampling frequencies of WiFi-enabled IoT devices, we evaluate our system under different frame rates. We show the average user authentication accuracy of office and apartment under different sampling rates in Figure 21. We can observe that our system can maintain high accuracy across different frame rates from $200 \text{Hz}$ to $1,000 \text{Hz}$. Particularly, the authentication accuracy is still over $86\%$ even for the low sampling rates such as $200 \text{Hz}$ and $400 \text{Hz}$. The above observations confirm that our system can be applied on IoT devices with different sampling capabilities.

8.6 Effectiveness of Transfer Learning

Finally, we conduct experiments with five new users on evaluating the performance of the proposed transfer learning technique. Specifically, we reuse the parameters of a pre-trained CNN for differentiating five existing users to initialize the weights of a new CNN model for recognizing new users. Additionally, we leverage another CNN model directly trained with data of the same 10 users as the baseline. Both the existing and new CNN models are trained/fine-tuned with all data from the same amount of samples (i.e., 2-14) from new users. As shown in Figure 22(a), transfer learning can greatly improve user identification accuracy under various training sizes for the new users. Especially, our transfer learning method could help to achieve $86.1\%$ accuracy even with the training size as low as 2. Such improvements could greatly reduce training samples required for new enrollments. Our other experiments with fewer new users (e.g., $1\sim {2}$ users) show more significant improvements when utilizing the transfer learning technique.

Fig. 22. Comparison of training sizes for new users/environments with/without the proposed transfer learning method.

We also exploit transfer learning to adapt an existing CNN model to new environments. Both the existing and the new CNN models are trained with data from both environments (i.e., apartment and office) but the existing CNN model does not reuse the parameters of the pre-trained CNN. As shown in Figure 22, transfer learning could help to increase the user identification accuracy by around $10\%$ for each of the training sizes. The results show that the proposed transfer learning based scheme could greatly improve system extensibility for new user enrollments/environments.

9 CONCLUSION

As the proliferation of the Internet of Things (IoT), the prevalence of wireless connections among IoT devices provides the opportunity to authenticate users through examining the wireless signal characteristics inherited from daily activities. In this article, we propose a device-free user authentication system by extracting unique physiological and behavioral characteristics embedded in human daily activities captured by the fine-grained CSI. Our system takes one step forward to support the extended user authentication concept in not only preventing unauthorized users to access restricted information but also identifying users for customized services (e.g., prohibiting a kid to operate a hot stove) in both corporate and home environments. We find that both amplitude and relative phase available in CSI readings are impacted by the environmental changes caused by human activities in different scales. To extract meaningful patterns from noisy CSI measurements, we design data calibration and subcarrier selection algorithms to filter out various noises while preserving human physiological and behavioral characteristics. A CNN-based user authentication mechanism is developed leveraging the extracted CSI features in both time and frequency domains to accurately identify each individual. By utilizing features from multiple WiFi antennas, the proposed CNN model could robustly authenticate a user even under interferences from surrounding people. Additionally, we design a transfer learning-based approach to reduce training efforts for adapting the CNN model to new users/environments. We show that the proposed system can authenticate users with high accuracy while being resilient to various spoofing attacks.

REFERENCES

Adam J. Aviv, Katherine L. Gibson, Evan Mossop, Matt Blaze, and Jonathan M. Smith. 2010. Smudge attacks on smartphone touch screens. Woot 10 (2010), 1–7.
Christopher M. Bishop. 2006. Pattern recognition. Machine Learning 128 (2006).
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273–297.
Rachna Dhamija and Adrian Perrig. 2000. Deja Vu - A user study: Using images for authentication. In Proceedings of the 9th Conference on USENIX Security Symposium (SSYM), Vol. 9. 4. https://dl.acm.org/doi/10.5555/1251306.1251310
Benoit Duc, Stefan Fischer, and Josef Bigün. 1999. Face authentication with Gabor information on deformable graphs. IEEE Transactions on Image Processing (IEEE TIP) 8, 4 (1999), 504–516.
Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010. Why does unsupervised pre-training help deep learning?Journal of Machine Learning Research 11 (2010), 625–660.
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2016. Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 1 (2016), 142–158.
Daniel Halperin, Wenjun Hu, Anmol Sheth, and David Wetherall. 2011. Tool release: Gathering 802.11 n traces with channel state information. ACM SIGCOMM Computer Communication Review (2011), 53–53.
Keerati Inthavisas and D. Lopresti. 2012. Secure speech biometric templates for user authentication. IET Biometrics 1, 1 (2012), 46–54.
Robert Keys. 1981. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing 29, 6 (1981), 1153–1160.
Hao Kong, Li Lu, Jiadi Yu, Yingying Chen, Linghe Kong, and Minglu Li. 2019. FingerPass: Finger gesture-based continuous user authentication for smart homes using commodity WiFi. In Proceedings of the 20th ACM International Symposium on Mobile Ad Hoc Networking and Computing. ACM, 201–210.
Manikanta Kotaru, Kiran Joshi, Dinesh Bharadia, and Sachin Katti. 2015. SpotFi: Decimeter level localization using WiFi. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (ACM SIGCOMM). 269–282.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.
Ajay Kumar and Arun Passi. 2010. Comparison and combination of iris matchers for reliable personal authentication. Pattern Recognition 43, 3 (2010), 1016–1026.
Chi Lin, Jiaye Hu, Yu Sun, Fenglong Ma, Lei Wang, and Guowei Wu. 2018. WiAU: An accurate device-free authentication system with ResNet. In Proceedings of the 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (IEEE SECON). 1–9.
Jian Liu, Yan Wang, Yingying Chen, Jie Yang, Xu Chen, and Jerry Cheng. 2015. Tracking vital signs during sleep leveraging off-the-shelf WiFi. In Proceedings of the 16th ACM International Symposium on Mobile Ad Hoc Networking and Computing (ACM MobiHoc). 267–276.
Cástor Mariño, Manuel G. Penedo, Marta Penas, María J. Carreira, and F. Gonzalez. 2006. Personal authentication using digital retinal images. Pattern Analysis and Applications 9, 1 (2006), 21.
Robert Morris and Ken Thompson. 1979. Password security: A case history. Commun. ACM 22, 11 (1979), 594–597.
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345–1359.
Lawrence R. Rabiner, Bernard Gold, and C. K. Yuen. 1978. Theory and application of digital signal processing. IEEE Transactions on Systems, Man, and Cybernetics 8, 2 (1978), 33–43.
Juhi Ranjan and Kamin Whitehouse. 2015. Object hallmarks: Identifying object users using wearable wrist sensors. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (ACM Ubicomp). 51–61.
Yanzhi Ren, Yingying Chen, Mooi Choo Chuah, and Jie Yang. 2013. Smartphone based user verification leveraging gait recognition for mobile healthcare systems. In Proceedings of the Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON). 149–157.
Kenneth Revett. 2009. A bioinformatics based approach to user authentication via keystroke dynamics. International Journal of Control, Automation and Systems 7, 1 (2009), 7–15.
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533.
Cong Shi, Jian Liu, Hongbo Liu, and Yingying Chen. 2017. Smart user authentication through actuation of daily activities leveraging WiFi-enabled IoT. In Proceedings of the 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing. ACM, 5.
Burrhus Frederic Skinner. 1953. Science and Human Behavior. Simon and Schuster.
David Tse and Pramod Viswanath. 2005. Fundamentals of Wireless Communication. Cambridge University Press.
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11 (2010), 3371–3408.
Ding Wang and Ping Wang. 2016. Two birds with one stone: Two-factor authentication with security beyond conventional bound. IEEE Transactions on Dependable and Secure Computing 15, 4 (2016), 708–722.
Wei Wang, Alex X. Liu, and Muhammad Shahzad. 2016. Gait recognition using WiFi signals. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (ACM Ubicomp). 363–373.
Wei Wang, Alex X. Liu, Muhammad Shahzad, Kang Ling, and Sanglu Lu. 2015. Understanding and modeling of WiFi signal based human activity recognition. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking (ACM MobiCom). 65–76.
Yan Wang, Jian Liu, Yingying Chen, Marco Gruteser, Jie Yang, and Hongbo Liu. 2014. E-eyes: Device-free location-oriented activity identification using fine-grained WiFi signatures. In Proceedings of the 20th Annual International Conference on Mobile Computing and Networking (ACM MobiCom). 617–628.
Dan Wu, Daqing Zhang, Chenren Xu, Yasha Wang, and Hao Wang. 2016. WiDir: Walking direction estimation using wireless signals. In Proceedings of ACM International Joint Conference on Pervasive and Ubiquitous Computing (ACM Ubicomp). 351–362.
Yaxiong Xie, Zhenjiang Li, and Mo Li. 2018. Precise power delay profiling with commodity Wi-Fi. IEEE Transactions on Mobile Computing (2018).
Yunze Zeng, Parth H. Pathak, and Prasant Mohapatra. 2016. WiWho: WiFi-based person identification in smart spaces. In Proceedings of 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IEEE IPSN). 1–12.
J. Zhang, B. Wei, W. Hu, and S. Kenhere. 2016. WiFi-ID: Human identification using WiFi signal. In Proceedings of International Conference on Distributed Computing in Sensor Systems (IEEE DCOSS). 75–82.
Nan Zheng, Aaron Paloski, and Haining Wang. 2011. An efficient user verification system via mouse movements. In Proceedings of the 18th ACM Conference on Computer and Communications Security (ACM CCS). 139–150.

Footnote

This work was partially supported by the National Science Foundation Grants CCF1909963, CCF2028876, and CNS1814590, and the Army Research Office Grant W911NF-18-1-0221. The preliminary results of this project have been published in MobiHoc 2017 [25].

Authors’ addresses: C. Shi and Y. Chen (corresponding author), Rutgers University, 96 Frelinghuysen Road, Core 501, New Brunswick, NJ, 08854, United States; emails: cs1421@scarletmail.rutgers.edu, yingche@scarletmail.rutgers.edu; J. Liu, Min H. Kao Building, Room 307, University of Tennessee, Knoxville, TN, 37996, United States; email: jliu@utk.edu; H. Liu, Qingshuihe Campus: No.2006, Xiyuan Ave, West Hi-Tech Zone, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China; email: hongbo.liu@uestc.edu.cn.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

Publication History: Received February 2020; revised October 2020; accepted February 2021