1. Introduction
Amidst the rapid evolution of the Internet of Things (IoT) and pervasive computing, Cyber–physical Systems (CPSs) have emerged. These systems integrate sensors, embedded technologies, and network capabilities to realize real-time sensing and control of the physical world. In this complex ecosystem, accurate user identification is crucial, supporting access control and data security and providing a solid foundation for various intelligent applications to flourish. However, early user identification systems heavily relied on complex passwords, including combinations of letters, numbers, and special characters. While individuals widely use passwords, these credentials face several challenges, including rapid forgetfulness, susceptibility to theft, and potential security breaches. Therefore, the limitations of traditional passwords have led to the widespread use of biometrics. In recent years, researchers have actively explored biometric-based user identification methods, including facial recognition [
1], fingerprint recognition [
2], voiceprint recognition [
3], and gait recognition [
4]. In this regard, gait recognition presents a unique potential as a biometric identification method. Gait patterns are unique over time and across individuals and are virtually impossible to fake. Additionally, gait recognition offers greater flexibility than other biometrics, such as fingerprinting or facial recognition, as it works at a distance and without contact. Going forward, we can expect gait recognition technology to continue flourishing, playing an essential role in intelligent interactions, personalized services, and critical areas such as physical access control and financial transaction verification. Simultaneously, one must recognize the ensuing privacy and security challenges, necessitating the implementation of appropriate measures to safeguard user data and comply with relevant regulations and legal requirements.
Research on gait recognition systems has explored various sensor modalities, including wearables [
5], vision [
6,
7], WiFi [
8,
9,
10], pressure sensors [
11], radio frequency identification (RFID) [
12], LoRa [
13], and acoustic sensing [
14]. However, these sensors pose several challenges in practical applications. On one hand, they usually involve additional infrastructure costs, such as installing sensor devices at specific locations, which can increase the complexity and cost of the system. On the other hand, certain sensors, like wearable devices, may cause user discomfort, reducing the overall user experience. Additionally, some approaches may require active cooperation from the user, for example, in vision systems, where the user must maintain a specific posture or wear specific clothing. Some sensors, such as WiFi and RFID, may be at risk of privacy leakage, potentially revealing the user’s location information.
In contrast to traditional sensor modalities, millimeter-wave radar is a low-power Frequency-Modulated Continuous Wave (FMCW) device with unique advantages [
15,
16,
17]. First, the non-invasive nature of radar technology allows it to perform monitoring while maintaining privacy, as it does not involve the capture of individual images. In addition, millimeter-wave radar is insensitive to changes in lighting conditions and therefore performs more consistently in low or unstable lighting environments. In contrast, video-based gait sensing may be limited by lighting issues, limiting its reliability under certain conditions. Another advantage is the radar’s ability to penetrate, allowing it to monitor across a number of non-metallic obstacles, thus mitigating the occlusion problem that video-based sensing systems may face. In addition, millimeter-wave radar has the ability to monitor over a wide area at long distances, making it more suitable in large indoor areas, whereas video sensing systems may require more equipment to achieve the same coverage. Video sensing systems may require more equipment to adapt to different environmental conditions, increasing system complexity and maintenance costs, in contrast to radar systems which are relatively simpler. Overall, radar-based gait sensing has a number of advantages, ranging from privacy protection to system stability, making it a highly sought-after technology in the field of gait recognition.
Considering its hardware characteristics and the benefits of being non-contact and privacy-preserving, millimeter-wave radar is anticipated to offer significant advantages for facilitating gait recognition. It typically requires minimal infrastructure support (e.g., only a single millimeter-wave radar device is required), making it a promising biometric technology for many applications. To address the aforementioned challenges, we propose the PGGait (PointNet GRU Gait) user gait recognition system. First, we designed a point cloud preprocessing pipeline to obtain high-quality point cloud data. Then, we modified the original PointNet++ network structure by adding a GRU module for multiscale spatio-temporal feature extraction. Additionally, due to the relatively small number of publicly available millimeter-wave radar gait datasets, we collected gait data from nine volunteers in different scenarios to construct the original dataset. We processed the base data using spatial coordinate rotation, translation, and scaling through data enhancement methods to increase the dataset’s size. Through these efforts, we obtained a larger dataset with limited raw data and successfully trained and evaluated the model, achieving a gait recognition accuracy of up to 96.75%.
In summary, this paper presents the following contributions:
In PGGait, we designed a point cloud preprocessing pipeline approach for denoising point cloud data captured by millimeter-wave radar. The pipeline employs a filtering method incorporating SNR threshold distribution, which is significantly improved for clustering formulas, to obtain high-quality point cloud data for individual segmentation, thus effectively reducing the negative impact caused by the environment.
To address the suboptimal performance of existing neural networks in processing millimeter-wave radar point clouds, we propose a novel network architecture, P-GRUNet. This network incorporates a new center-of-mass sampling algorithm and introduces a GRU module for multiscale spatiotemporal feature extraction on correlated point cloud data. This innovation overcomes the limitations of PointNet++ in temporal feature processing, thereby achieving high-precision classification and identification of millimeter-wave radar point clouds.
We comprehensively evaluated single and two-person gait recognition accuracy in several different environments. The experimental results show that our system achieves high accuracy rates of 96.75% and 94.30% for single and two-person gait recognition, respectively.
The subsequent sections of this paper are structured as follows:
Section 2 delves into pertinent research within the field.
Section 3 describes the theoretical basis of FMCW radar in gait recognition.
Section 4 describes the design of the PGGait system in detail.
Section 5 describes the experimental setup.
Section 6 evaluates the experiments.
Section 7 summarizes the full text.
3. FMCW Radar Basics
When walking, each person’s bone structure, stride period and length, joint range of motion, and body proportions affect how they walk, resulting in the dynamic uniqueness of each person’s gait characteristics. The millimeter-wave radar emits a fundamentally sinusoidal signal with a continuously varying frequency. This capability enables precise measurements of an object’s distance, speed, and angle of arrival by transmitting and receiving signals reflected by obstacles in its transmission path. These measurements reflect the object’s movement and position information. Through the accurate measurement and analysis of these data, it is possible to identify and differentiate the walking patterns of different individuals, and the capture and use of these unique walking characteristics can play an essential role in a variety of applications, such as identity verification, security monitoring, and human behavior analysis.
The transmitting signal of a millimeter-wave radar continuously sends a signal with continuous linear FM whose frequency is scanned within a bandwidth
for a duration of
for each linear FM signal. The slope of the scan is calculated using
/
. The radar system produces an intermediate frequency (IF) signal by combining the transmitted signal with the received signal reflected from the target object through a mixer. The specific formula is as follows:
where
and
represent the transmitted and received signals, respectively,
and
denote the amplitudes of the transmitted and received signals, with
representing the carrier’s center frequency.
signifies the flight delay from the transmitted signal back to the receiver post-reflection,
is the signal bandwidth,
is the sweep period of a linear FM signal, and
denotes the Doppler shift. The IF signal can then undergo processing through a frame separation procedure for subsequent applications such as frequency domain analysis.
Distance Measurement: When a user enters the detection range of a millimeter-wave radar, their body reflects a specific intermediate frequency (IF) signal. This signal can be used to determine the user’s exact location, and by performing a Fast Fourier Transform (FFT) analysis of this time-domain
signal, a precise distance profile of the user can be obtained. Subsequently, the distance between the user and the radar can be calculated using this distance profile. Precisely, the distance can be calculated using the following formula:
where
is the speed of light, and
is the frequency of the IF signal. According to Equation (2), a precise distance measurement of the user can be obtained, and information such as precise position and step size estimation can be obtained.
Velocity Measurements: A single pulse signal is sufficient for determining the distance between the target and the radar. However, to make velocity measurements, it is necessary to leave a certain time interval (
) between neighboring scans and to make several scans. Each reflected pulse undergoes a Fast Fourier Transform (FFT) of the distance measurement to determine the target’s position and generate peaks with different phases. The phase difference
between the two scans corresponds to the displacement of the target having a velocity
. Specifically, the following relationship exists between the phase difference Δ and the velocity
.
where
is the wavelength, another FFT (i.e., the Doppler-FFT) exists to distinguish multiple users in the velocity dimension.
The Angle of Arrival Measurement: FMCW radar systems are used to determine the angle of a target’s reflected signal relative to the horizontal and vertical planes, often referred to as azimuth and elevation angles, respectively. Multi-antenna FMCW radar systems estimate the angle of arrival by analyzing the phase difference on the peak of the distance FFT caused by the distance between antennas. The relevant equations are shown below.
where
is the distance between neighboring antennas, the goniometric technique allows precise determination of the target’s position in the horizontal and vertical directions.
4. PGGait System Design
This section provides a comprehensive overview and detailed design of the PGGait user gait recognition system, delving into the three critical components of PGGait: point cloud data collection, data preprocessing, user identification, and pattern matching.
4.1. System Overview
PGGait employs millimeter-wave radar to capture the gait characteristics of the human body during walking, facilitating the passive identification of multiple users. The overarching workflow of PGGait is illustrated in
Figure 1, encompassing three primary components: point cloud data collection, data preprocessing, and user identification and pattern matching.
Within the module dedicated to point cloud data acquisition, we initiate the Fast Fourier Transform (FFT) to extract pertinent information on distance (), velocity (), and angle of arrival () by Equations (2)–(4). Concurrently, we apply a Constant False Alarm Rate (CFAR) algorithm and a static cluster algorithm to mitigate interference during acquisition. These operations enable us to generate high-resolution raw point cloud data. Subsequently, the point cloud data are passed to the data preprocessing module for denoising. Then, we employ a modified DBSCAN clustering algorithm for user separation and location tracking. Finally, the multidimensional point cloud data are fed into a specially designed neural network for extracting spatial and temporal features to classify and identify multiple users.
4.2. Point Cloud Data Collection
Millimeter-wave radar transmits a linear FM signal to a target and then receives and analyzes the signal reflected from the target. By measuring the signal’s time delay and phase change, we can determine the target’s location and record the reflected signal. Subsequently, point cloud data are calculated and generated from the reflected signal. The transformation of raw radar data into point cloud data involves distance FFT, velocity FFT, and angle FFT processes. Moreover, we employ Capon’s algorithm for angle of arrival (AOA) estimation, incorporating static clutter filtering and constant false alarm rate (CFAR) algorithms. During the data acquisition process, the environment becomes complex and diverse due to various disturbing factors such as noise, windows, walls, tables, chairs, passing pedestrians, and multipath effects. Therefore, we use static clutter filtering and Constant False Alarm Rate (CFAR) algorithms during data acquisition to prioritize the removal of a portion of the interfering point cloud, i.e., noise. Combining the above methods, we can transform the radar-sampled data into accurate point cloud information in a Cartesian coordinate system, as shown in
Figure 2.
Converting the above ternary to the Cartesian coordinate system yields the following transformation relation for
:
where the original point coordinates are
in the ternary information,
is the distance Range between the point and the radar antenna,
is the elevation angle Elevation of the radar antenna to the point cloud, and
is the angle Azimuth of the point and radar connecting line with the radar facing direction in the xy-plane projection.
4.3. Data Preprocessing
The CFAR and static clutter algorithms [
29] employed in point cloud generation prove insufficient in eliminating high-intensity points stemming from external objects. This inadequacy leads to residual noise, which persists and introduces ambiguity in the user’s data points. These are unwanted data and will have an impact on the experimental accuracy. Thus, obtaining high-quality point cloud data is a prerequisite to ensure the accuracy of the final experiment. In this context, we devise an appropriate point cloud preprocessing pipeline to acquire high-quality point cloud data, ensuring the efficacy of subsequent clustering and individual tracking processes.
Denoising Phase: In the preprocessing of collected point cloud data, we employ filtering techniques, integrating the mechanism of Signal-to-Noise Ratio (SNR) distribution to determine appropriate thresholds, as illustrated in
Figure 3. The motivation behind this strategy lies in leveraging the SNR distribution to identify distinctions more accurately between signals and noise. The SNR distribution provides information about the relative quality of each data point, allowing us to set thresholds based on the principle of retaining the majority of signal points while effectively filtering out noise. Additionally, we introduce a boundary threshold from real-world scenarios to further enhance the quality of the point cloud data and alleviate negative impacts caused by environmental factors. This comprehensive denoising strategy ensures the robust performance of our gait recognition system in complex environments.
Point cloud clustering: PGGait selects the DBSCAN algorithm [
30], a density-based spatial clustering algorithm, in the clustering phase, which can deal with noise, has good robustness, and can efficiently differentiate between anomalies and multiple target clusters. The primary strength of this algorithm lies in its ability to dispense with the pre-setting of clustering clusters. It autonomously discerns the number of point clusters, maintaining consistency with the scene. Consequently, this approach mitigates noise, thereby enhancing the accuracy of clustering. Considering the limitations of point cloud scattering in the Z-axis direction caused by the equipment, the effect of clustering using the traditional Euclidean distance could be better for the actual point cloud data clustering process. In addition, since the point cloud is relatively dense in the X-Y direction, we have improved the distance formula adopted by DBSCAN accordingly to better adapt to the distribution of the point cloud.
where
and
are two different points in space, the results of several experiments on the values of
,
of 0.25, and 2
of 2.75 are relatively good. The results also show that reducing the corresponding weights for the Z-axis components improves the clustering effect. In our experiments, we set the following hyperparameters: eps (DBSCAN radius) = 0.3 and min_samples = 7. Additionally, the algorithm demonstrates improved performance when the experimental distance is maintained as large as possible, preferably greater than 0.3 m. The effect of clustering in the case of multiple people is shown in
Figure 4, demonstrating the optimized performance of the algorithm.
User Tracking: In implementing user tracking, we use the Hungarian algorithm [
31] to correlate the clusters that match each time interval to build their trajectories. To accomplish this, we introduce a corresponding Kalman filter for each trajectory. Expressly, we represent the target as the center of mass of a cluster of points, considered a scattering of points. To make predictions and use the Kalman filter, we design the state vectors for each trajectory as follows:
Specifically,
and
represent the horizontal and vertical coordinates of the trajectory, while
and
represent the change of these coordinates within one time unit, i.e., velocity information. Therefore, the state transition process can be expressed as follows:
where the
transfer matrix is:
In addition, the process model is related to the measurement model that describes the relationship between the state and the measurements at the current moment
as follows:
where
is the measurement vector,
is the measurement matrix, and
is a zero-mean Gaussian measurement noise vector assumed to have covariance
, i.e.,
.
The time intervals adjacent to the clusters are combined with predicted values and actual measurements to obtain the best-estimated trajectories. Trajectories’ creation, updating, and deletion are heavily influenced by how well they match the clusters. Suppose a trajectory has no new matching clusters for a significant period. In that case, it is usually assumed that the entity corresponding to the trajectory is no longer in the radar’s visual range and can be deleted. In trajectory management, redundant clustering creates new empty trajectories, while redundant trajectories trigger the generation of new blank clusters. The dynamic approach to trajectory management allows for efficient tracking of the target by ensuring that the system can adapt to real-time radar data and target movement patterns in different scenarios. The trajectory to cluster matching problem is solved by employing the Hungarian algorithm.
Figure 5 below shows the raw point cloud data collected by the radar and the high-quality point cloud clusters obtained by the point cloud preprocessing pipeline to prepare for the subsequent processing.
4.4. User Identification and Pattern-Matching
As the 3D point cloud data generated by millimeter-wave radar typically exhibits sparsity and dispersion, transforming this data onto an image using methods like voxelization may result in a geometrically escalating computational cost for the network. As a result, it becomes difficult to process such data and perform user recognition using conventional vision methods. With this in mind, we decided to train directly using point cloud data, but choosing a network structure suitable for point cloud data takes work. By analyzing the data collected by millimeter-wave radar, we found that it contains velocity component information, which reflects features such as human limb movement and stride length. Considering these attributes, we need a network structure that can simultaneously extract spatial features during walking and temporal features during motion. Based on these considerations, we chose a PointNet-based network [
32] for extracting spatial features and combined it with a lightweight GRU (Gated Recurrent Unit) network module for extracting temporal features. Together, these two form the architecture of our PGGait network. The architecture of the PGGait network is shown in
Figure 6 below.
In our experiments, we conducted a comparative study to evaluate the performance of the incrementally iterative version of PointNet++ concerning the original PointNet network. The results show that the improved version using PointNet++ performs better in terms of performance. To accommodate this improvement, we extended the input layer of PointNet++ from the original three-dimensional coordinates (x, y, z) to five-dimensional coordinates (x, y, z, v, snr).
Regarding the sampling process in PointNet++, we used Farthest Point Sampling (FPS), which is a widely used point cloud sampling technique. In the XYZ Coordinate Transformation and Feature Extraction Layer, we applied FPS to select key points, en-suring the preservation of crucial information within the point cloud for subsequent processing. This process maintains point cloud uniformity by selecting the farthest points, allowing the effectively chosen key points to represent the entire point cloud. The XYZ Coordinate Transformation, Feature Extraction Layer, Set Abstraction Layer, and Aggregated Point Cloud collectively contribute to the sampling process. This methodology persists throughout the forward propagation of the entire network, gradually extracting rich feature information through the processing of the multilayer perceptron module. These components collectively ensure the effectiveness of Point-Net++ in extracting spatial features.
For the spatial coordinate data of the point cloud, we first apply the Sampling Module feature transformation for subsequent feature extraction. The data dimensions of this module are kept constant during the transfer process, with an input dimension of 3 × m and an output dimension of 3 × m, where m denotes the total number of points in the point cloud sample. We then spliced the 3 × m vector with the velocity (v) and signal-to-noise ratio (SNR) data of the point cloud to generate a 5 × m vector. Next, we input the 5 × m vector into the Multilayer Perceptron (MLP) module, i.e., MLP (64, 64), for upscaling to obtain a 64 × m vector finally. Then, we again input the upscaled vector into the Sampling Module for feature transformation and further upscaling operations to finally obtain a 1024 × m feature vector. This feature vector is processed through the pooling layer into a 1024 × 30 vector. In the multiscale module, we input this feature vector to the GRU module, which allows temporal modeling through the Feature Propagation layer and finally obtains a 128 × 30 feature vector, where ‘30’ signifies the number of time steps employed in the time-series modeling of the point cloud sequence. In the multiscale module, we input this feature vector to the GRU module, which allows temporal modeling through the Feature Propagation layer and finally obtains a 128 × 30 feature vector.
Nevertheless, due to the challenge posed by the limited capacity of the five independent gait attributes to comprehensively capture the intrinsic characteristics of gait, a fusion of these attributes becomes imperative. Finally, these five attributes are integrated by a fusion network to obtain global features of the gait point cloud, including temporal and spatial features. Finally, we use a SoftMax classifier to output the probability value of which user identity category the current point cloud data may belong to. The category with the highest probability value is the identification result determined by the network. The final recognition result is calculated as shown in Equation (12).
In Equation (12), result() is the probability of being recognized as the individual after calculation. z denotes the vector inputted into the SoftMax layer, which contains the eigenvalues of individuals calculated by the optimal network model. is the element in the vector, corresponding to the individual’s eigenvalue. Among the probability values output from the Softmax layer, which are used to discriminate the relevance of each individual, the person number corresponding to the one with the most significant value is the final output.
6. Evaluation
After collecting the appropriate datasets, we will evaluate the overall performance of PGGait by evaluating it in different settings and comparing it with other state-of-the-art work.
6.1. User Identification Performance
We first evaluate the recognition performance of PGGait in the single-user case. As shown in
Figure 12a, we demonstrate the confusion matrix for single-user recognition under radial routes. Of particular note, the average recognition accuracy is an impressive 96.75%. The lowest recognition accuracy among all participants was 92.60%. In addition, as shown in
Figure 12b, we also examined the recognition accuracy in the two-person case under a radial route, and the results showed that the average accuracy reached 94.30%. In comparison, the lowest recognition rate was still as high as 91.45%. This set of recognition results validates the excellent performance of the PGGait system in recognizing single users and two-person walking situations with high accuracy.
6.2. Impact of Different Data Partition Ratios
By varying the training, test, and validation data ratio, we evaluated the performance of PGGait at different ratios. In a single-user scenario, we considered various data segmentation ratios and presented the results in
Figure 13. These experimental results show that PGGait exhibits strong recognition performance under different data segmentation ratios. Our model performs well, with an average recognition accuracy higher than 93%. Of particular note, the performance is even better when the amount of training data is greater than or equal to the amount of test data. Specifically, our model maintains a recognition accuracy of about 95% when the data training and testing split ratios are above 5:4, demonstrating that PGGait achieves high performance without requiring a large amount of data.
6.3. Impact of Different Environments
A series of experiments were undertaken to assess the system’s robustness across diverse environments. This involved acquiring and validating gait point cloud data from nine volunteers in three distinct settings: a laboratory, corridor, and hall. As shown in
Figure 14, the experimental results indicate that the performance of the PGGait system is virtually unaffected by environmental changes. This excellent robustness is attributed to the signal processing methods employed by the millimeter-wave radar during data acquisition and the specially designed point cloud preprocessing flow, including methods such as denoising and SNR threshold filtering. These methods effectively eliminate point cloud data from interfering objects in complex environments, such as walls, ceilings, glass, doors, and chairs, resulting in high-quality point cloud data that minimizes the impact of environmental factors on recognition performance. As a result, millimeter-wave radar-based gait recognition techniques show higher stability and applicability in processing data from different environments compared to WiFi-based gait recognition methods.
6.4. Effects of Different Walking Directions
In order to verify the performance difference of the system in different walking directions, we conducted a series of experiments, firstly testing the system in different walking directions of a single user on the radial path of the radar, including two directions toward the radar and away from the radar. The results show that the difference in final recognition accuracy between the two directions under a single walking direction is only 3%, as shown in
Figure 15a. We also conducted experiments in which data from a single direction toward the radar was trained and then tested in the direction away from the radar. However, the experimental results showed that meaningful data could not be obtained in this case. Therefore, combining data from both directions is required for practical applications. Finally, after comparing the test results of the radial radar path and the non-radial path, it is found that the recognition rate under the non-radial path is slightly lower than that under the radial path, as shown in
Figure 15b.
6.5. Comparison with Recent Research Work
We compare the recognition accuracy of the PGGait system with other recent research works, including the solutions of mmGaitNet [
25], Wu et al. [
33], MTPGait [
26], Xia et al. [
34], and Pegoraro [
35], and the results of the comparisons are shown in
Table 1. mmGaitNet’s research uses a dual radar technique, which shows good recognition of walks on radial paths. However, the recognition accuracy on non-radial paths drops significantly to 45%. This suggests that the algorithm in this work has challenges adapting to the point cloud data under non-radial paths. Wu et al.’s study addresses the issues of multipath interference and complex walking in the scene mainly through a specialized CNN network but ends up with a limited recognition result of 75%. MTPGait’s work achieves a relatively high recognition accuracy in single and two-person scenarios, thanks to the multi-dimensional data input and the combination of the CNN + LSTM network. However, the method requires a large amount of training data to achieve these results. Pegoraro’s study uses micro-Doppler techniques but needs more adaptability and a large amount of training data, which leads to a significant drop in accuracy in other environments.
Considering the above results, compared to existing work, the PGGait system is still able to achieve high recognition accuracy with relatively small datasets by improving the quality of the point cloud data and analyzing the gait data in conjunction with our designed network for extracting the spatio-temporal information, despite the limited data acquisition equipment. In addition, the PGGait system can maintain high recognition accuracy in different environments, including radial and non-radial paths.
6.6. Ablation Experiment
To validate the contribution of the innovative elements proposed in our study to gait recognition performance, we conducted a series of ablation experiments, selecting Pointnet2 as the baseline network version according to
Section 5.2. We specifically examined two critical factors in the model: the addition of GRU modules and the application of data preprocessing. In
Table 2 below, we compared the accuracy under different configurations, with particular emphasis on assessing the impact of adding GRU modules and applying data preprocessing on final recognition accuracy.
Initially, we established a baseline network version based on PointNet2_cls_msg. Subsequently, GRU modules were separately integrated into the model, and data preprocessing was applied. To ensure experimental reproducibility, we ran the experiments multiple times to obtain reliable results. The experimental outcomes indicate that, among various configurations, those involving the addition of GRU modules and data preprocessing exhibited optimal performance in terms of both accuracy (Best) and accuracy (Avg), achieving an accuracy of approximately 94.74%.
The results of the ablation experiment strongly affirm the effectiveness of our proposed approach. In the experimental configurations, the highest gait recognition accuracy was achieved by incorporating data preprocessing operations and introducing GRU modules into the baseline network, providing robust support for our innovative contributions. We observed a significant positive impact on the model’s performance by adding GRU modules and applying data preprocessing. This suggests that through the introduction of temporal modeling and data preprocessing techniques, our model is better equipped to capture spatiotemporal information, thereby enhancing gait recognition accuracy.
6.7. Discussion
The PGGait user gait recognition system first uses raw point cloud data collected by millimeter-wave radar, which is then passed into a specially designed point cloud preprocessing process to obtain high-quality point cloud data that maximally eliminates interferences caused by complex factors such as the environment. Next, denoising and clustering optimization are carried out in the clustering and tracking stage to obtain high-quality point cloud clusters for subsequent work. At the same time, the input of a multidimensional point cloud is formed by expanding the original point cloud data to increase the speed and signal-to-noise ratio. In addition, the original PointNet++ network module was modified to add a GRU module for multiscale spatio-temporal feature extraction, which fuses spatial and temporal features to significantly improve the model’s accuracy, thus enhancing the overall performance. The experimental results in various scenarios show that the PGGait user identification system can achieve higher recognition accuracy than other recent work. For future work, we plan to investigate the distinctions among gaits with similar body postures and address variances in gait patterns exhibited by users with mobility challenges. Simultaneously, to further enhance the overall system’s recognition performance, we will consider factors related to experimental equipment. Utilizing configurations characterized by high bandwidth and multiple transmit-receive antennas can enhance the accuracy of data collection. Moreover, refining target filtering, clustering, and tracking algorithms can further elevate data precision and recognition accuracy at the data processing stage. It is essential to holistically consider factors at each stage to improve the overall final recognition accuracy.