Model-based and model-free deep features fusion for high performed human gait recognition

Reem N. Yousef¹,
Abeer T. Khalil²,
Ahmed S. Samra² &
…
Mohamed Maher Ata³

3158 Accesses
Explore all metrics

Abstract

In the last decade, the need for a non-contact biometric model for recognizing candidates has increased, especially after the pandemic of COVID-19 appeared and spread worldwide. This paper presents a novel deep convolutional neural network (CNN) model that guarantees quick, safe, and precise human authentication via their poses and walking style. The concatenated fusion between the proposed CNN and a fully connected model has been formulated, utilized, and tested. The proposed CNN extracts the human features from two main sources: (1) human silhouette images according to model-free and (2) human joints, limbs, and static joint distances according to a model-based via a novel, fully connected deep-layer structure. The most commonly used dataset, CASIA gait families, has been utilized and tested. Numerous performance metrics have been evaluated to measure the system quality, including accuracy, specificity, sensitivity, false negative rate, and training time. Experimental results reveal that the proposed model can enhance recognition performance in a superior manner compared with the latest state-of-the-art studies. Moreover, the suggested system introduces a robust real-time authentication with any covariate conditions, scoring 99.8% and 99.6% accuracy in identifying casia (B) and casia (A) datasets, respectively.

HGANet-23: a novel architecture for human gait analysis based on deep neural network and improved satin bowerbird optimization

Article 07 June 2024

Deep mutual learning network for gait recognition

Article 29 May 2020

Different gait combinations based on multi-modal deep CNN architectures

Article Open access 13 March 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Recently, after the pandemic of COVID-19, the world’s attention has focused on applying social distance between humans and surrounding objects to protect people from infections. Accordingly, the need for a foundation, accurate, and safe biometric identification algorithm considers an emergency. One of the most robust, accurate, and secure biometric identification methods is gait, which can identify subjects from a distance based on fundamental dynamic patterns to identify the same person across several fixed cameras [1]. Compared to other commonly used biometrics modalities, such as fingerprint [2], iris [3], face [4], and DNA [5], the gait feature offers unique advantages, which are noninvasive, hard to disguise, sturdy to low-resolution images, and not need cooperative from a subject [1]. Due to the potential and advantages of gait identification considered algorithms than other biometrics, it uses in a wide range of security applications concerning hospitals, trade malls, banks, military installations, airports, religious institutions, etc. Moreover, it can use in crime prevention, forensic identification, and criminal investigation [6]. Despite the mentioned advantages of gait recognition, it has some drawbacks caused due to changes in clothes or carrying objects of subjects [7]. Numerous kinds of research have focused on two categories of identification methods to solve these issues: model-based approaches [8] and appearance-based approaches [9]. In this paper, after the silhouette frames have been extracted and enhanced, it passes through two main phases: (i) The first phase has been built based on the appearance-based model, which extracts the gait features using a proposed convolutional neural network (CNN) from the gait energy images (GEIs). (ii) The second phase has been designed based on the model-based technique, which extracts the other features using a proposed fully connected model architecture from the landmarks data frame. Finally, we propose a deep neural network (DNN) in order to recognize the concatenated high-level features from both phases. Figure 1 shows the main components of the fused gait recognition algorithm.

The main contributions of this manuscript can be summarized as follows:

1.
Extracting silhouette images from human gait videos and enhancing them (phase 1).
2.
Extracting lower 2D pose joints from gait videos, converting them to 3D poses, computing knee angles, static and limb distance, and finally reshaping the results in a data frame (phase 2).
3.
Proposing two convolutional neural network structures to extract the main features from phase1 silhouette images and phase2 data frames.
4.
Proposing a novel deep model that combines two high-level features extracted from silhouette images (appearance-based) and human poses (model-based) to recognize human gait images.
5.
Comparing the performance of the recent studies with the proposed algorithm.
6.
Introducing a fine-tuned CNN algorithm with the best performance metrics

The remainder of this article is organized as follows: Sect. 2 provides an overview of the recent related works, Sect. 3 introduces the principal methodology of the proposed model, Sect. 4 presents the experimental results and discussions, and finally, Sect. 5 presents the main conclusions and future works.

2 Related works

Li et al. proposed a model called GaitSlice, which analyzes human gait images based on spatiotemporal slice features. GaitSlice has combined residual frame attention mechanism (RFAM) with inter-related slice features to form the spatiotemporal information. The experimental results show higher accuracy than six typical gait recognition algorithms [6]. Han et al. tuned the learning metrics of the gait recognition model on CASIA-B and TUM-GAID gait datasets to improve the recognition model’s performance. They used angular SoftMax loss and triplet loss to enhance features to be separable and discriminative. Finally, they added a batch normalization layer to optimize the two mentioned losses. The experiment results reveal that the tuned model outperforms the other state-of-the-art approaches [7]. Tian et al. proposed spatiotemporal attention to enhance the AGS-GCN structural gait images. They constructed a gait skeleton graph to extract multi-scale gait features from the skeleton data. Moreover, they improved the characteristics of joint points through the spatiotemporal attention mechanism. Extensive experiments demonstrate that the AGS-GCN scores better performance metrics than the other recent studies [10].

Alobaidi et al. introduced an advanced real-world smartphone-based gait recognition system to recognize the human gait images outside. The gait images have captured from 44 uncontrolled subjects in a 7–10 day, just asked to go about their normal daily activities. For each user, the experiment modeled four different forms of motion: normal walking, rapid walking, down, and upstairs. The evaluation result has an error rate ranging from 11.32 to 27.33% in the different mentioned models [11]. Sanjay Kumar Gupta and Pratik Chattopadhyay proposed a model to enhance the performance of the gait recognition model from the covariate conditions. The proposed model depends on determining a set of unique generic poses and computing gait features corresponding to these poses, called the dynamic gait energy image (DGEI). Furthermore, they employed a generative adversarial network (GAN) model to predict the corresponding DGEI images without the covariates. The experimental studies on CASIA-B, TUM-GAID, and OU-ISIR datasets verify the effectiveness of the proposed approach compared with the other studies [9].

Martinez-Hernandez et al. presented a learning architecture for gait recognition and prediction. This model comprises a convolutional neural network (CNN), a predicted information gain (PIG) module, and an adaptive combination of information sources. The outputs of the CNN and PIG modules are blended using a proposed adaptive approach that relies on more reliable data. Finally, the experimental result of walking activity and gait period scored 98.63% accuracy when the CNN model had applied alone and achieved 99.9% accuracy when the PIG method and adaptive combination were used [12].

Weijie Shenga and Xinde Li proposed a human gait recognition and motion prediction model called attention enhanced temporal graph convolutional network (AT-GCN). Due to spatial and temporal attention, the proposed model can represent discriminative features in spatial dependency and temporal dynamics. A multi-task learning architecture was also presented, which can simultaneously learn representations for multiple tasks. Furthermore, they introduced a new dataset called (EMOGAIT) which contains 1440 real gaits, annotated with identity and emotion labels. Experimental results revealed the robustness of the proposed model when tested on two different datasets for identity recognition and emotion recognition [13]. Liao et al. proposed a gait recognition algorithm based on a model-based algorithm called PoseGait. It extracts spatiotemporal features from the 2D pose and then converts it to a 3D pose to enhance the gait authentication rate. The experimental results reveal that the performance metrics of the proposed system are more robust than the appearance-based model’s studies [8].

Altilio et al. used different machine learning classification algorithms to automatically classify patient movements from their gait dataset. Their method achieved the maximum accuracy of 91% in cost-effective and home-based rehabilitation programs from probabilistic neural network (PNN) [14]. In [15], Saleh et al. designed a gait recognition algorithm based on deep convolutional neural network (DCNN) and then compared its performance results with it after applying the image augmentation (IA) procedures. Their experimental results scored 82% without IA and 96.23 with IA, reflecting image augmentation’s main contribution. Wen J and Wang X. presented a gait recognition model based on sparse linear subspace. First, they extracted gait features from frame-by-frame gait energy images (ffGEIs) and then used sparse linear subspace technology for dimension reduction. Second, a new support vector machine-based gait classification technique uses Gaussian radial basis function (RBF) kernels for cross-view gait detection. Finally, the proposed gait authentication approach has been evaluated on CASIA-B and OU-ISIR gait databases to reveal its performance [16].

Gao et al. proposed a skeleton‐based gait recognition algorithm to solve the problem of covariate conditions. The spatial and temporal features of the gait images have been extracted by the space and time relationship between body joints. The feature map has been decomposed to eliminate redundant features and achieve a better recognition rate in the presence of covariate factors. Their experiments have been applied to CASIA‐B and OU-MVLP‐Pose databases, achieving higher recognition accuracy values and remarkable robustness [17]. Hasan and Mustafa proposed a gait recognition model to learn discriminant view‐invariant gait images. The proposed model has been based on a stacked autoencoder that can efficiently and gradually transform skeleton joint coordinates from any arbitrary view to a typical canonical view without prior prediction of the view angle or covariate type or losing temporal information. Finally, they fused the encoded features with two other spatiotemporal gait features to feed into the main recurrent neural network. Experimental results extracted from CASIA-A and CASIA-B gait datasets demonstrate that the proposed approach outperformed other state‐of‐the‐art methods on single‐view gait recognition [18].

In [19], Xiao Jing et al. proposed a gait recognition model called GaitGP that can learn the main details through fine-grained features and the relationship between neighboring regions through global features. The GaitGP model consists of attention feature extractor (CAFE) and the Global and Partial Feature Combiner (GPFC) to extract the global features and learn different fine-grained features. Experimental results on CASIA-B, OU-ISIR gait database, and OU-MVLP show that the GaitGP model is superior to current cross-view gait recognition methods. Gul et al. presented a 3D convolutional deep neural network (3D CNN) that extracts the spatiotemporal features of a gait sequence. They used gait energy images (GEI) as input to 3D CNN, which captures the shape and motion characteristics of the human gait. Moreover, they applied some optimization methods on CASIA-B and OULP datasets to tune the hyperparameters. The evaluation results scored the best values on the CASIA-B dataset [20].

Lee et al. have proposed a human gender recognition model based on a support vector machine (SVM) and random forest (RF), then using recursive feature elimination (RFE) to select the best feature subset. Gender classification results scored 99.11 percent (SVM classifier), and RF-RFE had a performance of 98.89 percent (SVM and RF classifier), indicating that it is a robust classifier [21].

Yusuf et al. collected gait images from 26 participants, 14 males and 12 females, and then recognized them using a convolutional recurrent neural network—long short-term memory (CRNN-LSTM) by analyzing the upper half gait and the whole body. The experimental results approve that the recognition accuracy from the upper half gait is better than the full body with low computation analysis [22]. Zhang et al. proposed an integrated network model called SBLSTM, which combines three models, sparse autoencoder (SAE), bidirectional long short-term memory (BiLSTM), and deep neural network (DNN) for recognizing gait images during human movement. Firstly, the SAE model extracts the key features from gait images. Then, the BiLSTM model learns the temporal and periodic variations in gait images. Finally, the DNN can identify and classify the gait phases. The experimental results prove that the proposed model (SBLSTM) effectively recognizes gait more than DNN or LSTM only, confirming the proposed model’s effectiveness [23].

Dong et al. proposed a framework based on multi-source fusion to recognize human gait phases and patterns and reduce computational costs. This model combines four well-known models with low-cost commercial sensors called, support vector machines (SVM), backpropagation (BP) neural networks, AlexNet, and LeNet5 models to confirm the performance of the proposed methodology for gait recognition. The evaluation results scored 97.7% accuracy from gait phases and 99.2% for gait patterns using the fusion model, proving the proposed framework’s effectiveness [24].

In [25], the human gait images in CASIA-B and CASIA-C have been recognized using deep learning and Bayesian optimization. They proposed a framework that includes both parallel and sequential steps. First, they extract the optical flow-based motion regions and then enhance the video frame, which trains separately instead of selecting a static hyperparameter. Two models have been obtained: the original and the motion frames model, combined using a proposed parallel approach called-Sq-Parallel Fusion (SqPF). The Tiger optimization algorithm has been enhanced, called Entropy controlled Tiger optimization (EVcTO). Finally, an extreme learning machine (ELM) classifier has classified the selected features. The experimental results score 92.04 and 94.97% in recognition accuracy from CASIA-A and CASIA-C datasets, respectively, outperforming the other deep learning-based networks.

Ismail et al. have selected the optimal CNN architecture using the genetic algorithm model (GA) for human activity recognition (HAR task). Furthermore, the proposed search space offers a respectable level of depth because it does not place a cap on the length of the task architecture that may be created. Three datasets have been tested to confirm the effectiveness of the proposed methodology named UCI-HAR, Opportunity, and DAPHNET. The Experimental results scored the accuracy value of 98.3%, 98.5% (∓ 1.1), and 99.14% (∓ 0.8) for Opportunity, UCI-HAR, and DAPHNET, respectively [26]. Teepe et al. proposed a pose estimation model to obtain the optimal skeleton pose analysis from RGB images for vivid model-based gait identification. Moreover, they combine the gait graph with skeleton poses using graph convolutional network (GCN) to obtain the best human gait recognition model (HGR). Finally, the gait features are extracted and combined by the GCN. When the model was conducted on the CASIA-B dataset, it achieved a promised modality compared to the existing methods [27].

3 Frameworks for gait recognition

The typical procedures of gait recognition are demonstrated in Fig. 2, which includes four basic steps: data collection and preprocessing, feature representation, dimension reduction, and recognition or classification [28], as shown in Fig. 2. The following subsections will fully illustrate the methodology of the proposed model and its techniques by pre-trained on the CASIA gait dataset [29, 30].

3.1 Methodology of the proposed model

Gait recognition is a challenging task, especially when the videos of the dataset have been captured from people doing their normal daily activities, including regular, high-speed walking, ascending, descending stairs, and carrying conditions [1]. Therefore, the main aim of the proposed model is to tackle the mentioned problems, in addition, improve gait recognition based on its fused features and broaden the scope of gait applications to use in COVID tracking [31] instead of more typical biometrics like faces, fingerprints, iris scans, and DNA. The proposed model depends on concatenation fusion between appearance- and model-based algorithms to achieve the targets and enhance the gait recognition algorithm, which passes through six main steps: step 1: convert videos to successive frames; step 2: preprocess frames to create silhouette images and GEIs; step 3: extract the prominent landmarks from gait images, including hip, knee, ankle, knee angle, etc.; step 4: extract the first features from GEIs directly by training the proposed CNN model; step 5: extract the second features from the landmarks poses Dataframe by training the proposed fully connected network (FCN); And finally, step 6: recognize the resultant concatenated features by training and testing the well-designed deep learning module, and then, compute the main performance metrics, including precision, recall, F1-score, specificity, and training time. Figure 3 refers to the flowchart of the proposed model, and Algorithm 1 presents the proposed human authentication procedures.

3.2 Dataset description

Numerous gait databases have been created for gait recognition study, including CASIA, OU-ISIR, and OU-MVLP. The selected dataset in the proposed model is CASIA because it is available in RGB image form so that the human poses can be estimated, not silhouettes. The only available original color image among gait datasets is the CASIA gait dataset, but the other datasets are offered in silhouette shape only due to privacy issues. The Institute of Automation Chinese Academy of Sciences has created CASIA gait datasets containing four different datasets: A, B, C, and D.

CASIA gait datasets have roughly 20 K images, varying in speed, view angles, and clothing variations. On December 10, 2001, Dataset CASIA-A [32], consisting of 20 people, was created. Each person has 12 image sequences, four for each of the three image plane directions (parallel, 45, and 90 degrees). The total size of this dataset is around 2.2 GB, with 19,139 frames in the database. In January 2005, the first large multi-view gait database was created called CASIA-B [29]. It has been collected from 124 subjects and captured from 11 views; the difference between each view is the view angle. Moreover, the view angles had repeated in three variations: regular walking, clothes, and carrying condition changes. The thermal infrared camera captured CASIA-C [33] gait images from 153 subjects in July–August 2005. The main difference among subjects is the walking speed condition, normal, fast, slow, and regular with a carrying bag. The detailed description of the used gait datasets is tabulated in Table 1. Figure 4 shows a sample of the normal walking sequence of CASIA-B [30], and Fig. 5 shows the sample of CASIA gait datasets.

Table 1 Detailed description of CASIA gait datasets

Full size table

3.3 Preprocessing procedures

One of the overarching goals of the preprocessing steps is to enhance the image quality, suppress distortion, and remove noise for identifying images with clearly visible silhouette gaits for phase one. As such, the first thought that comes to mind is to scale down all gait image datasets to 150 × 150 and then convert them to grayscale images for decreasing computational time. In addition, a subtraction process between foreground gait content and its background would produce isolated images. Moreover, histogram equalization has been performed in order to enhance the contrast according to Eq. (1) and the optimal weight by Eq. (2) [34],

$$ H\left( l \right) = h\left( {l,i} \right) * W\left( I \right) $$

(1)

$$ W\left[ I \right] = \mathop \sum \limits_{x = 1}^{M} \mathop \sum \limits_{y = 1}^{N} \frac{{n^{k} }}{{\max \left( {n^{k} } \right)}} $$

(2)

where ${n}^{k}$ is the pixel counts, $h\left(l,i\right)$ is the input histogram, and $H\left(l\right)$ is the modified histogram.

The isolated images (ROI) have then binarized by the Otsu threshold based on minimizing the variance between foreground and background [35], which can be calculated by Eq. (3) and Eq. (4),

$$ \sigma_{B}^{2} = \omega_{0} \left( {\mu_{0} - \mu_{T} } \right)^{2} + \omega_{1} \left( {\mu_{1} - \mu_{T} } \right)^{2} $$

(3)

$$ t^{*} = \arg \max_{1 \le t \le L} \sigma_{B}^{2} $$

(4)

where ${\omega }_{0} \mathrm{and }{\omega }_{1}$ represent the foreground and background items. ${Also, \mu }_{0}, {\mu }_{1}$ are represented by the mean of the gray levels and ${\mu }_{T}$ is the entire gray-level image, respectively.

Finally, some morphological operations applied on the isolated binary images include dilating to find the maximum local values in frames and then filling to remove the holes by convolution of an isolated image and disk structure element (SE) with radius one [36]. Furthermore, all binarized frames have normalized to enhance images by generating a new range from an existing one. The dilation of a frame A (set) by structuring element B is computed as Eq. (5) [36],

$$ A \oplus B = \left\{ {x{|}\left( {B_{x} } \right) \cap A \ne \emptyset } \right\} $$

(5)

The filling operation has applied by filling the holes inside the isolated binary image with ‘black’ starting from any point in ROI to reach the image boundary and was defined as Eq. (6) [37],

$$ X_{K} = (X_{K - 1} \oplus B) \cap A^{c} \quad K = 1,2,3, \ldots $$

(6)

where ${X}_{k}$ = P, the structure element (SE) is B, and the complement of set A frame is A^c.

A pre-defined boundary can normalize the gait dataset, calculated by Eq. (7) [38],

$$ A^{\prime} = \left( {\frac{A - \min \,value\, of\, A}{{\max \,value\, of\, A - \min \,value\, of \,A}}} \right)*\left( {R - M} \right) + M $$

(7)

where Min–Max Normalized data is A′ when pre-defined boundaries are (R, M) and the range of the original data is A.

Algorithm 2 introduces the preprocessing steps for phase 1, and Fig. 6 shows the gait dataset preprocessing procedure for phase 1 that has been applied in this study. Figure 7 represents the gait sample of silhouettes after the preprocessing procedure for phase 1.

3.4 Feature extraction methods

This section will declare the existing feature extraction methods for gait recognition. There are different techniques for feature representations; the common types are appearance-based and model-based feature representation models. The main procedures of each model will be described in detail in the following subsections and the proposed one [39].

3.4.1 Appearance-based feature representation model

The main aim of model-free or appearance-based feature representation is to process a human silhouette to identify the person from its gait data from different angles. Using an appearance model as a feature extraction has two significant advantages: It does not need a high-quality video that enables us to capture data far from a subject [28]. Moreover, it is more cost-effective than a model-based algorithm. So, the appearance-based model is more popular [28]. On the other hand, model-free representation has one drawback: It depends on views and scale. For example, the gait recognition rate will be reduced when viewing angle, clothes, and carrying condition changes. The main appearance models for feature extraction methods differ in the shape of raw input data, including silhouettes and gait energy images (GEIs).

3.4.1.1 Silhouettes inputs

This feature representation method was considered the first proposed model for the gait feature, which depends on extracting human gait from the background to focus on the region of interest (ROI) as raw input data. To obtain the silhouette dataset from RGB, gait videos must pass through these procedures [39]. First, decompose gait videos into RGB frames; each contains foreground and background parts. Second, generate the background from frames by computing some statistics of the background pixels, namely the covariances and the mean. Third, determine the silhouette area by computing the Mahalanobis distance [40] between pixels and the background; based on this distance, the pixel has determined whether it belongs to the background or foreground. Finally, the silhouette of the gait image sequence had generated. In general, the procedures described above can produce high-quality binary isolated images. However, some issues are leading to segmentation errors [39]: (1) shadows; (2) background/foreground threshold; and (3) background moving object. Before using this data as input, one must apply several preprocessing procedures to enhance it to be suitable for image recognition/classification purposes—for example, Wu. et al. [41] proposed a model that extracts two domains, spatial and temporal features, from silhouette images called Spatial–Temporal Graph Attention Network (STGAN). This model can observe the relationship between frames and detect variations in the temporal domain to decrease errors that affect the gait recognition rate.

3.4.1.2 Gait energy images

After the gait silhouettes have been extracted from the gait RGB videos dataset in the above subsection, the gait energy images (GEIs) [39] have been created by aligning, averaging the human silhouettes, and comparing the similarities between the two GEIs [8], which is computed by Eq. (8) [42],

$$ G\left( {x,y} \right) = \frac{1}{n}\sum\nolimits_{1}^{n} {B\left( {x,y} \right)} $$

(8)

where $n$ represents the frame cycle count of the silhouette, x and y are the 2D frame coordinates, and $B\left(x,y\right)$ is the binary gait silhouette images.

In the GEI images, a high-intensity pixel indicates that the individual’s action frequently occurs at this location [9], which refers to the direct effect of GEI in image representation. Figure 8 represents the gait cycle sequence of silhouettes and the corresponding GEI. Recently, some researchers have begun using human silhouettes directly as raw input data rather than GEI due to their performance outperforming the previous state-of-the-art studies [43].

3.4.2 Model-based feature representation model

Despite the pros of using the appearance-based model in gait recognition, it has significant cons, resulting from clothes or carrying conditions variations. The model-based feature representation can cure these problems by modeling the human body skeleton to joint points, limbs, and static distances between joints. Moreover, it can withstand any variation in the human body and model it accurately. However, it is a complex task with a high computational cost and needs high-quality video, so it is less popular than model-free [8]. The model-based or dynamic gait feature extraction model has based on identifying human gaits from the rotation pattern of the lower human part joints (hip, knee, and ankle) in both feet. These annotated points have been placed in several joints [44], and the nose point refers to the center point of both sides, as shown in Fig. 9. After extracting the annotated points for the whole human body, the model-based also uses static, limb distances, and joint angles for human gait recognition [39]. For example, M. Sivarathinabala and S. Abirami [44] modeled the human skeleton to 11 joint points and then extracted four static distances: stride length, degree of toe-out, left–right-ankle, and left–right-knee distances. Moreover, they computed the dynamic angles, namely leg–hip, leg–knee, and leg–ankle angles, and finally fused the static and dynamic features. Liao et al. [8] proposed a model-based called PoseGait, which divided the whole human body into 18 joint points to identify gaits. They also extracted the temporal features from poses to improve the gait recognition rate.

3.4.3 Proposed feature representation model

After the standard feature representation models have been described in the past subsections, the proposed model has been modeled to be robust to any variation in the human body and improve the gait recognition rate, named the Fused Gait Feature Representation (FGFR) model. This model has based on the concatenating fusion between appearance-based and model-based feature representation to form handcrafted features from both models. The FGFRM passes through some procedures; firstly, in phase one, after the images of the silhouettes have been segmented from RGB gait videos, its main features have been extracted from the proposed convolutional neural network (CNN). Secondly, in phase two, the annotated joints in the lower part of the human body have been detected from the media pipe algorithm [45], representing seven 2D joints: LHip, LKnee, LAnkle, RHip RKnee, RAnkle, and Nose. Then, the 3D poses have estimated from selected 2D annotated joints to tackle the problems of changing in clothes or carrying conditions [46]; the proposed 3D human poses are defined as Eq. (9),

$$ f_{pose} = \left\{ {j_{0} ,j_{1} , \ldots j_{N} } \right\} $$

(9)

where ${j}_{i}=\left\{{x}_{i},{y}_{i},{z}_{i}\right\}$, i$\in \left\{\mathrm{1,2},\dots ,N\right\}$, N = 7 annotated points.

However, the subject size in gait images varies according to changes in distance between the participant and the fixed camera. Therefore, all 3D annotated point coordinates have been normalized to fixed size by considering the distance between the nose and the center point of the right and left hip as unit length. The selection of the nose comes back to its location in the origin of the human body coordinates [8]. So, the annotated points are normalized as follows, Eq. (10),

$$ J_{N} = \frac{{j_{i} - j_{n} }}{{D_{hn} }} $$

(10)

where ${j}_{i}$ $\in $ ${R}^{3}$ is the location of body joint, ${j}_{n}$ is the nose position, and ${D}_{hn}$ is the distance between the center of the hip and nose point.

Moreover, the static distances between joints were estimated included, right–left hip, right–left knee, and right–left ankle; then, the limb lengths have measured as distances between hip–knee and knee–ankle joints from Euclidean distance, which is calculated as Eq. (11) [47],

$$ d_{i} = \sqrt {\left( {x_{1} - x_{2} } \right)^{2} + \left( {y_{1} - y_{2} } \right)^{2} + \left( {z_{1} - z_{2} } \right)^{2} } $$

(11)

where (${x}_{1}$, ${y}_{1}$, ${z}_{1}$) and (${x}_{2}$, ${y}_{2}$, ${z}_{2}$) refer to 3D poses coordinates of between two corresponding points or from hip joint to knee joint and knee joint to ankle joint.

After computing the spatial features from static and limb distances, the dynamic features were also calculated from the knee angles using joint trajectories of lower limb hip, knee, and ankle due to the knee angle changing influence on the gait recognition model performances [48], and the left and right knee angles are computed as follows from Eqs. (12) to (14) [8],

$$ f_{{{\text{angle}}}} = \left\{ {\left( {\alpha_{ij} ,\beta_{ij} } \right)|\left( {i,j} \right) \in \emptyset } \right\} $$

(12)

$$ \alpha_{ij} = \left\{ {\begin{array}{*{20}l} {\arctan \frac{{y_{i} - y_{j} }}{{x_{i} - x_{j} }}} \hfill & { x_{i} \ne x_{j} } \hfill \\ {\frac{\pi }{2}} \hfill & {x_{i} = x_{j} } \hfill \\ \end{array} } \right. $$

(13)

$$ \beta_{ij} = \left\{ {\begin{array}{*{20}l} {\arctan \frac{{z_{i} - z_{j} }}{{\sqrt {\left( {x_{i} - x_{j} } \right)^{2} + \left( {y_{i} - y_{j} } \right)^{2} } }}} \hfill & { \left( {x_{i} - x_{j} } \right)^{2} + \left( {y_{i} - y_{j} } \right)^{2} \ne 0} \hfill \\ {\frac{\pi }{2}} \hfill & {\left( {x_{i} - x_{j} } \right)^{2} + \left( {y_{i} - y_{j} } \right)^{2} = 0} \hfill \\ \end{array} } \right. $$

(14)

where ${f}_{angle}$ is the right and left knee angles, ${J}_{i}=$ (${x}_{i}$, ${y}_{i}$, ${z}_{i}$), ${J}_{j}=$ (${x}_{j}$, ${y}_{j}$, ${z}_{j}$), the $\left(i,j\right)$ is the set of $\varnothing $, and $\varnothing $ is hip, knee, and ankle joints.

Finally, the normalized lower limb poses coordinates, spatial features, and dynamic features have been tabulated in Dataframe to be input for the second proposed deep learning structure (DL). The normalized joints, spatial, and dynamic features for frontal and side view are shown in Fig. 10. Algorithm 3 represents the procedures of extracting landmarks, static, limb distances, and knee angles.

3.5 Proposed pre-trained models

This section will highlight the key components of the proposed FGFR model; Sects. 3.5.1 and 3.5.2 will declare the detailed coding layers of the proposed appearance-based model and the model-based feature extractions models for CASIA-A and CASIA-B datasets, respectively. Figures 11 and 12 show the proposed concatenation feature representation models’ architecture for both datasets.

3.5.1 Concatenation model for CASIA-A dataset

The proposed convolutional neural network has been created to extract features in phase one from silhouette images. It contains four convolutional layers and two max-pooling layers, each one of them stacked after two convolutional layers. The first two convolutional layers have 32 filters with kernel size 3 $\times $ 3, one stride, no padding, and a ReLU activation function. In contrast, the other two convolutional layers contain 64 filters with the same kernel size, activation function, two strides, and zero padding. Moreover, the dropout layer has been added to prevent overfitting and reduce local response normalization. The convolution operation has been computed from Eq. (15) [49],

$$ y_{j}^{r} = f\left( {b_{j}^{r} + \sum w_{i,j}^{r - 1} *x_{i}^{r} } \right) $$

(15)

where r refers to the layer number in the proposed network, ${w}_{i,j}$ is the convolution kernel between ${x}_{i}$ and ${y}_{j}$, f is the activation function, ${x}_{i}$ and ${y}_{j}$ are the input feature map, * is the convolution operator, and the jth is the output feature map. The mathematical operation of the ReLU activation function has been computed by Eq. (16) [50],

$$ f\left( x \right)_{{{\text{ReLU}}}} = \max \left( {0,x} \right) $$

(16)

The max-pooling layer is popular and mostly used between pooling types techniques, which shrinks the feature to the smaller feature map size; the output of feature map size after pooling operation is defined by Eqs. (17) and (18) [50],

$$ h^{\prime } = \left[ {\frac{h - f}{s}} \right] $$

(17)

$$ w^{\prime} = \left[ {\frac{w - f}{s}} \right] $$

(18)

where $h^{\prime}$ and $w^{\prime}$ refer to the height and width of the output feature map, h and w are the height and width of the input feature map, f is the pooling size, and s is the stride size of the pooling layer. The output size of the feature map after each convolution operation was computed by the formula in Eq. (19) [51],

$$ L_{{{\text{output}}}} = \left[ {\frac{N - F + 2P}{S}} \right] + 1 $$

(19)

where N refers to the input size, F is the number of kernels in each layer, P is the padding size, and S is the stride size.

Finally, after the mentioned convolution and pooling layers have been declared, its output has been flattened and passed through three fully connected layers with kernel sizes 1024, 512, and 256, which extracts features from the proposed appearance-based model and reduces unnecessary data from the network. In the second phase, the data frame of normalized poses, limbs, static joint distances, and knee angles has passed through four fully connected layers with kernel sizes 512, 256, 8, and 4 to extract the main features from the proposed model-based architecture. After extracting the two phases’ features, the concatenated fusion fuses them with eight filters formulated in Eq. (20) [49],

$$ {\text{RF}} = \max \left( {0,\sum\nolimits_{i}^{n} {w_{i} l_{i} } + \sum\nolimits_{i}^{m} {w_{j} h_{j} + b} } \right) $$

(20)

where Ph1_F = $\left\{{l}_{1},{l}_{2},{l}_{3},{l}_{4},\dots ,{l}_{i},\dots ,{l}_{n}\right\}$ and Ph2_F = $\left\{{h}_{1},{h}_{2},{h}_{3},{h}_{4},\dots ,{h}_{j},\dots ,{h}_{m}\right\}$ represent phase one, phase two features, and b is the bias.

The final layer of the fully connected layer architecture contains the number of output neurons; for recognition purposes, the SoftMax layer has been applied, which is formulated by Eq. (21) [52],

$$ {\text{sm}}(z)_{i} = \frac{{e^{{z_{i} }} }}{{\mathop \sum \nolimits_{j = 1}^{k} e^{{z_{j} }} }} \quad {\text{for}}\, i = 1, \ldots ,k \,{\text{and}}\, z = \left( {z_{1} , \ldots , z_{k} } \right) \in R^{k} $$

(21)

where ${z}_{i}$ refers to the number of elements, z is the input vector, and the resulting values have normalized by dividing the sum of all the exponentials. The parameters for each layer used in the architecture of dataset CASIA-A are tabulated in Table 2.

Table 2 The architecture of the proposed concatenation model between two phases based on CASIA-A dataset

Full size table

3.5.2 Concatenation model for CASIA-B dataset

In the case of the CASIA-B dataset, the structure of phase one is the same as that of phase one in CASIA-A, while phase two contains five fully connected layers with sizes 1024, 512, 100, 8, and 4 because it is a deeper network than CASIA-A. Moreover, after concatenation between two phases, the last layer in the fully connected layer has eleven neurons output. Table 3 describes the parameters for each layer used in the proposed architecture of dataset CASIA-B.

Table 3 The architecture of the proposed concatenation model between two phases based on CASIA-B dataset

Full size table

4 Results and discussion

Before the proposed models’ results discuss, some main performance evaluation metrics are basic to verify the proposed models’ robustness in the training and testing phases. These metrics compute from the confusion matrix, which contains the number of rows and columns, and each row represents one model’s class. Moreover, it includes the predicted and actual values of the recognition rate. For calculating the performance metrics values, some data must be computed first from the confusion matrix, including true positive (TP), true negative (TN), false positive (FP), and false negative value (FN). The confusion matrix elements are computed from the following equations [53]:

$$ TP_{{{\text{Class}}\, X}} = C_{i,i} $$

(22)

$$ {\text{FN}}_{{{\text{Class}} \,X}} = \sum\nolimits_{l = 1}^{N} {C_{i,l} } - {\text{TP}}_{{{\text{Class}}\, X}} $$

(23)

$$ {\text{FP}}_{{{\text{Class}}\, X}} = \sum\nolimits_{l = 1}^{N} {C_{l,i} } - {\text{TP}}_{{{\text{Class}}\, X}} $$

(24)

$$ {\text{TN}}_{{{\text{Class }}\,X}} = \sum\nolimits_{l = 1}^{N} {\sum\nolimits_{k = 1}^{N} {C_{l,k} } } - \left( {{\text{FP}}_{{{\text{Class}} \,X}} + {\text{FN}}_{{{\text{Class}}\, X}} + {\text{TP}}_{{{\text{Class}} \,X}} } \right) $$

(25)

where $C_{i,i}$ refers to the number of samples that are successfully recognized for a given class, $C_{i,l}$ is the number of negative samples that are mistakenly recognized for another class, $C_{l,i}$ is the number of positive samples that are mistakenly recognized for another class, and $C_{l,k}$ is the total number of samples.

The mentioned elements are then conducted on the proposed concatenated models to evaluate their performance metrics, including accuracy, sensitivity, specificity, precision, false discovery rate, F1-score, training time, and recall [54]. The accuracy of the FGFR model is computed from Eq. (26),

$$ {\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}} \times 100 $$

(26)

The sensitivity or true positive rate (TPR) is the measure of correctly recognized samples within a dataset, which is measured by Eq. (27) [55],

$$ {\text{TPR}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} \times 100 $$

(27)

The specificity or true negative rate (TNR) is the measure of not correctly recognized samples within a dataset, which is calculated from Eq. (28) [55],

$$ {\text{TNR}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{Fp}}}} \times 100 $$

(28)

Precision refers to the number of correctly predicted samples that turned out to be positive and computed from Eq. (29) [56],

$$ {\text{PPV}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}} \times 100 $$

(29)

The recall represents the value of actual positive samples correctly predicted from the model and calculated from Eq. (30) [56],

$$ R = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} \times 100 $$

(30)

F1-score computes the harmonic mean of recall and precision, which is calculated from Eq. (31) [57],

$$ F1 - {\text{score}} = \frac{{2{\text{TP}}}}{{2{\text{TP}} + {\text{FP}} + {\text{FN}}}} \times 100 $$

(31)

The false discovery rate (FDR) measures the proportion of irrelevant alerts, which is computed from Eq. (32) [58],

$$ {\text{FDR}} = \frac{{{\text{FP}}}}{{{\text{FP}} + {\text{TP}}}} \times 100 $$

(32)

The false negative rate (FNR) value is obtained from Eq. (33) [59],

To evaluate the proposed experiments, we used the following configuration. First, both phases of the dataset have been divided into three standard sets, train, test, and validation set, with a 7: 1.5: 1.5 ratio. The proposed network compiles and fits according to the number of the following tuning hyperparameters; batch normalization, which can reduce the value of internal shift of the activation layers with size 32. The learning rate value refers to the parameter updating step size with 0.001, and the momentum factor sets to 0.5, enhancing training speed and accuracy. Furthermore, the Adam optimizer is selected to train the proposed models, consuming less memory than the other optimizers and less computational power [50]. Because all used datasets have multiple outputs, the loss function chosen is the sparse categorical cross-entropy [60]. Table 4 describes the main hyperparameters utilized in the proposed models in this article. All mentioned proposed models were conducted on a PC having these specifications: Microsoft Windows 10 operating system, 7-core processor @ 4.0 GHz, 12 GB of RAM, NVidia Tesla 16 GB GPU.

Table 4 The tuning hyperparameters

Full size table

4.1 Evaluation of CASIA-A dataset

According to the optimal tuning parameters tabulated in Table 4, the proposed concatenated model has been evaluated by the CASIA-A dataset, which contains three different view angles, parallel, 45, and 90 degrees. The performance metrics of the CASIA-A dataset have been computed from the confusion matrix data, shown in Fig. 13. The main extracted elements from the confusion matrix are a true positive value of 952.33, a true negative of 1909.33, and a false positive; the same as the false-negative is 4.66. When applying these values to equations of performance metrics from Eq. (26) to (33), their values are listed in Table 5, and the detailed accuracy of each label is listed in Table 6. Figure 14 shows the loss and accuracy curves of the proposed fused model, which scored 99.6% of accuracy in a total running time of nearly 4 min 25 s.

Table 5 Proposed model evaluation metrics results based on CASIA-A dataset

Full size table

Table 6 The detailed accuracy of each label

Full size table

4.2 Evaluation of CASIA-B dataset

In addition, the proposed model has been tested using the CASIA-B dataset with the same mentioned optimum tuning parameters in Table 4. Then, the experimental results of the proposed model have been extracted from the confusion matrix in Fig. 15, which scored 99.8% accuracy in a total running time of nearly 5 min 57 s, and the other performance evaluation metrics are listed in Table 7. After applying the three hold-out experiments on the CASIA-B dataset, their results are listed in Table 8, which scored higher accuracy values in view angles of 36°, 54°, and 144° than the other view angles. Furthermore, the proposed concatenated model scores high gait authentication rates in various conditions (nm, cl, bg), owing to using the fused features of human poses and silhouette images. The experimental results indicate that the FGFR model scores high accuracy in all subject view angles and walking conditions. Furthermore, Fig. 16 shows the loss and accuracy curves, including the number of epochs on the x-axis and the improvement on the y-axis.

Table 7 Proposed model evaluation metrics results based on the CASIA-B dataset

Full size table

Table 8 Detailed accuracy of the CASIA-B dataset

Full size table

Moreover, the proposed model has been repeated three times in different cases, and Table 9 refers to their results. Figure 17 shows three scene conditions’ loss and accuracy values, and Fig. 18 shows their precision–recall curves. Finally, a complete comparison has been applied between the CASIA-B dataset and the existing state-of-the-art studies, including [61,62,63] and [64], are then listed in Table 10. From the table results, the performance of the proposed algorithm outperforms the recent studies, which scored 99.8% in recognition accuracy in the low time.

Table 9 Proposed model evaluation metrics results based on three different conditions of CASIA-B dataset

Full size table

Table 10 Various studies on CASIA-B-based accuracy

Full size table

In addition to this, we applied a complete comparison study between our proposed concatenated model results and the commonly used pre-trained models, including, LeNet, AlexNet, GoogLeNet, Xception, and ResNet, listed in Table 11, and then repeated with the current state-of-the-art studies to show the proposed model robustness, listed in Table 12. The results declared that the proposed system scored 99.8 in the case of using dataset CASIA-B in 1.17 s; moreover, it scored 99.6 in the CASIA-A dataset in 0.29 s. From the mentioned results, we noticed the ability of the proposed model to enhance and recognize the gait image of subjects regardless of any covariate factors (bag or clothes) in low training time, which improved the accuracy value by 0.7% [64] and 9.48% than the study in [68].

Table 11 Performance results on Casia (B) gait dataset

Full size table

Table 12 Comparative studies between the proposed FGFR model and the recent gait articles

Full size table

5 Conclusions

This paper proposes a novel model for improving the recognition rate of humans via their gaits. The proposed system combines model-based and model-free features to limit the covariate factors in the dataset, such as carrying bags or clothes coats. The first step is to separate RGB gait videos into frames in the state-of-the-art model-free phase. Those frames have been preprocessed, enhanced, and segmented from the background to obtain silhouette images, which would be the kernel input for the proposed CNN to form the first features vector. In addition, in a model-based phase, the annotated joints, limb, and static joint distances have been estimated from the poses of the RGB gait frames, which would be the kernel input for the proposed fully connected layering deep structure to extract the second features vector. To build a deep CNN recognizer, a concatenated fusion has been utilized between the previously discussed two features’ vectors. A complete comparison study has been applied between the proposed model and other recent studies. Experimental results show that the proposed model outperforms other current techniques, scoring 99.8% and 99.6% accuracy on CASIA-B and CASIA-A datasets with a noticeably low training time. Finally, the proposed model can quickly identify and authenticate humans via their gait images as a fast and more accurate methodology than any other recent state of technique.

References

Liu X, You Z, He Y, Bi S, Wang J (2022) Symmetry-driven hyper feature GCN for skeleton-based gait recognition. Pattern Recognit 125:108520
Article Google Scholar
Aboalhsan A, Alatawi MN (2022) Deep learning technique for fingerprint recognition. In: 2022 2nd International Conference on Computing and Information Technology (ICCIT). IEEE, pp 340–343
Jia L, Shi X, Sun Q, Tang X, Li P (2022) Second-order convolutional networks for iris recognition. Appl Intell 1–15
Khosravy M, Nakamura K, Hirose Y, Nitta N, Babaguchi N (2022) Model inversion attack by integration of deep generative models: privacy-sensitive face generation from a face recognition system. IEEE Trans Inform Forensics Secur
Lüking M (2022) Simulating structural transitions during protein-DNA recognition. Biophys J 121(3):480a–481a
Article Google Scholar
Li H et al (2022) GaitSlice: a gait recognition model based on spatio-temporal slice features. Pattern Recogn 124:108453
Article Google Scholar
Han F, Li X, Zhao J, Shen F (2022) A unified perspective of classification-based loss and distance-based loss for cross-view gait recognition. Pattern Recogn 125:108519
Article Google Scholar
Liao R, Yu S, An W, Huang Y (2020) A model-based gait recognition method with body pose and human prior knowledge. Pattern Recogn 98:107069
Article Google Scholar
Gupta SK, Chattopadhyay P (2021) Gait recognition in the presence of co-variate conditions. Neurocomputing 454:76–87
Article Google Scholar
Tian H, Ma X, Wu H, Li Y (2022) Skeleton-based abnormal gait recognition with spatio-temporal attention enhanced gait-structural graph convolutional networks. Neurocomputing 473:116–126
Article Google Scholar
Alobaidi H, Clarke N, Li F, Alruban A (2022) Real-world smartphone-based gait recognition. Comput Secur 113:102557
Article Google Scholar
Martinez-Hernandez U, Awad MI, Dehghani-Sanij AA (2022) Learning architecture for the recognition of walking and prediction of gait period using wearable sensors. Neurocomputing 470:1–10
Article Google Scholar
Sheng W, Li X (2021) Multi-task learning for gait-based identity recognition and emotion recognition using attention enhanced temporal graph convolutional network. Pattern Recogn 114:107868
Article Google Scholar
Altilio R, Rossetti A, Fang Q, Gu X, Panella M (2021) A comparison of machine learning classifiers for smartphone-based gait analysis. Med Biol Eng Comput 59(3):535–546
Article Google Scholar
Saleh AM, Hamoud T (2021) Analysis and best parameters selection for person recognition based on gait model using CNN algorithm and image augmentation. J Big Data 8(1):1–20
Article Google Scholar
Wen J, Wang X (2021) Gait recognition based on sparse linear subspace. IET Image Proc 15(12):2761–2769
Article Google Scholar
Gao S, Yun J, Zhao Y, Liu L (2021) Gait‐D: skeleton‐based gait feature decomposition for gait recognition. IET Comput Vis
Hasan MM, Mustafa HA (2021) Learning view-invariant features using stacked autoencoder for skeleton-based gait recognition. IET Comput Vis 15(7):527–545
Article Google Scholar
Xiao J, Yang H, Xie K, Zhu J, Zhang J (2021) Learning discriminative representation with global and fine‐grained features for cross‐view gait recognition. CAAI Trans Intell Technol
Gul S, Malik MI, Khan GM, Shafait F (2021) Multi-view gait recognition system using spatio-temporal features and deep learning. Expert Syst Appl 179:115057
Article Google Scholar
Lee M, Lee J-H, Kim D-H (2022) Gender recognition using optimal gait feature based on recursive feature elimination in normal walking. Expert Syst Appl 189:116040
Article Google Scholar
Yusuf SI, Adeshina SA, Boukar MM (2022) Upper gait analysis for human identification using convolutional–recurrent neural network. J Theor Appl Inform Technol 100(13)
Zhang Z, Wang Z, Lei H, Gu W (2022) Gait phase recognition of lower limb exoskeleton system based on the integrated network model. Biomed Signal Process Control 76:103693
Article Google Scholar
Dong D, Ma C, Wang M, Vu HT, Vanderborght B, Sun Y (2023) A low-cost framework for the recognition of human motion gait phases and patterns based on multi-source perception fusion. Eng Appl Artif Intell 120:105886
Article Google Scholar
Khan MA et al (2023) HGRBOL2: human gait recognition for biometric application using Bayesian optimization and extreme learning machine. Future Gener Comput Syst
Ismail WN, Alsalamah HA, Hassan MM, Mohamed E (2023) AUTO-HAR: an adaptive human activity recognition framework using an automated CNN architecture design. Heliyon
Teepe T, Khan A, Gilg J, Herzog F, Hörmann S, Rigoll G (2021) Gaitgraph: graph convolutional network for skeleton-based gait recognition. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2314–2318
Kumar M, Singh N, Kumar R, Goel S, Kumar K (2021) Gait recognition based on vision systems: A systematic survey. J Vis Commun Image Represent 75:103052
Article Google Scholar
Zheng S, Zhang J, Huang K, He R, Tan T (2011) Robust view transformation model for gait recognition. In: 2011 18th IEEE International Conference on Image Processing. IEEE, pp 2073–2076
Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: 18th International Conference on Pattern Recognition (ICPR'06), vol 4. IEEE, pp 441–444
Shui-Hua W, Khan MA, Govindaraj V, Fernandes SL, Zhu Z, Yu-Dong Z (2022) Deep rank-based average pooling network for COVID-19 recognition. Comput Mater Contin 2797–2813
Wang L, Tan T, Ning H, Hu W (2003) Silhouette analysis-based gait recognition for human identification. IEEE Trans Pattern Anal Mach Intell 25(12):1505–1518
Article Google Scholar
Tan D, Huang K, Yu S, Tan T (2006) Efficient night gait recognition based on template matching. In: 18th International Conference on Pattern Recognition (ICPR'06), vol. 3. IEEE, pp 1000–1003
Mishro PK, Agrawal S, Panda R, Abraham A (2021) A novel brightness preserving joint histogram equalization technique for contrast enhancement of brain MR images. Biocybern Biomed Eng 41(2):540–553
Article Google Scholar
Khairnar S, Thepade SD, Gite S (2021) Effect of image binarization thresholds on breast cancer identification in mammography images using OTSU, Niblack, Burnsen, Thepade’s SBTC. Intell Syst Appl 10:200046
Google Scholar
Zhou Y et al (2022) Application of mathematical morphology operation with memristor-based computation-in-memory architecture for detecting manufacturing defects. Fundam Res 2(1):123–130
Article MathSciNet Google Scholar
Švábek D (2018) Comparison of morphological face filling in image with human-made fill. In: AIP Conference Proceedings, vol. 2040, no. 1. AIP Publishing LLC, p 030009
Donon Y, Paringer R, Kupriyanov A (2020) Image normalization for blurred image matching. In: CEUR workshop proceedings. pp 127–131
Wan C, Wang L, Phoha VV (2018) A survey on gait recognition. ACM Comput Surv (CSUR) 51(5):1–35
Article Google Scholar
Nguyen V-D, Trung NL, Abed-Meraim K (2022) Robust subspace tracking algorithms using fast adaptive Mahalanobis distance. Signal Process 195:108402
Article Google Scholar
Wu X, An W, Yu S, Guo W, García EB (2019) Spatial-temporal graph attention network for video-based gait recognition. In: Asian Conference on Pattern Recognition. Springer, pp 274-286
Thomas K, Pushpalatha K (2021) A comparative study of the performance of gait recognition using gait energy image and shannon’s entropy image with CNN. In: Data Science and Security, pp. 191–202.
Wu Z, Huang Y, Wang L, Wang X, Tan T (2016) A comprehensive study on cross-view gait based human identification with deep cnns. IEEE Trans Pattern Anal Mach Intell 39(2):209–226
Article Google Scholar
Sivarathinabala M, Abirami S (2019) AGRS: automated gait recognition system in smart environment. J Intell Fuzzy Syst 36(3):2511–2525
Article Google Scholar
Wu Y-C, Lin S-X, Lin J-Y, Han C-C, Chang C-S, Jiang J-X (2022) Development of ai algorithm for weight training using inertial measurement units. Appl Sci 12(3):1422
Article Google Scholar
Singh JP, Jain S, Arora S, Singh UP (2018) Vision-based gait recognition: a survey. IEEE Access 6:70497–70527
Article Google Scholar
Sun J, Wang Y, Li J, Wan W, Cheng D, Zhang H (2018) View-invariant gait recognition based on kinect skeleton feature. Multimed Tools Appl 77(19):24909–24935
Article Google Scholar
Ferrini L, Lemaignan S (2022) Kinematically-consistent real-time 3D human body estimation for physical and social HRI. In: Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction. pp 765–767
Lai Z, Deng H (2018) Medical image classification based on deep features extracted by deep model and statistic feature fusion with multilayer perceptron‬. Comput. Intell. Neurosci. 2018
Ghosh A, Sufian A, Sultana F, Chakrabarti A, De D (2020) Fundamental concepts of convolutional neural network. In: Balas VE, Kumar R, Srivastava R (eds) Recent trends and advances in artificial intelligence and internet of things, vol 172. Intelligent Systems Reference Library, Springer, New Year, pp 519–567
Bačanin Džakula N (2019) Convolutional neural network layers and architectures. In: Sinteza 2019-International Scientific Conference on Information Technology and Data Related Research. Singidunum University. pp 445–451
Banerjee K, Gupta RR, Vyas K, Mishra B (2020) Exploring alternatives to softmax function. arXiv preprint arXiv:2011.11538
El Gannour O et al (2021) Concatenation of pre-trained convolutional neural networks for enhanced COVID-19 screening using transfer learning technique. Electronics 11(1):103
Article Google Scholar
Grandini M, Bagli E, Visani E (2020) Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756
Shreffler J, Huecker MR (2020) Diagnostic testing accuracy: sensitivity, specificity, predictive values and likelihood ratios
Krstinić D, Braović M, Šerić L, Božić-Štulić D (2020) Multi-label classifier performance evaluation with confusion matrix. Comput Sci Inf Technol 10:1–14
Google Scholar
Bhandari A (2020) Everything you should know about confusion matrix for machine learning. Anal Vidhya (2020).
Lei L, Ramdas A, Fithian W (2021) A general interactive framework for false discovery rate control under structural constraints. Biometrika 108(2):253–267
Article MathSciNet MATH Google Scholar
Kanji JN et al (2021) False negative rate of COVID-19 PCR testing: a discordant testing analysis. Virol J 18(1):1–6
Article Google Scholar
Raksa ARA, Padukawan F, Aji KK, Alamsyah MR, Octaviyani S, Laksana EA (2022) Wall-following robot navigation classification using deep learning with Sparse Categorical Crossentropy Loss Function. Central Asia and the caucasus, vol. 23, no. 1, 2022.
Rao H, et al (2021) A self-supervised gait encoding approach with locality-awareness for 3d skeleton based person re-identification. IEEE Trans Pattern Anal Mach Intell
Ben X, Zhang P, Lai Z, Yan R, Zhai X, Meng W (2019) A general tensor representation framework for cross-view gait recognition. Pattern Recogn 90:87–98
Article Google Scholar
Zhang Z, Tran L, Liu F, Liu X (2020) On learning disentangled representations for gait recognition. IEEE Trans Pattern Anal Mach Intell 44(1):345–360
Article Google Scholar
Elharrouss O, Almaadeed N, Al-Maadeed S, Bouridane A (2021) Gait recognition for person re-identification. J Supercomput 77(4):3653–3672
Article Google Scholar
Gao S, Yun J, Zhao Y, Liu L (2022) Gait-D: Skeleton-based gait feature decomposition for gait recognition. IET Comput Vis 16(2):111–125
Article Google Scholar
Chao H, He Y, Zhang J, Feng J (2019) Gaitset: Regarding gait as a set for cross-view gait recognition. Proc AAAI Conf Artif Intell 33(01):8126–8133
Google Scholar
H. Guo et al (2020) Gait recognition based on the feature extraction of Gabor filter and linear discriminant analysis and improved local coupled extreme learning machine. Math Problems Eng 2020
Bukhari M et al (2020) An efficient gait recognition method for known and unknown covariate conditions. IEEE Access 9:6465–6477
Article Google Scholar
"CASIA gait dataset," http://www.cbsr.ia.ac.cn/users/szheng/?page_id=71.

Download references

Author information

Authors and Affiliations

Delta Higher Institute for Engineering and Technology, Mansoura, Egypt
Reem N. Yousef
Electronics and Communications Engineering Department, Faculty of Engineering, Mansoura University, Mansoura, Egypt
Abeer T. Khalil & Ahmed S. Samra
Department of Communications and Electronics Engineering, MISR Higher Institute for Engineering and Technology, Mansoura, Egypt
Mohamed Maher Ata

Authors

Reem N. Yousef
View author publications
You can also search for this author in PubMed Google Scholar
Abeer T. Khalil
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed S. Samra
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Maher Ata
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RNY: Conceptualization, Methodology, Software, Data curation, Visualization, Original draft preparation, and Investigation. ATK: Supervision, Reviewing, Methodology, Original draft preparation, and Editing.ASS: Supervision, Data curation, and Reviewing. MMA: Supervision, Methodology, Software, Data curation, Writing- Original draft preparation, Reviewing, and Editing.

Corresponding author

Correspondence to Mohamed Maher Ata.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yousef, R.N., Khalil, A.T., Samra, A.S. et al. Model-based and model-free deep features fusion for high performed human gait recognition. J Supercomput 79, 12815–12852 (2023). https://doi.org/10.1007/s11227-023-05156-9

Download citation

Accepted: 01 March 2023
Published: 19 March 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s11227-023-05156-9

Model-based and model-free deep features fusion for high performed human gait recognition

Abstract

Similar content being viewed by others

HGANet-23: a novel architecture for human gait analysis based on deep neural network and improved satin bowerbird optimization

Deep mutual learning network for gait recognition

Different gait combinations based on multi-modal deep CNN architectures

1 Introduction

2 Related works

3 Frameworks for gait recognition

3.1 Methodology of the proposed model

3.2 Dataset description

3.3 Preprocessing procedures

3.4 Feature extraction methods

3.4.1 Appearance-based feature representation model

3.4.1.1 Silhouettes inputs

3.4.1.2 Gait energy images

3.4.2 Model-based feature representation model

3.4.3 Proposed feature representation model

3.5 Proposed pre-trained models

3.5.1 Concatenation model for CASIA-A dataset

3.5.2 Concatenation model for CASIA-B dataset

4 Results and discussion

4.1 Evaluation of CASIA-A dataset

4.2 Evaluation of CASIA-B dataset

5 Conclusions

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Model-based and model-free deep features fusion for high performed human gait recognition

Abstract

Similar content being viewed by others

HGANet-23: a novel architecture for human gait analysis based on deep neural network and improved satin bowerbird optimization

Deep mutual learning network for gait recognition

Different gait combinations based on multi-modal deep CNN architectures

1 Introduction

2 Related works

3 Frameworks for gait recognition

3.1 Methodology of the proposed model

3.2 Dataset description

3.3 Preprocessing procedures

3.4 Feature extraction methods

3.4.1 Appearance-based feature representation model

3.4.1.1 Silhouettes inputs

3.4.1.2 Gait energy images

3.4.2 Model-based feature representation model

3.4.3 Proposed feature representation model

3.5 Proposed pre-trained models

3.5.1 Concatenation model for CASIA-A dataset

3.5.2 Concatenation model for CASIA-B dataset

4 Results and discussion

4.1 Evaluation of CASIA-A dataset

4.2 Evaluation of CASIA-B dataset

5 Conclusions

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords