CN107480600A

CN107480600A - A kind of gesture identification method based on depth convolutional neural networks

Info

Publication number: CN107480600A
Application number: CN201710597440.3A
Authority: CN
Inventors: 王修晖
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2017-07-20
Filing date: 2017-07-20
Publication date: 2017-12-15

Abstract

The invention discloses a kind of gesture identification method based on depth convolutional neural networks, including：(1) division of edge detection process and sample set is carried out to the sample image of training set；（2）Build depth convolutional neural networks；（3）Determine activation primitive and loss function；（4）Train deep neural network；（5）Gesture identification is realized according to the depth convolutional neural networks after training：Its step includes：A) hand-type image is extracted from gesture data to be identified；B) hand-type image is subjected to rim detection and size normalized；C) normalized hand-type image is input in depth convolutional neural networks, the output valve according to output layer judges the ownership class of current gesture.The present invention uses multiple down-sampling technique construction depth convolutional neural networks, and carries out the training of neutral net as activation primitive using hyperbolic tangent function, can not only improve the efficiency of gesture identification, and can improve the accuracy rate of gesture identification.

Description

A Gesture Recognition Method Based on Deep Convolutional Neural Network

技术领域technical field

本发明涉及生物特征识别领域，尤其涉及一种基于深度卷积神经网络的手势识别方法。The invention relates to the field of biological feature recognition, in particular to a gesture recognition method based on a deep convolutional neural network.

背景技术Background technique

生物特征识别是视频监控、安全认证等领域的关键技术之一。生物特征可以分为生理特征与行为特征。生理特征主要包括人脸、指纹和虹膜等，行为特征则包括步态、手势等。典型的基于生理特征的识别方法有指纹识别，手掌形状及轮廓识别，人脸识别，虹膜识别等。指纹识别是目前应用最广泛的基于生物特征的身份识别方法之一。指纹识别具有技术成熟，成本低廉等优点。其缺点是接触式的，具有侵犯性，存在卫生方面的问题，同时指纹也是易磨损的。人脸识别技术是近年来非常活跃的一个研究领域，具有直观性好、方便、友好、容易被人接受的优点。人脸识别是非接触式的，被动识别，不需要人的主动配合；但缺点是容易受到光照、视角、遮挡物、环境、表情等的影响，造成识别困难。虹膜特征识别的安全度和精准度非常高，但是特征采集过程非常困难。基于行为特征的身份识别技术，常见的有步态识别与手势识别。步态识别的输入是一段行走的视频图像序列，数据量很大，导致计算复杂性高，处理起来比较困难。而手势识别作为非接触式人机交互的重要组成部分，目前大多数研究者均将注意力集中在手势的最终识别方面，通常会将手势背景简化，并在单一背景下利用所研究的算法将手势进行分割，然后采用常用的识别方法将手势表达的含义通过系统分析出来。Biometric identification is one of the key technologies in the fields of video surveillance and security authentication. Biological characteristics can be divided into physiological characteristics and behavioral characteristics. Physiological features mainly include face, fingerprints, and irises, while behavioral features include gait, gestures, etc. Typical recognition methods based on physiological characteristics include fingerprint recognition, palm shape and contour recognition, face recognition, iris recognition, etc. Fingerprint recognition is currently one of the most widely used biometric-based identification methods. Fingerprint identification has the advantages of mature technology and low cost. Its disadvantage is that it is contact-type, invasive, has health problems, and fingerprints are also easy to wear. Face recognition technology is a very active research field in recent years. It has the advantages of good intuition, convenience, friendliness, and easy acceptance. Face recognition is non-contact, passive recognition, and does not require the active cooperation of people; but the disadvantage is that it is easily affected by light, viewing angle, occlusion, environment, expression, etc., making recognition difficult. The security and accuracy of iris feature recognition are very high, but the process of feature collection is very difficult. Identification technologies based on behavioral characteristics, common ones are gait recognition and gesture recognition. The input of gait recognition is a sequence of walking video images, which has a large amount of data, resulting in high computational complexity and difficult processing. Gesture recognition is an important part of non-contact human-computer interaction. At present, most researchers focus on the final recognition of gestures, usually simplifying the background of gestures, and using the researched algorithm in a single background. Gestures are segmented, and then the meaning expressed by the gestures is analyzed by the system using commonly used recognition methods.

发明内容Contents of the invention

本发明的目的在于针对现有技术的不足，提供一种基于深度卷积神经网络的手势识别方法。The object of the present invention is to provide a gesture recognition method based on a deep convolutional neural network to address the deficiencies of the prior art.

本发明的目的是通过以下技术方案来实现的：一种基于深度神经网络的手势识别方法，包括以下步骤：The object of the present invention is achieved by the following technical solutions: a gesture recognition method based on deep neural network, comprising the following steps:

(1)对训练集的样本图像进行边缘检测处理和样本集的划分：首先对训练用的手势集图像进行手型检测和边缘检测处理，并将提取的手型图像调整到统一的尺寸；将预处理后的数据划分为训练样本集和验证样本集；(1) Carry out edge detection processing and division of sample sets on the sample images of the training set: first, carry out hand type detection and edge detection processing on the gesture set images for training, and adjust the extracted hand type images to a uniform size; The preprocessed data is divided into training sample set and verification sample set;

(2)构建深度卷积神经网络：设I和O分别为深度卷积神经网络的输入层和输出层，I和O之间的隐藏层为H1、H2、…、Hn。其中，输入层为步骤(1) 获得的手型图像，输出层为一个长度为N的手势特征向量，隐藏层采用多重下采样技术，允许下采样的分块之间有重叠；(2) Constructing a deep convolutional neural network: Let I and O be the input layer and output layer of the deep convolutional neural network respectively, and the hidden layers between I and O are H1, H2, ..., Hn. Wherein, the input layer is the hand image obtained in step (1), the output layer is a gesture feature vector with a length of N, and the hidden layer adopts multiple down-sampling technology, which allows overlapping between sub-blocks that are down-sampled;

(3)确定激活函数和损失函数：选择公式(1)所示的非线性双曲正切函数作为神经元的激活函数；选择公式(2)所示的损失函数。其中，n是训练集中样本的个数，x为手型图像，y为对应于x的手势特征向量，θ为参数向量；(3) Determine the activation function and loss function: select the nonlinear hyperbolic tangent function shown in formula (1) as the activation function of the neuron; select the loss function shown in formula (2). Among them, n is the number of samples in the training set, x is the hand image, y is the gesture feature vector corresponding to x, and θ is the parameter vector;

(4)训练深度神经网络：从训练样本集中选取m个训练样本，采用最速梯度下降法计算梯度；然后，采用验证样本集进行验证，当正确率超过预设阈值99.5％时，结束训练，从而得到具有确定权值w和偏置项b的深度神经网络；(4) Training deep neural network: select m training samples from the training sample set, and use the fastest gradient descent method to calculate the gradient; then, use the verification sample set for verification, and when the correct rate exceeds the preset threshold of 99.5%, the training ends, thereby Obtain a deep neural network with a certain weight w and bias item b;

(5)根据训练后的深度卷积神经网络实现手势识别：其步骤包括：a)从待识别手势数据中提取手型图像；b)将手型图像进行边缘检测和尺寸归一化处理； c)将手型图像输入到深度卷积神经网络中，依据输出层的输出值判定当前手势的归属类。(5) Realize gesture recognition according to the deep convolutional neural network after training: the steps include: a) extracting the hand shape image from the gesture data to be recognized; b) performing edge detection and size normalization processing on the hand shape image; c ) input the hand shape image into the deep convolutional neural network, and determine the attribution of the current gesture according to the output value of the output layer.

本发明的有益效果是：本发明的基于深度卷积神经网络的手势识别方法，使用多重下采样技术构建深度卷积神经网络，并采用双曲正切函数作为激活函数进行神经网络的训练，不但能提高手势识别的效率，而且也能提高手势识别的准确率。The beneficial effects of the present invention are: the gesture recognition method based on the deep convolutional neural network of the present invention uses multiple down-sampling techniques to construct a deep convolutional neural network, and uses the hyperbolic tangent function as the activation function to train the neural network, which not only can Improve the efficiency of gesture recognition, but also improve the accuracy of gesture recognition.

附图说明Description of drawings

图1是本方法的实施流程；Fig. 1 is the implementation process of this method;

图2手型图像提取示意图；Figure 2 Schematic diagram of hand image extraction;

图3多重下采样示意图；Figure 3 Schematic diagram of multiple downsampling;

图4双曲正切函数曲线；Fig. 4 hyperbolic tangent function curve;

图5识别率比较的错误率数据。Figure 5 Error rate data for recognition rate comparison.

具体实施方式detailed description

下面结合附图对本发明进行详细的描述。The present invention will be described in detail below in conjunction with the accompanying drawings.

如图1所示，本发明基于深度卷积神经网络的手势识别方法分为训练阶段与识别阶段。As shown in FIG. 1 , the gesture recognition method based on the deep convolutional neural network of the present invention is divided into a training stage and a recognition stage.

在训练阶段，构建深度卷积神经网络，并使用训练集数据确定其权值和偏移参数的值。具体包括以下子步骤：In the training phase, a deep convolutional neural network is constructed, and the values of its weight and offset parameters are determined using the training set data. Specifically include the following sub-steps:

1、从训练数据中提取手型图像：针对手势交互中的一帧，如图2(a)所示，本发明首先根据肤色特征提取粗略的手部区域图像，如图2(b)所示；其次，采用滤波和快速生态学膨胀、腐蚀算法对手部区域图像进行修正，得到进一步的手型图像，如图2(c)所示；1. Extract hand shape images from training data: for a frame in gesture interaction, as shown in Figure 2(a), the present invention first extracts rough hand region images according to skin color features, as shown in Figure 2(b) ;Secondly, the image of the hand area is corrected by filtering and fast ecological expansion and erosion algorithm, and a further hand image is obtained, as shown in Figure 2(c);

2、对手型图像进行边缘检测和尺寸归一化处理：对步骤1得到的手型图像进行边缘检测和二值化处理，得到初步的手型轮廓图像，如图2(d)所示；然后，对手型轮廓曲线进行修剪和完善，并调整到统一的尺寸得到对应的手型图像；2. Carry out edge detection and size normalization processing on the hand shape image: carry out edge detection and binarization processing on the hand shape image obtained in step 1 to obtain a preliminary hand shape contour image, as shown in Figure 2 (d); then , pruning and perfecting the contour curve of the hand shape, and adjusting it to a uniform size to obtain the corresponding hand shape image;

3、构建深度卷积神经网络：对于步骤2得到的手型图像，本发明构建了深度卷积神经网络用于手势识别。设I和O分别为深度卷积神经网络的输入层和输出层，I和O之间的隐藏层为H1、H2、…、Hn。其中，输入层为步骤2获得的手型图像，输出层为一个长度为N的手势特征向量，隐藏层采用多重下采样技术，允许下采样的分块之间有重叠，如图3所示；3. Constructing a deep convolutional neural network: For the hand image obtained in step 2, the present invention constructs a deep convolutional neural network for gesture recognition. Let I and O be the input layer and output layer of the deep convolutional neural network respectively, and the hidden layers between I and O are H1, H2, ..., Hn. Among them, the input layer is the hand shape image obtained in step 2, the output layer is a gesture feature vector with a length of N, and the hidden layer adopts multiple downsampling technology, which allows overlap between the downsampled blocks, as shown in Figure 3;

4、确定激活函数和损失函数：如图4所示，选择非线性双曲正切函数作为神经元的激活函数；选择公式(2)所示的平方误差函数作为损失函数。4. Determine the activation function and loss function: as shown in Figure 4, select the nonlinear hyperbolic tangent function as the activation function of the neuron; select the square error function shown in formula (2) as the loss function.

5、训练深度卷积神经网络：5. Train a deep convolutional neural network:

从训练样本集中选取m个训练样本，采用最速梯度下降法计算梯度，并根据损失函数对各隐藏层的参数进行迭代优化；然后，采用验证样本集进行验证，当正确率超过预设阈值99.5％时，结束训练，从而得到具有确定权值w和偏置项b 的深度神经网络。Select m training samples from the training sample set, use the fastest gradient descent method to calculate the gradient, and iteratively optimize the parameters of each hidden layer according to the loss function; then, use the verification sample set for verification, when the correct rate exceeds the preset threshold 99.5% When , end the training, so as to obtain a deep neural network with a certain weight w and bias item b.

在识别阶段，首先从待识别数据中提取手型图像，并进行边缘检测和尺寸归一化处理；然后，将其输入到训练好的深度卷积神经网络来判断当前手势的归属类。最后，将本方法与采用数据手套的方法、以及序列相似性检测(SSDA)方法进行了比较，图5给出了错误率实验数据。In the recognition stage, first extract the hand shape image from the data to be recognized, and perform edge detection and size normalization processing; then, input it into the trained deep convolutional neural network to judge the current gesture category. Finally, the method is compared with the method using data gloves and the sequence similarity detection (SSDA) method, and Fig. 5 shows the experimental data of the error rate.

Claims

1. a kind of gesture identification method based on depth convolutional neural networks, comprises the following steps：

(1) division of edge detection process and sample set is carried out to the sample image of training set：First to the gesture collection of training Image carries out hand-type detection and edge detection process, and by the hand-type Image Adjusting of extraction to unified size；After pre-processing Data be divided into training sample set and checking sample set；

(2) depth convolutional neural networks are built：If I and O are respectively the input layer and output layer of depth convolutional neural networks, I and O Between hidden layer for H1, H2 ..., Hn.Wherein, input layer is the hand-type image that step (1) obtains, and output layer is a length For N gesture feature vector, hidden layer uses multiple down-sampling technology, it is allowed to has between the piecemeal of down-sampling overlapping；

(3) activation primitive and loss function are determined：The non-linear hyperbolic tan shown in formula (1) is selected as neuron Activation primitive；Select the loss function shown in formula (2).Wherein, n is the number of sample in training set, and x is in hand-type image Point, y are the output valve corresponding to x, and θ is parameter vector；

<mrow> <mi>L</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munder> <mo>&Sigma;</mo> <mi>x</mi> </munder> <mo>|</mo> <mo>|</mo> <mi>y</mi> <mo>-</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

(4) deep neural network is trained：Concentrated from training sample and choose m training sample, calculated using steepest descent method Gradient；Then, verified using checking sample set, when accuracy exceedes predetermined threshold value, terminate training, so as to be had Determine weight w and bias term b deep neural network；

(5) gesture identification is realized according to the depth convolutional neural networks after training：Its step includes：A) from gesture data to be identified Middle extraction hand-type image；B) hand-type image is subjected to rim detection and size normalized；C) hand-type image is input to depth Spend in convolutional neural networks, the output valve according to output layer judges the ownership class of current gesture.