CN101377813A

CN101377813A - Method for real time tracking individual human face in complicated scene

Info

Publication number: CN101377813A
Application number: CNA2008102003102A
Authority: CN
Inventors: 寇超; 白琮; 陈泉林; 王华红; 王少波
Original assignee: SHANGHAI UNIVERSITY
Current assignee: Haian Soochow University Advanced Robot Research Institute
Priority date: 2008-09-24
Filing date: 2008-09-24
Publication date: 2009-03-04
Anticipated expiration: 2028-09-24
Also published as: CN101377813B

Abstract

The invention relates to a real-time tracking method for a single human face in a complex scene. The steps of the method are: (1) picture preprocessing: linear transformation and Gaussian template filtering are carried out on the picture captured by the video; (2) face detection is carried out by using the skin color feature: the human face is detected by using the aggregation of the skin color in the YCrCb color space Detect and perform morphological operations; (3) Elimination of skin-like areas: Combine the area and geometric features of the face to eliminate non-face skin-like areas, and after obtaining the ideal face area, use it to calculate the center of gravity of the face area , and use it as the center to set the initial search area; (4) Face tracking: use the face detection result as a new type of back-projection map, and use continuous adaptive mean shift method to realize face tracking based on it. The invention combines multiple features to extract human faces and realize tracking, the effect is stable and the calculation complexity is low, and the invention can be widely used in the fields of video monitoring, human-computer interaction interface and the like.

Description

Real-time Tracking Method for Individual Faces in Complex Scenes

技术领域 technical field

本发明涉及人机界面交互以及视频监控领域，特别是对复杂场景中单个人脸的实时追踪。具体是一种综合考虑多特征、利用新型反向投影图的连续均值漂移方法。The invention relates to the fields of man-machine interface interaction and video monitoring, especially the real-time tracking of a single human face in complex scenes. Specifically, it is a continuous mean shift method that comprehensively considers multiple features and utilizes a new type of back-projection graph.

背景技术 Background technique

在人机交互和视频监控等应用中，作为识别个人身份的显著特征之一，对人脸的检测和跟踪效果将直接关系到系统的整体性能。数十年来，已有无数学者对人脸跟踪问题作过广泛的研究和讨论，其中主要分为：基于人脸模型、基于特征匹配、基于运动信息以及基于肤色的方法。然而，由于存在诸如头部转动，光照、阴影及遮挡物等诸多因素的影响，导致实际复杂环境中的人脸跟踪向来都难以有效实现。In applications such as human-computer interaction and video surveillance, as one of the salient features for identifying personal identities, the detection and tracking of human faces will directly affect the overall performance of the system. For decades, countless scholars have done extensive research and discussion on the problem of face tracking, which are mainly divided into: methods based on face models, based on feature matching, based on motion information and based on skin color. However, due to the influence of many factors such as head rotation, lighting, shadows, and occluders, it has always been difficult to effectively implement face tracking in actual complex environments.

对现有技术性文献的查阅可知，在各种跟踪方法中，连续均值漂移算法比较令人满意。它由G.R.Bradski.在1998年Intel技术季刊上发表的一篇名为《Computer Vision Face Trackingfor Use in a Perceptual User Interface》的文章中提出并成功应用于计算机游戏人机交互界面中机器视觉对人脸运动的捕捉。文章以均值漂移算法为基础，利用搜索区域内在H色彩通道中的直方图统计计算视频序列后续各帧像素与肤色概率的相似程度，并以此为特征对人脸进行跟踪。然而，在H色彩通道对搜索区域进行直方图统计过程中易受搜索区域内背景像素的干扰，影响计算像素类肤色概率的准确度，进而不能达到良好的跟踪效果。同时，文中的初始搜索区域必须手动设定。A review of the existing technical literature shows that among various tracking methods, the continuous mean shift algorithm is more satisfactory. It was proposed by G.R. Bradski in an article titled "Computer Vision Face Tracking for Use in a Perceptual User Interface" published in the Intel Technology Quarterly in 1998 and successfully applied to the human-computer interaction interface of computer games. Motion capture. Based on the mean shift algorithm, the article uses the histogram statistics in the H color channel in the search area to calculate the similarity between the pixels of each subsequent frame of the video sequence and the probability of skin color, and uses this as a feature to track the face. However, in the process of performing histogram statistics on the search area by the H color channel, it is susceptible to the interference of background pixels in the search area, which affects the accuracy of calculating the probability of pixel-like skin color, and thus cannot achieve a good tracking effect. At the same time, the initial search area in the text must be set manually.

发明内容 Contents of the invention

本发明的目的在于克服上述已有技术的不足，提供一种复杂场景中单个人脸的实时跟踪方法，以人脸检测的二值化结果表征图像中像素类肤色的概率，并以此为特征对人脸进行跟踪。同时，利用首帧图像的人脸检测结果自动确定初始搜索区域，使算法自动运行。The purpose of the present invention is to overcome the deficiencies of the above-mentioned prior art, provide a real-time tracking method of a single face in a complex scene, and use the binarization result of face detection to represent the probability of pixel-like skin color in the image, and use this as a feature Track faces. At the same time, the initial search area is automatically determined by using the face detection result of the first frame image, so that the algorithm runs automatically.

为达到上述目的，本发明采用下述技术方案：To achieve the above object, the present invention adopts the following technical solutions:

一种针对复杂场景中单个人脸的实时跟踪方法，其步骤包括：A real-time tracking method for a single human face in a complex scene, the steps of which include:

(1)画面预处理：为减少环境光照和图像噪声的影响，本发明首先对图像实施分段灰度线形变换和高斯滤波，目的在于减少环境光照和图像噪声的干扰；(1) Picture preprocessing: in order to reduce the impact of ambient light and image noise, the present invention first implements segmented grayscale linear transformation and Gaussian filtering to the image, with the aim of reducing the interference of ambient light and image noise;

(2)利用肤色特征进行人脸检测：利用肤色在YCrCb色彩空间中的聚合性进行人脸检测，并通过形态学操作去除检测噪声；(2) Face detection using skin color features: use the aggregation of skin color in YCrCb color space to detect faces, and remove detection noise through morphological operations;

(3)类肤色区域的消除：针对类肤色区域，结合面积域值和人脸高、宽比限制条件进一步消除，通过计算人脸区域的空间权重来确定面部区域的中心点，并以其为中心设定初始搜索区域；(3) Elimination of the skin-like area: For the skin-like area, combined with the area threshold and the face height and width ratio constraints to further eliminate, the center point of the face area is determined by calculating the spatial weight of the face area, and it is used as The center sets the initial search area;

(4)人脸跟踪：将人脸检测二值化结果作被跟踪特征，利用连续自适应均值偏移方法实现跟踪。(4) Face tracking: The binarization result of face detection is used as the tracked feature, and the continuous adaptive mean shift method is used to realize the tracking.

实验结果表明本方法不需人工干预即可自动对画面中的人脸进行实时跟踪，准确率可达90％以上。The experimental results show that this method can automatically track the faces in the picture in real time without manual intervention, and the accuracy rate can reach more than 90%.

以下对各步骤作进一步说明：Each step is further explained below:

(1)画面预处理：包括灰度线形变换和高斯模板滤波。由于环境中复杂光照的影响，会使面部出现一些高光或昏暗的区域，而这些区域都会对肤色的判断产生影响。因此需要将画面的亮度进行线形调整，以使其达到整体均匀的效果。而由于视频捕获设备的硬件原因，难免会在图像中出现噪声，这同样会对像素类肤色区域的判断产生影响。利用高斯模板对图像进行卷积，可以在保持边缘的前提下较好地去除噪声。(1) Screen preprocessing: including grayscale linear transformation and Gaussian template filtering. Due to the influence of complex lighting in the environment, there will be some bright or dark areas on the face, and these areas will affect the judgment of skin color. Therefore, it is necessary to linearly adjust the brightness of the picture to achieve an overall uniform effect. Due to the hardware of the video capture device, it is inevitable that noise will appear in the image, which will also affect the judgment of the pixel-like skin color area. Using a Gaussian template to convolve the image can better remove noise while maintaining the edge.

(2)利用肤色特征进行人脸检测：在各种色彩空间中，YCrCb中的Cr、Cb通道内的肤色具有良好的聚合性，通过实验观测数据可得到比较理想的分割阈值，进而产生良好的肤色分割结果。同时对初始检测结果进行形态学操作，消除一些细小空洞和检测噪声。(2) Face detection using skin color features: in various color spaces, the skin color in the Cr and Cb channels of YCrCb has good aggregation, and an ideal segmentation threshold can be obtained through experimental observation data, and then a good segmentation threshold can be obtained. Skin color segmentation results. At the same time, morphological operations are performed on the initial detection results to eliminate some small holes and detection noise.

(3)类肤色区域的消除：利用肤色信息对面部进行检测固然存在诸多优势，然而仅凭借此单一信息总会受到类肤色区域的干扰，而这些区域是无法单纯依靠形态滤波消除掉的。因为有些类肤色区域较大，需要多次腐蚀才能消除，而多次处理的同时又会导致面部区域萎缩。本方法结合人脸面积和几何特征对这类区域进一步消除。首先是当区域小于一定面积时，认为此区域不属于人脸；其次是对人脸区域长、短轴比进行限定，不在此范围内的区域都将被消除，以尽可能减少类肤色区域的影响。同时，在检测出的人脸区域内计算重心位置，并以其为中心设定初始搜索区域，使算法自动运行。(3) Elimination of skin-like areas: There are many advantages in using skin-color information to detect faces, but only relying on this single information will always be interfered by skin-like areas, and these areas cannot be eliminated simply by morphological filtering. Because some types of skin color have large areas, multiple erosions are required to eliminate them, and multiple treatments will cause the facial area to shrink at the same time. This method combines face area and geometric features to further eliminate such areas. Firstly, when the area is smaller than a certain area, it is considered that this area does not belong to the face; secondly, the ratio of the long axis to the short axis of the face area is limited, and the areas that are not within this range will be eliminated to reduce the skin-like area as much as possible. Influence. At the same time, calculate the position of the center of gravity in the detected face area, and set the initial search area with it as the center, so that the algorithm can run automatically.

(4)人脸跟踪：以人脸检测结果作为新型反向投影图，对搜索窗口内的空间矩进行迭代，使当前搜索框中的特征趋向于目标特征的分布模式，从而指导连续均值漂移法自动调整搜索窗口的位置及长轴角度以预测出目标在下一帧中的信息，使跟踪算法持续进行，在保证良好跟踪效果的同时还兼具简单的计算量。(4) Face tracking: use the face detection result as a new type of back-projection image, and iterate the spatial moments in the search window, so that the features in the current search box tend to the distribution pattern of the target features, thereby guiding the continuous mean shift method Automatically adjust the position of the search window and the angle of the long axis to predict the information of the target in the next frame, so that the tracking algorithm can continue to perform, while ensuring a good tracking effect, it also has a simple amount of calculation.

上述的画面预处理的具体实现方法如下：The specific implementation method of the above picture preprocessing is as follows:

(1)线性变换(1) Linear transformation

设一幅图像的亮度由f(x，y)表示，(x，y)代表图像像素的空间位置，Min[f(x，y)]、Max[f(x，y)]为线性变换的最大、最小域值，则经过线性变换后的图像亮度G(x，y)为：Suppose the brightness of an image is represented by f(x, y), (x, y) represents the spatial position of the image pixel, and Min[f(x, y)], Max[f(x, y)] are linear transformation The maximum and minimum threshold values, then the image brightness G(x, y) after linear transformation is:

$G G ((x x,, y the y)) = = \{\begin{matrix} 00 & f f ((x x,, y the y)) \leq \leq Min Min [[f f ((x x,, y the y))]] \\ \frac{f f ((x x,, y the y)) - - Min Min [[f f ((x x,, y the y))]]}{Max Max [[f f ((x x,, y the y))]] - - Min Min [[f f ((x x,, y the y))]]} \times \times 256256 & Min Min [[f f ((x x,, y the y))]] < < f f ((x x,, y the y)) < < Max Max [[f f ((x x,, y the y))]] \\ 255255 & f f ((x x,, y the y)) &GreaterEqual; &Greater Equal; Max Max [[f f ((x x,, y the y))]] \end{matrix},,$

(2)高斯模版滤波(2) Gaussian template filtering

滤波过程采用离散化高斯模板对图像进行卷积，离散化高斯模板如下：The filtering process uses a discrete Gaussian template to convolve the image, and the discrete Gaussian template is as follows:

上述的利用肤色特征进行人脸检测的具体实现方法如下：The specific implementation method of the above-mentioned face detection using the skin color feature is as follows:

数字图像由RGB到YCrCb色彩空间的转换公式如下，其中R、G、B、Y、C_r、C_b分别代表像素在相应色彩通道中的像素值：The conversion formula of a digital image from RGB to YCrCb color space is as follows, where R, G, B, Y, C _r , and C _b represent the pixel values of pixels in the corresponding color channels:

$\{\begin{matrix} Y Y = = 0.299 0.299 \times \times R R + + 0.587 0.587 \times \times G G + + 0.114 0.114 \times \times B B \\ Cr Cr = = R R - - Y Y \\ Cb Cb = = B B - - Y Y \end{matrix},,$

之后利用如下公式，将满足条件的像素值设置为255，否则将像素值设置为0，以此得到人脸初步检测结果。其中C_r代表数字图像在相应通道内的像素值，C_rMax、C_rMin限定了像素属于人脸区域时C_r所满足的最大值、最小值范围；C_b、C_bMax、C_bMin所代表含义同理。(C_rMin<C_r<C_rMax)∩(C_bMin<C_b<C_bMax)，Then use the following formula to set the pixel value that satisfies the condition to 255, otherwise set the pixel value to 0, so as to obtain the preliminary detection result of the face. Among them, C _r represents the pixel value of the digital image in the corresponding channel, C _rMax and C _rMin limit the maximum and minimum value range that C _r satisfies when the pixel belongs to the face area; the meanings represented by C _b , C _bMax and C _bMin the same way. (C _rMin < _Cr < _CrMax )∩( _CbMin < _Cb < _CbMax ),

上述的类肤色区域的消除具体实现方法如下：The specific implementation method of the above-mentioned skin-color-like area elimination is as follows:

(1)综合考虑人脸面积和几何特征消除类肤色区域(1) Comprehensively consider the face area and geometric features to eliminate skin-like areas

对各独立区域求其面积，在数字图像中即为区域中的像素个数，当区域面积小于100时就将其从初步检测结果中消除；同时，对剩余的独立区域分别计算长、短轴比，根据面部特征和观测经验，将长、短轴比不在1.0～2.3范围内变化的区域进一步消除。Calculate the area of each independent area, which is the number of pixels in the area in the digital image. When the area of the area is less than 100, it will be eliminated from the preliminary detection results; at the same time, the long and short axes of the remaining independent areas are calculated respectively. According to the facial features and observation experience, the areas where the ratio of the major axis and the minor axis do not change within the range of 1.0 to 2.3 are further eliminated.

(2)初始搜索区域的确定(2) Determination of the initial search area

初始搜索区域由如下公式确定：The initial search area is determined by the following formula:

$x_{o} = \frac{Σ_{x} Σ_{y} xI (x, y)}{Σ_{x} Σ_{y} I (x, y)}$ $y_{o} = \frac{Σ_{x} Σ_{y} yI (x, y)}{Σ_{x} Σ_{y} I (x, y)},$ $x_{o} = \frac{Σ_{x} Σ_{the y} xI (x, the y)}{Σ_{x} Σ_{the y} I (x, the y)}$ ${the y}_{o} = \frac{Σ_{x} Σ_{the y} i (x, the y)}{Σ_{x} Σ_{the y} I (x, the y)},$

其中，(x_o，y_o)为初始搜索区域的中心，(x，y)代表人脸检测二值化图像像素的空间位置，I(x，y)为其在(x，y)处的像素值；以(x_o，y_o)为中心，设定一个200×100的矩形窗口作为初始搜索区域。Among them, (x _o , y _o ) is the center of the initial search area, (x, y) represents the spatial position of the binarized image pixel for face detection, and I(x, y) is its position at (x, y) Pixel value; with (x _o , y _o ) as the center, set a 200×100 rectangular window as the initial search area.

本发明与现有技术相比较，具有如下显而易见的突出实质性特点和显著优点：Compared with the prior art, the present invention has the following obvious outstanding substantive features and significant advantages:

本发明的人脸跟踪方法在具有较低计算量的同时可对复杂场景中人脸的旋转、倾斜等情况成功实现跟踪，当跟踪失败后也能自动重新进行，且对诸如手部类肤色区域的干扰不明显。同时，本发明还实现了初始搜索区域的自动设定，使整个过程可在无人干涉的情况下自动进行。本发明结合多重特征提取人脸并实现跟踪，效果稳定且计算复杂度低，可广泛应用于视频监控、人机交互界面等领域。The human face tracking method of the present invention can successfully track the rotation and tilt of the human face in complex scenes while having a low amount of calculation, and can automatically re-start when the tracking fails, and can track skin-colored areas such as hands interference is not obvious. At the same time, the invention also realizes the automatic setting of the initial search area, so that the whole process can be carried out automatically without human intervention. The invention combines multiple features to extract human faces and realize tracking, the effect is stable and the calculation complexity is low, and the invention can be widely used in the fields of video monitoring, human-computer interaction interface and the like.

附图说明 Description of drawings

图1为本发明的流程框图。Fig. 1 is a flow chart of the present invention.

图2为本发明中对初始检测结果进行形态学操作前后的比较。Fig. 2 is the comparison before and after the morphological operation on the initial detection results in the present invention.

图3为本发明利用的Cr、Cb色彩通道内的肤色聚合特性展示。Fig. 3 shows the aggregation characteristics of skin color in the Cr and Cb color channels utilized in the present invention.

图4为本发明的最终跟踪效果展示。Fig. 4 shows the final tracking effect of the present invention.

具体实施方式 Detailed ways

本发明的一个优选实施例结合附图说明如下：参见图1，本针对复杂场景中单个人脸的实时跟踪方法，其步骤包括：A preferred embodiment of the present invention is described as follows in conjunction with accompanying drawing: Referring to Fig. 1, this real-time tracking method for single human face in complex scene, its step comprises:

(1)画面预处理：对视频捕获的画面进行图像线性变换和高斯模板滤波；(1) Picture preprocessing: perform image linear transformation and Gaussian template filtering on the picture captured by the video;

(2)利用肤色特征进行人脸检测：利用肤色在YCrCb色彩空间的聚合性在CrCb通道中对人脸进行检测，并利用形态学操作初步去除检测后的噪声；(2) Face detection using skin color features: use the aggregation of skin color in the YCrCb color space to detect faces in the CrCb channel, and use morphological operations to initially remove the detected noise;

(3)类肤色区域的消除：针对场景中的类肤色区域，结合人脸区域面积和几何特征进一步消除，计算人脸区域质心，并以其为中心设定初始搜索区域；(3) Elimination of the skin-like area: for the skin-like area in the scene, combined with the face area and geometric features to further eliminate, calculate the centroid of the face area, and set the initial search area centered on it;

(4)人脸跟踪：将人脸检测结果作为新型反向投影图，利用连续自适应均值偏移法实现人脸跟踪。(4) Face tracking: The face detection result is used as a new back projection image, and the continuous adaptive mean shift method is used to realize face tracking.

上述画面预处理的实现方法如下：The implementation method of the above screen preprocessing is as follows:

在对人脸进行检测之前，需要对画面进行预处理。具体包括灰度线性变换和高斯模版卷积。其中灰度线性变换目的在于将画面的整体亮度调整均匀。调整过程按照公式(1)进行计算：Before detecting the face, the image needs to be preprocessed. Specifically, it includes grayscale linear transformation and Gaussian template convolution. Among them, the purpose of grayscale linear transformation is to adjust the overall brightness of the picture evenly. The adjustment process is calculated according to formula (1):

$G G ((x x,, y the y)) = = \{\begin{matrix} 00 & f f ((x x,, y the y)) \leq \leq Min Min [[f f ((x x,, y the y))]] \\ \frac{f f ((x x,, y the y)) - - Min Min [[f f ((x x,, y the y))]]}{Max Max [[f f ((x x,, y the y))]] - - Min Min [[f f ((x x,, y the y))]]} \times \times 256256 & Min Min [[f f ((x x,, y the y))]] < < f f ((x x,, y the y)) < < Max Max [[f f ((x x,, y the y))]] \\ 255255 & f f ((x x,, y the y)) &GreaterEqual; &Greater Equal; Max Max [[f f ((x x,, y the y))]] \end{matrix} - - - - - - ((11))$

其中，f(x，y)代表原始图像中的灰度值，(x，y)代表图像像素的空间位置，G(x，y)是线形变换后的灰度值。Min[f(x，y)]、Max[f(x，y)]为所设定的最大、最小域值。Among them, f(x, y) represents the gray value in the original image, (x, y) represents the spatial position of the image pixel, and G(x, y) is the gray value after linear transformation. Min[f(x, y)] and Max[f(x, y)] are the set maximum and minimum threshold values.

之后，采用如下的高斯模板与原始图像进行卷积，以消除由视频捕获设备带来的噪声。Afterwards, the following Gaussian template is used to convolve with the original image to remove the noise brought by the video capture device.

上述利用肤色特征进行人脸检测的实现方法如下：在各种色彩空间中，YCrCb中的Cr、Cb通道内的肤色具有良好的聚合性(见图3为一个肤色块在Cr、Cb通道内聚合性展示)，因此需要将图像从RGB转换到YCrCb色彩空间，其中R、G、B、Y、Cr、Cb分别代表像素在相应通道中的像素值，转换公式如公式(2)：The above-mentioned implementation method of using skin color features for face detection is as follows: in various color spaces, the skin color in the Cr and Cb channels in YCrCb has good aggregation (see Figure 3 for a skin color block aggregated in the Cr and Cb channels Sexual display), so it is necessary to convert the image from RGB to YCrCb color space, where R, G, B, Y, Cr, and Cb represent the pixel values of the pixels in the corresponding channels, and the conversion formula is as formula (2):

$\{\begin{matrix} Y Y = = 0.299 0.299 \times \times R R + + 0.587 0.587 \times \times G G + + 0.114 0.114 \times \times B B \\ Cr Cr = = R R - - Y Y \\ Cb Cb = = B B - - Y Y \end{matrix} - - - - - - ((22))$

如表1，根据实验观测表明，在不同时段、光照条件下，通过设定不同的域值，可利用公式(3)在Cr、Cb通道中对人脸进行检测，并将满足条件的像素值置为255，否则将像素值置为0，以此得到人脸初步检测结果As shown in Table 1, according to experimental observations, under different time periods and lighting conditions, by setting different threshold values, the formula (3) can be used to detect faces in the Cr and Cb channels, and the pixel values satisfying the conditions Set it to 255, otherwise set the pixel value to 0, so as to get the preliminary detection result of the face

(C_rMin<C_r<C_rMax)∩(C_bMin<C_b<C_bMax) (3)(C _rMin <C _r <C _rMax )∩(C _bMin <C _b <C _bMax ) (3)

表1 Table 1

之后按照表1对初始检测结果进行形态学操作，消除一些细小空洞和检测噪声(形态学操作结果展示见图2)。Then perform morphological operations on the initial detection results according to Table 1 to eliminate some small holes and detection noise (see Figure 2 for the results of morphological operations).

上述类肤色区域的消除的实现方法如下：当利用肤色特征完成人脸初步检测后，包括人脸区域在内会产生一些像素值为255的区域，要将这些类肤色区域消除：首先对各独立区域求其面积(在数字图像中即为区域中的像素个数)，当区域面积小于100时就将其从初步检测结果中消除。其次，再对剩余的独立区域分别计算长、短轴比。根据面部几何特征和观测经验，将长、短轴比不在1.0～2.3范围内变化的区域进一步消除。经过上述处理，绝大多数类肤色区域都能被顺利消除。The implementation method of the elimination of the above-mentioned skin-like areas is as follows: After the preliminary detection of the face is completed using the skin-color features, some areas with a pixel value of 255 will be generated including the face area. To eliminate these skin-like areas: first, each independent Calculate the area of the area (in the digital image, it is the number of pixels in the area), and when the area is less than 100, it will be eliminated from the preliminary detection results. Secondly, calculate the long-axis and short-axis ratios for the remaining independent regions respectively. According to the facial geometric characteristics and observation experience, the areas where the ratio of the major axis to the minor axis did not change within the range of 1.0 to 2.3 were further eliminated. After the above processing, most of the skin-like areas can be successfully eliminated.

设定初始搜索区域：Set the initial search area:

为使跟踪过程自动运行，利用公式(4)计算面部区域质心并以其为中心自动设定初始搜索区域：In order to make the tracking process run automatically, formula (4) is used to calculate the centroid of the face area and automatically set the initial search area with it as the center:

$x_{o} = \frac{Σ_{x} Σ_{y} xI (x, y)}{Σ_{x} Σ_{y} I (x, y)}$ $y_{o} = \frac{Σ_{x} Σ_{y} yI (x, y)}{Σ_{x} Σ_{y} I (x, y)} - - - (4)$ $x_{o} = \frac{Σ_{x} Σ_{the y} xI (x, the y)}{Σ_{x} Σ_{the y} I (x, the y)}$ ${the y}_{o} = \frac{Σ_{x} Σ_{the y} i (x, the y)}{Σ_{x} Σ_{the y} I (x, the y)} - - - (4)$

其中，(x_o，y_o)为人脸区域的中心，(x，y)代表人脸检测二值化图像像素的空间位置，I(x，y)为其在(x，y)处的亮度值。以(x_o，y_o)为中心，设定一个200×100的矩形窗口作为初始搜索区域。Among them, (x _o , y _o ) is the center of the face area, (x, y) represents the spatial position of the binarized image pixel for face detection, and I(x, y) is its brightness at (x, y) value. With (x _o , y _o ) as the center, set a 200×100 rectangular window as the initial search area.

上述人脸跟踪的实现方法如下：The implementation method of the above face tracking is as follows:

通过上述处理，人脸跟踪便可自动运行。这是一个迭代的计算过程，具体计算过程如下：Through the above-mentioned processing, the face tracking can run automatically. This is an iterative calculation process, and the specific calculation process is as follows:

(a)在视频帧序列的第一帧中进行人脸检测，若检测成功，则自动确定初始搜索区域；反之则继续对下一帧进行检测，直到检测人脸成功为止(a) Face detection is performed in the first frame of the video frame sequence. If the detection is successful, the initial search area is automatically determined; otherwise, continue to detect the next frame until the detection of the face is successful.

(b)在人脸检测的二值化图中，利用均值漂移算法通过一次或多次迭代计算区域的0~2阶矩，用以确定跟踪窗口的位置、长轴偏转角度(b) In the binary image of face detection, use the mean shift algorithm to calculate the 0~2 order moments of the region through one or more iterations to determine the position of the tracking window and the long axis deflection angle

(c)保持跟踪窗口尺寸不变，改变搜索框的中心位置及偏转角度，将其移动到由(b)确定的位置(c) Keep the tracking window size unchanged, change the center position and deflection angle of the search box, and move it to the position determined by (b)

(d)对后续每帧图像重复进行(b)、(c)步骤，直到窗口位置收敛，实现人脸跟踪。图4为本发明的最终跟踪效果展示，结果显示本发明不仅成功实现了对正脸的跟踪，对侧脸及头部偏转情况下也有良好的跟踪效果，除此之外，对诸如手部等类肤色区域的干扰也具有较强的鲁棒性。(d) Repeat steps (b) and (c) for each subsequent frame of image until the window position converges to realize face tracking. Figure 4 shows the final tracking effect of the present invention. The results show that the present invention not only successfully realizes the tracking of the front face, but also has a good tracking effect on the side face and head deflection. The interference in the skin-like area is also robust.

Claims

1. method for real time tracking at individual human face in the complex scene, its step comprises:

A. picture pre-service: the picture to Video Capture carries out image linear transformation and the filtering of Gauss's template;

B. utilizing features of skin colors to carry out people's face detects: utilize the colour of skin people's face to be detected in the CrCb passage at the polymerism of YCrCb color space, and utilize the noise after morphological operation is tentatively removed detection;

C. the elimination of class area of skin color: at the class area of skin color in the scene, further eliminate, calculate the human face region barycenter, and be center setting initial search area with it in conjunction with human face region area and geometric properties;

D. face tracking: with people's face testing result as novel trans to perspective view, utilize continuous adaptive mean shift method to realize face tracking.

2. the method for real time tracking at individual human face in the complex scene according to claim 1 is characterized in that the pretreated specific implementation method of described picture is as follows:

(1) linear transformation

If the brightness of piece image by f (x, y) expression, (x, the y) locus of representative image pixel, Min[f (x, y)], Max[f (x, y)] be the maximum of linear transformation, minimum thresholding, then pass through brightness of image G after the linear transformation (x y) is:

G (x, y) = \{\begin{matrix} 0 & f (x, y) \leq Min [f (x, y)] \\ \frac{f (x, y) - Min [f (x, y)]}{Max [f (x, y)] - Min [f (x, y)]} \times 256 & Min [f (x, y)] < f (x, y) < Max [f (x, y)] \\ 255 & f (x, y) &GreaterEqual; Max [f (x, y)] \end{matrix},

(2) Gauss's masterplate filtering

Filtering adopts discretize Gauss template that image is carried out convolution, and discretize Gauss's template is as follows:

3. the method for real time tracking at individual human face in the complex scene according to claim 1 is characterized in that the described specific implementation method of utilizing features of skin colors to carry out the detection of people's face is as follows:

Digital picture is as follows to the conversion formula of YCrCb color space by RGB, wherein R, G, B, Y, C _r, C _bThe pixel value of difference represent pixel in corresponding color channel:

\{\begin{matrix} Y = 0.299 \times R + 0.587 \times G + 0.114 \times B \\ Cr = R - Y \\ Cb = B - Y \end{matrix},

Utilize following formula afterwards, the pixel value that satisfies condition is set to 255, otherwise pixel value is set to 0, obtains people's face Preliminary detection result with this.C wherein _rRepresent the pixel value of digital picture in respective channel, C _RMax, C _RMinC when defining pixel and belonging to human face region _rThe maximal value that is satisfied, minimum value scope; C _b, C _BMax, C _BMinThe representative implication in like manner.

(C_{rMin} < C_{r} < C_{rMax}) \cap (C_{bMin} < C_{b} < C_{bMax}) .

4. the method for real time tracking at individual human face in the complex scene according to claim 1 is characterized in that the specific implementation method of elimination of described class area of skin color is as follows:

(1) takes all factors into consideration the long-pending and geometric properties elimination class area of skin color of people's face

Each isolated area is asked its area, in digital picture, be the number of pixels in the zone, when region area is just eliminated it less than 100 the time from the Preliminary detection result; Simultaneously, remaining isolated area is calculated long and short axial ratio respectively,, the zone that long and short axial ratio does not change in 1.0～2.3 scopes is further eliminated according to facial characteristics and observation experience;

(2) initial search area determines

Initial search area is determined by following formula:

x_{o} = \frac{Σ_{x} Σ_{y} xI (x, y)}{Σ_{x} Σ_{y} I (x, y)}

y_{o} = \frac{Σ_{x} Σ_{y} yI (x, y)}{Σ_{x} Σ_{y} I (x, y)},

Wherein, (x _O, y _O) be the center of initial search area, (and x, y) representative's face detects the locus of binary image pixel, and (x is that it is at (x, the pixel value of y) locating y) to I; With (x _O, y _O) be the center, the rectangular window of setting one 200 * 100 is as initial search area.