[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109829436B - A Multi-Face Tracking Method Based on Deep Apparent Features and Adaptive Aggregation Networks - Google Patents

A Multi-Face Tracking Method Based on Deep Apparent Features and Adaptive Aggregation Networks Download PDF

Info

Publication number
CN109829436B
CN109829436B CN201910106309.1A CN201910106309A CN109829436B CN 109829436 B CN109829436 B CN 109829436B CN 201910106309 A CN201910106309 A CN 201910106309A CN 109829436 B CN109829436 B CN 109829436B
Authority
CN
China
Prior art keywords
face
frame
feature
target
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910106309.1A
Other languages
Chinese (zh)
Other versions
CN109829436A (en
Inventor
柯逍
郑毅腾
朱敏琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201910106309.1A priority Critical patent/CN109829436B/en
Publication of CN109829436A publication Critical patent/CN109829436A/en
Priority to PCT/CN2019/124966 priority patent/WO2020155873A1/en
Application granted granted Critical
Publication of CN109829436B publication Critical patent/CN109829436B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及一种基于深度表观特征和自适应聚合网络的多人脸跟踪方法,首先采用人脸识别数据集训练自适应聚合网络;接着使用基于卷积神经网络的人脸检测方法获取人脸的位置,初始化待跟踪的人脸目标,提取人脸特征;然后采用卡尔曼滤波器预测每个人脸跟踪目标在下一帧的位置,并在下一帧中再次定位人脸所在位置,对检测出的人脸提取特征;最后使用自适应聚合网络,对每个跟踪的人脸目标跟踪轨迹中的人脸特征集合进行聚合,动态地生成一个融合多帧信息的人脸深度表观特征,结合预测的位置及融合后的特征,与当前帧中通过检测得到的人脸位置及其特征,进行相似度计算与匹配,更新跟踪状态。本发明能够提升人脸跟踪的性能。

Figure 201910106309

The invention relates to a multi-face tracking method based on deep apparent features and an adaptive aggregation network. First, a face recognition data set is used to train an adaptive aggregation network; then a face detection method based on a convolutional neural network is used to obtain the human face. The position of the face target to be tracked is initialized, and the face features are extracted; then the Kalman filter is used to predict the position of each face tracking target in the next frame, and the position of the face is located again in the next frame. Face extraction features; finally, an adaptive aggregation network is used to aggregate the face feature sets in the tracking track of each tracked face target, and dynamically generate a face depth apparent feature fused with multi-frame information. The position and the fused features are calculated and matched with the face position and its features obtained through detection in the current frame, and the tracking status is updated. The present invention can improve the performance of face tracking.

Figure 201910106309

Description

基于深度表观特征和自适应聚合网络的多人脸跟踪方法A Multi-Face Tracking Method Based on Deep Apparent Features and Adaptive Aggregation Networks

技术领域technical field

本发明涉及模式识别与计算机视觉领域,特别是一种基于深度表观特征和自适应聚合网络的多人脸跟踪方法。The invention relates to the field of pattern recognition and computer vision, in particular to a multi-face tracking method based on deep apparent features and an adaptive aggregation network.

背景技术Background technique

近年来,随着社会进步及科技的不断发展,视频人脸识别问题已渐渐成为一个热门研究领域,吸引了国内外众多专家学者的研究兴趣,作为视频人脸识别的入口和基础,人脸检测和跟踪技术得到了快速的发展,广泛应用在智能监控、虚拟现实感知接口、视频会议等领域,由于现实的视频背景是复杂多变的,且人脸作为非刚性目标,在视频序列中可能存在大幅的姿态或表情的变化,在真实场景中实现一个鲁棒的人脸跟踪算法仍然具有很大的挑战。In recent years, with social progress and the continuous development of technology, video face recognition has gradually become a hot research field, attracting the research interest of many experts and scholars at home and abroad. As the entrance and basis of video face recognition, face detection And tracking technology has developed rapidly and is widely used in intelligent monitoring, virtual reality perception interface, video conferencing and other fields. Because the real video background is complex and changeable, and the face is a non-rigid target, it may exist in the video sequence. With large pose or expression changes, it is still a great challenge to implement a robust face tracking algorithm in real scenes.

为了对一个人脸进行分析,我们首先必须捕捉人脸,这可以通过人脸检测技术和人脸跟踪技术来实现,只有在视频中精确定位和跟踪人脸目标,我们才可以对人脸进行更细致的分析,如人脸识别,姿态估计等。目标跟踪技术无疑是智能安防中最为重要的技术之一,人脸跟踪技术便是目前跟踪技术的一种具体应用,其运用跟踪算法处理视频序列中运动的人脸,并保持对这个人脸区域的锁定完成跟踪,该技术在智能安防及视频监控等场景下都具有良好的应用前景。In order to analyze a face, we must first capture the face, which can be achieved through face detection technology and face tracking technology. Only by accurately locating and tracking the face target in the video, we can update the face. Detailed analysis, such as face recognition, pose estimation, etc. Target tracking technology is undoubtedly one of the most important technologies in intelligent security. Face tracking technology is a specific application of current tracking technology. It uses tracking algorithms to process moving faces in video sequences, and keeps track of this face area. This technology has good application prospects in scenarios such as intelligent security and video surveillance.

人脸跟踪在视频监控中扮演着重要的角色,但目前在真实场景中,由于人脸姿态的大幅变化以及跟踪目标之间的重叠与遮挡,导致实际应用还具有较大的困难。Face tracking plays an important role in video surveillance, but currently in real scenes, due to the large changes in face poses and the overlap and occlusion between tracking targets, practical applications are still difficult.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明的目的是提出一种基于深度表观特征和自适应聚合网络的多人脸跟踪方法,能够提升人脸跟踪的性能。In view of this, the purpose of the present invention is to propose a multi-face tracking method based on deep apparent features and an adaptive aggregation network, which can improve the performance of face tracking.

本发明采用以下方案实现:一种基于深度表观特征和自适应聚合网络的多人脸跟踪方法,具体包括以下步骤:The present invention adopts the following scheme to realize: a multi-face tracking method based on deep apparent feature and self-adaptive aggregation network, which specifically includes the following steps:

步骤S1:采用人脸识别数据集训练自适应聚合网络;Step S1: using the face recognition data set to train the adaptive aggregation network;

步骤S2:根据初始的输入视频帧,采用卷积神经网络获取人脸的位置,初始化待跟踪的人脸目标,提取人脸特征并保存;Step S2: According to the initial input video frame, use the convolutional neural network to obtain the position of the face, initialize the face target to be tracked, extract the face features and save;

步骤S3:采用卡尔曼滤波器预测每个人脸目标在下一帧的位置,并在下一帧中再次定位人脸所在位置,并对检测出的人脸提取特征;Step S3: use Kalman filter to predict the position of each face target in the next frame, and locate the position of the face again in the next frame, and extract features for the detected face;

步骤S4:使用步骤S1训练好的自适应聚合网络,对每个跟踪的人脸目标跟踪轨迹中的人脸特征集合进行聚合,动态地生成一个融合多帧信息的人脸深度表观特征,结合预测的位置及融合后的特征,与当前帧中通过检测得到的人脸位置及其特征,进行相似度计算与匹配,更新跟踪状态。Step S4: Use the adaptive aggregation network trained in step S1 to aggregate the face feature set in the tracking track of each tracked face target, and dynamically generate a face deep apparent feature fused with multi-frame information. The predicted position and the fused feature are compared with the detected face position and its features in the current frame, and the similarity is calculated and matched, and the tracking status is updated.

进一步地,步骤S1具体包括以下步骤:Further, step S1 specifically includes the following steps:

步骤S11:收集公开的人脸识别数据集,获得相关人物的图片及姓名;Step S11: collect public face recognition data sets, and obtain pictures and names of relevant persons;

步骤S12:采用融合策略对多个数据集中共有人物的图片进行整合,使用预训练的MTCNN模型进行人脸检测和人脸关键点定位,并应用相似变换进行人脸对齐,同时将训练集中的所有图像都减去其每个通道在训练集上的均值,完成数据预处理,训练自适应聚合网络。Step S12: Integrate pictures of common characters in multiple datasets by using a fusion strategy, use the pre-trained MTCNN model for face detection and face key point location, and apply similarity transformation for face alignment, and at the same time all the training set The images are subtracted from the mean of each channel on the training set, data preprocessing is done, and the adaptive aggregation network is trained.

进一步地,所述自适应聚合网络由深度特征抽取模块和自适应特征聚合模块串联而成,其接受同一个人的一张或多张人脸图像作为输入,输出聚合后的特征,其中深度特征抽取模块采用34层的ResNet作为骨干网络,自适应特征聚合模块含有一个特征聚合层;令B表示输入的样本数量,{zt}表示深度特征抽取模块的输出特征集合,其中t=1,2,...,B表示输入样本编号,特征聚合层的计算方式为:Further, the adaptive aggregation network is formed by a deep feature extraction module and an adaptive feature aggregation module in series, which accepts one or more face images of the same person as input, and outputs the aggregated features, wherein the deep feature extraction The module adopts 34 layers of ResNet as the backbone network, and the adaptive feature aggregation module contains a feature aggregation layer; let B represent the number of input samples, {z t } represents the output feature set of the deep feature extraction module, where t=1, 2, ...,B represents the input sample number, and the calculation method of the feature aggregation layer is:

Figure BDA0001966809980000021
Figure BDA0001966809980000021

Figure BDA0001966809980000022
Figure BDA0001966809980000022

a=∑totzta=∑ t o t z t ;

式中,q表示特征向量zt各个分量的权重,是可以学习的参数,通过将人脸识别信号作为监督信号,利用反向传播和梯度下降方法进行学习,vt为sigmoid函数的输出,代表每个特征向量zt的分数,范围在0和1之间,ot为L1归一化的输出,使得∑tot=1,a为B个特征向量聚合后的一个特征向量。In the formula, q represents the weight of each component of the feature vector z t , which is a parameter that can be learned. By using the face recognition signal as the supervision signal, the back-propagation and gradient descent methods are used for learning, and v t is the output of the sigmoid function, representing The score of each eigenvector z t , ranging between 0 and 1, o t is the output of L1 normalization, so that ∑ t o t =1, a is an eigenvector after B eigenvectors are aggregated.

进一步地,步骤S2具体包括以下步骤:Further, step S2 specifically includes the following steps:

步骤S21:令i表示输入视频的第i帧的编号,初始时i=1,使用预训练的MTCNN 模型同时检测所有人脸的位置Di及其对应的面部关键点的位置Ci,其中

Figure BDA0001966809980000031
j为第j个检测到人脸的编号,Ji为第帧检测到的人脸数量,
Figure BDA0001966809980000032
其中
Figure BDA0001966809980000033
表示第i帧中第j个人脸的位置,x,y,w,h分别表示人脸区域的左上角坐标及其宽度和高度,
Figure BDA0001966809980000034
其中
Figure BDA0001966809980000035
表示第i帧中第 j个人脸的关键点,c1,c2,c3,c4,c5分别表示人脸的左眼,右眼,鼻子,左嘴角,右嘴角的坐标;Step S21: Let i represent the number of the ith frame of the input video, initially i=1, use the pre-trained MTCNN model to simultaneously detect the position D i of all faces and the position C i of the corresponding facial key points, wherein
Figure BDA0001966809980000031
j is the number of the jth detected face, J i is the number of detected faces in the jth frame,
Figure BDA0001966809980000032
in
Figure BDA0001966809980000033
Represents the position of the jth face in the ith frame, x, y, w, h represent the upper left corner coordinates of the face area and its width and height, respectively,
Figure BDA0001966809980000034
in
Figure BDA0001966809980000035
Represents the key points of the jth face in the ith frame, c 1 , c 2 , c 3 , c 4 , and c 5 respectively represent the coordinates of the left eye, right eye, nose, left corner of the mouth, and right corner of the mouth;

步骤S22:对于每一个人脸的位置Dj i及其面部关键点坐标

Figure BDA0001966809980000036
为其分配一个唯一的身份IDk,k=1,2,...,Ki,其中k表示第k个跟踪目标的编号,Ki表示在第i帧时跟踪目标的人数,并初始化其对应的跟踪器Tk={IDk,Pk,Lk,Ek,Ak},其中IDk表示第k 个跟踪目标的唯一身份标识,Pk表示分配给第k个目标的人脸位置坐标,Lk表示第 k个目标的面部关键点坐标,Ek表示第k个目标的人脸特征列表,Ak表示第k个目标的生命周期,初始化Ki=Ji
Figure BDA0001966809980000037
Ak=1;Step S22 : For each face position D ji and its facial key point coordinates
Figure BDA0001966809980000036
It is assigned a unique identity ID k , k=1,2,...,K i , where k represents the number of the k-th tracking target, K i represents the number of people tracking the target at the i-th frame, and initializes its The corresponding tracker T k ={ID k ,P k ,L k ,E k ,A k }, where ID k represents the unique identification of the k-th tracked target, and P k represents the face assigned to the k-th target Position coordinates, L k represents the facial key point coordinates of the k-th target, E k represents the face feature list of the k-th target, A k represents the life cycle of the k-th target, initialization K i =J i ,
Figure BDA0001966809980000037
A k = 1;

步骤S23:对于Tk中的每一个人脸的位置Pk,对图像进行裁剪,得到对应的人脸图像,使用对应的面部关键点位置Lk,应用相似变换进行人脸对齐,得到对齐后的人脸图像;Step S23: For the position P k of each face in T k , crop the image to obtain the corresponding face image, use the corresponding face key point position L k , apply the similarity transformation to align the face, and obtain the aligned face image. face image;

步骤S24:将对齐后的人脸图像输入自适应聚合网络,得到对应的人脸深度表观特征,添加到跟踪器中Tk的特征列表EkStep S24: Input the aligned face image into the adaptive aggregation network to obtain the corresponding deep apparent feature of the face, and add it to the feature list E k of T k in the tracker.

进一步地,步骤S3具体包括以下步骤:Further, step S3 specifically includes the following steps:

步骤S31:将每个跟踪的人脸目标状态表示为以下形式:Step S31: Represent the state of each tracked face target in the following form:

Figure BDA0001966809980000038
Figure BDA0001966809980000038

式中,m表示跟踪的人脸目标状态,u和v表示跟踪人脸区域的中心坐标,s为人脸框的面积,r为人脸框的宽高比,

Figure BDA0001966809980000041
分别表示(u,v,s,r)在图像坐标空间中的速度;In the formula, m represents the state of the tracked face target, u and v represent the center coordinates of the tracked face area, s is the area of the face frame, r is the aspect ratio of the face frame,
Figure BDA0001966809980000041
respectively represent the speed of (u, v, s, r) in the image coordinate space;

步骤S32:将每个跟踪器Tk中的人脸位置Pk=(x,y,w,h)转化为

Figure BDA0001966809980000042
的形式,其中
Figure BDA0001966809980000043
表示第i帧中第k个跟踪目标的人脸位置转化后的形式;Step S32: Convert the face position P k =(x, y, w, h) in each tracker T k into
Figure BDA0001966809980000042
in the form of
Figure BDA0001966809980000043
Represents the transformed form of the face position of the k-th tracking target in the i-th frame;

步骤S33:将

Figure BDA0001966809980000044
作为第i帧第k个跟踪目标的直接观测结果,其由人脸检测而来,采用基于线性匀速运动模型的卡尔曼滤波器对第k个跟踪目标在第i+1帧中的状态
Figure BDA0001966809980000045
进行预测;Step S33: put
Figure BDA0001966809980000044
As the direct observation result of the k-th tracking target in the i-th frame, it is obtained from face detection, and the Kalman filter based on the linear uniform motion model is used to analyze the state of the k-th tracking target in the i+1-th frame.
Figure BDA0001966809980000045
make predictions;

步骤S34:在第i+1帧中,采用MTCNN模型再次进行人脸检测与面部关键点定位,得到人脸的位置Di+1和面部关键点Ci+1Step S34: in the i+1th frame, the MTCNN model is used to perform face detection and facial key point positioning again to obtain the position D i+1 of the human face and the facial key point C i+1 ;

步骤S35:对每一个人脸位置

Figure BDA0001966809980000046
基于其面部关键点
Figure BDA0001966809980000047
应用相似变换完成人脸对齐,并输入自适应聚合网络提取特征,得到特征集合Fi+1,其中Fi+1表示第i+1 帧中所有人脸的特征集合。Step S35: for each face position
Figure BDA0001966809980000046
based on its facial key points
Figure BDA0001966809980000047
Apply similarity transformation to complete face alignment, and input adaptive aggregation network to extract features to obtain feature set F i+1 , where F i+1 represents the feature set of all faces in the i+1th frame.

进一步地,步骤S4具体包括以下步骤:Further, step S4 specifically includes the following steps:

步骤S41:对于每个人脸的跟踪器Tk,将其历史运动轨迹中所有特征的集合Ek输入自适应聚合网络,得到聚合特征fk,其中fk表示将第k个跟踪目标历史运动轨迹中所有特征向量进行融合之后输出的一个聚合特征;Step S41: For the tracker T k of each face, input the set E k of all the features in its historical motion trajectory into the adaptive aggregation network to obtain the aggregated feature f k , where f k represents the historical motion trajectory of the kth tracking target. An aggregated feature output after fusion of all feature vectors in ;

步骤S42:将第i帧中由卡尔曼滤波器预测的第k个目标在下一帧的位置状态

Figure BDA0001966809980000048
转化为
Figure BDA0001966809980000049
的形式;Step S42: Calculate the position state of the k-th target predicted by the Kalman filter in the i-th frame in the next frame
Figure BDA0001966809980000048
transform into
Figure BDA0001966809980000049
form;

步骤S43:结合

Figure BDA00019668099800000410
和目标k聚合后的特征fk,以及第i+1帧中的由人脸检测得到的人脸位置Di+1及其特征集合Fi+1,计算如下关联矩阵:Step S43: Combine
Figure BDA00019668099800000410
The feature f k aggregated with the target k, and the face position D i+1 and its feature set F i+1 obtained by face detection in the i+1th frame, calculate the following correlation matrix:

G=[gjk],j=1,2,...,Ji+1,k=1,2,...,KiG=[g jk ], j=1,2,...,J i+1 ,k=1,2,...,K i ;

式中,Ji+1为第i+1帧中检测到的人脸数量,Ki为第i帧中的跟踪目标数量,

Figure BDA0001966809980000051
Figure BDA0001966809980000052
为第i+1帧中第j个人脸检测框与第i帧中由卡尔曼滤波器预测的第k个目标在第i+1帧中的位置状态
Figure BDA0001966809980000053
之间的重合程度,
Figure BDA0001966809980000054
为第i+1帧中第j个人脸特征
Figure BDA0001966809980000055
与第i帧中第k个目标聚合特征fk之间的余弦相似度,λ为超参数,用于平衡两个度量的权重;In the formula, J i+1 is the number of faces detected in the i+1th frame, K i is the number of tracked targets in the i-th frame,
Figure BDA0001966809980000051
Figure BDA0001966809980000052
is the position state of the jth face detection frame in the i+1th frame and the kth target predicted by the Kalman filter in the ith frame in the i+1th frame
Figure BDA0001966809980000053
the degree of overlap between
Figure BDA0001966809980000054
is the jth face feature in the i+1th frame
Figure BDA0001966809980000055
Cosine similarity with the k-th target aggregated feature f k in the i-th frame, λ is a hyperparameter used to balance the weights of the two metrics;

步骤S44:将关联矩阵G作为代价矩阵,使用匈牙利算法计算得到匹配的结果,将第i+1帧中的人脸检测框

Figure BDA0001966809980000056
关联到第k个跟踪目标;Step S44: Using the correlation matrix G as the cost matrix, the Hungarian algorithm is used to calculate the matching result, and the face detection frame in the i+1th frame is
Figure BDA0001966809980000056
Associated with the kth tracking target;

步骤S45:将匹配结果中的下标对应关联矩阵G中的项,并过滤所有小于Tsimilarity的项gjk,将其从匹配结果中删除,其中Tsimilarity为设定的超参数,表示匹配成功的最低相似度阈值;Step S45: Correspond the subscript in the matching result to the item in the association matrix G, and filter all items g jk less than T similarity , and delete it from the matching result, where T similarity is a set hyperparameter, indicating that the matching is successful The minimum similarity threshold of ;

步骤S46:在匹配结果中,若检测框

Figure BDA0001966809980000057
与第k个跟踪目标关联成功,则更新对应跟踪器Tk中的位置状态
Figure BDA0001966809980000058
人脸关键点位置
Figure BDA0001966809980000059
生命周期Ak=Ak+1,以及将对应的人脸特征
Figure BDA00019668099800000510
添加到特征列表Ek,若检测框
Figure BDA00019668099800000511
关联失败,则创建新的跟踪器;Step S46: In the matching result, if the detection frame is
Figure BDA0001966809980000057
If the association with the k-th tracking target is successful, the position state in the corresponding tracker T k is updated
Figure BDA0001966809980000058
Face key point location
Figure BDA0001966809980000059
Life cycle A k =A k +1, and the corresponding face features
Figure BDA00019668099800000510
add to the feature list E k , if the detection frame
Figure BDA00019668099800000511
If the association fails, create a new tracker;

步骤S47:对每一个跟踪器Tk,若其生命周期Ak>Tage,则删除该跟踪器,其中Tage为设定的超参数,表示一个跟踪目标可以存活的最长时间。Step S47: For each tracker T k , if its life cycle A k >T age , delete the tracker, where T age is a set hyperparameter, indicating the longest time a tracking target can survive.

与现有技术相比,本发明有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

1、本发明构建的一种基于深度表观特征和自适应聚合网络的多人脸跟踪方法能够有效地对视频中的人脸进行跟踪,提升了人脸跟踪的准确率,并降低了目标切换的次数。1. A multi-face tracking method based on deep apparent features and an adaptive aggregation network constructed by the present invention can effectively track the faces in the video, improve the accuracy of face tracking, and reduce target switching. number of times.

2、本发明能够在保证跟踪效果的同时对视频中的人脸进行在线跟踪。2. The present invention can track the human face in the video online while ensuring the tracking effect.

3、针对人脸跟踪过程中,预测的人脸位置不确定性较大,同时人脸可能发生大幅姿态变化以及遮挡等问题,本发明提出了利用人脸深度表观特征的方法,通过结合空间位置与深度特征之间的信息,提高了人脸跟踪的性能。3. In the face tracking process, the predicted position of the face has a large uncertainty, and the face may undergo large posture changes and occlusions. The information between position and depth features improves the performance of face tracking.

4、针对人脸跟踪过程中,难以有效利用同一目标跟踪轨迹中的所有特征,并将多个特征集合之间进行有效比较的问题,本发明提出了自适应聚合网络,通过特征聚合模块自适应地学习特征集合中每一个特征的重要程度并有效地进行融合,提升了人脸跟踪的效果。4. In the face tracking process, it is difficult to effectively utilize all the features in the same target tracking trajectory and effectively compare multiple feature sets. It learns the importance of each feature in the feature set and fuses it effectively, which improves the effect of face tracking.

附图说明Description of drawings

图1为本发明实施例的流程示意图。FIG. 1 is a schematic flowchart of an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.

应该指出,以下详细说明都是示例性的,旨在对本申请提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the application. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和 /或它们的组合。It should be noted that the terminology used herein is for the purpose of describing specific embodiments only, and is not intended to limit the exemplary embodiments according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural as well, furthermore, it is to be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates that There are features, steps, operations, devices, components and/or combinations thereof.

如图1所示,本实施例提供了一种基于深度表观特征和自适应聚合网络的多人脸跟踪方法,具体包括以下步骤:As shown in FIG. 1 , this embodiment provides a multi-face tracking method based on deep apparent features and an adaptive aggregation network, which specifically includes the following steps:

步骤S1:采用人脸识别数据集训练自适应聚合网络;Step S1: using the face recognition data set to train the adaptive aggregation network;

步骤S2:根据初始的输入视频帧,使用基于卷积神经网络的人脸检测方法获取人脸的位置,初始化待跟踪的人脸目标,提取人脸特征并保存;Step S2: according to the initial input video frame, use the face detection method based on convolutional neural network to obtain the position of the face, initialize the face target to be tracked, extract the face features and save;

步骤S3:采用卡尔曼滤波器预测每个人脸目标在下一帧的位置,并在下一帧中再次使用人脸检测方法定位人脸所在位置,并对检测出的人脸提取特征;Step S3: use the Kalman filter to predict the position of each face target in the next frame, and use the face detection method again in the next frame to locate the position of the face, and extract features for the detected face;

步骤S4:使用步骤S1训练好的自适应聚合网络,对每个跟踪的人脸目标跟踪轨迹中的人脸特征集合进行聚合,动态地生成一个融合多帧信息的人脸深度表观特征,结合预测的位置及融合后的特征,与当前帧中通过检测得到的人脸位置及其特征,进行相似度计算与匹配,更新跟踪状态。Step S4: Use the adaptive aggregation network trained in step S1 to aggregate the face feature set in the tracking track of each tracked face target, and dynamically generate a face deep apparent feature fused with multi-frame information. The predicted position and the fused feature are compared with the detected face position and its features in the current frame, and the similarity is calculated and matched, and the tracking status is updated.

在本实施例中,步骤S1具体包括以下步骤:In this embodiment, step S1 specifically includes the following steps:

步骤S11:收集公开的人脸识别数据集,获得相关人物的图片及姓名;Step S11: collect public face recognition data sets, and obtain pictures and names of relevant persons;

步骤S12:采用融合策略对多个数据集中共有人物的图片进行整合,使用预训练的MTCNN模型进行人脸检测和人脸关键点定位,并应用相似变换进行人脸对齐,同时将训练集中的所有图像都减去其每个通道在训练集上的均值,完成数据预处理,训练自适应聚合网络。Step S12: Integrate pictures of common characters in multiple datasets by using a fusion strategy, use the pre-trained MTCNN model for face detection and face key point location, and apply similarity transformation for face alignment, and at the same time all the training set The images are subtracted from the mean of each channel on the training set, data preprocessing is done, and the adaptive aggregation network is trained.

在本实施例中,所述自适应聚合网络由深度特征抽取模块和自适应特征聚合模块串联而成,其接受同一个人的一张或多张人脸图像作为输入,输出聚合后的特征,其中深度特征抽取模块采用34层的ResNet作为骨干网络,自适应特征聚合模块含有一个特征聚合层;令B表示输入的样本数量,{zt}表示深度特征抽取模块的输出特征集合,其中t=1,2,...,B表示输入样本编号,特征聚合层的计算方式为:In this embodiment, the adaptive aggregation network is composed of a deep feature extraction module and an adaptive feature aggregation module in series, which accepts one or more face images of the same person as input, and outputs the aggregated features, wherein The deep feature extraction module uses 34 layers of ResNet as the backbone network, and the adaptive feature aggregation module contains a feature aggregation layer; let B represent the number of input samples, {z t } represents the output feature set of the deep feature extraction module, where t=1 ,2,...,B represents the input sample number, and the calculation method of the feature aggregation layer is:

Figure BDA0001966809980000071
Figure BDA0001966809980000071

Figure BDA0001966809980000072
Figure BDA0001966809980000072

a=∑totzta=∑ t o t z t ;

式中,q表示特征向量zt各个分量的权重,是可以学习的参数,通过将人脸识别信号作为监督信号,利用反向传播和梯度下降方法进行学习,vt为sigmoid函数的输出,代表每个特征向量zt的分数,范围在0和1之间,ot为L1归一化的输出,使得∑tot=1,a为B个特征向量聚合后的一个特征向量。In the formula, q represents the weight of each component of the feature vector z t , which is a parameter that can be learned. By using the face recognition signal as a supervision signal, the back-propagation and gradient descent methods are used for learning, and v t is the output of the sigmoid function, representing The score of each eigenvector z t , ranging between 0 and 1, o t is the output of L1 normalization, so that ∑ t o t =1, a is an eigenvector after B eigenvectors are aggregated.

在本实施例中,步骤S2具体包括以下步骤:In this embodiment, step S2 specifically includes the following steps:

步骤S21:令i表示输入视频的第i帧的编号,初始时i=1,使用预训练的MTCNN 模型同时检测所有人脸的位置Di及其对应的面部关键点的位置Ci,其中

Figure BDA0001966809980000073
j为第j个检测到人脸的编号,Ji为第帧检测到的人脸数量,
Figure BDA0001966809980000081
其中
Figure BDA0001966809980000082
表示第i帧中第j个人脸的位置,x,y,w,h分别表示人脸区域的左上角坐标及其宽度和高度,
Figure BDA0001966809980000083
其中
Figure BDA0001966809980000084
表示第i帧中第 j个人脸的关键点,c1,c2,c3,c4,c5分别表示人脸的左眼,右眼,鼻子,左嘴角,右嘴角的坐标;Step S21: Let i represent the number of the ith frame of the input video, initially i=1, use the pre-trained MTCNN model to simultaneously detect the position D i of all faces and the position C i of the corresponding facial key points, wherein
Figure BDA0001966809980000073
j is the number of the jth detected face, J i is the number of detected faces in the jth frame,
Figure BDA0001966809980000081
in
Figure BDA0001966809980000082
Represents the position of the jth face in the ith frame, x, y, w, h represent the upper left corner coordinates of the face area and its width and height, respectively,
Figure BDA0001966809980000083
in
Figure BDA0001966809980000084
Represents the key points of the jth face in the ith frame, c 1 , c 2 , c 3 , c 4 , and c 5 respectively represent the coordinates of the left eye, right eye, nose, left corner of the mouth, and right corner of the mouth;

步骤S22:对于每一个人脸的位置Dj i及其面部关键点坐标

Figure BDA0001966809980000085
为其分配一个唯一的身份IDk,k=1,2,...,Ki,其中k表示第k个跟踪目标的编号,Ki表示在第i帧时跟踪目标的人数,并初始化其对应的跟踪器Tk={IDk,Pk,Lk,Ek,Ak},其中IDk表示第k 个跟踪目标的唯一身份标识,Pk表示分配给第k个目标的人脸位置坐标,Lk表示第k个目标的面部关键点坐标,Ek表示第k个目标的人脸特征列表,Ak表示第k个目标的生命周期,初始化Ki=Ji
Figure BDA0001966809980000086
Ak=1;Step S22 : For each face position D ji and its facial key point coordinates
Figure BDA0001966809980000085
It is assigned a unique identity ID k , k=1,2,...,K i , where k represents the number of the k-th tracking target, K i represents the number of people tracking the target at the i-th frame, and initializes its The corresponding tracker T k ={ID k ,P k ,L k ,E k ,A k }, where ID k represents the unique identification of the k-th tracked target, and P k represents the face assigned to the k-th target Position coordinates, L k represents the facial key point coordinates of the k-th target, E k represents the face feature list of the k-th target, A k represents the life cycle of the k-th target, initialization K i =J i ,
Figure BDA0001966809980000086
A k = 1;

步骤S23:对于Tk中的每一个人脸的位置Pk,对图像进行裁剪,得到对应的人脸图像,使用对应的面部关键点位置Lk,应用相似变换进行人脸对齐,得到对齐后的人脸图像;Step S23: For the position P k of each face in T k , crop the image to obtain the corresponding face image, use the corresponding face key point position L k , apply the similarity transformation to align the face, and obtain the aligned face image. face image;

步骤S24:将对齐后的人脸图像输入自适应聚合网络,得到对应的人脸深度表观特征,添加到跟踪器中Tk的特征列表EkStep S24: Input the aligned face image into the adaptive aggregation network to obtain the corresponding deep apparent feature of the face, and add it to the feature list E k of T k in the tracker.

在本实施例中,步骤S3具体包括以下步骤:In this embodiment, step S3 specifically includes the following steps:

步骤S31:将每个跟踪的人脸目标状态表示为以下形式:Step S31: Represent the state of each tracked face target in the following form:

Figure BDA0001966809980000087
Figure BDA0001966809980000087

式中,m表示跟踪的人脸目标状态,u和v表示跟踪人脸区域的中心坐标,s为人脸框的面积,r为人脸框的宽高比,

Figure BDA0001966809980000088
分别表示(u,v,s,r)在图像坐标空间中的速度;In the formula, m represents the state of the tracked face target, u and v represent the center coordinates of the tracked face area, s is the area of the face frame, r is the aspect ratio of the face frame,
Figure BDA0001966809980000088
respectively represent the speed of (u, v, s, r) in the image coordinate space;

步骤S32:将每个跟踪器Tk中的人脸位置Pk=(x,y,w,h)转化为

Figure BDA0001966809980000089
的形式,其中
Figure BDA00019668099800000810
表示第i帧中第k个跟踪目标的人脸位置转化后的形式;Step S32: Convert the face position P k =(x, y, w, h) in each tracker T k into
Figure BDA0001966809980000089
in the form of
Figure BDA00019668099800000810
Represents the transformed form of the face position of the k-th tracking target in the i-th frame;

步骤S33:将

Figure BDA0001966809980000091
作为第i帧第k个跟踪目标的直接观测结果,其由人脸检测而来,采用基于线性匀速运动模型的卡尔曼滤波器对第k个跟踪目标在第i+1帧中的状态
Figure BDA0001966809980000092
进行预测;Step S33: put
Figure BDA0001966809980000091
As the direct observation result of the k-th tracking target in the i-th frame, it is obtained from face detection, and the Kalman filter based on the linear uniform motion model is used to analyze the state of the k-th tracking target in the i+1-th frame.
Figure BDA0001966809980000092
make predictions;

步骤S34:在第i+1帧中,采用MTCNN模型再次进行人脸检测与面部关键点定位,得到人脸的位置Di+1和面部关键点Ci+1Step S34: in the i+1th frame, the MTCNN model is used to perform face detection and facial key point positioning again to obtain the position D i+1 of the human face and the facial key point C i+1 ;

步骤S35:对每一个人脸位置

Figure BDA0001966809980000093
基于其面部关键点
Figure BDA0001966809980000094
应用相似变换完成人脸对齐,并输入自适应聚合网络提取特征,得到特征集合Fi+1,其中Fi+1表示第i+1 帧中所有人脸的特征集合。Step S35: for each face position
Figure BDA0001966809980000093
based on its facial key points
Figure BDA0001966809980000094
Apply similarity transformation to complete face alignment, and input adaptive aggregation network to extract features to obtain feature set F i+1 , where F i+1 represents the feature set of all faces in the i+1th frame.

在本实施例中,步骤S4具体包括以下步骤:In this embodiment, step S4 specifically includes the following steps:

步骤S41:对于每个人脸的跟踪器Tk,将其历史运动轨迹中所有特征的集合Ek输入自适应聚合网络,得到聚合特征fk,其中fk表示将第k个跟踪目标历史运动轨迹中所有特征向量进行融合之后输出的一个聚合特征;Step S41: For the tracker T k of each face, input the set E k of all the features in its historical motion trajectory into the adaptive aggregation network to obtain the aggregated feature f k , where f k represents the historical motion trajectory of the kth tracking target. An aggregated feature output after fusion of all feature vectors in ;

步骤S42:将第i帧中由卡尔曼滤波器预测的第k个目标在下一帧的位置状态

Figure BDA0001966809980000095
转化为
Figure BDA0001966809980000096
的形式;Step S42: Calculate the position state of the k-th target predicted by the Kalman filter in the i-th frame in the next frame
Figure BDA0001966809980000095
transform into
Figure BDA0001966809980000096
form;

步骤S43:结合

Figure BDA0001966809980000097
和目标k聚合后的特征fk,以及第i+1帧中的由人脸检测得到的人脸位置Di+1及其特征集合Fi+1,计算如下关联矩阵:Step S43: Combine
Figure BDA0001966809980000097
The feature f k aggregated with the target k, and the face position D i+1 and its feature set F i+1 obtained by face detection in the i+1th frame, calculate the following correlation matrix:

G=[gjk],j=1,2,...,Ji+1,k=1,2,...,KiG=[g jk ], j=1,2,...,J i+1 ,k=1,2,...,K i ;

式中,Ji+1为第i+1帧中检测到的人脸数量,Ki为第i帧中的跟踪目标数量,

Figure BDA0001966809980000098
Figure BDA0001966809980000099
为第i+1帧中第j个人脸检测框与第i帧中由卡尔曼滤波器预测的第k个目标在第i+1帧中的位置状态
Figure BDA00019668099800000910
之间的重合程度,
Figure BDA00019668099800000911
为第i+1帧中第j个人脸特征
Figure BDA00019668099800000912
与第i帧中第k个目标聚合特征fk之间的余弦相似度,λ为超参数,用于平衡两个度量的权重;In the formula, J i+1 is the number of faces detected in the i+1th frame, K i is the number of tracked targets in the i-th frame,
Figure BDA0001966809980000098
Figure BDA0001966809980000099
is the position state of the jth face detection frame in the i+1th frame and the kth target predicted by the Kalman filter in the ith frame in the i+1th frame
Figure BDA00019668099800000910
the degree of overlap between
Figure BDA00019668099800000911
is the jth face feature in the i+1th frame
Figure BDA00019668099800000912
Cosine similarity with the k-th target aggregated feature f k in the i-th frame, λ is a hyperparameter used to balance the weights of the two metrics;

步骤S44:将关联矩阵G作为代价矩阵,使用匈牙利算法计算得到匹配的结果,将第i+1帧中的人脸检测框

Figure BDA0001966809980000101
关联到第k个跟踪目标;Step S44: Using the correlation matrix G as the cost matrix, the Hungarian algorithm is used to calculate the matching result, and the face detection frame in the i+1th frame is
Figure BDA0001966809980000101
Associated with the kth tracking target;

步骤S45:将匹配结果中的下标对应关联矩阵G中的项,并过滤所有小于Tsimilarity的项gjk,将其从匹配结果中删除,其中Tsimilarity为设定的超参数,表示匹配成功的最低相似度阈值;Step S45: Correspond the subscript in the matching result to the item in the association matrix G, and filter all items g jk less than T similarity , and delete it from the matching result, where T similarity is a set hyperparameter, indicating that the matching is successful The minimum similarity threshold of ;

步骤S46:在匹配结果中,若检测框

Figure BDA0001966809980000102
与第k个跟踪目标关联成功,则更新对应跟踪器Tk中的位置状态
Figure BDA0001966809980000103
人脸关键点位置
Figure BDA0001966809980000104
生命周期Ak=Ak+1,以及将对应的人脸特征
Figure BDA0001966809980000105
添加到特征列表Ek,若检测框
Figure BDA0001966809980000106
关联失败,则创建新的跟踪器;Step S46: In the matching result, if the detection frame is
Figure BDA0001966809980000102
If the association with the k-th tracking target is successful, the position state in the corresponding tracker T k is updated
Figure BDA0001966809980000103
Face key point location
Figure BDA0001966809980000104
Life cycle A k =A k +1, and the corresponding face features
Figure BDA0001966809980000105
add to the feature list E k , if the detection frame
Figure BDA0001966809980000106
If the association fails, create a new tracker;

步骤S47:对每一个跟踪器Tk,若其生命周期Ak>Tage,则删除该跟踪器,其中Tage为设定的超参数,表示一个跟踪目标可以存活的最长时间。Step S47: For each tracker T k , if its life cycle A k >T age , delete the tracker, where T age is a set hyperparameter, indicating the longest time a tracking target can survive.

本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/ 或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flows of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

以上所述,仅是本发明的较佳实施例而已,并非是对本发明作其它形式的限制,任何熟悉本专业的技术人员可能利用上述揭示的技术内容加以变更或改型为等同变化的等效实施例。但是凡是未脱离本发明技术方案内容,依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与改型,仍属于本发明技术方案的保护范围。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention in other forms. Any person skilled in the art may use the technical content disclosed above to make changes or modifications to equivalent changes. Example. However, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention without departing from the content of the technical solutions of the present invention still belong to the protection scope of the technical solutions of the present invention.

Claims (2)

1.一种基于深度表观特征和自适应聚合网络的多人脸跟踪方法,其特征在于:包括以下步骤:1. a multi-face tracking method based on deep apparent feature and self-adaptive aggregation network, is characterized in that: comprise the following steps: 步骤S1:采用人脸识别数据集训练自适应聚合网络;Step S1: using the face recognition data set to train the adaptive aggregation network; 步骤S2:根据初始的输入视频帧,采用卷积神经网络获取人脸的位置,初始化待跟踪的人脸目标,提取人脸特征并保存;Step S2: According to the initial input video frame, use the convolutional neural network to obtain the position of the face, initialize the face target to be tracked, extract the face features and save; 步骤S3:采用卡尔曼滤波器预测每个人脸目标在下一帧的位置,并在下一帧中再次定位人脸所在位置,并对检测出的人脸提取特征;Step S3: use Kalman filter to predict the position of each face target in the next frame, and locate the position of the face again in the next frame, and extract features for the detected face; 步骤S4:使用步骤S1训练好的自适应聚合网络,对每个跟踪的人脸目标跟踪轨迹中的人脸特征集合进行聚合,动态地生成一个融合多帧信息的人脸深度表观特征,结合预测的位置及融合后的特征,与当前帧中通过检测得到的人脸位置及其特征,进行相似度计算与匹配,更新跟踪状态;Step S4: Use the adaptive aggregation network trained in step S1 to aggregate the face feature set in the tracking track of each tracked face target, and dynamically generate a face deep apparent feature fused with multi-frame information. The predicted position and the fused features are calculated and matched with the face position and its features obtained through detection in the current frame, and the tracking status is updated; 步骤S1具体包括以下步骤:Step S1 specifically includes the following steps: 步骤S11:收集公开的人脸识别数据集,获得相关人物的图片及姓名;Step S11: collect public face recognition data sets, and obtain pictures and names of relevant persons; 步骤S12:采用融合策略对多个数据集中共有人物的图片进行整合,使用预训练的MTCNN模型进行人脸检测和人脸关键点定位,并应用相似变换进行人脸对齐,同时将训练集中的所有图像都减去其每个通道在训练集上的均值,完成数据预处理,训练自适应聚合网络;Step S12: Integrate pictures of common characters in multiple datasets by using a fusion strategy, use the pre-trained MTCNN model for face detection and face key point location, and apply similarity transformation for face alignment, and at the same time all the training set The average value of each channel on the training set is subtracted from the images, data preprocessing is completed, and the adaptive aggregation network is trained; 步骤S2具体包括以下步骤:Step S2 specifically includes the following steps: 步骤S21:令i表示输入视频的第i帧的编号,初始时i=1,使用预训练的MTCNN模型同时检测所有人脸的位置Di及其对应的面部关键点的位置Ci,其中
Figure FDA0003548592730000011
j为第j个检测到人脸的编号,Ji为第 i 帧 检测到的人脸数量,
Figure FDA0003548592730000012
其中
Figure FDA0003548592730000013
表示第i帧中第j个人脸的位置,x,y,w,h分别表示人脸区域的左上角坐标及其宽度和高度,
Figure FDA0003548592730000014
其中
Figure FDA0003548592730000015
表示第i帧中第j个人脸的关键点,c1,c2,c3,c4,c5分别表示人脸的左眼,右眼,鼻子,左嘴角,右嘴角的坐标;
Step S21: let i represent the number of the ith frame of the input video, initially i=1, use the pre-trained MTCNN model to simultaneously detect the position D i of all faces and the position C i of the corresponding facial key points, wherein
Figure FDA0003548592730000011
j is the number of the j-th detected face, J i is the number of detected faces in the i-th frame,
Figure FDA0003548592730000012
in
Figure FDA0003548592730000013
Represents the position of the jth face in the ith frame, x, y, w, h represent the upper left corner coordinates of the face area and its width and height, respectively,
Figure FDA0003548592730000014
in
Figure FDA0003548592730000015
Represents the key points of the jth face in the ith frame, c 1 , c 2 , c 3 , c 4 , and c 5 respectively represent the coordinates of the left eye, right eye, nose, left corner of the mouth, and right corner of the mouth;
步骤S22:对于每一个人脸的位置
Figure FDA0003548592730000021
及其面部关键点坐标
Figure FDA0003548592730000022
为其分配一个唯一的身份IDk,k=1,2,...,Ki,其中k表示第k个跟踪目标的编号,Ki表示在第i帧时跟踪目标的人数,并初始化其对应的跟踪器Tk={IDk,Pk,Lk,Ek,Ak},其中IDk表示第k个跟踪目标的唯一身份标识,Pk表示分配给第k个目标的人脸位置坐标,Lk表示第k个目标的面部关键点坐标,Ek表示第k个目标的人脸特征列表,Ak表示第k个目标的生命周期,初始化Ki=Ji
Figure FDA0003548592730000023
Ak=1;
Step S22: For the position of each face
Figure FDA0003548592730000021
and its facial keypoint coordinates
Figure FDA0003548592730000022
It is assigned a unique identity ID k , k=1,2,...,K i , where k represents the number of the k-th tracking target, K i represents the number of people tracking the target at the i-th frame, and initializes its The corresponding tracker T k ={ID k ,P k ,L k ,E k ,A k }, where ID k represents the unique identification of the k-th tracking target, and P k represents the face assigned to the k-th target Position coordinates, L k represents the facial key point coordinates of the k-th target, E k represents the face feature list of the k-th target, A k represents the life cycle of the k-th target, initialization K i =J i ,
Figure FDA0003548592730000023
A k = 1;
步骤S23:对于Tk中的每一个人脸的位置Pk,对图像进行裁剪,得到对应的人脸图像,使用对应的面部关键点位置Lk,应用相似变换进行人脸对齐,得到对齐后的人脸图像;Step S23: For the position P k of each face in T k , crop the image to obtain the corresponding face image, use the corresponding face key point position L k , apply the similarity transformation to align the face, and obtain the aligned face image. face image; 步骤S24:将对齐后的人脸图像输入自适应聚合网络,得到对应的人脸深度表观特征,添加到跟踪器中Tk的特征列表EkStep S24: input the aligned face image into the adaptive aggregation network, obtain the corresponding face depth apparent feature, and add it to the feature list E k of T k in the tracker; 步骤S3具体包括以下步骤:Step S3 specifically includes the following steps: 步骤S31:将每个跟踪的人脸目标状态表示为以下形式:Step S31: Represent the state of each tracked face target in the following form:
Figure FDA0003548592730000024
Figure FDA0003548592730000024
式中,m表示跟踪的人脸目标状态,u和v表示跟踪人脸区域的中心坐标,s为人脸框的面积,r为人脸框的宽高比,
Figure FDA0003548592730000025
分别表示(u,v,s,r)在图像坐标空间中的速度;
In the formula, m represents the state of the tracked face target, u and v represent the center coordinates of the tracked face area, s is the area of the face frame, r is the aspect ratio of the face frame,
Figure FDA0003548592730000025
respectively represent the speed of (u, v, s, r) in the image coordinate space;
步骤S32:将每个跟踪器Tk中的人脸位置Pk=(x,y,w,h)转化为
Figure FDA0003548592730000026
的形式,其中
Figure FDA0003548592730000027
表示第i帧中第k个跟踪目标的人脸位置转化后的形式;
Step S32: Convert the face position P k =(x, y, w, h) in each tracker T k into
Figure FDA0003548592730000026
in the form of
Figure FDA0003548592730000027
Represents the transformed form of the face position of the k-th tracking target in the i-th frame;
步骤S33:将
Figure FDA0003548592730000031
作为第i帧第k个跟踪目标的直接观测结果,其由人脸检测而来,采用基于线性匀速运动模型的卡尔曼滤波器对第k个跟踪目标在第i+1帧中的状态
Figure FDA0003548592730000032
进行预测;
Step S33: put
Figure FDA0003548592730000031
As the direct observation result of the k-th tracking target in the i-th frame, it is obtained from face detection, and the Kalman filter based on the linear uniform motion model is used to analyze the state of the k-th tracking target in the i+1-th frame.
Figure FDA0003548592730000032
make predictions;
步骤S34:在第i+1帧中,采用MTCNN模型再次进行人脸检测与面部关键点定位,得到人脸的位置Di+1和面部关键点Ci+1Step S34: in the i+1th frame, the MTCNN model is used to perform face detection and facial key point positioning again to obtain the position D i+1 of the human face and the facial key point C i+1 ; 步骤S35:对每一个人脸位置
Figure FDA0003548592730000033
基于其面部关键点
Figure FDA0003548592730000034
应用相似变换完成人脸对齐,并输入自适应聚合网络提取特征,得到特征集合Fi+1,其中Fi+1表示第i+1帧中所有人脸的特征集合;
Step S35: for each face position
Figure FDA0003548592730000033
based on its facial key points
Figure FDA0003548592730000034
Apply similarity transformation to complete face alignment, and input adaptive aggregation network to extract features to obtain feature set F i+1 , where F i+1 represents the feature set of all faces in the i+1th frame;
步骤S4具体包括以下步骤:Step S4 specifically includes the following steps: 步骤S41:对于每个人脸的跟踪器Tk,将其历史运动轨迹中所有特征的集合Ek输入自适应聚合网络,得到聚合特征fk,其中fk表示将第k个跟踪目标历史运动轨迹中所有特征向量进行融合之后输出的一个聚合特征;Step S41: For the tracker T k of each face, input the set E k of all the features in its historical motion trajectory into the adaptive aggregation network to obtain the aggregated feature f k , where f k represents the historical motion trajectory of the kth tracking target. An aggregated feature output after fusion of all feature vectors in ; 步骤S42:将第i帧中由卡尔曼滤波器预测的第k个目标在下一帧的位置状态
Figure FDA0003548592730000035
转化为
Figure FDA0003548592730000036
的形式;
Step S42: Calculate the position state of the k-th target predicted by the Kalman filter in the i-th frame in the next frame
Figure FDA0003548592730000035
transform into
Figure FDA0003548592730000036
form;
步骤S43:结合
Figure FDA0003548592730000037
和目标k聚合后的特征fk,以及第i+1帧中的由人脸检测得到的人脸位置Di+1及其特征集合Fi+1,计算如下关联矩阵:
Step S43: Combine
Figure FDA0003548592730000037
The feature f k aggregated with the target k, and the face position D i+1 and its feature set F i+1 obtained by face detection in the i+1th frame, calculate the following correlation matrix:
G=[gjk],j=1,2,...,Ji+1,k=1,2,...,KiG=[g jk ], j=1,2,...,J i+1 ,k=1,2,...,K i ; 式中,Ji+1为第i+1帧中检测到的人脸数量,Ki为第i帧中的跟踪目标数量,
Figure FDA0003548592730000038
Figure FDA0003548592730000039
为第i+1帧中第j个人脸检测框与第i帧中由卡尔曼滤波器预测的第k个目标在第i+1帧中的位置状态
Figure FDA00035485927300000310
之间的重合程度,
Figure FDA00035485927300000311
为第i+1帧中第j个人脸特征
Figure FDA00035485927300000312
与第i帧中第k个目标聚合特征fk之间的余弦相似度,λ为超参数,用于平衡两个度量的权重;
In the formula, J i+1 is the number of faces detected in the i+1th frame, K i is the number of tracked targets in the i-th frame,
Figure FDA0003548592730000038
Figure FDA0003548592730000039
is the position state of the jth face detection frame in the i+1th frame and the kth target predicted by the Kalman filter in the ith frame in the i+1th frame
Figure FDA00035485927300000310
the degree of overlap between
Figure FDA00035485927300000311
is the jth face feature in the i+1th frame
Figure FDA00035485927300000312
Cosine similarity with the k-th target aggregated feature f k in the i-th frame, λ is a hyperparameter used to balance the weights of the two metrics;
步骤S44:将关联矩阵G作为代价矩阵,使用匈牙利算法计算得到匹配的结果,将第i+1帧中的人脸检测框
Figure FDA0003548592730000041
关联到第k个跟踪目标;
Step S44: Using the correlation matrix G as the cost matrix, the Hungarian algorithm is used to calculate the matching result, and the face detection frame in the i+1th frame is
Figure FDA0003548592730000041
Associated with the kth tracking target;
步骤S45:将匹配结果中的下标对应关联矩阵G中的项,并过滤所有小于Tsimilarity的项gjk,将其从匹配结果中删除,其中Tsimilarity为设定的超参数,表示匹配成功的最低相似度阈值;Step S45: Correspond the subscript in the matching result to the item in the association matrix G, and filter all items g jk less than T similarity , and delete it from the matching result, where T similarity is a set hyperparameter, indicating that the matching is successful The minimum similarity threshold of ; 步骤S46:在匹配结果中,若检测框
Figure FDA0003548592730000042
与第k个跟踪目标关联成功,则更新对应跟踪器Tk中的位置状态
Figure FDA0003548592730000043
人脸关键点位置
Figure FDA0003548592730000044
生命周期Ak=Ak+1,以及将对应的人脸特征
Figure FDA0003548592730000045
添加到特征列表Ek,若检测框
Figure FDA0003548592730000046
关联失败,则创建新的跟踪器;
Step S46: In the matching result, if the detection frame is
Figure FDA0003548592730000042
If the association with the k-th tracking target is successful, the position state in the corresponding tracker T k is updated
Figure FDA0003548592730000043
Face key point location
Figure FDA0003548592730000044
Life cycle A k =A k +1, and the corresponding face features
Figure FDA0003548592730000045
add to the feature list E k , if the detection frame
Figure FDA0003548592730000046
If the association fails, create a new tracker;
步骤S47:对每一个跟踪器Tk,若其生命周期Ak>Tage,则删除该跟踪器,其中Tage为设定的超参数,表示一个跟踪目标可以存活的最长时间。Step S47: For each tracker T k , if its life cycle A k >T age , delete the tracker, where T age is a set hyperparameter, indicating the longest time a tracking target can survive.
2.根据权利要求1所述的一种基于深度表观特征和自适应聚合网络的多人脸跟踪方法,其特征在于:所述自适应聚合网络由深度特征抽取模块和自适应特征聚合模块串联而成,其接受同一个人的一张或多张人脸图像作为输入,输出聚合后的特征,其中深度特征抽取模块采用34层的ResNet作为骨干网络,自适应特征聚合模块含有一个特征聚合层;令B表示输入的样本数量,{zt}表示深度特征抽取模块的输出特征集合,其中t=1,2,...,B表示输入样本编号,特征聚合层的计算方式为:2. a kind of multi-face tracking method based on deep apparent feature and self-adaptive aggregation network according to claim 1, is characterized in that: described self-adaptive aggregation network is connected in series by deep feature extraction module and self-adaptive feature aggregation module It accepts one or more face images of the same person as input, and outputs aggregated features. The deep feature extraction module uses 34 layers of ResNet as the backbone network, and the adaptive feature aggregation module contains a feature aggregation layer; Let B represent the number of input samples, {z t } represent the output feature set of the deep feature extraction module, where t=1,2,...,B represents the input sample number, and the calculation method of the feature aggregation layer is:
Figure FDA0003548592730000047
Figure FDA0003548592730000047
Figure FDA0003548592730000048
Figure FDA0003548592730000048
a=∑totzta=∑ t o t z t ; 式中,q表示特征向量zt各个分量的权重,是可以学习的参数,通过将人脸识别信号作为监督信号,利用反向传播和梯度下降方法进行学习,vt为sigmoid函数的输出,代表每个特征向量zt的分数,范围在0和1之间,ot为L1归一化的输出,使得∑tot=1,a为B个特征向量聚合后的一个特征向量。In the formula, q represents the weight of each component of the feature vector z t , which is a parameter that can be learned. By using the face recognition signal as the supervision signal, the back-propagation and gradient descent methods are used for learning, and v t is the output of the sigmoid function, representing The score of each eigenvector z t , ranging between 0 and 1, o t is the output of L1 normalization, so that ∑ t o t =1, a is an eigenvector after B eigenvectors are aggregated.
CN201910106309.1A 2019-02-02 2019-02-02 A Multi-Face Tracking Method Based on Deep Apparent Features and Adaptive Aggregation Networks Active CN109829436B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910106309.1A CN109829436B (en) 2019-02-02 2019-02-02 A Multi-Face Tracking Method Based on Deep Apparent Features and Adaptive Aggregation Networks
PCT/CN2019/124966 WO2020155873A1 (en) 2019-02-02 2019-12-13 Deep apparent features and adaptive aggregation network-based multi-face tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910106309.1A CN109829436B (en) 2019-02-02 2019-02-02 A Multi-Face Tracking Method Based on Deep Apparent Features and Adaptive Aggregation Networks

Publications (2)

Publication Number Publication Date
CN109829436A CN109829436A (en) 2019-05-31
CN109829436B true CN109829436B (en) 2022-05-13

Family

ID=66863393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910106309.1A Active CN109829436B (en) 2019-02-02 2019-02-02 A Multi-Face Tracking Method Based on Deep Apparent Features and Adaptive Aggregation Networks

Country Status (2)

Country Link
CN (1) CN109829436B (en)
WO (1) WO2020155873A1 (en)

Families Citing this family (118)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829436B (en) * 2019-02-02 2022-05-13 福州大学 A Multi-Face Tracking Method Based on Deep Apparent Features and Adaptive Aggregation Networks
TWI727337B (en) * 2019-06-06 2021-05-11 大陸商鴻富錦精密工業(武漢)有限公司 Electronic device and face recognition method
CN110490901A (en) * 2019-07-15 2019-11-22 武汉大学 The pedestrian detection tracking of anti-attitudes vibration
CN110414443A (en) * 2019-07-31 2019-11-05 苏州市科远软件技术开发有限公司 A kind of method for tracking target, device and rifle ball link tracking
CN110705478A (en) * 2019-09-30 2020-01-17 腾讯科技(深圳)有限公司 Face tracking method, device, equipment and storage medium
CN111078295B (en) * 2019-11-28 2021-11-12 核芯互联科技(青岛)有限公司 Mixed branch prediction device and method for out-of-order high-performance core
CN111160202B (en) * 2019-12-20 2023-09-05 万翼科技有限公司 Identity verification method, device, equipment and storage medium based on AR equipment
CN111079718A (en) * 2020-01-15 2020-04-28 中云智慧(北京)科技有限公司 Quick face comparison method
CN111275741B (en) * 2020-01-19 2023-09-08 北京迈格威科技有限公司 Target tracking method, device, computer equipment and storage medium
CN111325279B (en) * 2020-02-26 2022-06-10 福州大学 A Pedestrian and Carry-on Sensitive Item Tracking Method Based on Visual Relationship
CN111476826A (en) * 2020-04-10 2020-07-31 电子科技大学 A multi-target vehicle tracking method based on SSD target detection
CN111770299B (en) * 2020-04-20 2022-04-19 厦门亿联网络技术股份有限公司 Method and system for real-time face abstract service of intelligent video conference terminal
CN111553234B (en) * 2020-04-22 2023-06-06 上海锘科智能科技有限公司 Pedestrian tracking method and device integrating facial features and Re-ID feature ordering
CN111914613B (en) * 2020-05-21 2024-03-01 淮阴工学院 Multi-target tracking and facial feature information recognition method
CN112001225B (en) * 2020-07-06 2023-06-23 西安电子科技大学 An online multi-target tracking method, system and application
CN111932588B (en) * 2020-08-07 2024-01-30 浙江大学 A tracking method for airborne UAV multi-target tracking system based on deep learning
CN111784746B (en) * 2020-08-10 2024-05-03 青岛高重信息科技有限公司 Multi-target pedestrian tracking method and device under fish-eye lens and computer system
CN111899284B (en) * 2020-08-14 2024-04-09 北京交通大学 A planar target tracking method based on parameterized ESM network
CN112036271B (en) * 2020-08-18 2023-10-10 汇纳科技股份有限公司 Pedestrian re-identification method, system, medium and terminal based on Kalman filtering
CN111932661B (en) * 2020-08-19 2023-10-24 上海艾麒信息科技股份有限公司 Facial expression editing system and method and terminal
CN112016440B (en) * 2020-08-26 2024-02-20 杭州云栖智慧视通科技有限公司 Target pushing method based on multi-target tracking
CN112215873A (en) * 2020-08-27 2021-01-12 国网浙江省电力有限公司电力科学研究院 A method for tracking and locating multiple targets in a substation
CN112085767B (en) * 2020-08-28 2023-04-18 安徽清新互联信息科技有限公司 Passenger flow statistical method and system based on deep optical flow tracking
CN112053386B (en) * 2020-08-31 2023-04-18 西安电子科技大学 Target tracking method based on depth convolution characteristic self-adaptive integration
CN112257502A (en) * 2020-09-16 2021-01-22 深圳微步信息股份有限公司 A monitoring video pedestrian identification and tracking method, device and storage medium
CN112149557B (en) * 2020-09-22 2022-08-09 福州大学 Person identity tracking method and system based on face recognition
CN112215155B (en) * 2020-10-13 2022-10-14 北京中电兴发科技有限公司 Face tracking method and system based on multi-feature fusion
CN112288773A (en) * 2020-10-19 2021-01-29 慧视江山科技(北京)有限公司 Multi-scale human body tracking method and device based on Soft-NMS
CN112307234A (en) * 2020-11-03 2021-02-02 厦门兆慧网络科技有限公司 Face bottom library synthesis method, system, device and storage medium
CN112287877B (en) * 2020-11-18 2022-12-02 苏州爱可尔智能科技有限公司 Multi-role close-up shot tracking method
CN114639129B (en) * 2020-11-30 2024-05-03 北京君正集成电路股份有限公司 Paper medium living body detection method for access control system
CN113762013B (en) * 2020-12-02 2024-09-24 北京沃东天骏信息技术有限公司 Method and device for face recognition
CN112541418B (en) * 2020-12-04 2024-05-28 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for image processing
CN112560669B (en) * 2020-12-14 2024-07-26 杭州趣链科技有限公司 Face pose estimation method and device and electronic equipment
CN112651994A (en) * 2020-12-18 2021-04-13 零八一电子集团有限公司 Ground multi-target tracking method
CN112668432A (en) * 2020-12-22 2021-04-16 上海幻维数码创意科技股份有限公司 Human body detection tracking method in ground interactive projection system based on YoloV5 and Deepsort
CN112597901B (en) * 2020-12-23 2023-12-29 艾体威尔电子技术(北京)有限公司 Device and method for effectively recognizing human face in multiple human face scenes based on three-dimensional ranging
CN112560874B (en) * 2020-12-25 2024-04-16 北京百度网讯科技有限公司 Training method, device, equipment and medium for image recognition model
CN112653844A (en) * 2020-12-28 2021-04-13 珠海亿智电子科技有限公司 Camera holder steering self-adaptive tracking adjustment method
CN112597944B (en) * 2020-12-29 2024-06-11 北京市商汤科技开发有限公司 Key point detection method and device, electronic equipment and storage medium
CN112669345B (en) * 2020-12-30 2023-10-20 中山大学 Cloud deployment-oriented multi-target track tracking method and system
CN112733642A (en) * 2020-12-30 2021-04-30 广州市高科通信技术股份有限公司 Behavior analysis method and terminal based on prison
CN112581506A (en) * 2020-12-31 2021-03-30 北京澎思科技有限公司 Face tracking method, system and computer readable storage medium
CN112686175B (en) * 2020-12-31 2025-02-14 北京仡修技术有限公司 Face capture method, system and computer readable storage medium
CN112784725B (en) * 2021-01-15 2024-06-07 北京航天自动控制研究所 Pedestrian anti-collision early warning method, device, storage medium and stacker
CN114913386A (en) * 2021-01-29 2022-08-16 北京图森智途科技有限公司 A multi-target tracking model training method and multi-target tracking method
CN113076808B (en) * 2021-03-10 2023-05-26 海纳云物联科技有限公司 Method for accurately acquiring bidirectional traffic flow through image algorithm
CN113158788B (en) * 2021-03-12 2024-03-08 中国平安人寿保险股份有限公司 Facial expression recognition method and device, terminal equipment and storage medium
CN113033439B (en) * 2021-03-31 2023-10-20 北京百度网讯科技有限公司 Method and device for data processing and electronic equipment
CN113158853A (en) * 2021-04-08 2021-07-23 浙江工业大学 Pedestrian's identification system that makes a dash across red light that combines people's face and human gesture
CN113192105B (en) * 2021-04-16 2023-10-17 嘉联支付有限公司 Method and device for indoor multi-person tracking and attitude measurement
CN113096156B (en) * 2021-04-23 2024-05-24 中国科学技术大学 Automatic driving-oriented end-to-end real-time three-dimensional multi-target tracking method and device
CN113158909B (en) * 2021-04-25 2023-06-27 中国科学院自动化研究所 Behavior recognition light-weight method, system and equipment based on multi-target tracking
CN113408348B (en) * 2021-05-14 2022-08-19 桂林电子科技大学 Video-based face recognition method and device and storage medium
CN113377192B (en) * 2021-05-20 2023-06-20 广州紫为云科技有限公司 Somatosensory game tracking method and device based on deep learning
CN113379795B (en) * 2021-05-21 2024-03-22 浙江工业大学 Multi-target tracking and segmentation method based on conditional convolution and optical flow characteristics
CN113269098B (en) * 2021-05-27 2023-06-16 中国人民解放军军事科学院国防科技创新研究院 Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle
CN113313201B (en) * 2021-06-21 2024-10-15 南京挥戈智能科技有限公司 Multi-target detection and ranging method based on Swin transducer and ZED camera
CN113487653B (en) * 2021-06-24 2024-03-26 之江实验室 Self-adaptive graph tracking method based on track prediction
CN113486771B (en) * 2021-06-30 2023-07-07 福州大学 Video action uniformity evaluation method and system based on key point detection
CN113724291B (en) * 2021-07-29 2024-04-02 西安交通大学 Multi-panda tracking method, system, terminal device and readable storage medium
CN113658223B (en) * 2021-08-11 2023-08-04 山东建筑大学 A method and system for multi-pedestrian detection and tracking based on deep learning
CN113807187B (en) * 2021-08-20 2024-04-02 北京工业大学 Unmanned aerial vehicle video multi-target tracking method based on attention feature fusion
CN113688740B (en) * 2021-08-26 2024-02-27 燕山大学 Indoor gesture detection method based on multi-sensor fusion vision
CN113723279B (en) * 2021-08-30 2022-11-01 东南大学 Multi-target tracking acceleration method based on time-space optimization in edge computing environment
CN113920457A (en) * 2021-09-16 2022-01-11 中国农业科学院农业资源与农业区划研究所 Fruit yield estimation method and system based on space and ground information acquisition cooperative processing
CN113723361A (en) * 2021-09-18 2021-11-30 西安邮电大学 Video monitoring method and device based on deep learning
CN113808170B (en) * 2021-09-24 2023-06-27 电子科技大学长三角研究院(湖州) Anti-unmanned aerial vehicle tracking method based on deep learning
CN114022509B (en) * 2021-09-24 2024-06-14 北京邮电大学 Target tracking method based on monitoring video of multiple animals and related equipment
CN113822211B (en) * 2021-09-27 2023-04-11 山东睿思奥图智能科技有限公司 Interactive person information acquisition method
CN113850843A (en) * 2021-09-27 2021-12-28 联想(北京)有限公司 Target tracking method and device, electronic equipment and storage medium
CN113888179B (en) * 2021-10-09 2025-04-01 支付宝(杭州)信息技术有限公司 Image processing method and device
CN113936312B (en) * 2021-10-12 2024-06-07 南京视察者智能科技有限公司 Face recognition base screening method based on deep learning graph convolution network
CN113887494A (en) * 2021-10-21 2022-01-04 上海大学 Real-time high-precision face detection and recognition system for embedded platform
CN114627339B (en) * 2021-11-09 2024-03-29 昆明物理研究所 Intelligent recognition tracking method and storage medium for cross border personnel in dense jungle area
CN114332909B (en) * 2021-11-16 2024-08-23 南京行者易智能交通科技有限公司 Binocular pedestrian recognition method and device under monitoring scene
CN114120188B (en) * 2021-11-19 2024-04-05 武汉大学 Multi-row person tracking method based on joint global and local features
CN115690545B (en) * 2021-12-03 2024-06-11 北京百度网讯科技有限公司 Method and device for training target tracking model and target tracking
CN114339398A (en) * 2021-12-24 2022-04-12 天翼视讯传媒有限公司 Method for real-time special effect processing in large-scale video live broadcast
CN114445766B (en) * 2021-12-29 2024-11-15 中原动力智能机器人有限公司 A method, device and robot for detecting and managing human traffic
CN114419669A (en) * 2021-12-30 2022-04-29 杭州电子科技大学 A real-time cross-camera pedestrian tracking method based on re-identification and orientation awareness
CN114419151B (en) * 2021-12-31 2024-07-26 福州大学 Multi-target tracking method based on contrast learning
CN114663796A (en) * 2022-01-04 2022-06-24 北京航空航天大学 Target person continuous tracking method, device and system
CN114529799A (en) * 2022-01-06 2022-05-24 浙江工业大学 Aircraft multi-target tracking method based on improved YOLOV5 algorithm
CN114359968A (en) * 2022-01-10 2022-04-15 杭州巨岩欣成科技有限公司 Swimming pool drowning prevention multi-camera target tracking method and device, computer equipment and storage medium
CN114529577B (en) * 2022-01-10 2024-09-06 燕山大学 Road side visual angle multi-target tracking method
CN114612823A (en) * 2022-03-06 2022-06-10 北京工业大学 A personnel behavior monitoring method for laboratory safety management
CN114821702A (en) * 2022-03-15 2022-07-29 电子科技大学 A thermal infrared face recognition method based on occlusion face
CN114663835B (en) * 2022-03-21 2024-10-22 合肥工业大学 Pedestrian tracking method, system, equipment and storage medium
CN115214430B (en) * 2022-03-23 2023-11-17 广州汽车集团股份有限公司 Vehicle seat adjusting method and vehicle
CN117178292A (en) * 2022-03-30 2023-12-05 京东方科技集团股份有限公司 Target tracking method, device, system and storage medium
CN114898458A (en) * 2022-04-15 2022-08-12 中国兵器装备集团自动化研究所有限公司 Factory floor number monitoring method, system, terminal and medium based on image processing
CN114782500B (en) * 2022-04-22 2025-02-07 西安理工大学 Karting racing behavior analysis method based on multi-target tracking
CN114821780B (en) * 2022-04-25 2024-12-10 江苏鑫合易家信息技术有限责任公司 System and method for multi-person simultaneous face recognition and motion recognition verification
CN114973070A (en) * 2022-05-09 2022-08-30 四川省人工智能研究院(宜宾) Mask wearing data processing method based on video analysis
CN114972426B (en) * 2022-05-18 2024-11-15 北京理工大学 A single target tracking method based on attention and convolution
CN114882417B (en) * 2022-05-23 2024-10-15 天津理工大学 A lightweight LightDimp single target tracking method based on dimp tracker
CN114897939B (en) * 2022-05-26 2024-12-10 东南大学 Multi-target tracking method and system based on deep path aggregation network
CN114863539B (en) * 2022-06-09 2024-09-24 福州大学 Portrait key point detection method and system based on feature fusion
CN114972445B (en) * 2022-06-10 2025-03-07 沈阳瞻言科技有限公司 A cross-lens person tracking and re-identification method and system
CN115272404B (en) * 2022-06-17 2023-07-18 江南大学 A Multiple Object Tracking Method Based on Kernel Space and Implicit Space Feature Alignment
CN114943924B (en) * 2022-06-21 2024-05-14 深圳大学 Pain assessment method, system, device and medium based on facial expression video
CN114783043B (en) * 2022-06-24 2022-09-20 杭州安果儿智能科技有限公司 Child behavior track positioning method and system
CN115223223A (en) * 2022-07-14 2022-10-21 南京慧安炬创信息科技有限公司 Complex crowd dynamic target identification method and device based on multi-feature fusion
CN115546829A (en) * 2022-09-28 2022-12-30 之江实验室 Pedestrian spatial information sensing method and device based on ZED (zero-energy-dimension) stereo camera
CN115994929A (en) * 2023-03-24 2023-04-21 中国兵器科学研究院 A Multi-Target Tracking Method Fused with Spatial Motion and Appearance Feature Learning
CN116596958B (en) * 2023-07-18 2023-10-10 四川迪晟新达类脑智能技术有限公司 Target tracking method and device based on online sample augmentation
CN117011335B (en) * 2023-07-26 2024-04-09 山东大学 Multi-target tracking method and system based on self-adaptive double decoders
CN117455955B (en) * 2023-12-14 2024-03-08 武汉纺织大学 Pedestrian multi-target tracking method based on unmanned aerial vehicle visual angle
CN117576166B (en) * 2024-01-15 2024-04-30 浙江华是科技股份有限公司 Target tracking method and system based on camera and low-frame-rate laser radar
CN117809054B (en) * 2024-02-29 2024-05-10 南京邮电大学 A multi-target tracking method based on feature decoupling fusion network
CN118366076A (en) * 2024-04-09 2024-07-19 浙江安得仕科技有限公司 Video vector fusion analysis method and system based on deep learning
CN118072000B (en) * 2024-04-17 2024-07-19 中国科学院合肥物质科学研究院 Fish detection method based on novel target recognition algorithm
CN118379608B (en) * 2024-06-26 2024-10-18 浙江大学 A highly robust deepfake detection method based on adaptive learning
CN118522058B (en) * 2024-07-22 2024-09-17 中电桑达电子设备(江苏)有限公司 Object tracking method, system and medium based on face recognition
CN118587758B (en) * 2024-08-06 2024-10-29 杭州登虹科技有限公司 Cross-domain personnel identification matching method and device and electronic equipment
CN118968601B (en) * 2024-10-12 2024-12-27 湘江实验室 Edge calculation-oriented face detection method and system
CN119131871B (en) * 2024-11-11 2025-03-14 霖久智慧(广东)科技有限公司 Face recognition method, device, equipment and storage medium based on dual feature comparison

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8295543B2 (en) * 2007-08-31 2012-10-23 Lockheed Martin Corporation Device and method for detecting targets in images based on user-defined classifiers
CN101216885A (en) * 2008-01-04 2008-07-09 中山大学 A Pedestrian Face Detection and Tracking Algorithm Based on Video
CN101777116B (en) * 2009-12-23 2012-07-25 中国科学院自动化研究所 Method for analyzing facial expressions on basis of motion tracking
US10902243B2 (en) * 2016-10-25 2021-01-26 Deep North, Inc. Vision based target tracking that distinguishes facial feature targets
CN106845385A (en) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 The method and apparatus of video frequency object tracking
CN107292911B (en) * 2017-05-23 2021-03-30 南京邮电大学 Multi-target tracking method based on multi-model fusion and data association
CN107492116A (en) * 2017-09-01 2017-12-19 深圳市唯特视科技有限公司 A kind of method that face tracking is carried out based on more display models
CN107609512A (en) * 2017-09-12 2018-01-19 上海敏识网络科技有限公司 A kind of video human face method for catching based on neutral net
CN108509859B (en) * 2018-03-09 2022-08-26 南京邮电大学 Non-overlapping area pedestrian tracking method based on deep neural network
CN108363997A (en) * 2018-03-20 2018-08-03 南京云思创智信息科技有限公司 It is a kind of in video to the method for real time tracking of particular person
CN109101915B (en) * 2018-08-01 2021-04-27 中国计量大学 Network structure design method for face, pedestrian and attribute recognition based on deep learning
CN109086724B (en) * 2018-08-09 2019-12-24 北京华捷艾米科技有限公司 Accelerated human face detection method and storage medium
CN109829436B (en) * 2019-02-02 2022-05-13 福州大学 A Multi-Face Tracking Method Based on Deep Apparent Features and Adaptive Aggregation Networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于预测的实时人脸特征点定位跟踪算法;翁政魁等;《万方数据知识服务平台期刊库》;20150722;第198-202页 *

Also Published As

Publication number Publication date
WO2020155873A1 (en) 2020-08-06
CN109829436A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
CN109829436B (en) A Multi-Face Tracking Method Based on Deep Apparent Features and Adaptive Aggregation Networks
Shah et al. Multi-view action recognition using contrastive learning
CN110378259A (en) A kind of multiple target Activity recognition method and system towards monitor video
CN102426645B (en) Multi-view and multi-state gait recognition method
Yu et al. Human action recognition using deep learning methods
CN109522853A (en) Face datection and searching method towards monitor video
CN112989889B (en) Gait recognition method based on gesture guidance
CN109472198A (en) A Pose Robust Approach for Video Smiley Face Recognition
Tong et al. Multi-view gait recognition based on a spatial-temporal deep neural network
CN108960078A (en) A method of based on monocular vision, from action recognition identity
CN112149557A (en) Person identity tracking method and system based on face recognition
CN108537181A (en) A kind of gait recognition method based on the study of big spacing depth measure
WO2019153175A1 (en) Machine learning-based occluded face recognition system and method, and storage medium
Raychaudhuri et al. Prior-guided source-free domain adaptation for human pose estimation
CN114639117B (en) Cross-border specific pedestrian tracking method and device
CN113111857A (en) Human body posture estimation method based on multi-mode information fusion
Batool et al. Telemonitoring of daily activities based on multi-sensors data fusion
Uddin et al. A thermal camera-based activity recognition using discriminant skeleton features and rnn
KR102763536B1 (en) Multi-object tracking apparatus and method based on self-supervised learning
CN114360058A (en) Cross-visual angle gait recognition method based on walking visual angle prediction
Li et al. Real-time human action recognition using depth motion maps and convolutional neural networks
CN118658177A (en) A method for re-identification of pedestrians with changed clothes based on deep learning
Wang et al. Thermal infrared object tracking based on adaptive feature fusion
Zahoor et al. Drone-based human surveillance using YOLOv5 and multi-features
THAMRIN et al. Intelligent security system based on biometric face recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant