CN110008844A - A kind of long-term gesture tracking method of KCF merging SLIC algorithm - Google Patents
A kind of long-term gesture tracking method of KCF merging SLIC algorithm Download PDFInfo
- Publication number
- CN110008844A CN110008844A CN201910184848.7A CN201910184848A CN110008844A CN 110008844 A CN110008844 A CN 110008844A CN 201910184848 A CN201910184848 A CN 201910184848A CN 110008844 A CN110008844 A CN 110008844A
- Authority
- CN
- China
- Prior art keywords
- gesture
- target
- foreground
- kcf
- tracking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000007774 longterm Effects 0.000 title claims abstract description 17
- 238000001514 detection method Methods 0.000 claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 19
- 239000013598 vector Substances 0.000 claims description 33
- 230000008569 process Effects 0.000 claims description 14
- 230000004044 response Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 238000013145 classification model Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 4
- 239000003086 colorant Substances 0.000 claims description 2
- 238000005316 response function Methods 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 abstract description 3
- 230000011218 segmentation Effects 0.000 abstract 2
- 238000007689 inspection Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 14
- 239000011159 matrix material Substances 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 241000195940 Bryophyta Species 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/28—Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
- G06V10/507—Summing image-intensity values; Histogram projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of long-term gesture tracking methods of KCF for merging SLIC algorithm, comprising steps of 1) constructing gesture training dataset, extract the SVM model of simultaneously off-line training super-pixel block, obtain the rough segmentation class model of gestures detection;2) foreground-background dictionary is constructed, by combining the similarity function of FHOG feature and CN characteristic Design KNN algorithm, to complete the disaggregated classification of gestures detection;3) gestures detection model is obtained by the disaggregated classification of the rough segmentation class model of gestures detection and gestures detection, using gestures detection model inspection target, obtains the detection block of target gesture;4) designed target scale estimator is used, estimates the rectangle frame of most suitable target gesture;5) confidence level function is designed, determines the whether credible realization gesture tracking of current tracking result by comparing the similarity of present frame and the result of previous frame tracking.Inventive algorithm complexity is low, and tracking accuracy is high, and strong robustness is suitble to real-time application.
Description
Technical Field
The invention relates to a gesture recognition technology, in particular to a KCF long-term gesture tracking method fusing an SLIC algorithm.
Background
Gesture recognition technology has been a focus of research, and gesture tracking is an important part of gesture recognition technology. The gesture tracking is generally classified into two types, namely short-term tracking, namely the movement tracking condition of a target in a short period of time is considered, such as algorithms of KCF, DSST, MOSSE and the like; and secondly, long-term tracking, namely, the target can be tracked well within a long period of time.
The KCF target tracking algorithm is a discriminant correlation filtering algorithm, and generally, in the method, a target detector is trained in the tracking process, the target detector is used for detecting whether the next frame prediction position is a target or not, and then a new detection result is used for updating a training set so as to update the target detector. The KFC target tracking algorithm collects positive and negative samples by using a circulation matrix of a region around a target, trains a target detector by using ridge regression, and successfully converts the operation of the matrix into a Hadamad product of a vector by using diagonalization property of the circulation matrix in a Fourier space, namely, element dot multiplication, thereby greatly reducing the operation amount and improving the operation speed. For the situation of nonlinearity, the KFC target tracking algorithm maps the ridge regression of a linear space to a nonlinear space through a sum function, solves a dual problem and some common constraints in the nonlinear space, and also simplifies the calculation by utilizing the diagonalization property of a circulant matrix Fourier space.
The KCF algorithm is a better real-time algorithm to some extent, but it still has the following problems:
1. the KCF algorithm can not be changed in a self-adaptive manner depending on a cyclic matrix and an initialization matrix thereof, so that the KCF algorithm is not ideal for multi-scale target tracking;
2. the KCF algorithm has a defect in tracking ability for a high-speed moving target and a target in a low frame rate, and the reason is that the displacement of the target between adjacent frames is too large and exceeds the search range of the KCF algorithm;
3. the KCF algorithm has difficulty in continuing to track the target after the target is occluded for several frames.
Disclosure of Invention
Aiming at the technical problems, the invention aims to provide a KCF long-term gesture tracking method fused with an SLIC algorithm.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
a KCF long-term gesture tracking method fused with an SLIC algorithm comprises the following steps:
1) constructing a gesture training data set, extracting superpixel blocks of the picture through a SLIC algorithm, and training an SVM model of the superpixel blocks in an off-line mode to obtain a coarse classification model of gesture detection;
2) extracting the foreground and the background of various gesture pictures from the gesture training data set, constructing a foreground-background dictionary, and designing a similarity function of a KNN algorithm by combining FHOG characteristics and CN characteristics so as to complete fine classification of gesture detection;
3) obtaining a gesture detection model through the coarse classification model of the gesture detection and the fine classification of the gesture detection, and detecting a target by using the gesture detection model to obtain a detection frame of the target gesture; initializing a KCF filter by using a detection box of the target gesture, and then estimating the target gesture of the next frame by using the KCF filter, wherein the KCF filter takes FHOG characteristics and CN characteristics as input;
4) estimating an optimal rectangular box of the target gesture by using a designed target scale estimator, wherein the target scale estimator adopts FHOG characteristics and CN characteristics as input;
5) determining whether the current tracking result is credible by comparing the similarity of the tracking results of the current frame and the previous frame by combining a confidence function designed by a perceptual hash algorithm, the FHOG characteristic cosine similarity and the color statistical characteristic cosine similarity, and if the confidence is greater than a threshold, identifying the next frame by adopting the current tracking result, and repeating the steps 3) to 5); if the confidence coefficient is smaller than the threshold value, abandoning the current tracking result, detecting the current frame by using the gesture detection model, taking the detection result as the current tracking result, re-initializing the KCF tracker, repeating the steps 3) to 5), and finally updating the foreground-background dictionary by using the current frame recognition result.
Compared with the prior art, the invention has the following advantages:
1. combining with SLIC algorithm, generating a superpixel block, extracting features on the basis of the superpixel block, using svm for rough division, and subdividing by KNN under a foreground-background dictionary, thereby realizing multi-scale detection;
2. by combining the perceptual hash algorithm, the FHOG characteristic cosine similarity and the color statistical characteristic cosine similarity, a confidence function is designed, and whether the current result is credible or not is judged by comparing the similarity of the current frame tracking result and the previous frame tracking result, so that the loss of a tracking target is avoided;
3. the HOG characteristic and the color statistical characteristic are extracted from the superpixel block, the HOG characteristic and the color statistical characteristic have invariance to illumination, scale and the like, and the HOG characteristic and the color statistical characteristic have invariance to non-rigid deformation, rotation and rapid movement, and are complementary to each other, so that the characteristics have better robustness;
4. the KCF position estimator and the scale estimator adopt FHOG + CN characteristics, have better robustness on the gesture, and adopt the multi-scale estimator to be well adapted to the change of the target scale.
Drawings
Fig. 1 shows a schematic flow diagram of an embodiment of the invention.
Fig. 2 shows a flow chart of the KNN-foreground-background dictionary algorithm according to an embodiment of the present invention.
FIG. 3 shows a flow chart of a foreground-background dictionary update algorithm of an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
as shown in fig. 1, a KCF long-term gesture tracking method fused with SLIC algorithm includes the following steps:
the method comprises the following steps: and constructing a gesture training data set, extracting a superpixel block of the picture through a SLIC algorithm, and training an SVM model of the superpixel block in an off-line manner to obtain a rough classification model for gesture detection.
Specifically, the SLIC algorithm is a superpixel generation algorithm, which is a learning algorithm based on a clustering mode, and specifically comprises the following steps:
1. seed point initialization (cluster center): and uniformly distributing the seed points in the image according to the set number of the super pixels. Assuming that a picture has N pixel points in total, the picture is pre-divided into K super pixels with the same size, the size of each super pixel is N/K, the distance (step length) between adjacent seed points is approximately S ═ sqrt (N/K), and sqrt (.) represents solving a square root;
2. the seed point is reselected within n x n neighborhood of the seed point (typically, n is 3). The specific method comprises the following steps: calculating gradient values of all pixel points in the neighborhood, and moving the seed point to the place with the minimum gradient in the neighborhood;
3. each pixel point is assigned a class label (i.e., to which cluster center) in the neighborhood around each seed point. Unlike standard k-means search through the entire graph, SLIC search range is limited to 2S x 2S, algorithm convergence can be accelerated, desired superpixel size is S x S, but search range is 2S x 2S;
4. a distance measure. Including color distance and spatial distance. For each searched pixel point, the distance between the pixel point and the seed point is calculated respectively. The distance calculation method is as follows:
where dc represents the color distance, ds represents the spatial distance, and Ns is the maximum spatial distance within a class, defined as Ns-sqrt (N/K), and is applicable to each cluster. The maximum color distance Nc is different from picture to picture and from cluster to cluster, so we replace it with a fixed constant m (value range [1,40], generally 10). The final distance measure D' is as follows:
each pixel point is searched by a plurality of seed points, so that each pixel point has a distance with the surrounding seed points, and the seed point corresponding to the minimum value is taken as the clustering center of the pixel point;
5. and (5) performing iterative optimization. Theoretically, the steps are iterated continuously until the error is converged (the fact that the clustering center of each pixel point is not changed can be understood), practice shows that 10 iterations can obtain a relatively ideal effect on most pictures, and therefore the general iteration number is 10;
6. and enhancing connectivity. And (3) newly building a mark table, wherein the elements in the table are all-1, the discontinuous superpixels and the oversize superpixels are redistributed to the adjacent superpixels according to the Z-shaped trend (from left to right and from top to bottom), and the traversed pixel points are distributed to the corresponding labels until all the points are traversed.
Specifically, a super-pixel block of a picture to be detected is obtained through an SLIC algorithm, and the current picture to be detected is assumed to be the T-th frame, s (r, T) is the r-th super-pixel of the T-th frame, and Tt={Xt,Yt,Wt,HtIs a gesture target frame in the t-th frame image, and Xt,YtIs the gesture target center coordinate, { Wt,HtThe length and width of the gesture target. The superpixels that coincide with the target frame are marked as foreground, and the rest are marked as background. The label of the r-th super-pixel can be expressed as:
after the super pixels are obtained, the labels of the super pixel blocks are marked according to the formula, and the HOG characteristic and the color statistical characteristic of each super pixel block are extracted:
since the number of pixels of different superpixel blocks is not always the same, assume that the number of pixels of the r-th superpixel block s (r, t) of the first frame is nums(r,t)Taking the statistic bin of the HOG features as 18, regarding a super-pixel block as a unit cell, calculating the gradient of each pixel in the cell, and counting the number of the gradients of the pixels in the cell falling into each bin, so that the HOG features obtained by one super-pixel block are 18-dimensional vectors VecHs(r,t)The HOG features were normalized as follows:
N_VecHs(r,t)=VecHs(r,t)/||VecHs(r,t)||/nums(r,t)
before the HOG features are extracted, the image is optically corrected using gamma algorithm and grayed.
The image gradient within a superpixel cell is calculated as follows:
wherein G isxIs a gradient in the horizontal direction, and GyIs the gradient in the vertical direction, G (x, y) is the gradient of the cell,is its phase angle;
for the color statistical characteristics, the image is kept in an RGB mode, r, g, b components of the RGB image can be regularly divided into 64 parts, and values of r, g, b in the image are all (0, 255), so that:
wherein,to round down, and rdiv、gdivAnd bdivThe r, g and b components are respectively taken as block values;
establishing a statistical array count [64], carrying out statistics on 64 sections divided by r, g and b, wherein the corresponding index is as follows:
index=rdiv*4*4+gdiv*4+bdiv
=>count[index];
thus, by counting the number of colors, a 64-dimensional vector VecC can be obtaineds(r,t)It was normalized as follows:
N_VecCs(r,t)=VecCs(r,t)/||VecCs(r,t)||/nums(r,t)
then, the HOG features and the color statistical features are concatenated to obtain the final features:
Vecs(r,t)=[N_VecHs(r,t),N_VecCs(r,t)]
and finally, combining the final characteristics of the super pixels and the labels into a training sample set dataSet of the svm classifier { Vec ═ Vecs(r,t)And l (r, t) }, sending the sample set into an svm classifier, and training to obtain a parameter model of the svm classifier, wherein the svm classifier adopts a Gaussian kernel.
The svm classifier is specifically as follows:
for a hyperplane:
wherein,is weight, b is bias, phi (-) is a nonlinear function, and x is a feature input;
the following constraint problem is solved:
wherein y is a category label, and N is the number of samples;
the method adopts a Lagrange multiplier method to obtain:
wherein,is a lagrange multiplier.
The hyperplane may become:
wherein K (x)i,x)=<φ(xi)·φ(x)>Is a kernel function;
whereinSolving by the following dual problem:
s.t.αi≥0,i=1,...,N
the above problem can be solved by SMO algorithm.
Step two: and extracting the foreground and the background of various gesture pictures from the gesture training data set, constructing a foreground-background dictionary, and designing a similarity function of a KNN algorithm by combining FHOG characteristics and CN characteristics so as to complete fine classification of gesture detection.
Specifically, after gamma correction and graying are carried out on a picture to be detected, the FHOG characteristic extraction step is as follows:
1. extracting 9-dimensional HOG features, defining cells as units, for example, defining the cells as 4 x 4 pixels, and counting the 4 x 4 pixels by adopting a histogram of 9 bins;
2. and (6) normalization truncation, namely performing normalization truncation on the cell vector obtained above. Knowing that C (i, j) is a 9-dimensional feature vector of the (i, j) th cell, the feature vectors adjacent thereto are:
definition of Nβ,γComprises the following steps:
Nβ,γ=(||C(i,j)||2+||C(i+β,j)||2+||C(i+β,j+γ)||2+||C(i,j+γ)||2)
the 4 x 9 dimensional feature vector H (i, j) is then:
3. performing PCA dimension reduction, summing the obtained 4 x 9-dimensional feature vectors H (i, j) according to rows to obtain a 9-dimensional feature vector, summing according to columns to obtain a 4-dimensional feature vector, and splicing into a 13-dimensional feature vector;
4. extracting 18-dimensional HOG features, obtaining an 18-dimensional HOG feature by taking a cell as a unit, then carrying out normalization truncation on the 18-dimensional HOG feature to obtain a 4 x 18-dimensional feature vector, and summing the 4 x 18-dimensional feature vector according to rows to obtain an 18-dimensional feature vector;
5. and serially splicing the 18-dimensional feature vector and the 13-dimensional feature vector to obtain a 31-dimensional FHOG feature vector.
Specifically, when the CN feature is extracted from the picture to be detected, the CN feature maps the color into a 10-dimensional feature vector space, and the extraction steps are as follows:
1. setting the size of the image to be detected as width × height × 3, and dividing r, g, and b of the RGB image into 32 parts, namely:
2. in a designed (32 × 32) × 10-dimensional feature mapping table, mapping each rgb pixel in the image into a 10-dimensional feature vector according to the following index, and finally obtaining the vector dimension of width × height × 10;
index=rdiv*32*32+gdiv*32+bdiv;
3. the width × height × 10 vector is expanded into a feature vector of (width × height × 10) × 1 dimension.
Further, the FHOG features and the CN features are combined in series, and a KNN algorithm is used in a constructed foreground-background dictionary.
Specifically, the KNN algorithm steps are as follows, and the flowchart is shown in fig. 2:
1. the foreground data volume and the background data volume in the established foreground and background dictionary are equal, in the method, only the categories are classified into a foreground category and a background category, the distance between a sample to be detected and the two categories of data is calculated, and the distance function of the KNN adopts the Euclidean distance;
2. sequencing the samples to be tested and the distances between the foreground and the background according to an increasing relationship;
3. selecting K points with the minimum distance;
4. determining the occurrence frequency of the category where the first K points are located;
5. and returning the category with the highest occurrence frequency in the previous K points as the prediction classification of the sample to be detected.
Step three: and obtaining a gesture detection model through the coarse classification model of the gesture detection and the fine classification of the gesture detection, and detecting a target by using the gesture model to obtain a detection frame of the target gesture. Initializing a KCF filter by using a detection box of the target gesture, and then estimating the target gesture of the next frame by using the KCF filter, wherein the KCF filter adopts FHOG characteristics and CN characteristics.
Specifically, the KFC filter is a process of solving a ridge regression function:
wherein, λ is penalty factor, α is weight parameter, y is regression value
1. The training process solves for the fourier transform fft (α) of the parameters α:
fft(α)=fft(y)./(fft(Kxx)+λ));
2. solving detection response in the detection process:
response=ifft(fft(α).*fft(Kxz));
3. solving K of kernel functionxx':
Kxx'=φ(ifft(fft(x).*fft(x')))T
Wherein fft (-) is a Fourier transform, ifft (-) is an inverse Fourier transform, phi (-) is a nonlinear function, and K is a kernel function;
step four: the rectangular box of the most suitable target gesture is estimated using a designed target scale estimator that takes FHOG and CN features as input.
Specifically, the target scale estimator uses a one-dimensional KCF filter, which is a process of solving the following optimal filter:
the method comprises the steps that a set of image blocks are obtained, wherein l belongs to {1, 2.,. d } is a mark for extracting d kinds of image blocks according to different scales near the center of a gesture target of a previous frame of picture, g is a Gaussian response function given according to the distance between each image block and the center of the target, h is a designed scale estimator, and f is a corresponding image feature.
Assuming the frequency responses of H and f as H and G, the above can be solved to obtain a size estimator as:
where F is the frequency response of the image feature F, andfor its conjugate, H is the frequency response of the scale estimator H,and for its conjugate, λ is a penalty factor, d is the number of image blocks extracted, and l ∈ {1, 2.
From the above equation, the following two processes can be obtained:
1. the prediction process of the scale estimator takes the position estimation obtained in the step three as the center, extracts 33 image blocks according to different scales in the picture of the t-th frame, extracts FHOG characteristics and CN characteristics of the image blocks and takes the FHOG characteristics and the CN characteristics as the input of the scale estimator:
wherein ZtFor the FGOG feature and the CN feature of the image block in 33 extracted from the picture of the t-th frame, A and B are two pending parameters, which can be obtained by the following updating, andrepresenting conjugation.
2. After a prediction target is obtained at a current frame, extracting 33 image blocks according to different scales near the center of a gesture target of a picture of a current t-th frame, extracting FHOG characteristics and CN characteristics of the image blocks, and using the FHOG characteristics and the CN characteristics as input of a scale estimator, wherein the scale estimator parameters are updated through the following process;
wherein η is a parameter adjustment factor.
Step five: determining whether the current tracking result is credible by comparing the similarity of the tracking results of the current frame and the previous frame by combining a confidence function designed by a perceptual hash algorithm, FHOG characteristic cosine similarity and color statistical characteristic cosine similarity, if the confidence is greater than a threshold, identifying the next frame by adopting the current tracking result, and repeating the third step to the fifth step; and if the confidence coefficient is smaller than the threshold value, abandoning the current tracking result, detecting the current frame by using the gesture detector, taking the detection result as the current tracking result, re-initializing the KCF tracker, repeating the third step to the fifth step, discarding part of data of the foreground-background dictionary according to a certain random function, and extracting foreground and background data of the current frame as supplement.
Specifically, for the perceptual hash algorithm, the steps are as follows:
1. correcting two pictures to be compared by using a gamma correction algorithm;
2. the interpolation or sampling reset size of the two compared pictures is 16 x 16;
3. carrying out graying treatment on the two pictures with the reset sizes;
4. two 16 x 16 pictures are expanded row by row into 256 dimensional vectors vecHash _ src and vecHash _ dst, and the average pixels vecHash _ src _ avg and vecHash _ dst _ avg for each vector are calculated:
5. comparing the element value of the vector vecHash _ src with the magnitude of vecHash _ src _ avg and comparing the element value of the vector vecHash _ dst with the magnitude of vecHash _ dst _ avg, and encoding the image to obtain vecHash _ src _ code and vecHash _ dst _ code:
vecHash_src_codei=vecHash_srci≥vecHash_src_avg?1:0
vecHash_dst_codei=vecHash_dsti≥vecHash_dst_avg?1:0;
6. calculating the similarity of codes, comparing whether elements in the vecHash _ src _ code and the vecHash _ dst _ code are the same one by one, and marking the same number as similarNum, wherein the similarity of the perceptual Hash algorithm is given by the following formula:
similarPercent=similarNum/256
specifically, given the FHOG feature and color statistics feature vectors featureVec1 and featureVec2 for two pictures, the cosine similarity of the two pictures is calculated as follows:
cosSimilar=featureVec1*featureVec2/(||featureVec1||*||featureVec2||)
the extraction of FHOG features and the extraction of color statistics are the same as the color statistics described in step one and the FHOG feature extraction process described in step two.
In particular, the confidence is calculated in combination with the perceptual hashing algorithm, the FHOG feature cosine similarity and the color statistics feature in the following way:
setting the similarity obtained by the perceptual hash algorithm as hashSimilar, the cosine similarity obtained by the FHOOG characteristic as fhogCossimilar, and the similarity obtained by the color statistical characteristic as colorCossimilar;
calculating the similarity of the two pictures according to a certain weight:
similar=α1×hashSimilar+α2×fhogCosSimilar+α3×colorCosSimilar。
specifically, for updating the foreground-background dictionary data, the steps are as follows, and the specific flow is as shown in fig. 3:
1. the foreground-background dictionary stores FHGG and CN feature vectors of a gesture target and a background picture, the two types of feature vectors are equal in quantity, the quantity of the foreground-background dictionary is num _ data, a certain quantity threshold num _ threshold is set, if num _ data is less than num _ threshold, a tracking or detection result is used for cutting out the target gesture picture in the current frame, the size is reset to 256 x 256, FHGG and CN features of the target gesture picture are extracted and stored in a foreground data set, and the same extraction frame with the size as the recognition result is used for cutting out the background picture outside the target gesture, the size is reset to 256 x 256, FHGG and CN features are extracted and stored in a background data set; if num _ data is greater than or equal to num _ threshold, updating is performed through the following 2;
2. the data stored in the foreground-background dictionary is arranged according to a certain sequence number, a random function is used, each record of the foreground and the background is randomly discarded according to the probability of 1/num _ data, and then the data is supplemented in a mode when num _ data is less than num _ threshold.
The above examples of the present invention are merely examples for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (9)
1. A KCF long-term gesture tracking method fused with an SLIC algorithm is characterized by comprising the following steps:
1) constructing a gesture training data set, extracting superpixel blocks of the picture through a SLIC algorithm, and training an SVM model of the superpixel blocks in an off-line mode to obtain a coarse classification model of gesture detection;
2) extracting the foreground and the background of various gesture pictures from the gesture training data set, constructing a foreground-background dictionary, and designing a similarity function of a KNN algorithm by combining FHOG characteristics and CN characteristics so as to complete fine classification of gesture detection;
3) obtaining a gesture detection model through the coarse classification model of the gesture detection and the fine classification of the gesture detection, and detecting a target by using the gesture detection model to obtain a detection frame of the target gesture; initializing a KCF filter by using a detection box of the target gesture, and then estimating the target gesture of the next frame by using the KCF filter, wherein the KCF filter takes FHOG characteristics and CN characteristics as input;
4) estimating an optimal rectangular box of the target gesture by using a designed target scale estimator, wherein the target scale estimator adopts FHOG characteristics and CN characteristics as input;
5) determining whether the current tracking result is credible by comparing the similarity of the tracking results of the current frame and the previous frame by combining a confidence function designed by a perceptual hash algorithm, the FHOG characteristic cosine similarity and the color statistical characteristic cosine similarity, and if the confidence is greater than a threshold, identifying the next frame by adopting the current tracking result, and repeating the steps 3) to 5); if the confidence coefficient is smaller than the threshold value, abandoning the current tracking result, detecting the current frame by using the gesture detection model, taking the detection result as the current tracking result, re-initializing the KCF tracker, repeating the steps 3) to 5), and finally updating the foreground-background dictionary by using the current frame recognition result.
2. The KCF long-term gesture tracking method fused with the SLIC algorithm of claim 1, wherein in step 1), the super pixel block of the picture is extracted through the SLIC algorithm, and the off-line training of the SVM model of the super pixel block specifically comprises:
step 2.1) obtaining the super pixel block of the picture to be detected through SLIC algorithm, and assuming that the current picture to be detected is the T-th frame, s (r, T) is the r-th super pixel of the T-th frame, Tt={Xt,Yt,Wt,HtIs a gesture target frame in the t-th frame image, and Xt,YtAs gesture target center, { Wt,HtThe length and width of the gesture target are set; marking the super pixels overlapped with the target frame as a foreground, and marking the background in other situations; the label of the r-th super-pixel can be expressed as:
step 2.2) after the super pixels are obtained, extracting HOG characteristic N _ VecH of each super pixel block according to the label of the super pixelss(r,t)And color statistical feature N _ VecCs(r,t);
Step 2.3) connecting HOG characteristics and color statistical characteristics in series to obtain final characteristics:
Vecs(r,t)=[N_VecHs(r,t),N_VecCs(r,t)];
step 2.4) forming the final characteristics and labels of the super pixels into a training sample set dataSet of the svm classifier { Vec ═s(r,t)And l (r, t) }, sending the sample set into an svm classifier, and training to obtain a parameter model of the svm classifier.
3. The method for tracking the KCF long-term gesture fused with the SLIC algorithm of claim 2, wherein in the step 2.2), the specific process of extracting the HOG feature of each super-pixel block after obtaining the super-pixel and according to the label of the super-pixel comprises:
because the number of the pixel points of different superpixel blocks may be different, the number of the pixel points of the r-th superpixel block s (r, t) of the first frame is assumed to be nums(r,t)Taking the statistic bin of the HOG characteristic as 18, regarding a superpixel block as a cell, and calculating the gradient of each pixel in the cell:
wherein G isxIs a gradient in the horizontal direction, and GyIs the gradient in the vertical direction, G (x, y) is the gradient of the cell,is its phase angle;
counting the number of bins in which the gradient of the pixels in the cell falls, and obtaining a vector VecH with 18-dimensional HOG characteristics by using a super-pixel blocks(r,t)And performing the following normalization processing on the HOG characteristics:
N_VecHs(r,t)=VecHs(r,t)/||VecHs(r,t)||/nums(r,t)。
4. the KCF long-term gesture tracking method fused with SLIC algorithm of claim 2, wherein in step 2.2), before extracting the HOG feature, the gamma algorithm is used to optically correct the image and grays the image.
5. The method for tracking the KCF long-term gesture fused with the SLIC algorithm of claim 2, wherein in the step 2.2), the specific process of extracting the color statistical characteristics of each super-pixel block after obtaining the super-pixels and according to the labels of the super-pixels comprises:
for the color statistical characteristics, the image is kept in an RGB mode, r, g, b components of the RGB image are regularly divided into 64 parts, and values of r, g, b in the image are all (0, 255), so that:
wherein,to round down, and rdiv、gdivAnd bdivThe r, g and b components are respectively taken as block values;
establishing a statistical array count [64], carrying out statistics on 64 sections divided by r, g and b, wherein the corresponding index is as follows:
index=rdiv*4*4+gdiv*4+bdiv
=>count[index];
obtaining a 64-dimensional vector VecC by counting the number of colorss(r,t)It is normalized as follows:
N_VecCs(r,t)=VecCs(r,t)/||VecCs(r,t)||/nums(r,t)。
6. the method for tracking the KCF long-term gesture fused with the SLIC algorithm, as claimed in claim, wherein the specific process of the step 2) is as follows:
step 3.1) extracting the foreground and the background of various gesture pictures from the gesture training data set, constructing a foreground-background dictionary, wherein the foreground data amount and the background data amount in the constructed foreground-background dictionary are equal, only the categories are divided into a foreground category and a background category, the distance between a sample to be detected and the two categories of data is calculated, and the distance function of the KNN algorithm adopts the Euclidean distance:
step 3.2) sequencing the distance between the sample to be tested and the foreground and the background according to the increasing relationship;
step 3.3) selecting K points with the minimum distance;
step 3.4) determining the occurrence frequency of the category where the front K points are located;
and 3.5) returning the category with the highest occurrence frequency in the previous K points as the prediction classification of the sample to be detected.
7. The method for tracking the KCF long-term gesture fused with the SLIC algorithm of claim 1, wherein in the step 4), the target scale estimator adopts a one-dimensional KCF filter, which is a process of solving the following optimal filter:
the method comprises the following steps that l belongs to {1, 2.,. d } is a mark for extracting d image blocks according to different scales near the center of a gesture target of a previous frame of picture, g is a Gaussian response function given according to the distance between each image block and the center of the target, h is a designed scale estimator, f is corresponding image characteristics, and lambda is a penalty factor; assuming the frequency responses of H and f as H and G, the above can be solved to obtain a size estimator as:
where F is the frequency response of the image feature F, andfor its conjugate, H is the frequency response of the scale estimator H,and for its conjugate, λ is a penalty factor, d is the number of image blocks extracted, and l ∈ {1, 2.
8. The method for tracking the KCF long-term gesture fused with the SLIC algorithm, as claimed in claim 1, wherein in step 5), the specific process of the confidence function designed by combining the perceptual hash algorithm, the FHOG feature cosine similarity and the color statistic feature cosine similarity is as follows:
step 4.1) inputting two pictures, obtaining a similarity of hashSimiar through a perceptual hash algorithm, calculating FHOG characteristics to obtain a cosine similarity of FHOG characteristics of FHOG of fh;
step 4.2) calculating the similarity of the two pictures according to a certain weight:
similar=α1×hashSimilar+α2×fhogCosSimilar+α3×colorCosSimilar。
9. the KCF long-term gesture tracking method fused with SLIC algorithm of claim 8, wherein in step 5), the specific process of updating the foreground-background dictionary using the current frame recognition result is as follows:
step 5.1) FHOG and CN feature vectors of the gesture target and the background picture are stored in the foreground-background dictionary, the quantity of the two types of FHOG and CN feature vectors are equal, and a certain quantity threshold value num _ threshold is set on the assumption that the quantity of the foreground-background dictionary is num _ data;
step 5.2) if num _ data is less than num _ threshold, cutting out a target gesture picture in the current frame by using a tracking or detection result, resetting the size to be 256 x 256, extracting FHOG and CN characteristics of the target gesture picture, storing the FHOG and the CN characteristics into a foreground data set, and intercepting a background picture outside the target gesture picture by using an extraction frame with the size similar to that of the identification result, resetting the size to be 256 x 256, extracting the FHOG and the CN characteristics, and storing the FHOG and the CN characteristics into the background data set;
and 5.3) if the num _ data is more than or equal to num _ threshold, the data stored in the foreground-background dictionary is arranged according to a certain sequence number, a random function is used, each record of the foreground and the background is randomly discarded according to the probability of 1/num _ data, and then the data is supplemented in a mode when the num _ data is less than num _ threshold in the step 5.2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910184848.7A CN110008844B (en) | 2019-03-12 | 2019-03-12 | KCF long-term gesture tracking method fused with SLIC algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910184848.7A CN110008844B (en) | 2019-03-12 | 2019-03-12 | KCF long-term gesture tracking method fused with SLIC algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110008844A true CN110008844A (en) | 2019-07-12 |
CN110008844B CN110008844B (en) | 2023-07-21 |
Family
ID=67166900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910184848.7A Active CN110008844B (en) | 2019-03-12 | 2019-03-12 | KCF long-term gesture tracking method fused with SLIC algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110008844B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807473A (en) * | 2019-10-12 | 2020-02-18 | 浙江大华技术股份有限公司 | Target detection method, device and computer storage medium |
CN111292355A (en) * | 2020-02-12 | 2020-06-16 | 江南大学 | Nuclear correlation filtering multi-target tracking method fusing motion information |
CN112926693A (en) * | 2021-04-12 | 2021-06-08 | 辽宁工程技术大学 | Kernel correlation filtering algorithm for fast motion and motion blur |
CN112991394A (en) * | 2021-04-16 | 2021-06-18 | 北京京航计算通讯研究所 | KCF target tracking method based on cubic spline interpolation and Markov chain |
CN113608618A (en) * | 2021-08-11 | 2021-11-05 | 兰州交通大学 | Hand region tracking method and system |
CN114821764A (en) * | 2022-01-25 | 2022-07-29 | 哈尔滨工程大学 | Gesture image recognition method and system based on KCF tracking detection |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105825502A (en) * | 2016-03-12 | 2016-08-03 | 浙江大学 | Saliency-guidance-based weak supervision image analysis method of dictionary learning |
US20160342837A1 (en) * | 2015-05-19 | 2016-11-24 | Toyota Motor Engineering & Manufacturing North America, Inc. | Apparatus and method for object tracking |
CN107123130A (en) * | 2017-03-06 | 2017-09-01 | 华南理工大学 | Kernel correlation filtering target tracking method based on superpixel and hybrid hash |
CN107527054A (en) * | 2017-09-19 | 2017-12-29 | 西安电子科技大学 | Prospect extraction method based on various visual angles fusion |
WO2018045626A1 (en) * | 2016-09-07 | 2018-03-15 | 深圳大学 | Super-pixel level information fusion-based hyperspectral image classification method and system |
CN108876818A (en) * | 2018-06-05 | 2018-11-23 | 国网辽宁省电力有限公司信息通信分公司 | A kind of method for tracking target based on like physical property and correlation filtering |
CN109034193A (en) * | 2018-06-20 | 2018-12-18 | 上海理工大学 | Multiple features fusion and dimension self-adaption nuclear phase close filter tracking method |
-
2019
- 2019-03-12 CN CN201910184848.7A patent/CN110008844B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160342837A1 (en) * | 2015-05-19 | 2016-11-24 | Toyota Motor Engineering & Manufacturing North America, Inc. | Apparatus and method for object tracking |
CN105825502A (en) * | 2016-03-12 | 2016-08-03 | 浙江大学 | Saliency-guidance-based weak supervision image analysis method of dictionary learning |
WO2018045626A1 (en) * | 2016-09-07 | 2018-03-15 | 深圳大学 | Super-pixel level information fusion-based hyperspectral image classification method and system |
CN107123130A (en) * | 2017-03-06 | 2017-09-01 | 华南理工大学 | Kernel correlation filtering target tracking method based on superpixel and hybrid hash |
CN107527054A (en) * | 2017-09-19 | 2017-12-29 | 西安电子科技大学 | Prospect extraction method based on various visual angles fusion |
CN108876818A (en) * | 2018-06-05 | 2018-11-23 | 国网辽宁省电力有限公司信息通信分公司 | A kind of method for tracking target based on like physical property and correlation filtering |
CN109034193A (en) * | 2018-06-20 | 2018-12-18 | 上海理工大学 | Multiple features fusion and dimension self-adaption nuclear phase close filter tracking method |
Non-Patent Citations (4)
Title |
---|
刘雨情等: "在线判别式超像素跟踪算法", 《西安电子科技大学学报》 * |
柯俊敏等: "融合颜色特征的核相关滤波器目标长期跟踪算法", 《计算机系统应用》 * |
范文兵等: "多特征融合的自适应相关滤波跟踪算法", 《计算机工程与应用》 * |
郝少华等: "基于候选区域检测的核相关目标跟踪算法", 《电视技术》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807473A (en) * | 2019-10-12 | 2020-02-18 | 浙江大华技术股份有限公司 | Target detection method, device and computer storage medium |
CN110807473B (en) * | 2019-10-12 | 2023-01-03 | 浙江大华技术股份有限公司 | Target detection method, device and computer storage medium |
CN111292355A (en) * | 2020-02-12 | 2020-06-16 | 江南大学 | Nuclear correlation filtering multi-target tracking method fusing motion information |
CN112926693A (en) * | 2021-04-12 | 2021-06-08 | 辽宁工程技术大学 | Kernel correlation filtering algorithm for fast motion and motion blur |
CN112926693B (en) * | 2021-04-12 | 2024-05-24 | 辽宁工程技术大学 | Nuclear related filtering method for fast motion and motion blur |
CN112991394A (en) * | 2021-04-16 | 2021-06-18 | 北京京航计算通讯研究所 | KCF target tracking method based on cubic spline interpolation and Markov chain |
CN112991394B (en) * | 2021-04-16 | 2024-01-19 | 北京京航计算通讯研究所 | KCF target tracking method based on cubic spline interpolation and Markov chain |
CN113608618A (en) * | 2021-08-11 | 2021-11-05 | 兰州交通大学 | Hand region tracking method and system |
CN114821764A (en) * | 2022-01-25 | 2022-07-29 | 哈尔滨工程大学 | Gesture image recognition method and system based on KCF tracking detection |
Also Published As
Publication number | Publication date |
---|---|
CN110008844B (en) | 2023-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110008844B (en) | KCF long-term gesture tracking method fused with SLIC algorithm | |
CN108470354B (en) | Video target tracking method and device and implementation device | |
CN107633226B (en) | Human body motion tracking feature processing method | |
CN111898547A (en) | Training method, device and equipment of face recognition model and storage medium | |
CN109522908A (en) | Image significance detection method based on area label fusion | |
US9349194B2 (en) | Method for superpixel life cycle management | |
CN106228121B (en) | Gesture feature recognition method and device | |
Zhang et al. | Road recognition from remote sensing imagery using incremental learning | |
CN107688829A (en) | A kind of identifying system and recognition methods based on SVMs | |
CN110866896A (en) | Image saliency target detection method based on k-means and level set super-pixel segmentation | |
WO2019007253A1 (en) | Image recognition method, apparatus and device, and readable medium | |
CN108509925B (en) | Pedestrian re-identification method based on visual bag-of-words model | |
CN110751027B (en) | Pedestrian re-identification method based on deep multi-instance learning | |
CN111583279A (en) | Super-pixel image segmentation method based on PCBA | |
CN109241816B (en) | Image re-identification system based on label optimization and loss function determination method | |
CN105550641B (en) | Age estimation method and system based on multi-scale linear differential texture features | |
CN113888586B (en) | Target tracking method and device based on correlation filtering | |
Etezadifar et al. | A new sample consensus based on sparse coding for improved matching of SIFT features on remote sensing images | |
CN108428220A (en) | Satellite sequence remote sensing image sea island reef region automatic geometric correction method | |
CN112329784A (en) | Correlation filtering tracking method based on space-time perception and multimodal response | |
CN116704490B (en) | License plate recognition method, license plate recognition device and computer equipment | |
CN108921872B (en) | Robust visual target tracking method suitable for long-range tracking | |
WO2015146113A1 (en) | Identification dictionary learning system, identification dictionary learning method, and recording medium | |
CN114387592B (en) | Character positioning and identifying method under complex background | |
CN110827327B (en) | Fusion-based long-term target tracking method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |