[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN102013103A - Method for dynamically tracking lip in real time - Google Patents

Method for dynamically tracking lip in real time Download PDF

Info

Publication number
CN102013103A
CN102013103A CN 201010571128 CN201010571128A CN102013103A CN 102013103 A CN102013103 A CN 102013103A CN 201010571128 CN201010571128 CN 201010571128 CN 201010571128 A CN201010571128 A CN 201010571128A CN 102013103 A CN102013103 A CN 102013103A
Authority
CN
China
Prior art keywords
mrow
lip
msub
mtd
pixel points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010571128
Other languages
Chinese (zh)
Other versions
CN102013103B (en
Inventor
王士林
李建华
刘功申
李翔
李生红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN 201010571128 priority Critical patent/CN102013103B/en
Publication of CN102013103A publication Critical patent/CN102013103A/en
Application granted granted Critical
Publication of CN102013103B publication Critical patent/CN102013103B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a method which belongs to the technical field of image processing and pattern recognition, in particular to a method for dynamically tracking a lip in real time. The method comprises the following steps: shooting and acquiring an image sequence containing a lip area by a digital video (DV); based on a continuous image lip segmentation method of fuzzy clustering and Kalman prediction, dividing all pixel points in an image into lip pixel points or non-lip pixel points, and outputting the probability of the lip pixel points to which the all pixel points belong; and acquiring outlines of lips in each frame in the lip image sequence on the basis of a lip probability allocation plan provided by the step two through a 14-point dynamic shape model and Kalman forecast. The method can be used for automatically tracking the movement of the lip in the image sequence and has the advantages of higher processing speed (so as to ensure instantaneity) and recognition accuracy.

Description

Real-time dynamic lip tracking method
Technical Field
The invention relates to a method in the technical field of image processing and pattern recognition, in particular to a real-time dynamic lip tracking method.
Background
In recent years, Automatic Speech Recognition (ASR) technology has advanced greatly, and a series of mature products are formed, which can obtain a good recognition effect in an environment with a high signal-to-noise ratio. However, the performance of these systems is often limited by the level of background noise, and the results achieved by these systems are often unsatisfactory in heavily noisy environments, such as in-car, factory, airport, etc. Accordingly, more and more scholars seek ways to improve speech recognition from sources other than audio. The McGurk effect (the McGurk effect) reveals that there is an indivisible intrinsic relationship between the audio/visual information during the speaker's instruction. Therefore, it is thought that understanding of the narrative is aided by the introduction of visual information of lip movements, and this type of speech recognition system is called an automated lip reading system. In the above system, one of the first and most critical steps is to accurately and rapidly acquire lip motion change conditions from the video, i.e. a real-time lip tracking method. The accuracy and reliability of the lip reading system often directly determine the performance of the lip reading system.
After a search of the prior art documents, the Lip region detection and tracking (Lip detection and tracking) published by a company at the 11th International Conference on Image Analysis and Processing, page 8-13, which adopts the intensity of the luminance edge as a standard for detecting the Lip contour to converge the Lip edge to the strongest edge by an iterative method. Meanwhile, under the limitation of a reasonable lip model, the reasonability of finally obtaining the lip model is ensured. The technology has the following defects: firstly, this is a lip tracking technique for grayscale (luminance) images, which is greatly affected by the illumination condition due to the lack of chrominance information; second, the technique relies on the bright edges of the lip image, while the edge information depends on the contrast of the image, and the unpainted lip image tends to be of low contrast, causing instability of the edge information. Based on the two points, the accuracy and robustness of the technology need to be improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a real-time dynamic lip tracking method, which realizes the acquisition and tracking of lip movement of a speaker, and ensures the real-time processing speed while acquiring higher matching accuracy.
The invention is realized by the following technical scheme:
the invention comprises the following steps:
step one, shooting and acquiring an image sequence containing a lip area through a digital camera. Since the color space collected by the commonly used digital camera is the RGB color space, the RGB color space is not a uniform color space that conforms to the human eye's color difference vision. Therefore, it needs to be converted into the CIE-LAB uniform color space as follows:
X Y Z = 0.490 0.310 0.200 0.177 0.813 0.011 0.000 0.010 0.990 R G B
<math><mrow><msup><mi>L</mi><mo>*</mo></msup><mo>=</mo><mfenced open='{' close=''><mtable><mtr><mtd><mn>116</mn><msup><mrow><mo>(</mo><msup><mi>Y</mi><mo>&prime;</mo></msup><mo>)</mo></mrow><mrow><mn>1</mn><mo>/</mo><mn>3</mn></mrow></msup><mo>-</mo><mn>16</mn></mtd><mtd><mi>if</mi><msup><mi>Y</mi><mo>&prime;</mo></msup><mo>></mo><mn>0.008856</mn></mtd></mtr><mtr><mtd><mn>903.3</mn><msup><mi>Y</mi><mo>&prime;</mo></msup></mtd><mtd><mi>otherwise</mi></mtd></mtr></mtable></mfenced></mrow></math>
a * = 500 ( K 1 1 / 3 - K 2 1 / 3 )
b * = 200 ( K 2 1 / 3 - K 3 1 / 3 )
wherein, <math><mrow><msub><mi>K</mi><mi>i</mi></msub><mo>=</mo><mfenced open='{' close=''><mtable><mtr><mtd><msub><mi>&Phi;</mi><mi>i</mi></msub></mtd><mtd><mi>if</mi><msub><mi>&Phi;</mi><mi>i</mi></msub><mo>></mo><mn>0.008856</mn></mtd></mtr><mtr><mtd><mn>7.787</mn><msub><mi>&Phi;</mi><mi>i</mi></msub><mo>+</mo><mn>16</mn><mo>/</mo><mn>116</mn></mtd><mtd><mi>otherwise</mi></mtd></mtr></mtable></mfenced></mrow></math>
and step two, dividing all pixel points in the image into lip pixel points or non-lip pixel points through a continuous image lip segmentation method based on fuzzy clustering and Kalman prediction, and outputting the probability that all the pixel points belong to the lip pixel points. The specific method comprises the following steps:
for an N × M image I, X ═ X1,1,…,xr,s,…,xN,MRepresents the color information set of all pixel points in the image, where xr,s∈RqRepresenting the color characteristics of the pixel at coordinates (r, s). In addition, let di,r,sAs a color feature xr,sAnd ith color center vi(i-0 represents a lip class, and i-1 represents a non-lip class). Finally, the whole lip segmentation algorithm target function based on the fuzzy clustering technology is as follows:
<math><mrow><mi>J</mi><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>r</mi><mo>=</mo><mn>1</mn></mrow><mi>N</mi></munderover><munderover><mi>&Sigma;</mi><mrow><mi>s</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mn>1</mn></munderover><msubsup><mi>u</mi><mrow><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi></mrow><mi>m</mi></msubsup><mrow><mo>(</mo><msubsup><mi>d</mi><mrow><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi></mrow><mn>2</mn></msubsup><mo>+</mo><mi>gs</mi><mrow><mo>(</mo><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi><mo>,</mo><mi>p</mi><mo>)</mo></mrow><mo>)</mo></mrow></mrow></math>
and obey <math><mrow><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mn>1</mn></munderover><msub><mi>u</mi><mrow><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi></mrow></msub><mo>=</mo><mn>1</mn><mo>,</mo></mrow></math> <math><mrow><mo>&ForAll;</mo><mrow><mo>(</mo><mi>r</mi><mo>,</mo><mi>s</mi><mo>)</mo></mrow><mo>&Element;</mo><mi>I</mi><mo>.</mo></mrow></math>
Wherein, U represents a fuzzy membership matrix (i.e. probability value that a pixel belongs to a certain class), and the gs function is a position penalty function, i.e. the membership of lips of pixels in a lip region is enhanced, and the membership of lips of pixels outside the lip region is reduced.
The probability of the lip pixel points is obtained in the whole lip segmentation process by adopting an iterative mode of gradient reduction to obtain the optimal solution of a membership matrix which enables the objective function to be minimum, the Kalman prediction of the color center and the spatial position of the lips has the effect of predicting the color center/non-lip color center and the spatial position of the lips of the current frame through the color centers and the spatial positions of the lips of a plurality of previous frames, and the final output result is the probability that all pixel points in the image belong to the lip pixel points, namely u0,r,s
The kalman prediction specifically includes:
xk=Axk-1+wk-1
zk=Hxk+vk
wherein x iskRepresenting the current state, wk-1Representing noise at the time of the state transition. And A is the state transition matrix; z is a radical ofkRepresents the current time measurement (i.e., color center and lip space position parameters), and vkThen it represents a measurement error and H is the measurement matrix. The state transition error and the measurement error are generally considered to fit a normal distribution: p (w) N (0, Q); p (v) N (0, R). The calculation of kalman filter prediction is an iterative recursive process, which is specifically as follows:
1) initializing an initial state and initial estimation error covariance;
2) predicting the current state according to the state of the previous step, and obtaining a predicted measurement value through the predicted state by using an H measurement function, wherein the measurement value is a required correction result after Kalman filtering;
3) correcting the system model according to the currently observed measured value, inputting the final output of the current frame measured value into a correction process, and correcting the system model;
4) repeating steps 2) and 3) until the last frame of the lip sequence.
Step three, acquiring the lip contour in each frame in the lip image sequence on the basis of the lip probability distribution map provided in the step two through a 14-point dynamic shape model and Kalman prediction, wherein the details are as follows:
the objective function is defined as:
<math><mrow><mi>max</mi><mo>{</mo><mi>C</mi><mrow><mo>(</mo><msub><mi>&lambda;</mi><mi>p</mi></msub><mo>)</mo></mrow><mo>=</mo><munder><mi>&Pi;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>R</mi><mi>l</mi></msub><mrow><mo>(</mo><msub><mi>&lambda;</mi><mi>p</mi></msub><mo>)</mo></mrow></mrow></munder><msub><mi>prob</mi><mi>l</mi></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><munder><mi>&Pi;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>R</mi><mi>nl</mi></msub><mrow><mo>(</mo><msub><mi>&lambda;</mi><mi>p</mi></msub><mo>)</mo></mrow></mrow></munder><msub><mi>prob</mi><mi>nl</mi></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>}</mo></mrow></math>
wherein λpIs a 14 o' clock lip profile parameter, R1Is the lip region, Rb1Is a non-lip region. prob1Is lip-like probability, probm1Is a non-lip class probability. By means of an iterative search it is possible to search,obtaining a final lip contour model lambdap. The kalman prediction functions to predict the initial lip model of the current frame by using the lip contour points of the previous frames, and the method is similar to that described in the second step, except that the measured value is the 14-point lip contour coordinate value.
Compared with the prior art, the invention has the following beneficial effects: according to the characteristics that the lip image contrast is low, lip points are often relatively gathered, the lip sequence has continuity in a time domain and the like, a novel lip segmentation and lip contour extraction method is adopted, and the method is superior to the traditional method based on brightness (color) edges in performance and robustness. Through a large number of experimental tests, the method can accurately track and extract the lip contour on the basis of ensuring the real-time performance (processing more than 30 frames of images per second).
Drawings
FIG. 1 is a work flow diagram of the method of the present invention.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the accompanying drawings: the present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.
As shown in fig. 1, the present embodiment includes the following steps:
firstly, acquiring a lip image sequence containing lip areas by a digital camera at a frame rate of 24 frames per second, wherein the format of each frame of image is RGB, and the resolution is 220 x 180. And converting the color space into a CIE-LAB uniform color space, which comprises the following steps:
X Y Z = 0.490 0.310 0.200 0.177 0.813 0.011 0.000 0.010 0.990 R G B
<math><mrow><msup><mi>L</mi><mo>*</mo></msup><mo>=</mo><mfenced open='{' close=''><mtable><mtr><mtd><mn>116</mn><msup><mrow><mo>(</mo><msup><mi>Y</mi><mo>&prime;</mo></msup><mo>)</mo></mrow><mrow><mn>1</mn><mo>/</mo><mn>3</mn></mrow></msup><mo>-</mo><mn>16</mn></mtd><mtd><mi>if</mi><msup><mi>Y</mi><mo>&prime;</mo></msup><mo>></mo><mn>0.008856</mn></mtd></mtr><mtr><mtd><mn>903.3</mn><msup><mi>Y</mi><mo>&prime;</mo></msup></mtd><mtd><mi>otherwise</mi></mtd></mtr></mtable></mfenced></mrow></math>
a * = 500 ( K 1 1 / 3 - K 2 1 / 3 )
b * = 200 ( K 2 1 / 3 - K 3 1 / 3 )
wherein, <math><mrow><msub><mi>K</mi><mi>i</mi></msub><mo>=</mo><mfenced open='{' close=''><mtable><mtr><mtd><msub><mi>&Phi;</mi><mi>i</mi></msub></mtd><mtd><mi>if</mi><msub><mi>&Phi;</mi><mi>i</mi></msub><mo>></mo><mn>0.008856</mn></mtd></mtr><mtr><mtd><mn>7.787</mn><msub><mi>&Phi;</mi><mi>i</mi></msub><mo>+</mo><mn>16</mn><mo>/</mo><mn>116</mn></mtd><mtd><mi>otherwise</mi></mtd></mtr></mtable></mfenced></mrow></math>
and secondly, dividing all pixel points in the image into lip pixel points or non-lip pixel points by a continuous image lip segmentation method based on fuzzy clustering and Kalman prediction, and outputting the probability that all the pixel points belong to the lip pixel points. The specific method comprises the following steps:
for a 220X 180 image I, X ═ X1,1,…,xr,s,…,xN,MRepresents the color information set of all pixel points in the image, where xr,s∈RqRepresenting the Lab three-dimensional color characteristics of the pixel points located at the coordinates (r, s). In addition, let di,r,sAs a color feature xr,sAnd ith color center vi(i-0 represents a lip class, and i-1 represents a non-lip class). Finally, the whole lip segmentation algorithm target function based on the fuzzy clustering technology is as follows:
<math><mrow><mi>J</mi><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>r</mi><mo>=</mo><mn>1</mn></mrow><mi>N</mi></munderover><munderover><mi>&Sigma;</mi><mrow><mi>s</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mn>1</mn></munderover><msubsup><mi>u</mi><mrow><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi></mrow><mi>m</mi></msubsup><mrow><mo>(</mo><msubsup><mi>d</mi><mrow><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi></mrow><mn>2</mn></msubsup><mo>+</mo><mi>gs</mi><mrow><mo>(</mo><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi><mo>,</mo><mi>p</mi><mo>)</mo></mrow><mo>)</mo></mrow></mrow></math>
and obey
Figure BDA0000035864760000047
Figure BDA0000035864760000048
Wherein, U represents a fuzzy membership matrix (i.e. probability value that a pixel belongs to a certain class), and the gs function is a position penalty function, i.e. the membership of lips of pixels in a lip region is enhanced, and the membership of lips of pixels outside the lip region is reduced. In the whole lip segmentation process, the optimal solution of the membership matrix which minimizes the objective function is obtained by adopting a gradient descending iteration mode.
The role of kalman prediction is to predict the lip/non-lip color center and lip space position of the current frame from the color center and lip space position of the previous frames. The final output result is the probability that all pixel points in the image belong to the lip class, namely
Figure BDA0000035864760000049
The kalman prediction of the color center and the lip space position specifically comprises:
xk=Axk-1+wk-1
zk=Hxk+vk
wherein x iskRepresenting the current state, wk-1Representing noise at the time of the state transition. And A is the state transition matrix; z is a radical ofkRepresents the current time measurement (i.e., color center and lip space position parameters), and vkThen it represents a measurement error and H is the measurement matrix. Error of state transitionThe difference and measurement error are generally considered to fit a normal distribution: p (w) N (0, Q); p (v) N (0, R). The calculation of kalman filter prediction is an iterative recursive process, which is specifically as follows:
1) initializing an initial state and initial estimation error covariance;
2) predicting the current state according to the state of the previous step, and obtaining a predicted measurement value through the predicted state by using an H measurement function, wherein the measurement value is a required correction result after Kalman filtering;
3) correcting the system model according to the currently observed measured value, inputting the final output of the current frame measured value into a correction process, and correcting the system model;
4) repeating steps 2) and 3) until the last frame of the lip sequence.
Thirdly, through a 14-point dynamic shape model and Kalman prediction, on the basis of the lip probability distribution map provided in the second step, the lip contour in each frame in the lip image sequence is obtained, specifically as follows:
the objective function is defined as:
<math><mrow><mi>max</mi><mo>{</mo><mi>C</mi><mrow><mo>(</mo><msub><mi>&lambda;</mi><mi>p</mi></msub><mo>)</mo></mrow><mo>=</mo><munder><mi>&Pi;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>R</mi><mi>l</mi></msub><mrow><mo>(</mo><msub><mi>&lambda;</mi><mi>p</mi></msub><mo>)</mo></mrow></mrow></munder><msub><mi>prob</mi><mi>l</mi></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><munder><mi>&Pi;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>R</mi><mi>nl</mi></msub><mrow><mo>(</mo><msub><mi>&lambda;</mi><mi>p</mi></msub><mo>)</mo></mrow></mrow></munder><msub><mi>prob</mi><mi>nl</mi></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>}</mo></mrow></math>
wherein λpIs a 14 o' clock lip profile parameter, R1Is the lip region, Rb1Is a non-lip region. prob1Is lip-like probability, probm1Is a non-lip class probability. Obtaining a final lip contour model lambda through iterative searchp. The kalman prediction functions to predict the initial lip model of the current frame by using the lip contour points of the previous frames, and the method is similar to that described in the second step, except that the measured value is the 14-point lip contour coordinate value.
The iterative search method is characterized in that:
firstly, initializing a 14-point lip model lambda by using lip class probability distribution obtained by a lip image segmentation algorithmp
Secondly, according to the target function, calculating the displacement of the lip contour point and updating the position of the contour point:
<math><mrow><mi>&Delta;</mi><msub><mi>&lambda;</mi><mi>p</mi></msub><mo>=</mo><mo>{</mo><msub><mi>dx</mi><mi>i</mi></msub><mo>,</mo><msub><mi>dy</mi><mi>i</mi></msub><mo>}</mo><mo>=</mo><mo>{</mo><mo>-</mo><mfrac><mrow><mo>&PartialD;</mo><mi>C</mi></mrow><mrow><mo>&PartialD;</mo><msub><mi>x</mi><mi>i</mi></msub></mrow></mfrac><mo>,</mo><mo>-</mo><mfrac><mrow><mo>&PartialD;</mo><mi>C</mi></mrow><mrow><mo>&PartialD;</mo><msub><mi>y</mi><mi>i</mi></msub></mrow></mfrac><mo>}</mo><mi>i</mi><mo>=</mo><mn>0,1</mn><mo>,</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>,</mo><mn>13</mn></mrow></math>
λp,new=λp,old+wΔλp
where w is the step size per offset, set to 0.05 in the example.
And thirdly, repeating the step 2 until the target function converges.
By testing more than 2000 speech sequences of 50 speakers, the method of the present embodiment accurately tracks the lip contour and ensures the processing speed to be greater than 30 frames per second.

Claims (7)

1. A real-time dynamic lip tracking method is characterized by comprising the following steps:
shooting and acquiring an image sequence including a lip region through a digital camera;
dividing all pixel points in the image into lip pixel points or non-lip pixel points by a continuous image lip segmentation method based on fuzzy clustering and Kalman prediction, and outputting the probability that all the pixel points belong to the lip pixel points;
and step three, acquiring the lip contour in each frame in the lip image sequence on the basis of the lip probability distribution map provided in the step two through a 14-point dynamic shape model and Kalman prediction.
2. The real-time dynamic lip tracking method according to claim 1, wherein when the color space collected by the digital camera is RGB color space, it is converted into CIE-LAB uniform color space, specifically as follows:
X Y Z = 0.490 0.310 0.200 0.177 0.813 0.011 0.000 0.010 0.990 R G B
<math><mrow><msup><mi>L</mi><mo>*</mo></msup><mo>=</mo><mfenced open='{' close=''><mtable><mtr><mtd><mn>116</mn><msup><mrow><mo>(</mo><msup><mi>Y</mi><mo>&prime;</mo></msup><mo>)</mo></mrow><mrow><mn>1</mn><mo>/</mo><mn>3</mn></mrow></msup><mo>-</mo><mn>16</mn></mtd><mtd><mi>if</mi><msup><mi>Y</mi><mo>&prime;</mo></msup><mo>></mo><mn>0.008856</mn></mtd></mtr><mtr><mtd><mn>903.3</mn><msup><mi>Y</mi><mo>&prime;</mo></msup></mtd><mtd><mi>otherwise</mi></mtd></mtr></mtable></mfenced></mrow></math>
a * = 500 ( K 1 1 / 3 - K 2 1 / 3 )
b * = 200 ( K 2 1 / 3 - K 3 1 / 3 )
wherein, <math><mrow><msub><mi>K</mi><mi>i</mi></msub><mo>=</mo><mfenced open='{' close=''><mtable><mtr><mtd><msub><mi>&Phi;</mi><mi>i</mi></msub></mtd><mtd><mi>if</mi><msub><mi>&Phi;</mi><mi>i</mi></msub><mo>></mo><mn>0.008856</mn></mtd></mtr><mtr><mtd><mn>7.787</mn><msub><mi>&Phi;</mi><mi>i</mi></msub><mo>+</mo><mn>16</mn><mo>/</mo><mn>116</mn></mtd><mtd><mi>otherwise</mi></mtd></mtr></mtable></mfenced><mo>.</mo></mrow></math>
3. the method of claim 1, wherein the segmentation method comprises:
for an N × M image I, X ═ X1,1,…,xr,s,…,xN,MRepresents the color information set of all pixel points in the image, where xr,s∈RqRepresenting the color characteristics of the pixel points located at the coordinates (r, s);
in addition, let di,r,sAs a color feature Xr,sAnd ith color center viEuclidean distance between, wherein: i-0 represents a lip class, i-1 represents a non-lip class;
finally, the whole lip segmentation algorithm target function based on the fuzzy clustering technology is as follows:
<math><mrow><mi>J</mi><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>r</mi><mo>=</mo><mn>1</mn></mrow><mi>N</mi></munderover><munderover><mi>&Sigma;</mi><mrow><mi>s</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mn>1</mn></munderover><msubsup><mi>u</mi><mrow><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi></mrow><mi>m</mi></msubsup><mrow><mo>(</mo><msubsup><mi>d</mi><mrow><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi></mrow><mn>2</mn></msubsup><mo>+</mo><mi>gs</mi><mrow><mo>(</mo><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi><mo>,</mo><mi>p</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>;</mo></mrow></math>
and obey <math><mrow><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mn>1</mn></munderover><msub><mi>u</mi><mrow><mi>i</mi><mo>,</mo><mi>r</mi><mo>,</mo><mi>s</mi></mrow></msub><mo>=</mo><mn>1</mn><mo>,</mo></mrow></math> <math><mrow><mo>&ForAll;</mo><mrow><mo>(</mo><mi>r</mi><mo>,</mo><mi>s</mi><mo>)</mo></mrow><mo>&Element;</mo><mi>I</mi><mo>;</mo></mrow></math>
The U represents a fuzzy membership matrix, namely the probability value that the pixel belongs to a certain class, and the gs function is a position penalty function, namely the lip class membership of the pixel in the lip region is enhanced, and the lip membership of the pixel outside the lip region is reduced.
4. The method as claimed in claim 1, wherein the probability of the lip pixel points and the entire lip segmentation process are determined by iterative means of gradient descent to obtain the optimal solution of the membership matrix for minimizing the objective function, and the kalman prediction of the color center and the spatial position of the lips is performed by predicting the color center and the spatial position of the lips of the current frame according to the color center and the spatial position of the lips of the previous frames, and finally, the probability of the lip pixel points and the entire lip segmentation process are determined by using the color center and the spatial position of the lips of the previous framesThe output result is the probability that all pixel points in the image belong to the lip pixel points, namely
Figure FDA0000035864750000023
5. The real-time dynamic lip tracking method of claim 4, wherein the Kalman prediction is:
xk=Axk-1+wk-1
zk=Hxk+vk
wherein x iskRepresenting the current state, wk-1Representing noise at the time of state transition, and a is a state transition matrix; z is a radical ofkRepresenting the current time measurement, namely the color center and lip space position parameters; and v iskThen the measurement error is indicated, H is the measurement matrix; the state transition error and the measurement error are in accordance with a normal distribution: p (w) N (0, Q); p (v) N (0, R).
6. The method of claim 4, wherein the Kalman prediction calculation is an iterative recursive process, as follows:
1) initializing an initial state and initial estimation error covariance;
2) predicting the current state according to the state of the previous step, and obtaining a predicted measurement value through the predicted state by using an H measurement function, wherein the measurement value is a required correction result after Kalman filtering;
3) correcting the system model according to the currently observed measured value, inputting the final output of the current frame measured value into a correction process, and correcting the system model;
4) repeating steps 2) and 3) until the last frame of the lip sequence.
7. The method for real-time dynamic lip tracking according to claim 1, wherein the objective function defined by the lip contour in each frame of the sequence of acquired lip images is:
<math><mrow><mi>max</mi><mo>{</mo><mi>C</mi><mrow><mo>(</mo><msub><mi>&lambda;</mi><mi>p</mi></msub><mo>)</mo></mrow><mo>=</mo><munder><mi>&Pi;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>R</mi><mi>l</mi></msub><mrow><mo>(</mo><msub><mi>&lambda;</mi><mi>p</mi></msub><mo>)</mo></mrow></mrow></munder><msub><mi>prob</mi><mi>l</mi></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><munder><mi>&Pi;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>R</mi><mi>nl</mi></msub><mrow><mo>(</mo><msub><mi>&lambda;</mi><mi>p</mi></msub><mo>)</mo></mrow></mrow></munder><msub><mi>prob</mi><mi>nl</mi></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>}</mo></mrow></math>
wherein: lambda [ alpha ]pIs a 14 o' clock lip profile parameter, R1Is the lip region, Rb1Is a non-lip region;
prob1is lip-like probability, probm1Probability of non-lip class;
obtaining a final lip contour model lambda through iterative searchp
The Kalman prediction is used for predicting the initial lip model of the current frame through the lip contour points of a plurality of previous frames, and the method is different from the step two in that the measured value is a 14-point lip contour coordinate value.
CN 201010571128 2010-12-03 2010-12-03 Method for dynamically tracking lip in real time Expired - Fee Related CN102013103B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010571128 CN102013103B (en) 2010-12-03 2010-12-03 Method for dynamically tracking lip in real time

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010571128 CN102013103B (en) 2010-12-03 2010-12-03 Method for dynamically tracking lip in real time

Publications (2)

Publication Number Publication Date
CN102013103A true CN102013103A (en) 2011-04-13
CN102013103B CN102013103B (en) 2013-04-03

Family

ID=43843267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010571128 Expired - Fee Related CN102013103B (en) 2010-12-03 2010-12-03 Method for dynamically tracking lip in real time

Country Status (1)

Country Link
CN (1) CN102013103B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102303614A (en) * 2011-07-29 2012-01-04 秦皇岛港股份有限公司 Automatic detection and alarm device of train dead hook
CN103914699A (en) * 2014-04-17 2014-07-09 厦门美图网科技有限公司 Automatic lip gloss image enhancement method based on color space
CN104409075A (en) * 2014-11-28 2015-03-11 深圳创维-Rgb电子有限公司 Voice identification method and system
CN104766316A (en) * 2015-03-31 2015-07-08 复旦大学 Novel lip segmentation algorithm for traditional Chinese medical inspection diagnosis
CN106446800A (en) * 2016-08-31 2017-02-22 北京云图微动科技有限公司 Tooth identification method, device and system
CN106683065A (en) * 2012-09-20 2017-05-17 上海联影医疗科技有限公司 Lab space based image fusing method
CN106778770A (en) * 2016-11-23 2017-05-31 河池学院 A kind of image-recognizing method of Visual intelligent robot
CN107369449A (en) * 2017-07-14 2017-11-21 上海木爷机器人技术有限公司 A kind of efficient voice recognition methods and device
CN107811735A (en) * 2017-10-23 2018-03-20 广东工业大学 One kind auxiliary eating method, system, equipment and computer-readable storage medium
CN108596992A (en) * 2017-12-31 2018-09-28 广州二元科技有限公司 A kind of quickly real-time lip gloss cosmetic method
CN109816741A (en) * 2017-11-22 2019-05-28 北京展讯高科通信技术有限公司 A kind of generation method and system of adaptive virtual lip gloss

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219639B1 (en) * 1998-04-28 2001-04-17 International Business Machines Corporation Method and apparatus for recognizing identity of individuals employing synchronized biometrics
CN101046959A (en) * 2007-04-26 2007-10-03 上海交通大学 Identity identification method based on lid speech characteristic
US20080273116A1 (en) * 2005-09-12 2008-11-06 Nxp B.V. Method of Receiving a Multimedia Signal Comprising Audio and Video Frames

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219639B1 (en) * 1998-04-28 2001-04-17 International Business Machines Corporation Method and apparatus for recognizing identity of individuals employing synchronized biometrics
US20080273116A1 (en) * 2005-09-12 2008-11-06 Nxp B.V. Method of Receiving a Multimedia Signal Comprising Audio and Video Frames
CN101046959A (en) * 2007-04-26 2007-10-03 上海交通大学 Identity identification method based on lid speech characteristic

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《1998 IEEE Second Workshop on Multimedia Signal Processing》 19981209 Chan, M.T. etc. Real-time lip tracking and bimodal continuous speech recognition 第65-70页 1-7 , 2 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102303614A (en) * 2011-07-29 2012-01-04 秦皇岛港股份有限公司 Automatic detection and alarm device of train dead hook
CN106683065A (en) * 2012-09-20 2017-05-17 上海联影医疗科技有限公司 Lab space based image fusing method
CN103914699B (en) * 2014-04-17 2017-09-19 厦门美图网科技有限公司 A kind of method of the image enhaucament of the automatic lip gloss based on color space
CN103914699A (en) * 2014-04-17 2014-07-09 厦门美图网科技有限公司 Automatic lip gloss image enhancement method based on color space
CN104409075A (en) * 2014-11-28 2015-03-11 深圳创维-Rgb电子有限公司 Voice identification method and system
CN104766316A (en) * 2015-03-31 2015-07-08 复旦大学 Novel lip segmentation algorithm for traditional Chinese medical inspection diagnosis
CN104766316B (en) * 2015-03-31 2017-11-17 复旦大学 New lip partitioning algorithm in tcm inspection
CN106446800A (en) * 2016-08-31 2017-02-22 北京云图微动科技有限公司 Tooth identification method, device and system
CN106446800B (en) * 2016-08-31 2019-04-02 北京贝塔科技股份有限公司 Tooth recognition methods, apparatus and system
CN106778770A (en) * 2016-11-23 2017-05-31 河池学院 A kind of image-recognizing method of Visual intelligent robot
CN107369449A (en) * 2017-07-14 2017-11-21 上海木爷机器人技术有限公司 A kind of efficient voice recognition methods and device
CN107811735A (en) * 2017-10-23 2018-03-20 广东工业大学 One kind auxiliary eating method, system, equipment and computer-readable storage medium
CN107811735B (en) * 2017-10-23 2020-01-07 广东工业大学 Auxiliary eating method, system, equipment and computer storage medium
CN109816741A (en) * 2017-11-22 2019-05-28 北京展讯高科通信技术有限公司 A kind of generation method and system of adaptive virtual lip gloss
CN109816741B (en) * 2017-11-22 2023-04-28 北京紫光展锐通信技术有限公司 Method and system for generating self-adaptive virtual lip gloss
CN108596992A (en) * 2017-12-31 2018-09-28 广州二元科技有限公司 A kind of quickly real-time lip gloss cosmetic method
CN108596992B (en) * 2017-12-31 2021-01-01 广州二元科技有限公司 Rapid real-time lip gloss makeup method

Also Published As

Publication number Publication date
CN102013103B (en) 2013-04-03

Similar Documents

Publication Publication Date Title
CN102013103B (en) Method for dynamically tracking lip in real time
CN108154118B (en) A kind of target detection system and method based on adaptive combined filter and multistage detection
CN106570486B (en) Filtered target tracking is closed based on the nuclear phase of Fusion Features and Bayes&#39;s classification
CN109741356B (en) Sub-pixel edge detection method and system
CN103413120B (en) Tracking based on object globality and locality identification
CN112036254B (en) Moving vehicle foreground detection method based on video image
CN108876820B (en) Moving target tracking method under shielding condition based on mean shift
CN102722891A (en) Method for detecting image significance
CN108537751B (en) Thyroid ultrasound image automatic segmentation method based on radial basis function neural network
CN108804992B (en) Crowd counting method based on deep learning
CN107944354B (en) Vehicle detection method based on deep learning
CN106991686A (en) A kind of level set contour tracing method based on super-pixel optical flow field
CN104537688A (en) Moving object detecting method based on background subtraction and HOG features
CN104599288A (en) Skin color template based feature tracking method and device
CN110245600B (en) Unmanned aerial vehicle road detection method for self-adaptive initial quick stroke width
CN110717900A (en) Pantograph abrasion detection method based on improved Canny edge detection algorithm
CN112926552A (en) Remote sensing image vehicle target recognition model and method based on deep neural network
CN112164093A (en) Automatic person tracking method based on edge features and related filtering
CN109241932B (en) Thermal infrared human body action identification method based on motion variance map phase characteristics
CN110751671B (en) Target tracking method based on kernel correlation filtering and motion estimation
CN106951831B (en) Pedestrian detection tracking method based on depth camera
CN109102520A (en) The moving target detecting method combined based on fuzzy means clustering with Kalman filter tracking
CN117522862A (en) Image processing method and processing system based on CT image pneumonia recognition
CN115063679B (en) Pavement quality assessment method based on deep learning
CN109615617A (en) A kind of image partition method for protecting convex indirect canonical level set

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130403

Termination date: 20151203

EXPY Termination of patent right or utility model