CN103810480A

CN103810480A - Method for detecting gesture based on RGB-D image

Info

Publication number: CN103810480A
Application number: CN201410073064.4A
Authority: CN
Inventors: 张维忠; 丁洁玉; 赵志刚; 张峰; 李明; 王青林
Original assignee: QINGDAO ANIMATION; Qingdao Broadcasting And Tv Wireless Media Group Co ltd; Qingdao University
Current assignee: Shenzhen Micagent Technology Co ltd
Priority date: 2014-02-28
Filing date: 2014-02-28
Publication date: 2014-05-21
Anticipated expiration: 2034-02-28
Also published as: CN103810480B

Abstract

The invention provides a method for detecting a gesture based on an RGB-D image. The method comprises the following steps of: step 1, acquiring the RGB-D image; step 2, segmenting the hands from a background; step 3, identifying the gesture; step 4, finding the optimal segmentation of the gesture. The detecting the gesture based on the RGB-D image provided by the invention is capable of effectively segmenting the areas of human hands, accurate in segmentation, capable of obtaining good gesture segmentation even if in the case that the hands are partially self-shielded or the interference of other people exists in the background, and good in algorithm robustness.

Description

Based on the gesture detecting method of RGB-D image

Technical field

The present invention relates to digital image processing techniques field, relate in particular to a kind of gesture detecting method based on RGB-D image.

Background technology

Man Machine Interface needs directly perceived as far as possible and nature.User and machine carry out alternately, do not need loaded down with trivial details equipment (as colour-coded or gloves) or device as telepilot, mouse and keyboard.Gesture can provide a simple communication way combining with machine intelligence.Can find there is the gesture system of successful Application at various research and industrial circle.For example: game control, virtual environment, Smart Home and Sign Language Recognition etc.

The quality of Hand Gesture Segmentation directly affects precision and the accuracy that follow-up gesture feature extracts, follows the tracks of, identifies.In recent years, researchist has proposed several different methods in the research of Hand Gesture Segmentation both at home and abroad, mainly comprises stencil matching method, method of difference, skin color segmentation method and constraint lambda limiting process etc.Stencil matching method is to be based upon on the basis of hand-type database, in database, the masterplate in images of gestures and hand-type data is compared.Hand-type is a nonrigid object, and the process computation amount of comparison is large, and difficulty is larger, is difficult to requirement of real time.Constrained method is the gloves by wearing different colours, or the contrast of outstanding hand and background, simplifies gesture region (prospect) and background are divided with this.But these constrained gesture data exchange convenience and freedom.Image difference method is to subtract each other to carry out Hand Gesture Segmentation by images of gestures and the static background image of motion, and the defect of the method is the generation that cannot overcome corresponding picture point skew on image.Skin color segmentation method is to carry out Hand Gesture Segmentation according to the Clustering features of the colour of skin, and it can be a greater impact the colour of skin with respect to the angle difference of light source because of gesture.For requiring quick and easy, the practical gesture identification based on vision, these methods of independent use have certain limitation, and cannot be accurately real-time effectively cuts apart gesture, has seriously affected segmentation effect.Patent CN103226708A is in Hand Gesture Segmentation, and the method that has also adopted depth image to combine with coloured image, is positioned at the foremost of human body but its prerequisite is supposition staff.In addition, also someone has proposed to have adopted similar approach, but its require first RGB camera and Depth camera are demarcated, this has increased complicacy and the triviality of algorithm.

Summary of the invention

Technical matters to be solved by this invention is to be to overcome the various defects that exist in gesture detecting method above-mentioned, a kind of gesture detecting method based on RGB-D image is provided, it can be partitioned into staff region effectively, have and cut apart accurately, even hand occur part from block or background in the Hand Gesture Segmentation that also can obtain while having other people to disturb, and algorithm robustness is good.

For solving the problems of the technologies described above, the invention provides a kind of gesture detecting method based on RGB-D image, it comprises:

The first step, obtains RGB-D image;

Second step is cut apart hand from background;

The 3rd step, identification gesture;

The 4th step, the optimum segmentation of searching gesture.

The described first step is specially utilizes depth transducer to obtain coloured image (RGB Image) stream and depth image (Depth Image) stream, be RGB-D image data stream, and the image that converts thereof into a frame frame is so that follow-up image processing.

Described second step is specially the pixel ratio by skeletal graph and depth image, and hand position is mapped to depth image, utilizes depth image information that hand is cut apart from background.

Described the 3rd step is specially the images of gestures of utilizing convex function optimization to cut apart RGB-D, thereby identifies rapidly and accurately gesture.

Described the 4th step is specially utilizes minimization function and function constraint thereof, solves model by Split Bregman fast algorithm, and RGB-D image is found to optimum segmentation.

Beneficial effect of the present invention:

The gesture detecting method of RGB-D image provided by the invention can be partitioned into staff region effectively, have and cut apart accurately, even hand occur part from block or background in the Hand Gesture Segmentation that also can obtain while having other people to disturb, and algorithm robustness is good.

Reference numeral

Fig. 1 a-1e is based on coloured image/depth image/RGB-D image segmentation result; Wherein, Fig. 1 a coloured image; Fig. 1 b depth image; Fig. 1 c color images result; The segmentation result of Fig. 1 d depth image; Fig. 1 e RGB-D image segmentation result;

Fig. 2 a-2e is based on coloured image/depth image/RGB-D image segmentation result in another kind of situation; Wherein, Fig. 2 a coloured image; Fig. 2 b depth image; Fig. 2 c color images result; The segmentation result of Fig. 2 d depth image; Fig. 2 e RGB-D image segmentation result.

Embodiment

The invention provides a kind of gesture detecting method based on RGB-D image, it comprises:

The first step, obtains RGB-D image;

Second step is cut apart hand from background;

The 3rd step, identification gesture;

The 4th step, the optimum segmentation of searching gesture.

Utilize depth transducer can obtain depth image and RGB color image data simultaneously, can support real-time whole body and bone to follow the trail of, can identify a series of attitude, action, utilize in this application it to obtain gesture data information simultaneously.

The object of gestures detection is from original image, effectively to cut apart hand region, namely the staff region (prospect) in image and other (background area) made a distinction, and be very important element task of gesture identification.Depth transducer has analysis depth data and surveys the function of human body or player's profile.By it can obtain that color and depth data flow and the image that converts thereof into a frame frame so that follow-up image processing.To the image of input, require RGB image in pixel, to align and time synchronized with Depth depth image.Having obtained the image that meets above-mentioned condition to rear, input picture is carried out to pre-service, as filtering etc., reach the object that suppresses noise.

Coloured image and depth image can be used for carrying out Hand Gesture Segmentation.The advantage of coloured image is clear, but it only comprises two-dimensional signal, and anti-interference is more weak.And depth image does not have cromogram image height in resolution, but it has comprised three-dimensional information, and strong interference immunity.Because skeletal graph can be followed the trail of the coordinate position of human hands, be therefore easy to determine the particular location of hand in skeletal graph.Then by the pixel ratio of skeletal graph and depth image, hand position is mapped to depth image, utilizes depth image information that hand is cut apart from background.Because depth image resolution is low and be subject to the interference of depth value same object, the effect of cutting apart is unsatisfactory.Therefore, the detection method in conjunction with depth image and coloured image has been proposed in this application.

For cutting apart optimizing process, the image that we define this problem is divided into a minimized functional:

E(u)=∫ _Ωf(x)u(x)dx+∫ _Ω|Du(x)| (1)

Wherein, u ∈ BV (IR ^d; 0,1}) and be the bounded variation of a binary function on indicator function, u=1 and u=0 are illustrated in the inside and outside of surperficial IRd, cut apart one group of closed boundary or the one group of occluding surface in three-dimensional segmentation situation in situation at two dimensional image.In formula (1), Part II is total variation.Wherein Du represents derivative of a distribution, and differentiable function u is summed up as

by lax scale-of-two constraint, the value of function u is between 0 and 1.This optimization problem becomes the (IR at convex set BV ^d; [0,1]) in try to achieve and minimize protruding formula (1).

By protruding optimization and threshold value, the form of functional is spatially set continuously, can realize global optimization.This thresholding theorem guarantees that solution u* resolution problem keeps global optimum to original binary mark problem.The global minimum of computing formula (1) is as follows: at convex set BV (IRd; [0,1]), when any value of θ ∈ (0,1), global minimum u* and the threshold value that is greater than minimum value u* in computing formula (1).

Due to from RGB-D Image Acquisition to extra depth information, so boundary length can be in absolute codomain | Du (x) | rather than measure in image area d (x).Functional (1) can be generalized to:

E(u)=∫ _Ωf(x)u(x)dx+∫ _Ωd(x)|Du(x)| (2)

Depth value d: Ω → IR, formula (2) has compensated the ill effect (due to perspective projection, object is far away, and less image appears in camera) causing in operating process.

For the function constraint of RGB-D image, how we affect embedded meeting point corresponding to protruding majorized function by these constraint conditions of explanation by the square that utilizes depth information to retrain to cut apart simultaneously.We are with being defined in B=BV (Ω; [0,1]) convex function represent to be defined in whole image-region

bounded variation Closing Binary Marker function.Area-constrained: the shape of the corresponding region u of 0 rank square, can pass through formula (3) and calculate

Area(u)：=∫ _Ωd ²(x)u(x)dx (3)

Wherein d (x) has provided the degree of depth of pixel x.Suppose d (x)=KD (x), K is the focal length of camera, and D (x) is the degree of depth of the pixel measured.Make d ²(x) be the size of corresponding pixel projection in 3d space, the space of entirety is the view field in surface area rather than image.Adopt (Grenander, U., Chow with document, Y., Keenan, D.M.:Hands:A Pattern Theoretic Study of Biological Shapes.Springer, New York (1991)) method, process in the same way all pixels.

The absolute area of shape u is limited in constant c ₁≤ c ₂between, realize by retrain u in formula (4) set:

C ₀={u∈β|c ₁≤Area(u)≤c ₂}

(4)

Set C ₀be linearly dependent on u, therefore protruding constant c ₂>=c ₁>=0.

Conventionally, by c is set ₁=c ₂or the region that applies the upper bound and lower bound determines area accurately, or apply a soft range constraint, promote functional (1) by formula (5) as follows:

E _total(u)=E(u)+λ(∫d ²udx-c) ² (5)

Formula (5) increases soft-constraint weight λ > 0, makes the area shape of estimating approach c >=0.Formula (5) is also convex function.

Described Split Bregman fast algorithm is specially and maximizes a likelihood function is of equal value with the natural logarithm that maximizes it.First the application is applied to Split method during RGB-D image cuts apart, and sets up a following universal model:

\min_{ω, u &Element; {0,1}} {E (ω, u) = α_{1} {&Integral;}_{Ω} Q_{1} (x, ω_{1}) udxdy + α_{2} {&Integral;}_{Ω} Q_{2} (x, ω_{2}) (1 - u) dxdy + γ {&Integral;}_{Ω} | &dtri; u | dxdy} - - - (7)

Wherein Q _i=-lnP _i, i=1,2, ω=(μ, σ)=Max (p _i), i=1,2, u is that Closing Binary Marker function is used for representing curvilinear motion.

The application is incorporated into Split Bregman algorithm idea in the universal model that RGB-D image cuts apart, and on the basis of Split method, first introduces division variable w=[w ₁, w ₂] ^t, then introduce Bregman distance b=(b ₁, b ₂) ^t, the functional extreme value problem of formula (7) is converted into:

b^{k + 1} = b^{k} + &dtri; u^{k} - w^{k} - - - (8)

(u^{k + 1}, w^{k + 1}) = \arg \min_{w, φ &Element; [0,1]} {E (u, w) = γ {&Integral;}_{Ω} | w | dxdy + \frac{u}{2} {&Integral;}_{Ω} (w - &dtri; u - b^{k + 1}) dx + {&Integral;}_{Ω} r (u_{1}, u_{2}) udxdy} - - - (9)

Wherein r (u ₁, u ₂)=α ₁q ₁(x, ω ₁)-α ₂q ₂(x, ω ₂).Formula (9), for the energy functional of two variablees being asked to the problem of extreme value, conventionally adopts alternately and optimizes and realize.First, suppose that w is constant, the problems referred to above are converted into asks extreme-value problem to u:

\min_{u} E (u) = \frac{θ}{2} {&Integral;}_{Ω} (w - &dtri; u - b^{k + 1}) dxdy + {&Integral;}_{Ω} r (u_{1}, u_{2}) udxdy - - - (10)

Then, suppose that u is constant, solve the extreme-value problem about w:

\min_{w} E (w) = γ {&Integral;}_{Ω} | w | dxdy + \frac{θ}{2} {&Integral;}_{Ω} {(w - &dtri; u - b^{k + 1})}^{2} dxdy - - - (11)

Can be obtained the Euler-Lagrange equation of energy functional (10) by variational method:

\{\begin{matrix} r (u_{1}, u_{2}) - θ &dtri; \cdot (&dtri; u + b^{k + 1} - w^{k}) = 0 & inΩ \\ (&dtri; u + b^{k + 1} - w^{k}) \cdot \overset{r}{n} = 0 & on &PartialD; Ω \end{matrix} - - - (12)

Formula (12) can adopt your iteration mechanism of quick Gauss's Saden to solve.Because the span of u after the protruding relaxing techniques of employing is [0,1], so need to adopt following projection pattern that u is tied within the scope of this:

u ^k+1=Max(Min(u ^k+1，1)，0) (13)

Solve after energy functional (10), then solved energy functional (11).The Euler-Lagrange equation of formula (11) is:

w = &dtri; u^{k + 1} + b^{k + 1} - \frac{γ}{θ} \frac{w}{| w |} - - - (14)

Obtain its analytic solution by broad sense soft-threshold formula, its form is:

w^{k + 1} = Max (| &dtri; u^{k + 1} + b^{k + 1} | - \frac{γ}{θ}, 0) \frac{&dtri; u^{k + 1} + b^{k + 1}}{| &dtri; u^{k + 1} + b^{k + 1} |} - - - (15)

Below adopt embodiment to describe embodiments of the present invention in detail, to the present invention, how application technology means solve technical matters whereby, and the implementation procedure of reaching technique effect can fully understand and implement according to this.

The present invention has shown the experiment comparing result of this method and other method.Test dividing method is demonstrated by Fig. 1 and two scenes of Fig. 2, and experiment is intended to cut apart individual gesture from crowd.As can be seen from the figure be better than cutting apart based on color image or depth image separately based on RGB-D Hand Gesture Segmentation.As shown in Fig. 1 (c), when only utilizing RGB color image information algorithm to be partitioned into staff, face and part wall information, fail to be partitioned into the gesture needing.Shown in Fig. 1 (d), while only utilizing depth image information, staff and the human body parts identical with the staff degree of depth are out divided.As can be seen here, in the time only considering a kind of in above-mentioned two situations, segmentation effect is all undesirable.As shown in Fig. 1 (e), in the time considering RGB and depth information, during based on RGB-D image information, the Region Segmentation of staff, by independent splitting, is cut apart difficult problem and is resolved simultaneously.Under complicated scene, the application's algorithm also has good robustness, as shown in Figure 2.In scene, add the new personage in different depth, also can well be partitioned in this case target gesture.

All above-mentioned these intellecture properties of primary enforcement, do not set restriction this new product of other forms of enforcement and/or new method.Those skilled in the art will utilize this important information, and foregoing is revised, to realize similar implementation status.But all modifications or transformation belong to the right of reservation based on new product of the present invention.

The above, be only preferred embodiment of the present invention, is not the restriction of the present invention being made to other form, and any those skilled in the art may utilize the technology contents of above-mentioned announcement to be changed or be modified as the equivalent embodiment of equivalent variations.But every technical solution of the present invention content that do not depart from, any simple modification, equivalent variations and the remodeling above embodiment done according to technical spirit of the present invention, still belong to the protection domain of technical solution of the present invention.

Claims

1. the gesture detecting method based on RGB-D image, is characterized in that, comprising:

The first step, obtains RGB-D image;

Second step is cut apart hand from background;

The 3rd step, identification gesture;

The 4th step, the optimum segmentation of searching gesture.

2. gesture detecting method as claimed in claim 1, it is characterized in that: the described first step is specially utilizes depth transducer to obtain coloured image (RGB Image) stream and depth image (Depth Image) stream, be RGB-D image data stream, and the image that converts thereof into a frame frame is so that follow-up image processing.

3. gesture detecting method as claimed in claim 1 or 2, is characterized in that: described second step is specially the pixel ratio by skeletal graph and depth image, and hand position is mapped to depth image, utilizes depth image information that hand is cut apart from background.

4. the gesture detecting method as described in claims 1 to 3, is characterized in that: described the 3rd step is specially the images of gestures of utilizing convex function optimization to cut apart RGB-D, thereby identifies rapidly and accurately gesture.

5. the gesture detecting method as described in claim 1 to 4, is characterized in that: described the 4th step is specially utilizes minimization function and function constraint thereof, solves model by Split Bregman fast algorithm, and RGB-D image is found to optimum segmentation.