Summary of the invention
Technical matters to be solved by this invention is to be to overcome the various defects that exist in gesture detecting method above-mentioned, a kind of gesture detecting method based on RGB-D image is provided, it can be partitioned into staff region effectively, have and cut apart accurately, even hand occur part from block or background in the Hand Gesture Segmentation that also can obtain while having other people to disturb, and algorithm robustness is good.
For solving the problems of the technologies described above, the invention provides a kind of gesture detecting method based on RGB-D image, it comprises:
The first step, obtains RGB-D image;
Second step is cut apart hand from background;
The 3rd step, identification gesture;
The 4th step, the optimum segmentation of searching gesture.
The described first step is specially utilizes depth transducer to obtain coloured image (RGB Image) stream and depth image (Depth Image) stream, be RGB-D image data stream, and the image that converts thereof into a frame frame is so that follow-up image processing.
Described second step is specially the pixel ratio by skeletal graph and depth image, and hand position is mapped to depth image, utilizes depth image information that hand is cut apart from background.
Described the 3rd step is specially the images of gestures of utilizing convex function optimization to cut apart RGB-D, thereby identifies rapidly and accurately gesture.
Described the 4th step is specially utilizes minimization function and function constraint thereof, solves model by Split Bregman fast algorithm, and RGB-D image is found to optimum segmentation.
Beneficial effect of the present invention:
The gesture detecting method of RGB-D image provided by the invention can be partitioned into staff region effectively, have and cut apart accurately, even hand occur part from block or background in the Hand Gesture Segmentation that also can obtain while having other people to disturb, and algorithm robustness is good.
Reference numeral
Fig. 1 a-1e is based on coloured image/depth image/RGB-D image segmentation result; Wherein, Fig. 1 a coloured image; Fig. 1 b depth image; Fig. 1 c color images result; The segmentation result of Fig. 1 d depth image; Fig. 1 e RGB-D image segmentation result;
Fig. 2 a-2e is based on coloured image/depth image/RGB-D image segmentation result in another kind of situation; Wherein, Fig. 2 a coloured image; Fig. 2 b depth image; Fig. 2 c color images result; The segmentation result of Fig. 2 d depth image; Fig. 2 e RGB-D image segmentation result.
Embodiment
The invention provides a kind of gesture detecting method based on RGB-D image, it comprises:
The first step, obtains RGB-D image;
Second step is cut apart hand from background;
The 3rd step, identification gesture;
The 4th step, the optimum segmentation of searching gesture.
The described first step is specially utilizes depth transducer to obtain coloured image (RGB Image) stream and depth image (Depth Image) stream, be RGB-D image data stream, and the image that converts thereof into a frame frame is so that follow-up image processing.
Utilize depth transducer can obtain depth image and RGB color image data simultaneously, can support real-time whole body and bone to follow the trail of, can identify a series of attitude, action, utilize in this application it to obtain gesture data information simultaneously.
The object of gestures detection is from original image, effectively to cut apart hand region, namely the staff region (prospect) in image and other (background area) made a distinction, and be very important element task of gesture identification.Depth transducer has analysis depth data and surveys the function of human body or player's profile.By it can obtain that color and depth data flow and the image that converts thereof into a frame frame so that follow-up image processing.To the image of input, require RGB image in pixel, to align and time synchronized with Depth depth image.Having obtained the image that meets above-mentioned condition to rear, input picture is carried out to pre-service, as filtering etc., reach the object that suppresses noise.
Described second step is specially the pixel ratio by skeletal graph and depth image, and hand position is mapped to depth image, utilizes depth image information that hand is cut apart from background.
Coloured image and depth image can be used for carrying out Hand Gesture Segmentation.The advantage of coloured image is clear, but it only comprises two-dimensional signal, and anti-interference is more weak.And depth image does not have cromogram image height in resolution, but it has comprised three-dimensional information, and strong interference immunity.Because skeletal graph can be followed the trail of the coordinate position of human hands, be therefore easy to determine the particular location of hand in skeletal graph.Then by the pixel ratio of skeletal graph and depth image, hand position is mapped to depth image, utilizes depth image information that hand is cut apart from background.Because depth image resolution is low and be subject to the interference of depth value same object, the effect of cutting apart is unsatisfactory.Therefore, the detection method in conjunction with depth image and coloured image has been proposed in this application.
Described the 3rd step is specially the images of gestures of utilizing convex function optimization to cut apart RGB-D, thereby identifies rapidly and accurately gesture.
For cutting apart optimizing process, the image that we define this problem is divided into a minimized functional:
E(u)=∫
Ωf(x)u(x)dx+∫
Ω|Du(x)| (1)
Wherein, u ∈ BV (IR
d; 0,1}) and be the bounded variation of a binary function on indicator function, u=1 and u=0 are illustrated in the inside and outside of surperficial IRd, cut apart one group of closed boundary or the one group of occluding surface in three-dimensional segmentation situation in situation at two dimensional image.In formula (1), Part II is total variation.Wherein Du represents derivative of a distribution, and differentiable function u is summed up as
by lax scale-of-two constraint, the value of function u is between 0 and 1.This optimization problem becomes the (IR at convex set BV
d; [0,1]) in try to achieve and minimize protruding formula (1).
By protruding optimization and threshold value, the form of functional is spatially set continuously, can realize global optimization.This thresholding theorem guarantees that solution u* resolution problem keeps global optimum to original binary mark problem.The global minimum of computing formula (1) is as follows: at convex set BV (IRd; [0,1]), when any value of θ ∈ (0,1), global minimum u* and the threshold value that is greater than minimum value u* in computing formula (1).
Due to from RGB-D Image Acquisition to extra depth information, so boundary length can be in absolute codomain | Du (x) | rather than measure in image area d (x).Functional (1) can be generalized to:
E(u)=∫
Ωf(x)u(x)dx+∫
Ωd(x)|Du(x)| (2)
Depth value d: Ω → IR, formula (2) has compensated the ill effect (due to perspective projection, object is far away, and less image appears in camera) causing in operating process.
Described the 4th step is specially utilizes minimization function and function constraint thereof, solves model by Split Bregman fast algorithm, and RGB-D image is found to optimum segmentation.
For the function constraint of RGB-D image, how we affect embedded meeting point corresponding to protruding majorized function by these constraint conditions of explanation by the square that utilizes depth information to retrain to cut apart simultaneously.We are with being defined in B=BV (Ω; [0,1]) convex function represent to be defined in whole image-region
bounded variation Closing Binary Marker function.Area-constrained: the shape of the corresponding region u of 0 rank square, can pass through formula (3) and calculate
Area(u):=∫
Ωd
2(x)u(x)dx (3)
Wherein d (x) has provided the degree of depth of pixel x.Suppose d (x)=KD (x), K is the focal length of camera, and D (x) is the degree of depth of the pixel measured.Make d
2(x) be the size of corresponding pixel projection in 3d space, the space of entirety is the view field in surface area rather than image.Adopt (Grenander, U., Chow with document, Y., Keenan, D.M.:Hands:A Pattern Theoretic Study of Biological Shapes.Springer, New York (1991)) method, process in the same way all pixels.
The absolute area of shape u is limited in constant c
1≤ c
2between, realize by retrain u in formula (4) set:
C
0={u∈β|c
1≤Area(u)≤c
2}
(4)
Set C
0be linearly dependent on u, therefore protruding constant c
2>=c
1>=0.
Conventionally, by c is set
1=c
2or the region that applies the upper bound and lower bound determines area accurately, or apply a soft range constraint, promote functional (1) by formula (5) as follows:
E
total(u)=E(u)+λ(∫d
2udx-c)
2 (5)
Formula (5) increases soft-constraint weight λ > 0, makes the area shape of estimating approach c >=0.Formula (5) is also convex function.
Described Split Bregman fast algorithm is specially and maximizes a likelihood function is of equal value with the natural logarithm that maximizes it.First the application is applied to Split method during RGB-D image cuts apart, and sets up a following universal model:
Wherein Q
i=-lnP
i, i=1,2, ω=(μ, σ)=Max (p
i), i=1,2, u is that Closing Binary Marker function is used for representing curvilinear motion.
The application is incorporated into Split Bregman algorithm idea in the universal model that RGB-D image cuts apart, and on the basis of Split method, first introduces division variable w=[w
1, w
2]
t, then introduce Bregman distance b=(b
1, b
2)
t, the functional extreme value problem of formula (7) is converted into:
Wherein r (u
1, u
2)=α
1q
1(x, ω
1)-α
2q
2(x, ω
2).Formula (9), for the energy functional of two variablees being asked to the problem of extreme value, conventionally adopts alternately and optimizes and realize.First, suppose that w is constant, the problems referred to above are converted into asks extreme-value problem to u:
Then, suppose that u is constant, solve the extreme-value problem about w:
Can be obtained the Euler-Lagrange equation of energy functional (10) by variational method:
Formula (12) can adopt your iteration mechanism of quick Gauss's Saden to solve.Because the span of u after the protruding relaxing techniques of employing is [0,1], so need to adopt following projection pattern that u is tied within the scope of this:
u
k+1=Max(Min(u
k+1,1),0) (13)
Solve after energy functional (10), then solved energy functional (11).The Euler-Lagrange equation of formula (11) is:
Obtain its analytic solution by broad sense soft-threshold formula, its form is:
Below adopt embodiment to describe embodiments of the present invention in detail, to the present invention, how application technology means solve technical matters whereby, and the implementation procedure of reaching technique effect can fully understand and implement according to this.
The present invention has shown the experiment comparing result of this method and other method.Test dividing method is demonstrated by Fig. 1 and two scenes of Fig. 2, and experiment is intended to cut apart individual gesture from crowd.As can be seen from the figure be better than cutting apart based on color image or depth image separately based on RGB-D Hand Gesture Segmentation.As shown in Fig. 1 (c), when only utilizing RGB color image information algorithm to be partitioned into staff, face and part wall information, fail to be partitioned into the gesture needing.Shown in Fig. 1 (d), while only utilizing depth image information, staff and the human body parts identical with the staff degree of depth are out divided.As can be seen here, in the time only considering a kind of in above-mentioned two situations, segmentation effect is all undesirable.As shown in Fig. 1 (e), in the time considering RGB and depth information, during based on RGB-D image information, the Region Segmentation of staff, by independent splitting, is cut apart difficult problem and is resolved simultaneously.Under complicated scene, the application's algorithm also has good robustness, as shown in Figure 2.In scene, add the new personage in different depth, also can well be partitioned in this case target gesture.
All above-mentioned these intellecture properties of primary enforcement, do not set restriction this new product of other forms of enforcement and/or new method.Those skilled in the art will utilize this important information, and foregoing is revised, to realize similar implementation status.But all modifications or transformation belong to the right of reservation based on new product of the present invention.
The above, be only preferred embodiment of the present invention, is not the restriction of the present invention being made to other form, and any those skilled in the art may utilize the technology contents of above-mentioned announcement to be changed or be modified as the equivalent embodiment of equivalent variations.But every technical solution of the present invention content that do not depart from, any simple modification, equivalent variations and the remodeling above embodiment done according to technical spirit of the present invention, still belong to the protection domain of technical solution of the present invention.