CN110363788A

CN110363788A - A kind of video object track extraction method and device

Info

Publication number: CN110363788A
Application number: CN201910505008.6A
Authority: CN
Inventors: 蔡昭权; 蔡映雪; 陈伽; 胡松; 黄思博; 李慧; 胡辉; 陈明阳
Original assignee: Huizhou University
Current assignee: Huizhou University
Priority date: 2018-09-26
Filing date: 2019-06-11
Publication date: 2019-10-22
Also published as: CN110335288A; CN110659562A; CN110516534A; WO2020062899A1; WO2020063189A1; WO2020062898A1; WO2020063321A1; WO2020063436A1; CN110378867A

Abstract

The disclosure discloses a kind of video object track extraction method and device, transparency estimated value is recalculated to obtain the first transparency mask of the first image by measuring the confidence level of prospect background pixel pair first, then new picture is generated by superposition grayscale information and obtain the second transparency mask of the first image, and further correct the first transparency mask of the first image, finally the foreground target of frame image a certain in video is extracted using the modified first transparency mask, and specified target is retrieved in all foreground targets, then according to the time sequencing of all frames, the track of the specified target is generated according to all images for including the specified target.The disclosure can comprehensively utilize the confidence level and grayscale information of prospect background pixel pair in a certain frame image in video, provide a kind of scheme of new video object trajectory extraction.

Description

A kind of video object track extraction method and device

Technical field

The disclosure belongs to field of image processing, in particular to a kind of video object track extraction method and device.

Background technique

In safety-security area, there is the demand of many trajectory extractions to set the goal for video middle finger.

However, in the prior art, although there are the schemes of enough extraction video object tracks, on how to benefit With prospect background pixel to and grayscale information go forward side by side onestep extraction target trajectory to extract video foreground target, there has been no related new The implementation method of grain husk.

Summary of the invention

Present disclose provides a kind of video object track extraction methods, include the following steps:

S100 divides all foreground pixel set F, the had powerful connections picture in the image for the first image in video Element set B and all unknown pixel set Z；Wherein, the first image is a certain frame image extracted from the video；

S200 gives certain prospect background pixels to (F_i, B_j), each unknown pixel Z is measured according to the following formula_kIt is saturating Lightness

Wherein, I_kFor unknown pixel Z_kRGB color value, the foreground pixel F_iFor apart from unknown pixel Z_kNearest m Foreground pixel, the background pixel B_jAlso for apart from unknown pixel Z_kM nearest background pixel, the prospect background pixel pair (F_i, B_j) amount to m²Group；

S300, for the m²Each group of prospect background pixel in group is to (F_i, B_j) and its it is correspondingAccording to as follows Formula measures prospect background pixel to (F_i, B_j) confidence level n_ij:

Wherein, σ value 0.1, and choose the highest MAX (n of confidence level_ij) corresponding to that group prospect background pixel to for (F_iMAX, B_jMAX)；

S400 calculates each unknown pixel Z according to the following formula_kTransparency estimated value

S500, according to each unknown pixel Z_kTransparency estimated valuePrimarily determine the first of the first image Transparency mask；

S600, to the first image superposition grayscale information to generate the second image, and it is all to divide it to second image Foreground pixel set, all background pixel set and all unknown pixel set；

S700 executes step S200 to S500 for second image, to determine that the first transparency of the second image hides Cover, and using the first transparency mask of second image as the second transparency mask of the first image；

S800, using the second transparency mask of the first image, the first transparency for correcting the first image is hidden Cover；

S900 corrects the first transparency mask of resulting first image according to step S800, to the first of the video Foreground target in image extracts, and specified target is retrieved in all foreground targets, then according to the time of all frames Sequentially, the track of the specified target is generated according to all images for including the specified target.

In addition, the disclosure further discloses a kind of video object trajectory extraction device, comprising:

First division module, is used for: for the first image in video, dividing all foreground pixel set in the image F, all background pixel set B and all unknown pixel set Z；Wherein, the first image is extracted from the video A certain frame image；

First metric module, is used for: giving certain prospect background pixels to (F_i, B_j), measurement is each not according to the following formula Know pixel Z_kTransparency

Second metric module, is used for: for the m²Each group of prospect background pixel in group is to (F_i, B_j) and its it is corresponding 'sMeasurement prospect background pixel is to (F according to the following formula_i, B_j) confidence level n_ij:

Computing module is used for: calculating each unknown pixel Z according to the following formula_kTransparency estimated value

Determining module is used for: according to each unknown pixel Z_kTransparency estimated valuePrimarily determine described first First transparency mask of image；

Second division module, is used for: to the first image superposition grayscale information to generate the second image, and to second figure As dividing its all foreground pixel set, all background pixel set and all unknown pixel set；

Calling module again is used for: being directed to second image, is called first metric module, the second measurement again Module, computing module and determining module, to determine the first transparency mask of the second image, and by the first of second image Second transparency mask of the transparency mask as the first image；

Correction module is used for: using the second transparency mask of the first image, correcting the first of the first image Transparency mask；

Extraction module is used for: according to the first transparency mask of resulting first image of correction module, to the video Foreground target in first image extracts, and specified target is retrieved in all foreground targets, then according to all frames Time sequencing generates the track of the specified target according to all images for including the specified target.

By the method and device, the disclosure is capable of the confidence level and gray scale letter of Prospects of Comprehensive Utilization background pixel pair Breath provides a kind of scheme of new video object trajectory extraction.

Detailed description of the invention

Fig. 1 is the schematic diagram of one embodiment the method in the disclosure；

Fig. 2 is the schematic diagram of another embodiment described device in the disclosure.

Specific embodiment

In order to make those skilled in the art understand that technical solution disclosed by the disclosure, below in conjunction with embodiment and related The technical solution of each embodiment is described in attached drawing, and described embodiment is a part of this disclosure embodiment, without It is whole embodiments.Term " first " used by the disclosure, " second " etc. rather than are used for for distinguishing different objects Particular order is described.In addition, " comprising " and " having " and their any deformation, it is intended that covering and non-exclusive packet Contain.Such as contain the process of a series of steps or units or method or system or product or equipment are not limited to arrange Out the step of or unit, but optionally further include the steps that not listing or unit, or further includes optionally for these mistakes Other intrinsic step or units of journey, method, system, product or equipment.

Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments It is contained at least one embodiment of the disclosure.Each position in the description occur the phrase might not each mean it is identical Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.It will be appreciated by those skilled in the art that , embodiment described herein can combine with other embodiments.

Show referring to the process that Fig. 1, Fig. 1 are a kind of video object track extraction methods that one embodiment provides in the disclosure It is intended to.As shown, described method includes following steps:

It is understood that there are many means for dividing foreground pixel, background pixel and unknown pixel to image, can be artificial Mark, can also by way of machine learning or data-driven, can also be according to corresponding prospect threshold value, background threshold come Mark off all foreground and background pixels and its corresponding set；Once foreground and background pixel is divided out, unknown pixel, It corresponds to set also with regard to being divided out naturally；

In addition, the first image may is that when video is played when extracting to video foreground target, ring Current video broadcasting should be suspended, and carry out the interception of present frame immediately to pause picture, in the operation of user to obtain Obtain the first image；The first image is also possible to: when video is there is no being played, in response to the operation of user, choosing at random The a certain frame or a few frames in video are selected, using wherein a certain frame image as the first image.In any case, it is to be understood that the party The foreground target that method can be used for each frame image in video extracts.Preferably, the first image is first frame in video Image.

To those skilled in the art, theoretically, the selection of m can make corresponding prospect background pixel to being Part sample, can also be with exhaustive whole image；For step S200, it is intended to the color and prospect background by unknown pixel The color relationship of pixel pair estimates the transparency of unknown pixel；In addition, the selection of m can also further combined with neighborhood territory pixel with Between unknown pixel color, texture, gray scale, brightness, in terms of feature；

It is understood that the value of σ is empirical value or statistical value or simulation value, step S300 is further screened using confidence level Prospect background pixel pair, and for subsequent step by the prospect background pixel further screened to estimating that unknown pixel is transparent Degree；

This is to say, after the transparency estimated value of each unknown pixel obtains, the present embodiment with regard to primarily determining naturally First transparency mask of the first image；Why say be naturally, be because transparency mask can be considered as by By selected those respective pixels composition of certain value (or value range)；

For the step, the present embodiment is contemplated that gray scale is believed in view of each pixel is in addition to the effect of RGB color Cease the influence to pixel；Therefore, after being superimposed grayscale information, transparency mask is modified using following steps.

So far, the confidence level and grayscale information of disclosure Prospects of Comprehensive Utilization background pixel pair provides a kind of new video The scheme that target trajectory extracts.It is understood that in this scenario, the extraction of video foreground target therein is one and infinitely forces Close process, due to the transition of color, gray scale in the image frame of video, it's hard to say some way transparency obtained Mask is unique correct.Theoretically, above-described embodiment fusion more information, consideration are more multifactor, are conducive to more comprehensive Image in video is investigated, to extract relatively satisfied video foreground target.It is understood that in above-described embodiment In, it, can also be with when being extracted according to the foreground target in the first image of the first transparency mask to the video It uses for reference, comprehensive related means in the prior art.That is, the key of above-described embodiment is to obtain how in new ways Transparency mask and the trajectory extraction for implementing final specified target, and how do not lie according to transparency mask extraction video foreground Target.

In another embodiment, the specified target derives from a photo, or derives from an image data base.Example Such as, specified target is suspect, and photo is the recent photograph of the suspect, and image data base is wanted circular personnel image data Library.

In another embodiment, further include following steps after the step S900:

S1000 extracts remaining each frame image from the video, and respectively as the first image, weight Abovementioned steps S100 to S900 is executed, again to extract all foreground targets of the video；Or

S1100 extracts remaining each frame image from the video, respectively as the first image, and root It is divided according to the first transparency mask of revised first image of previous frame all in the first image corresponding to the present frame Foreground pixel set F_c, all background pixel set B_CWith all unknown pixel set Z_C, repeat abovementioned steps S200 extremely S900, to extract all foreground targets of the video, wherein before dividing all in the first image corresponding to the present frame Scape pixel set F_c, all background pixel set B_CWith all unknown pixel set Z_CSpecifically includes the following steps:

First transparency mask of revised first image of previous frame is carried out binaryzation by S11001, and threshold value takes 0.5, Obtain the first bianry image of foreground target；

S11002, using the first bianry image as the second bianry image initial value；

S11003 carries out morphological erosion operation to the second bianry image using the circular configuration element that size is 3x3, and The second bianry image is updated with the result of acquisition:

S11004 is repeated step S1003 five times；

S11005, using the first bianry image as third bianry image initial value；

S11006 carries out morphological dilation to third bianry image using the circular configuration element that size is 3x3, and Third bianry image is updated with the result of acquisition:

S11007 is repeated step S1006 five times；

S11008 will be genuine respective pixel in the second bianry image as all foreground pixel set F_c, by the three or two It is false respective pixel in value image as all background pixel set B_C, rest of pixels is as all unknown pixel set Z_C。

It is understood that repeating above-mentioned steps S100 to S900 to each frame image in video, view will be extracted All foreground targets in frequency.But, it is contemplated that often each frame image and its a later frame image have in picture video pictures Continuity and similitude in appearance, therefore, in order to make full use of this continuity and similitude, above-described embodiment can also be with The institute in the first image corresponding to present frame is divided according to the first transparency mask of revised first image of previous frame There is foreground pixel set F_c, all background pixel set B_CWith all unknown pixel set Z_C, so as in the essence of image procossing Balance is obtained between degree and efficiency；That is, the embodiment has the characteristic inherited: inherit the transparent of former frame Mask is spent, and divides foreground pixel set, background pixel set and the unknown pixel collection of a later frame using the transparency mask Close, in view of the continuity and similitude on image content, therefore this division not only in accordance with the transparency mask of former frame and And the means of morphological erosion and morphological dilations are utilized, this belongs to an innovative point of the disclosure.

In another embodiment, in step S600, in the following way to the first image superposition grayscale information to generate Second image:

S601 carries out mean filter to the first image and obtains third image；

S602, the first image and third image generate the second image by following formula:

Wherein, IM₂Indicate the gray value of k-th of pixel on the second image after being superimposed, x_rIndicate k-th of picture on the first image Plain x_kNeighborhood territory pixel, N_kIt indicates with x_kCentered on neighborhood in number of pixels,Indicate to the first image into The pixel value of k-th of pixel, β take 0.5 on the resulting third image of row mean filter.

The mode of specific superposition grayscale information is given by empirical value and related formula for above-described embodiment.

In another embodiment, step S800 further include:

S801 is found respectively according to the first transparency mask of the second transparency mask of the first image and the first image The edge at the edge of its second transparency mask, the first transparency mask；

S802 obtains the position of all pixels at the edge of the second transparency mask and the edge of the first transparency mask All pixels position, and determine position and the first transparency mask of all pixels at the edge of the second transparency mask The region that the position of all pixels at edge is overlapped, and then determine the identical pixel Z in position_sp；

S803 searches pixel Z respectively_spThe transparency estimated value of the first transparency mask corresponding to the first image and right It should be in the transparency estimated value of the second transparency mask of the first image, and using the average value of the two as pixel Z_spIt is revised Transparency estimated value；

S804, with pixel Z_spRevised transparency estimated value corrects the first transparency mask of the first image.

For above-described embodiment, it is intended to find, compares the identical pixel in position in two kinds of transparency masks, and utilize Transparency estimated value of the identical pixel in the position in respective transparency mask is averaged to correct the of the first image One transparency mask.

In another embodiment, the step S802 further comprises:

S8021, according to the position of all pixels at the edge of the second transparency mask of judgement and the first transparency mask Edge all pixels position be overlapped region, further determine that the different pixel Z in position_dp, including two kinds of situations: it is located at The pixel Z at the edge of the second transparency mask_dp2With the pixel Z at the edge for being located at the first transparency mask_dp1；

Unlike previous embodiment, edge that two transparency masks of the present embodiment additional attention are determined The different pixel in middle position, and find out these pixels of position different from each other；

S8022 utilizes the different pixel Z in the position_dpPixel Z identical with position_sp, obtain the second transparency mask Edge and the first transparency mask edge determined by: the closed enclosed region of institute and described between edge and edge The position of all closing pixels of enclosed region；

For the step, the edge as corresponding to each mask can be considered as a connection or closure to a certain degree Curve, then no matter closed curve corresponding to two masks is what kind of overlapping or nonoverlapping relationship: two are hidden Those of on the corresponding edge of cover for the pixel of position not corresponding (i.e. position is different or position is not overlapped), jointly really All closing pixels of the closed enclosed region of institute and the enclosed region between the edge and edge of two masks are determined Position；

S8023 executes following sub-step:

(1) pixel Z is searched_dp1Position corresponding to pixel estimate in the transparency of the first transparency mask of the first image Evaluation, and the transparence value of the corresponding pixel in the second image is searched, and using the average value of the two as pixel Z_dp1Amendment Transparency estimated value afterwards；

(2) pixel Z is searched_dp2Position corresponding to pixel estimate in the transparency of the second transparency mask of the first image Evaluation, and the transparence value of the corresponding pixel in the first image is searched, and using the average value of the two as pixel Z_dp2Amendment Transparency estimated value afterwards；

For the step, it is transparent under two different systems to be intended to find each pixel in aforementioned enclosed region Estimated value or transparence value are spent, and using the average value of the two as the revised transparency estimated value of respective pixel, then under For correcting the first transparency mask of the first image in one step S8024.That is, the present embodiment is similar to previous reality The amendment thinking for applying example is such, and what only the present embodiment solved is the common closed region of the corresponding edge institute of two masks. Wherein, with pixel Z_dp1For, belong to the pixel of the first transparency mask of the first image, first in the first image is saturating There are a transparency estimated values for lightness mask, in addition, in the second image, pixel Z_dp1In second image corresponding to position Pixel have a transparence value in the second image, the present embodiment using the average value of the transparency estimated value and transparence value as Respective pixel Z_dp1Revised transparency estimated value.Pixel Z_dp1It is similar.

S8024, in conjunction with pixel Z_dp1Revised transparency estimated value and pixel Z_dp2Revised transparency estimated value, is repaired First transparency mask of positive the first image.For example, by pixel Z_dp1Revised transparency estimated value and pixel Z_dp2It repairs Transparency estimated value after just, the transparence value as the first transparency mask corresponding position pixel.

Step in embodiment of the disclosure method can be sequentially adjusted, merged and deleted according to actual needs.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.

In addition, referring to fig. 2, the disclosure further discloses a kind of video object trajectory extraction device in another embodiment, Include:

For the embodiment, as shown in Figure 2, above-mentioned modules can be constituted with processor and memory system with Just implement；But Fig. 2 is not interfered: modules can also have processing unit with itself to realize data-handling capacity.

In another embodiment, described device further includes following module:

Successively calling module is used for: remaining each frame image is extracted from the video, and respectively as described First image, successively described in calling: the first division module, the second metric module, computing module, determines mould at the first metric module Block, the second division module, again calling module, correction module and extraction module, to extract all foreground targets of the video； Or include:

Calling module is inherited, is used for: extracting remaining each frame image from the video, respectively as described the One image, and it is input to third division module, wherein the third division module is used for according to revised first figure of previous frame First transparency mask of picture divides all foreground pixel set F in the first image corresponding to the present frame_c, had powerful connections Pixel set B_CWith all unknown pixel set Z_C；Then the succession calling module successively calls first metric module, Two metric modules, computing module, determining module, the second division module, again calling module, correction module and extraction module, with Extract all foreground targets of the video, wherein third division module includes:

First binary Images Processing unit, for carrying out the first transparency mask of revised first image of previous frame Binaryzation, threshold value take 0.5, obtain the first bianry image of foreground target；

Second bianry image initial cell, is used for: using the first bianry image as the second bianry image initial value；

Second binary Images Processing unit, is used for: using size be 3x3 circular configuration element to the second bianry image into The operation of row morphological erosion, and the second bianry image is updated with the result obtained；

First repeats call unit, calls the second two-value processing unit five times for repeating；

Third bianry image initial cell, is used for: using the first bianry image as third bianry image initial value；

Third binary Images Processing unit, is used for: using size be 3x3 circular configuration element to third bianry image into Row morphological dilation, and third bianry image is updated with the result obtained:

First repeats call unit, calls third two-value processing unit five times for repeating；

True and false division unit, is used for: will be true in the second bianry image of the second binary Images Processing unit final updating Respective pixel as all foreground pixel set F_c, by the third bianry image of third binary Images Processing unit final updating In be false respective pixel as all background pixel set B_C, rest of pixels is as all unknown pixel set Z_C。

In another embodiment, wherein the second division module further include:

Mean filter unit, is used for: carrying out mean filter to the first image and obtains third image；

Second image generation unit, is used for: the first image and third image pass through following formula the second image of generation:

In another embodiment, wherein correction module further include:

Edge cells are found, are used for: according to the second transparency mask of the first image and the first transparency of the first image Mask finds the edge of its second transparency mask, the edge of the first transparency mask respectively；

It determines position units, is used for: obtaining the position of all pixels at the edge of the second transparency mask and first transparent The position of all pixels at the edge of mask is spent, and determines the position and first of all pixels at the edge of the second transparency mask The region that the position of all pixels at the edge of transparency mask is overlapped, and then determine the identical pixel Z in position_sp；

First amending unit, is used for: searching pixel Z respectively_spThe first transparency mask corresponding to the first image it is transparent Spend estimated value, and the transparency estimated value of the second transparency mask corresponding to the first image, and using the average value of the two as Pixel Z_spRevised transparency estimated value；

Second amending unit, is used for: with pixel Z_spRevised transparency estimated value, corrects the first of the first image Transparency mask.

It is understood that described device can implement method described in one embodiment above.

In another embodiment, wherein the determining position units further comprise:

Different location subelement, is used for: according to the position of all pixels at the edge of the second transparency mask of judgement and The region that the position of all pixels at the edge of the first transparency mask is overlapped, further determines that the different pixel Z in position_dp, packet It includes: the pixel Z positioned at the edge of the second transparency mask_dp2With the pixel Z at the edge for being located at the first transparency mask_dp1；

It is closed subelement, is used for: utilizing the different pixel Z in the position_dpPixel Z identical with position_sp, obtain second Determined by the edge of the edge of transparency mask and the first transparency mask: the closed closed area of institute between edge and edge The position of all closing pixels of domain and the enclosed region；

Subelement is repeatedly searched, is used for:

(3) pixel Z is searched_dp1Position corresponding to pixel estimate in the transparency of the first transparency mask of the first image Evaluation, and the transparence value of the corresponding pixel in the second image is searched, and using the average value of the two as pixel Z_dp1Amendment Transparency estimated value afterwards；

(4) pixel Z is searched_dp2Position corresponding to pixel estimate in the transparency of the second transparency mask of the first image Evaluation, and the transparence value of the corresponding pixel first transparency mask in the first image is searched, and with the average value of the two As pixel Z_dp2Revised transparency estimated value；

Complicated revise subelemen, is used for: in conjunction with pixel Z_dp1Revised transparency estimated value and pixel Z_dp2It is revised Transparency estimated value corrects the first transparency mask of the first image.

Those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, it is involved And movement, module, unit be not necessarily essential to the invention.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.

In several embodiments provided by the disclosure, it should be understood that disclosed method is, it can be achieved that be corresponding function Energy unit, processor or even system may be distributed over multiple wherein each section of the system both can be located in one place In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.In addition, each functional unit can integrate in one processing unit, it is also possible to each unit individualism, it can also two A or more than two units are integrated in one unit.Above-mentioned integrated unit both can take the form of hardware realization, can also To realize in the form of software functional units.If the integrated unit is realized in the form of SFU software functional unit and conduct Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally The disclosed technical solution substantially all or part of the part that contributes to existing technology or the technical solution in other words It can be embodied in the form of software products, which is stored in a storage medium, including several fingers It enables and using so that a computer equipment (can be smart phone, personal digital assistant, wearable device, laptop, plate Computer) execute the disclosure each embodiment the method all or part of the steps.And storage medium above-mentioned include: USB flash disk, Read-only memory (ROM, Read-Only Memory), is moved random access memory (RAM, Random Access Memory) The various media that can store program code such as dynamic hard disk, magnetic or disk.

The above, above embodiments are only to illustrate the technical solution of the disclosure, rather than its limitations；Although referring to before Embodiment is stated the disclosure is described in detail, it should be understood by those skilled in the art that: it still can be to aforementioned each reality Technical solution documented by example is applied to modify or equivalent replacement of some of the technical features；And these modification or Person's replacement, the range for the presently disclosed embodiments technical solution that it does not separate the essence of the corresponding technical solution.

Claims

1. a kind of video object track extraction method, includes the following steps:

S100 divides all foreground pixel set F in the image, all background pixel collections for the first image in video Close B and all unknown pixel set Z；Wherein, the first image is a certain frame image extracted from the video；

S200 gives certain prospect background pixels to (F_i, B_j), each unknown pixel Z is measured according to the following formula_kTransparency

Wherein, I_kFor unknown pixel Z_kRGB color value, the foreground pixel F_iFor apart from unknown pixel Z_kM nearest prospect Pixel, the background pixel B_jAlso for apart from unknown pixel Z_kM nearest background pixel, the prospect background pixel is to (F_i, B_j) amount to m²Group；

S300, for the m²Each group of prospect background pixel in group is to (F_i, B_j) and its it is correspondingAccording to the following formula Prospect background pixel is measured to (F_i, B_j) confidence level n_ij:

Wherein, σ value 0.1, and choose the highest MAX (n of confidence level_ij) corresponding to that group prospect background pixel to for (F_iMAX, B_jMAX)；；

S500, according to each unknown pixel Z_kTransparency estimated valuePrimarily determine the first transparent of the first image Spend mask；

S600 to the first image superposition grayscale information to generate the second image, and divides its all prospect to second image Pixel set, all background pixel set and all unknown pixel set；

S700 executes step S200 to S500 for second image, to determine the first transparency mask of the second image, And using the first transparency mask of second image as the second transparency mask of the first image；

S800 corrects the first transparency mask of the first image using the second transparency mask of the first image；

S900 corrects the first transparency mask of resulting first image according to step S800, to the first image of the video In foreground target extract, and specified target is retrieved in all foreground targets, then according to the time sequencing of all frames, The track of the specified target is generated according to all images for including the specified target.

2. according to the method described in claim 1, wherein, it is preferred that further include following steps after the step S900:

S1000 extracts remaining each frame image from the video, and respectively as the first image, repetition is held Row abovementioned steps S100 to S900, to extract all foreground targets of the video；Or

S1100 extracts remaining each frame image, respectively as the first image, and according to upper from the video First transparency mask of revised first image of one frame divides all prospects in the first image corresponding to the present frame Pixel set F_c, all background pixel set B_CWith all unknown pixel set Z_C, abovementioned steps S200 to S900 is repeated, To extract all foreground targets of the video, wherein divide all prospect pictures in the first image corresponding to the present frame Plain set F_c, all background pixel set B_CWith all unknown pixel set Z_CSpecifically includes the following steps:

First transparency mask of revised first image of previous frame is carried out binaryzation by S11001, and threshold value takes 0.5, obtains First bianry image of foreground target；

S11003 carries out morphological erosion operation to the second bianry image using the circular configuration element that size is 3x3, and with obtaining The result obtained updates the second bianry image:

S11004 is repeated step S1003 five times；

S11005, using the first bianry image as third bianry image initial value；

S11006 carries out morphological dilation to third bianry image using the circular configuration element that size is 3x3, and with obtaining The result obtained updates third bianry image:

S11007 is repeated step S1006 five times；

S11008 will be genuine respective pixel in the second bianry image as all foreground pixel set F_c, by third bianry image In be false respective pixel as all background pixel set B_C, rest of pixels is as all unknown pixel set Z_C。

3. according to the method described in claim 1, wherein, in step S600, in the following way to the first image superposition gray scale Information is to generate the second image:

S601 carries out mean filter to the first image and obtains third image；

Wherein, IM₂Indicate the gray value of k-th of pixel on the second image after being superimposed, x_rIndicate k-th of pixel x on the first image_k Neighborhood territory pixel, N_kIt indicates with x_kCentered on neighborhood in number of pixels,It indicates to carry out the first image equal The pixel value of k-th of pixel, β take 0.5 on the resulting third image of value filtering.

4. according to the method described in claim 1, wherein, step S800 further include:

S801, according to the first transparency mask of the second transparency mask of the first image and the first image, find respectively its The edge at the edge of two transparency masks, the first transparency mask；

S802 obtains the institute of the position of all pixels at the edge of the second transparency mask and the edge of the first transparency mask There is the position of pixel, and determines the position of all pixels at the edge of the second transparency mask and the edge of the first transparency mask All pixels the region that is overlapped of position, and then determine the identical pixel Z in position_sp；

S803 searches pixel Z respectively_spThe transparency estimated value of the first transparency mask corresponding to the first image, and correspond to The transparency estimated value of second transparency mask of the first image, and using the average value of the two as pixel Z_spIt is revised transparent Spend estimated value；

5. according to the method described in claim 4, wherein, the step S802 further comprises:

S8021, according to the side of the position of all pixels at the edge of the second transparency mask of judgement and the first transparency mask The region that the position of all pixels of edge is overlapped, further determines that the different pixel Z in position_dp, comprising: it is located at the second transparency and hides The pixel Z at the edge of cover_dp2With the pixel Z at the edge for being located at the first transparency mask_dp1；

S8022 utilizes the different pixel Z in the position_dpPixel Z identical with position_sp, obtain the edge of the second transparency mask Determined by edge with the first transparency mask: the closed enclosed region of institute and the closed area between edge and edge The position of all closing pixels in domain；

S8023 executes following sub-step:

(1) pixel Z is searched_dp1Position corresponding to pixel in the transparency estimated value of the first transparency mask of the first image, And the transparence value of the corresponding pixel in the second image is searched, and using the average value of the two as pixel Z_dp1It is revised Transparency estimated value；

(2) pixel Z is searched_dp2Position corresponding to pixel in the transparency estimated value of the second transparency mask of the first image, And the transparence value of the corresponding pixel in the first image is searched, and using the average value of the two as pixel Z_dp2It is revised Transparency estimated value；

S8024, in conjunction with pixel Z_dp1Revised transparency estimated value and pixel Z_dp2Revised transparency estimated value corrects institute State the first transparency mask of the first image.

6. a kind of video object trajectory extraction device, comprising:

First division module, is used for: for the first image in video, dividing all foreground pixel set F in the image, institute The pixel set that has powerful connections B and all unknown pixel set Z；Wherein, the first image be extracted from the video it is a certain Frame image；

First metric module, is used for: giving certain prospect background pixels to (F_i, B_j), each unknown picture is measured according to the following formula Plain Z_kTransparency

Second metric module, is used for: for the m²Each group of prospect background pixel in group is to (F_i, B_j) and its it is correspondingMeasurement prospect background pixel is to (F according to the following formula_i, B_j) confidence level n_ij:

Determining module is used for: according to each unknown pixel Z_kTransparency estimated valuePrimarily determine the first image The first transparency mask；

Second division module, is used for: to the first image superposition grayscale information to generate the second image, and drawing to second image Divide its all foreground pixel set, all background pixel set and all unknown pixel set；

Calling module again is used for: be directed to second image, call again first metric module, the second metric module, Computing module and determining module, to determine the first transparency mask of the second image, and it is transparent by the first of second image Spend second transparency mask of the mask as the first image；

Correction module is used for: using the second transparency mask of the first image, correct the first image first is transparent Spend mask；

Extraction module is used for: according to the first transparency mask of resulting first image of correction module, to the first of the video Foreground target in image extracts, and specified target is retrieved in all foreground targets, then according to the time of all frames Sequentially, the track of the specified target is generated according to all images for including the specified target.

7. device according to claim 6, described device further include:

Successively calling module is used for: remaining each frame image is extracted from the video, and respectively as described first Image, successively call described in: the first division module, the first metric module, the second metric module, computing module, determining module, the Two division modules, again calling module, correction module and extraction module, to extract all foreground targets of the video；Or Include:

Calling module is inherited, is used for: remaining each frame image is extracted from the video, respectively as first figure Picture, and it is input to third division module, wherein the third division module is used for according to revised first image of previous frame First transparency mask divides all foreground pixel set F in the first image corresponding to the present frame_c, all background pixels Set B_CWith all unknown pixel set Z_C；Then the succession calling module successively calls first metric module, second degree Module, computing module, determining module, the second division module, again calling module, correction module and extraction module are measured, to extract All foreground targets of the video, wherein third division module includes:

First binary Images Processing unit, for the first transparency mask of revised first image of previous frame to be carried out two-value Change, threshold value takes 0.5, obtains the first bianry image of foreground target；

Second binary Images Processing unit, is used for: carrying out shape to the second bianry image using the circular configuration element that size is 3x3 State etching operation, and the second bianry image is updated with the result obtained；

Third binary Images Processing unit, is used for: carrying out shape to third bianry image using the circular configuration element that size is 3x3 State expansive working, and third bianry image is updated with the result obtained:

True and false division unit, is used for: will be really right in the second bianry image of the second binary Images Processing unit final updating Answer pixel as all foreground pixel set F_c, will be in the third bianry image of third binary Images Processing unit final updating False respective pixel is as all background pixel set B_C, rest of pixels is as all unknown pixel set Z_C。

8. device according to claim 6, wherein the second division module further include: mean filter unit is used for: to One image carries out mean filter and obtains third image；

9. device according to claim 6, wherein correction module further include:

Edge cells are found, are used for: according to the first transparency mask of the second transparency mask of the first image and the first image, The edge of its second transparency mask, the edge of the first transparency mask are found respectively；

Determine position units, be used for: the position and the first transparency for obtaining all pixels at the edge of the second transparency mask hide The position of all pixels at the edge of cover, and determine the position of all pixels at the edge of the second transparency mask and first transparent The region that the position of all pixels at the edge of mask is overlapped is spent, and then determines the identical pixel Z in position_sp；

First amending unit, is used for: searching pixel Z respectively_spThe transparency of the first transparency mask corresponding to the first image is estimated Evaluation, and the transparency estimated value of the second transparency mask corresponding to the first image, and using the average value of the two as pixel Z_spRevised transparency estimated value；

Second amending unit, is used for: with pixel Z_spRevised transparency estimated value, correct the first image first are transparent Spend mask.

10. device according to claim 9, wherein the determining position units further comprise:

Different location subelement, is used for: according to the position and first of all pixels at the edge of the second transparency mask of judgement The region that the position of all pixels at the edge of transparency mask is overlapped, further determines that the different pixel Z in position_dp, comprising: position Pixel Z in the edge of the second transparency mask_dp2With the pixel Z at the edge for being located at the first transparency mask_dp1；

It is closed subelement, is used for: utilizing the different pixel Z in the position_dpPixel Z identical with position_sp, obtain the second transparency Determined by the edge of the edge of mask and the first transparency mask: the closed enclosed region of institute between edge and edge, and The position of all closing pixels of the enclosed region；

Subelement is repeatedly searched, is used for:

Complicated revise subelemen, is used for: in conjunction with pixel Z_dp1Revised transparency estimated value and pixel Z_dp2It is revised transparent Estimated value is spent, the first transparency mask of the first image is corrected.