CN105374030B

CN105374030B - A kind of background model and Mobile object detection method and system

Info

Publication number: CN105374030B
Application number: CN201510659389.5A
Authority: CN
Inventors: 张勇; 李常春; 张磊
Original assignee: Beijing Deepview Technology Co Ltd
Current assignee: Beijing Deepview Technology Co Ltd
Priority date: 2015-10-12
Filing date: 2015-10-12
Publication date: 2017-12-15
Anticipated expiration: 2035-10-12
Also published as: CN105374030A

Abstract

The invention discloses a kind of background model and Mobile object detection method, background frames and mask frame are obtained including model initialization, obvious motive position detection, obvious position according to detecting take exercises object edge extraction, according to the edge limited region of search of moving object, obvious motive position travel direction region is increased, entire motion object is finally exported and updates background frames.The invention also discloses a kind of background model and moving object segmentation system, including model initialization module, hence it is evident that motive position detection module, moving object edge extracting module, limit search regions module, directional zone increase module, entire motion object module is exported, updates background frame module.Background model provided by the invention and Mobile object detection method and system do not need specialized training background model, plug and play, and can learn the background changed over time, and do not require specific primary condition.

Description

Background model and moving object detection method and system

Technical Field

The invention relates to the technical field of background models, in particular to a method and a system for detecting a background model and a moving object.

Background

Background models are one of the most fundamental steps in computer vision, with the aim of finding moving objects, such as moving human bodies. Computer vision needs to determine which are moving and which are background parts when processing continuously changing images. To achieve this, there are generally two approaches: one method is to find the human body directly from a frame of static image by pattern recognition. However, this method has limitations that it requires all possible body poses to be included in the classifier for pattern recognition, which is difficult to do in reality.

Another approach is to create a background model, i.e. background frames, to represent the real background and then to find moving objects by comparing the difference between the input frame and the background frame. This method is effective for general applications. However, if the moving object is present in the field of view from the beginning and only a part of the moving object is moving, this method cannot detect the complete moving object. For example, a person may initially stand in front of the camera and then begin to move his or her hands or legs, at which time other body parts that are not moving may not be detected by comparing the background.

Disclosure of Invention

In view of the above, the present invention is directed to a background model and a moving object detection method and system, which do not require special training of the background model, can learn the background changing with time, and do not require specific initial conditions.

The background model and the moving object detection method provided by the invention based on the above purpose comprise:

taking the first frame image as a background frame, starting from the second frame, taking the latest frame as an input frame, and comparing the background frame with the input frame to obtain a mask frame and a label frame;

extracting all obvious moving parts from the moving object mask frame and respectively labeling;

according to the extracted obvious motion part, performing edge extraction on the input frame, finding all pixel points belonging to the edge of the moving object in the input frame, and obtaining an edge image of the moving object;

finding a rectangular frame including an object edge along an edge image of the moving object;

combining the edge image of the moving object and a rectangular frame comprising the edge of the object, and carrying out region growing based on depth continuity on the detected obvious moving part, thereby obtaining the moving object to be detected;

based on the moving object to be detected, an 0/1 distribution of images of the complete moving object is obtained, where 0 represents a background pixel and 1 represents a moving object pixel. Representing a complete moving object to which the apparent motion part in the input frame belongs;

all portions of the input frame that do not belong to the position of the complete moving object are determined as background pixels, which are used to update the background frame after the end of the current input frame processing.

In some embodiments, the step of comparing the background frame and the input frame comprises:

a negative threshold is set to distinguish between the current motion pixel and the current background pixel, since typically the object is closer to the depth camera than the background, and therefore the depth value is smaller than the background, so the threshold is negative. And a positive threshold for distinguishing a previous moment motion pixel from a previous moment background pixel;

performing difference operation on the input frame and the background frame, wherein the difference value is obtained by subtracting the background frame from the input frame;

said step of obtaining a mask frame representing the location of the apparent motion portion in the input frame comprises:

judging pixel points with the depth difference smaller than a negative threshold value as current motion pixels, marking the current motion pixels as 1 in a mask frame, judging the rest pixel points as current background pixels, and marking the rest pixel points as 0 in the mask frame;

the step of obtaining the label frame for indicating the position of the obvious motion part in the background frame comprises:

judging pixel points with depth difference larger than a positive threshold value as a previous-moment motion pixel, marking the pixel points as 0 in a marking frame, and judging the rest pixel points as a previous-moment background pixel, and marking the pixel points as 1 in the marking frame;

the first frame of the mask frame at the initial time is independently saved as an initial moving object mask frame, and in the following time, the mask frames all represent moving object mask frames.

In some embodiments, the step of extracting and respectively labeling all the apparent motion portions from the motion object mask frame includes:

inputting 0/1 distributed moving object mask frame image and depth image input frame;

initializing a data structure representing an object: { rectangle frame equals to null, average equals to 0, pixel point set equals to null }, copying 0/1 distributed moving object mask frame to label frame, and setting label initial value as 2;

scanning the label frame line by line from left to right and from top to bottom;

if a pixel point value on the 0/1 distributed label frame is found to be 1, reassigning the pixel point to be the current label value, then obtaining the depth value corresponding to the pixel point from the input frame, updating the current depth mean value, and pressing the current depth mean value, the pixel point position and the related information into a stack;

if the stack is not empty, popping a pixel point on a label frame from the stack, searching neighborhood pixel points of the pixel point, if the value of a certain neighborhood pixel point is 1, assigning the value as a current label value, and pressing the pixel point and related information of a corresponding pixel point on a depth image input frame into the stack;

if the stack is empty, the label value is increased by 1, and the existing pixel point set and information are copied to an output sequence;

and outputting all the extracted objects, the depth mean values of the objects, the pixel point sets and relevant information of the objects until the whole graph is scanned.

In some embodiments, the step of finding a rectangular frame including an edge of the object along the edge image of the moving object includes:

inputting 0/1 distributed moving object edge images and detected obvious moving parts;

initializing a data structure representing an object: { rectangle frame is equal to null, mean value is equal to 0, pixel point set is equal to null };

respectively scanning the edge image of the moving object and the detected obvious moving part line by line from left to right and from top to bottom;

if the stack is not empty, if the value of one point in the edge image of the moving object is found to be 1 and the corresponding pixel point in the detected obvious moving part is also 1, updating the rectangular frame and the average value, and simultaneously pressing the position of the pixel point and the related information thereof into the stack. Resetting the processed pixel point to 0, and then not scanning the pixel point;

popping a pixel point from a stack, searching the neighborhood of the pixel point, and updating and pressing the position of the pixel point and the related information thereof into the stack if the value of the pixel point of a certain neighborhood is 1 and the distance between the pixel point of the certain neighborhood and the pixel point popped from the current stack is less than a given threshold value. Resetting the processed pixel point to 0, and then not scanning the pixel point;

if the stack is empty, copying the existing pixel point set and information into an output sequence to be used as a rectangular frame of an object;

and until the whole graph is scanned, extracting a rectangular frame comprising the edge of the object, and outputting the rectangular frame of the moving object.

In some embodiments, in the step of updating the background frame after the current input frame processing is finished, the background frame may be updated by using a selective progressive update scheme, that is, according to the position of the pixel point with the median of 0 in the label frame, the pixel points in the same position in the background frame and the input frame are weighted, and the background frame is updated by using the weighted result.

The invention also provides a background model and a moving object detection system, which are characterized by comprising the following components:

the model initialization module is used for taking the first frame image as a background frame, starting from the second frame, taking the latest frame as an input frame, and comparing the background frame with the input frame to obtain a mask frame and a label frame;

the obvious motion part detection module is used for extracting all the obvious motion parts from the mask frame of the moving object and respectively labeling the extracted obvious motion parts;

the moving object edge extraction module is used for extracting edges of the input frame according to the extracted obvious moving part, finding all pixel points belonging to the edges of the moving object in the input frame and obtaining an edge image of the moving object;

a limited search area module for finding a rectangular frame including the edge of the object along the edge image of the moving object;

the directional region growing module is used for carrying out region growing based on depth continuity on the detected obvious moving part by combining the moving object edge image and the rectangular frame comprising the object edge so as to obtain a moving object to be detected;

the complete moving object output module is used for obtaining an 0/1 distributed complete moving object image according to the moving object to be detected and representing the complete moving object to which the obvious motion part belongs in the input frame;

and the background frame updating module is used for judging all parts, which do not belong to the position of the complete moving object, in the input frame as background pixels and updating the background frame after the current input frame is processed.

In some embodiments, the model initialization module comprises:

the threshold setting module is used for setting a negative threshold used for distinguishing a current motion pixel from a current background pixel and a positive threshold used for distinguishing a previous motion pixel from a previous background pixel;

the difference making module is used for performing difference operation on the input frame and the background frame;

the mask frame generation module is used for judging pixel points with the depth difference smaller than a negative threshold value as current motion pixels, marking the current motion pixels as 1 in a mask frame, judging the rest pixel points as current background pixels, and marking the rest pixel points as 0 in the mask frame;

a mark frame generation module, configured to determine a pixel point with a depth difference greater than a positive threshold as a previous-mark motion pixel, where the previous-mark motion pixel is marked as 0 in a mark frame, and determine the remaining pixel points as a previous-mark background pixel, where the remaining pixel points are marked as 1 in the mark frame;

and the moving object mask frame generating module is used for independently storing the first frame at the initial moment of the mask frame as an initial moving object mask frame, and the mask frames all represent moving object mask frames at the next moment.

In some embodiments, the apparent motion site detection module comprises:

a detection input module for inputting 0/1 distributed moving object mask frame image and depth image input frame;

a detection initialization module to initialize a data structure representing an object: { rectangle frame equals to null, average equals to 0, pixel point set equals to null }, copying 0/1 distributed moving object mask frame to label frame, and setting label initial value as 2;

the detection scanning module is used for scanning the label frame line by line from left to right and from top to bottom;

the detection assignment module is used for reassigning the pixel point to be the current label value if the pixel point value on the label frame distributed by 0/1 is found to be 1, then obtaining the depth value corresponding to the pixel point from the input frame, updating the current depth mean value, and pressing the current depth mean value, the position of the pixel point and the related information into a stack;

the detection label module is used for popping a pixel point on a label frame from the stack if the stack is not empty, searching a neighborhood pixel point of the pixel point, assigning a current label value if the value of a certain neighborhood pixel point is 1, and pressing the pixel point and related information of a corresponding pixel point on a depth image input frame into the stack;

the detection updating module is used for increasing the label value by 1 if the stack is empty and copying the existing pixel point set and information into an output sequence;

and the detection output module is used for outputting all the extracted objects, the depth mean values of the plurality of objects, the pixel point sets and the related information until the whole graph is scanned.

In some embodiments, the defined search area module comprises:

the limited search input module is used for inputting 0/1 distributed moving object edge images and detected obvious moving parts;

a qualified search initialization module for initializing a data structure representing an object: { rectangle frame is equal to null, mean value is equal to 0, pixel point set is equal to null };

the limited search scanning module is used for respectively scanning the edge image of the moving object and the detected obvious moving part line by line from left to right and from top to bottom;

and the limited search assignment module is used for updating the rectangular frame and the mean value and simultaneously pressing the position of a pixel point and related information thereof into a stack if the value of the pixel point in the edge image of the moving object is found to be 1 and the corresponding pixel point in the detected obvious moving part is also 1. Resetting the processed pixel point to 0, and then not scanning the pixel point;

and the limited search label module is used for popping a pixel point from the stack and searching the neighborhood of the pixel point if the stack is not empty, and updating and pressing the position of the pixel point and the related information thereof into the stack if the value of a certain neighborhood pixel point is 1 and the distance between the certain neighborhood pixel point and the pixel point popped from the current stack is less than a given threshold value. Resetting the processed pixel point to 0, and then not scanning the pixel point;

a limited search updating module for copying the existing pixel point set and information to a rectangular frame of an object in an output sequence if the stack is empty;

and the limited search output module is used for extracting a rectangular frame comprising the edge of the object and outputting the rectangular frame of the moving object until the whole graph is scanned.

From the above, the background model and the moving object detection method and system provided by the invention have the following advantages:

1. according to the invention, the first frame input frame is directly taken as the background frame, the motion part detected in the second frame input frame is extracted, and the complete moving object is judged according to the motion part, so that even if the moving object is always in the field of view of the camera, as long as part of the moving object moves, the complete moving object can be captured and deduced by the method, and thus the background model and the foreground object are distinguished, and the background model can be conveniently learned and updated in real time in the application process.

2. The background frame of the invention is not required to be prepared before detection, and the first frame input frame is directly used as the background frame, so that the establishment of a background model does not need a learning process and can be plug and play.

3. By combining the extracted rectangular frame including the edge of the moving object and the improved directional region growing algorithm, the complete moving object can be presumed according to the partial motion of the object.

4. Because a large amount of background learning processes in the early stage are reduced, the method has small required computation amount and is suitable for an embedded system with limited resources.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of a method for detecting a background model and a moving object according to the present invention;

FIG. 2 is a diagram of a background frame B of a first frame in an embodiment of a background model and a moving object detection method according to the present invention;

FIG. 3 is a diagram of a second frame of input frames F according to an embodiment of a background model and a moving object detection method provided by the present invention;

fig. 4 is a schematic diagram of a mask frame M representing a position of an apparently moving portion in an input frame F in an embodiment of a background model and a moving object detection method provided by the present invention;

FIG. 5 is a diagram of a labeled frame T representing the location of an apparent motion portion in a background frame B in an embodiment of a background model and a moving object detection method provided by the present invention;

FIG. 6 is a diagram illustrating a moving object to be detected in an embodiment of a background model and a moving object detection method provided by the present invention;

FIG. 7 is a schematic diagram of a significant motion region detected in an embodiment of a background model and a moving object detection method provided by the present invention;

FIG. 8 is a schematic diagram of edges of a moving object extracted in an embodiment of a background model and a moving object detection method provided by the present invention;

FIG. 9 is a schematic diagram of a rectangular frame including edges of a moving object in an embodiment of a background model and a moving object detection method provided by the present invention;

FIG. 10 is a schematic diagram of a full-area search in step S500 according to an embodiment of a method for detecting a background model and a moving object provided by the present invention;

FIG. 11 is a schematic diagram of directional search in step S500 according to an embodiment of a method for detecting a background model and a moving object provided by the present invention;

FIG. 12 is a diagram illustrating a finally detected complete moving object O in an embodiment of a background model and a moving object detection method provided by the present invention;

FIG. 13 is a schematic diagram illustrating a more detailed flow of step S100 in an embodiment of a method for detecting a background model and a moving object according to the present invention;

FIG. 14 is a schematic diagram illustrating a more detailed flow of step S200 in an embodiment of a method for detecting a background model and a moving object according to the present invention;

FIG. 15 is a schematic diagram illustrating a more detailed flow of step S400 in an embodiment of a method for detecting a background model and a moving object according to the present invention;

FIG. 16 is a schematic structural diagram of an embodiment of a background model and a moving object detection system provided by the present invention;

FIG. 17 is a block diagram illustrating further details of a module 810 in an embodiment of a background model and moving object detection system according to the present invention;

FIG. 18 is a diagram illustrating a more detailed structure of a module 820 in an embodiment of a background model and moving object detection system according to the invention;

fig. 19 is a schematic diagram of a further detailed structure of a module 840 in an embodiment of a background model and moving object detection system provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

The invention provides a background model and a moving object detection method, and realizes a method for extracting a complete moving object from a noisy depth data stream. The method is applicable to a variety of initial conditions, including: the moving object is not in the visual field initially, then enters the visual field and starts to move; the moving object starts to be positioned in the visual field, and starts to move after no obvious movement; the moving object is in the field of view from the beginning and is in motion all the time. Therefore, the background model can be established without training to accurately extract the moving object from the background model.

Referring to fig. 1, a schematic flow chart of an embodiment of a method for detecting a background model and a moving object according to the present invention is shown.

The background model and moving object detection method comprises the following steps:

step S100: and initializing the model.

The first frame image is taken as a background frame B. Starting from the second frame, the latest frame is the input frame F and only valid depth inputs are considered, i.e. by comparing the background frame B and the input frame F, resulting in the mask frame M and the label frame T. Wherein,

both the mask frame M (refer to fig. 4) and the reference frame T (refer to fig. 5) are 0/1 distributed images, where each square represents a pixel point, and M of each pixel point_ijValue or T_ijA value of 0 or 1;

all M in the mask frame M_ijThe pixel points with the value of 1 represent the position of the obvious motion part in the input frame F, and all the pixel points with the value of 0 represent the background;

labeling all T in frame T_ijThe pixel points with the value of 0 represent the positions of the obvious motion parts in the background frame B, and all the pixel points with the value of 1 represent the background;

both the background frame B (refer to fig. 2) and the input frame F (refer to fig. 3) are depth maps, in which each element's small square represents a pixel, and B for each pixel_ijValue or F_ijThe value is the point depth value.

In the three-dimensional image processing, the three-dimensional characteristic information of each pixel point can be represented by a depth value, and the gray value in a two-dimensional image can be analogized. Since the depth image is not affected by the irradiation direction of the light source and the emission characteristics of the surface of the object, and there is no shadow, the three-dimensional depth information of the target surface of the object can be accurately expressed.

Step S200: and detecting the obvious motion part.

In order to finally detect the moving object to be detected shown in fig. 6, in this step, the significant moving portion in the moving object is detected first, so that all the pixel points belonging to the significant moving portion are extracted from the moving object mask frame M in combination with the input frame F and are re-labeled respectively. This embodiment provides a fast multi-target labeling and extraction algorithm, which processes a mask frame of a moving object based on a region growing characteristic of depth difference, extracts a plurality of sets of pixel points belonging to an obvious moving part at one time, and sets a threshold Th for screening a set of moving points₃Removing the number of pixels less than Th in the set of pixels belonging to the obvious motion part₃The remaining set represents the detected apparent motion portions, i.e., apparent motion portion 2 and apparent motion portion 3, as shown in fig. 7.

Step S300: and (5) extracting the edge of the moving object.

And according to the detected obvious motion part, performing edge extraction on the input frame F by using an edge extraction algorithm. Specifically, the input frame F is a depth map, all pixel points belonging to the edge of the moving object in the input frame F are found, then, the noise on the edge of the moving object is removed by using an image transformation method, and only the width of a single pixel is left to form a moving object edge image as shown in fig. 8, wherein the value of the pixel point on each moving object edge is marked as 1, and the values of the remaining pixel points on the image are marked as 0.

The edge extraction algorithm can use Canny algorithm, and any other common edge extraction algorithm can be used as long as the final object edge is ensured to be single-pixel wide.

Step S400: a search area is defined.

Based on that the pixel points on the edge image of the moving object and the detected apparent motion portions must be continuous in depth, that is, the difference value of the adjacent pixel points in depth is smaller than a specified threshold, the embodiment of the present invention provides a rectangular frame extraction method based on depth continuity, so as to find the rectangular frame including the edge of the object as shown in fig. 9 along the edge of the image.

Step S500: the directional area grows.

And combining the edge images of the moving object, and performing region growing based on depth continuity on the detected obvious moving part, thereby obtaining the moving object to be detected shown in the figure 6. The embodiment of the invention provides a directional region growing algorithm based on depth continuity, namely when the region growing algorithm is executed to the parts above the legs, the whole region search shown in figure 10 is used, and when the region growing algorithm is executed to the parts below the legs, the directional search shown in figure 11 is used, so that the region growing direction is limited to grow downwards only. The range of the simultaneous search is also limited by the rectangular frame including the edges of the moving object.

For example, the region growing based on depth continuity is performed on the detected obvious motion part shown in fig. 7, that is, the region growing is performed from the left-hand part and the right-hand part to eight directions 1 to 8 shown in fig. 10, the "growing" is to find pixel points which are continuous in depth with the left hand and the right hand in the input frame F, determine that the pixel points belong to the moving object, combine the edge image of the moving object, make the growing range not exceed the edge of the moving object and the rectangular frame, and finally find the complete person to which the obvious motion part belongs in the depth image of the input frame F. The eight directions of growth are made above the legs because the position of the body connected to the apparent motion area cannot be determined in which orientation, and eight directions are possible. However, when the algorithm judges that the growing position reaches the leg part, only the leg which is connected with the ground and has the supporting effect on the human body is downwards, so that the growing direction is limited to three downwards directions of 6-8.

This has the advantage that the problem of the foot being connected to the ground is effectively solved, since in a real depth map, the foot and the ground will have a continuously varying depth in all directions, since the foot of a person is standing on the ground. While the conventional depth continuity-based region growing algorithm uses a full-region search for the depth map, reference is made to fig. 10. This fact leads to the fact that the traditional region growing algorithm based on depth continuity can not find the edge of the connection between the foot and the ground, so that if no special measures are taken, the ground and the moving human body can be easily connected together in the detection process, and the detection is invalid.

Step S600: and outputting the complete moving object.

Referring to fig. 12, a schematic diagram of a complete moving object O finally detected in the embodiment of the background model and the moving object detection method provided by the present invention is shown.

According to the result of the growth in step S500, a moving object to which the apparently moving portion in the input frame F belongs is obtained, the values of the pixel points belonging to the moving object portion are marked as 1, and the values of the remaining pixel points are marked as 0, so that an 0/1-distributed image of the complete moving object O is obtained.

Step S700: the background frame is updated and the background frame is updated,

and (3) by utilizing the finally detected complete moving object O, all the pixel points with the median value of 0 in the complete moving object O can find pixel points at the same position in the input frame F, and the pixel points in the input frame F are judged as background pixels in the input frame F.

It can be seen from the above embodiments that the background model and the moving object detection method provided by the present invention do not require special training of the background model, are plug-and-play, can learn the background that changes with time, and do not require specific initial conditions, and the moving object can be in the field of view all the time or in a moving state all the time. The method does not need a large amount of calculation, and is suitable for occasions with limited computing resources, such as mobile equipment or embedded equipment.

Preferably, referring to fig. 13, a further detailed flowchart of step S100 in the embodiment of the background model and the moving object detection method provided by the present invention is shown.

The step S100 of initializing the model may further include the steps of:

step S101: setting a negative threshold-Th for distinguishing a current moving pixel from a current background pixel₁And a positive threshold Th for distinguishing a previous moment moving pixel from a previous moment background pixel₂。

In this embodiment, the selection of the positive and negative thresholds should be determined by referring to the environment where the moving object is located, for example, the distance between the person and the background, and the absolute value of the positive and negative thresholds is generally selected to be not less than 0.5 m.

Step S102: performing difference operation on the input frame F and the background frame B, namely performing difference operation on points at corresponding positions in the input frame F and the background frame B: f_ij-B_ij。

Step S103: for depth difference less than negative threshold-Th₁Judging the pixel points to be current motion pixel points, and judging M of the current motion pixel points_ijMarking the value as 1 in a mask frame M, judging the rest pixel points as current background pixel points, and judging the M of the current background pixel points_ijThe value is marked 0 in the mask frame M as shown in equation (1).

Step S104: as shown in equation (2), the depth difference is larger than the positive threshold Th₂The pixel points of (2) are judged as the previous moment moving pixel points, and the T of the previous moment moving pixel points is judged_ijMarking the value as 0 in the label frame T, judging the rest pixel points as the background pixel points of the previous moment, and judging the T of the background pixel points of the previous moment_ijThe value is marked 1 in the index frame T.

The sequence of step S103 and step S104 may be interchanged, or may be executed simultaneously.

Step S105: the first frame of the mask frame M at the initial moment is independently stored as an initial moving object mask frame M_OAt the next time, the mask frames M each represent a moving object mask frame.

It can be seen from the above steps that the mask frame M is used as the starting condition for the whole object detection, which can effectively remove the noise of the depth image itself and reduce the possibility of erroneous judgment.

Referring to fig. 14, a schematic diagram of a more detailed flow chart of step S200 in an embodiment of a method for detecting a background model and a moving object according to the present invention is shown;

preferably, in the step S200 of detecting the significant moving part, the fast multi-target labeling and extracting algorithm may further include the following steps:

step S201: the distributed moving object mask frame M image and depth image input frame F are input 0/1.

Step S202: initializing a data structure representing an object: { rectangle frame equals to null, average equals to 0, pixel point set equals to null }, copying 0/1 distributed moving object mask frame M to label frame T, and setting current label value initial value to 2, the label value is used to label T of pixel point satisfying condition in frame T_ijValue replacement corresponds toThe index value of (a).

Step S203: the index frame T is scanned line by line from left to right, top to bottom. Wherein, scanning from right to left, bottom to top, or other orders may also be employed.

Step S204: if T of a pixel point on the label frame T of 0/1 distribution is found_ijIf the value is 1, the T of the pixel point is set_ijAnd the value is assigned to be the current label value again, then the depth value corresponding to the pixel point is obtained from the input frame F, the depth mean value of all the pixel points marked with the same label value is recalculated, the current depth mean value is updated, and the current depth mean value, the position of the pixel point and the related information are pressed into a stack together. T of pixel point to be processed_ijThe value is reset to 0 and then the spot is not scanned. In the process of scanning the label frame T, if the value of one pixel point on the label frame T is 0, the next pixel point is continuously scanned.

Step S205: if the stack is not empty, pop up a pixel point on a label frame T from the stack, search the neighborhood pixel point of the pixel point, if T of a certain neighborhood pixel point_ijThe value is 1, and the T of the pixel point is set_ijAnd assigning the value as the current label value, and pressing the pixel point and the related information of the corresponding pixel point on the depth image input frame F into a stack.

Step S206: if the stack is empty, the index value is incremented by 1 and the existing set of pixels and information are copied into the output sequence.

Step S207: and outputting all the extracted objects, the depth mean values of the objects, the pixel point sets and relevant information of the objects until the whole graph is scanned.

The purpose of step S200 is to extract the positions of the significant motion portions, i.e., the pixels belonging to different significant motion portions marked 1 in the label frame T, simultaneously, and extract and store the depth information of the corresponding pixels in the depth image input frame F. For example, when the input in step S200 is the reference frame T shown in fig. 4 and the input frame F shown in fig. 3, all the pixels belonging to the significant motion region in the reference frame T are marked with 1. The algorithm marks the pixels belonging to different obvious motion parts with different label values, that is, pixels marked with 1 in the left hand and the right hand in the label frame T shown in fig. 4 are respectively marked with different label values, for example, the values of the pixels on the two hands in fig. 7 are respectively 2 and 3. Since the label frame T originally has 0/1 distribution, the label value in this embodiment starts from 2. In the scanning process of step S203, the pixel point with the median value of 1 in the first labeled frame T is executed with the action described in step S204, and the related information is calculated and pushed into the stack, wherein the pixel point with the median value of 1 in the first labeled frame T appears at different positions on the labeled frame T according to different scanning orders. Next, step S205 is executed, the pixel points adjacent to the pixel point with the value of 1 are searched and processed until all the pixel points with the value of 1 in the right hand of the person in fig. 7 are marked as 2, after all the pixel points in the right hand are marked, the stack is empty, step S206 is executed, the marking value is increased by 1 to become 3, at this time, step S205 is executed, and the above process is repeated to mark the left hand of the person in fig. 7. And finally, all the pixel points belonging to the obvious motion part in the label frame T are labeled to have different label values and output.

Referring to fig. 15, a schematic diagram of a more detailed flow chart of step S400 in an embodiment of a method for detecting a background model and a moving object according to the present invention is shown;

preferably, in the step S400 of defining the search area, the method for extracting the rectangular frame based on depth continuity may further include the following steps:

step S401: the edge images of the moving object distributed as 0/1 and the detected apparent moving parts are input.

Step S402: initializing a data structure representing an object: { rectangle frame equals to null, mean equals to 0, pixel point set equals to null }.

Step S403: from left to right, from top to bottom, the edge image of the moving object and the detected obvious moving part are scanned line by line respectively. Wherein, scanning from right to left, bottom to top, or other orders may also be employed.

Step S404: if the value of a pixel point in the edge image of the moving object is found to be 1 and the value of a pixel point at a corresponding position in the detected obvious moving part is also 1, updating the rectangular frame to enable the rectangular frame to contain the pixel point, updating the average value of the pixel points of which the corresponding values in the detected obvious moving part are also 1, and simultaneously pressing the position of the pixel point and related information thereof into a stack. Resetting the processed pixel point to 0, and then not scanning the pixel point.

It is seen from step S404 that after the scanning, if the value of a pixel point in the edge image of the moving object is 1, the depth value and the related information corresponding to the pixel point are not directly processed, but only when the value of the corresponding pixel point in the detected obvious moving portion is also 1, the depth value and the related information of the pixel point are processed, so that the determination of whether the pixel point belongs to the moving object can be more accurate.

Step S405: if the stack is not empty, a pixel point is popped from the stack, the neighborhood of the pixel point is searched, if the value of a certain neighborhood pixel point is 1 and the distance between the certain neighborhood pixel point and the pixel point popped from the current stack is less than a given threshold value, the position of the pixel point and the related information thereof are updated and pressed into the stack. Resetting the processed pixel point to 0, and then not scanning the pixel point.

Step S406: if the stack is empty, the existing set of pixel points and information are copied into the output sequence as a rectangular box of an object.

Step S407: and until the whole graph is scanned, extracting a rectangular frame comprising the edge of the object, and outputting the rectangular frame of the moving object.

The purpose of step S400 is to obtain a rectangular frame that just includes the moving object, and the purpose of performing the operation by combining the edge image of the moving object and the detected significant moving portion is to make the two cooperate and correct, so that the result is more accurate.

It should be noted that, in the embodiment, only the rectangular frame for extracting one edge is given, and in practical application, if a plurality of moving objects are extracted at the same time, the rectangular frames including different moving objects are obtained respectively.

Preferably, the background frame B is updated by using a selective progressive update scheme, the pixels at the same positions in the background frame B and the input frame F are weighted according to the position of the pixel with the value of 0 in the labeled frame T, and the background frame B is updated by using the weighted result, as shown in formula (3).

Where α represents the weight of the gradual update, which is usually dependent on how fast the background changes, and is usually between 0.9 and 1.

Another aspect of the present invention further provides a background model and a moving object detection system, which refer to fig. 16 and are schematic structural diagrams of embodiments of the background model and the moving object detection system provided by the present invention.

The background model and moving object detection system 800 includes:

a model initialization module 810, configured to use the first frame image as a background frame, start with the second frame, and obtain a mask frame M and a label frame T by comparing the background frame with the latest frame as an input frame;

an obvious motion part detection module 820, configured to extract all the obvious motion parts from the motion object mask frame and label the extracted obvious motion parts respectively;

the moving object edge extraction module 830 is configured to perform edge extraction on the input frame according to the detected significant moving part, find all pixel points belonging to the edge of the moving object in the input frame, and obtain an edge image of the moving object;

a define search area module 840 for finding a rectangular frame, e.g., including the edge of the object, along the edge image of the moving object;

a directional region growing module 850, configured to perform depth continuity-based region growing on the detected significant moving portion in combination with the edge image of the moving object, so as to obtain the moving object to be detected;

an output complete moving object module 860, configured to obtain an 0/1-distributed image of a complete moving object according to the moving object to be detected, indicating a moving object to which an apparently moving portion in the input frame belongs;

an update background frame module 870 for determining all portions of the input frame that do not belong to the position of the complete moving object as background pixels is used to update the background frame after the end of the current input frame processing.

It can be seen from the above embodiments that the background model and the moving object detection system provided by the present invention do not require special training of the background model, are plug-and-play, can learn the background that changes with time, and do not require specific initial conditions, and the moving object can be in the field of view all the time or in a moving state all the time. The method does not need a large amount of calculation, and is suitable for occasions with limited computing resources, such as mobile equipment or embedded equipment.

Referring now to FIG. 17, a block diagram of a background model and a module 810 in an embodiment of a motion detection system according to the present invention is shown

The model initialization module 810 may further include:

a threshold setting module 811 for setting a negative threshold for distinguishing a current moving pixel from a current background pixel, and a positive threshold for distinguishing a previous moving pixel from a previous background pixel;

a difference-making module 812, configured to perform difference operation on the input frame and the background frame;

a mask frame generating module 813, configured to determine, as a current moving pixel point, a point whose depth difference is smaller than a negative threshold, where the point is marked as 1 in a mask frame, and the remaining points are determined as current background pixel points, and are marked as 0 in the mask frame;

a label frame generation module 814, configured to determine a pixel point with a depth difference greater than a positive threshold as a previous-frame motion pixel point, where the previous-frame motion pixel point is marked as 0, and the remaining pixel points are determined as previous-frame background pixel points, where the previous-frame motion pixel points are marked as 1, and the label frame is initialized to be all 1, which indicates that any point cannot be determined as a background pixel;

the moving object mask frame generating module 815 is configured to separately store a first frame at an initial time of the mask frame as an initial moving object mask frame, where the mask frames all represent moving object mask frames in subsequent times.

The module can see that the mask frame is used as the starting condition of the whole object detection, so that the noise of the depth image can be effectively removed, and the possibility of misjudgment is reduced.

Referring now to FIG. 18, a block diagram of a background model and a module 820 of an embodiment of a moving object detection system according to the present invention is shown.

The apparent motion portion detecting module 820 may further include:

a detection input module 821 for inputting 0/1 the distributed moving object mask frame image and depth image input frame;

a detection initialization module 822 for initializing a data structure representing an object: { rectangle frame equals to null, average equals to 0, pixel point set equals to null }, copying 0/1 distributed moving object mask frame to label frame, and setting label initial value as 2;

the detection scanning module 823 is configured to scan the label frame line by line from left to right and from top to bottom;

a detection assignment module 824, configured to reassign a pixel point value on the label frame distributed in 0/1 to a current label value if the pixel point value is found to be 1, then obtain a depth value corresponding to the pixel point from the input frame, update a current depth mean value, and push the current depth mean value, the pixel point position, and related information together into a stack;

a label detection module 825, configured to pop up a pixel point on a label frame from the stack if the stack is not empty, search for a neighborhood point of the pixel point, assign a current label value if a certain neighborhood pixel point value is 1, and push related information of the pixel point and a corresponding pixel point on a depth image input frame into the stack;

a detection update module 826, configured to increase the label value by 1 if the stack is empty, and copy the existing pixel point set and information to an output sequence;

the detection output module 827 is configured to output the depth mean values, the pixel point sets, and the related information of all extracted objects and multiple objects until the whole image is scanned.

Preferably, referring to fig. 19, a further detailed structural diagram of the module 840 in the embodiment of the background model and the moving object detection system provided by the present invention is shown.

The limited search area module 840 may further include:

a limited search input module 841 for inputting the edge images of the moving object distributed as 0/1 and the detected obvious moving parts;

a qualified search initialization module 842 for initializing a data structure representing an object: { rectangle frame is empty, mean value is 0, point set is empty };

a limited search scanning module 843, configured to scan the edge image of the moving object and the detected significant moving portion line by line from left to right and from top to bottom, respectively;

and the limited search assignment module 844 is configured to update the rectangular frame and the mean value and push the position of a pixel and related information thereof into a stack if the value of the pixel in the edge image of the moving object is found to be 1 and the corresponding pixel in the detected significant moving part is also 1. Resetting the processed pixel point to 0, and then not scanning the pixel point;

and a limit search label module 845, configured to pop a pixel from the stack if the stack is not empty, search a neighborhood of the pixel, and update and push the pixel location and related information into the stack if the value of a neighborhood pixel is 1 and the distance between the current stack popped pixel and the pixel is less than a given threshold. Resetting the processed pixel point to 0, and then not scanning the pixel point;

a limited search update module 846, configured to copy an existing set of pixel points and information to a rectangular frame in an output sequence as an object if the stack is empty;

and the limited search output module 847 is used for extracting a rectangular frame including the edge of the object and outputting the rectangular frame of the moving object until the whole graph is scanned.

The method for detecting a background model and a moving object by using the background model and moving object detection system 800 provided by the present invention will be briefly described with reference to fig. 1:

the background model and the moving object detection method comprise the following steps:

step S100: the model initialization module 810 performs model initialization, i.e., using the first frame image as the background frame B. Starting from the second frame, the latest frame is the input frame F and only valid depth input is considered, i.e. by comparing the background frame B and the input frame F, a mask frame M and a label frame T are obtained;

step S200: the apparent motion region detection module 820 performs apparent motion region detection, that is, all the pixels belonging to an apparent motion region are extracted from the motion object mask frame M in combination with the input frame F and are respectively re-labeled;

step S300: the moving object edge extraction module 830 performs moving object edge extraction, that is, according to the detected significant moving part, edge extraction is performed on the input frame F, and all pixel points belonging to the edge of the moving object in the input frame F are found to obtain a moving object edge image;

step S400: the limited search region module 840 performs limited search region, i.e., finds a rectangular frame including the edge of the object along the edge image of the object;

step S500: the directional region increasing module 850 performs directional region increasing, that is, region increasing based on depth continuity is performed on the detected obvious moving part in combination with the edge image of the moving object, so as to obtain the moving object to be detected;

step S600: the complete moving object output module 860 outputs a complete moving object, that is, according to the moving object to be detected, an 0/1 distributed complete moving object image O is obtained, which represents the moving object to which the significant moving part in the input frame F belongs;

step S700: the update background frame module 870 performs a background frame update, i.e. determines all the portions of the input frame F that do not belong to the position of the complete moving object O as background pixels, which is used to update the background frame B after the processing of the current input frame F is finished.

It should be particularly pointed out that the above-described embodiments of the apparatus only employ the embodiments of the method to specifically describe the working process of each module, and those skilled in the art can easily conceive of applying these modules to other embodiments of the method. Of course, since the steps in the embodiments of the method can be mutually intersected, replaced, added, or deleted, these reasonable permutations and combinations should also fall within the scope of the present invention, and should not limit the scope of the present invention to the embodiments.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A background model and a moving object detection method are characterized by comprising the following steps:

taking the first frame image as a background frame, starting from the second frame, taking the latest frame as an input frame, and comparing the background frame with the input frame to obtain a mask frame and a label frame; wherein, the background frame and the input frame are depth maps, wherein the small square of each element represents a pixel point, and B of each pixel point_ijValue or F_ijThe value is the point depth value; both the mask frame and the label frame are 0/1 distributed images, where each small square represents an imagePixel point, M of each pixel point_ijValue or T_ijA value of 0 or 1; all M in the mask frame_ijThe pixel point with the value of 1 represents the position of the obvious motion part in the input frame, and all the pixel points with the value of 0 represent the background; all T's in the label frame_ijThe pixel points with the value of 0 represent the positions of the obvious motion parts in the background frame, and all the pixel points with the value of 1 represent the background;

according to the moving object to be detected, obtaining an 0/1 distributed complete moving object image, which represents the complete moving object to which the obvious motion part in the input frame belongs;

2. The method of claim 1,

the step of comparing the background frame and the input frame comprises:

setting a negative threshold value for distinguishing a current motion pixel from a current background pixel and a positive threshold value for distinguishing a previous-moment motion pixel from a previous-moment background pixel;

performing difference operation on the input frame and the background frame;

the step of obtaining the mask frame includes:

the step of obtaining the label frame comprises the following steps:

3. The method of claim 1, wherein the step of extracting and labeling all the significant moving parts from the moving object mask frame comprises:

4. The method of claim 1, wherein the step of finding the rectangular frame including the edge of the object along the edge image of the moving object comprises:

if the stack is not empty, if the value of one point in the edge image of the moving object is found to be 1 and the corresponding pixel point in the detected obvious moving part is also 1, updating the rectangular frame and the average value, and simultaneously pressing the position of the pixel point and the related information thereof into the stack; resetting the processed pixel point to 0, and then not scanning the pixel point;

popping a pixel point from a stack, searching the neighborhood of the pixel point, and if the value of the pixel point of a certain neighborhood is 1 and the distance between the pixel point of the certain neighborhood and the pixel point popped from the current stack is less than a given threshold value, updating and pressing the position of the pixel point and the related information thereof into the stack; resetting the processed pixel point to 0, and then not scanning the pixel point;

5. The method according to claim 1, wherein said step of updating the background frame after the end of the processing of the current input frame is performed by updating the background frame using a selective progressive update scheme, i.e. weighting pixels in the background frame at the same position as the input frame according to the position of the pixel with a median of 0 in the labeled frame, and updating the background frame using the weighted result.

6. A background model and moving object detection system, comprising:

the model initialization module is used for taking the first frame image as a background frame, starting from the second frame, taking the latest frame as an input frame, and comparing the background frame with the input frame to obtain a mask frame and a label frame; wherein, the background frame and the input frame are depth maps, wherein the small square of each element represents a pixel point, and B of each pixel point_ijValue or F_ijThe value is the point depth value; the mask frame and the label frame are 0/1 distributed images, wherein each small square represents a pixel point, and M of each pixel point_ijValue or T_ijA value of 0 or 1; all M in the mask frame_ijThe pixel point with the value of 1 represents the position of the obvious motion part in the input frame, and all the pixel points with the value of 0 represent the background; all T's in the label frame_ijThe pixel points with the value of 0 represent the positions of the obvious motion parts in the background frame, and all the pixel points with the value of 1 represent the background;

7. The system of claim 6, wherein the model initialization module comprises:

a mark frame generation module, configured to determine pixel points with depth differences larger than a positive threshold as a previous-moment motion pixel, where the previous-moment motion pixel is marked as 0 in a mark frame, and the remaining pixel points are determined as a previous-moment background pixel, where the previous-moment background pixel is marked as 1 in the mark frame and all initialization of the mark frame is 1;

8. The system of claim 6, wherein the apparent motion location detection module comprises:

9. The system of claim 6, wherein the defined search area module comprises:

the limited search assignment module is used for updating the rectangular frame and the mean value and simultaneously pressing the position of a pixel point and related information thereof into a stack if the value of the pixel point in the edge image of the moving object is found to be 1 and the corresponding pixel point in the detected obvious moving part is also 1; resetting the processed pixel point to 0, and then not scanning the pixel point;

a limited search label module used for popping a pixel point from the stack and searching the neighborhood of the pixel point if the stack is not empty, and updating and pressing the position of the pixel point and the related information into the stack if the value of a certain neighborhood pixel point is 1 and the distance between the certain neighborhood pixel point and the pixel point popped from the current stack is less than a given threshold value; resetting the processed pixel point to 0, and then not scanning the pixel point;