[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20180260960A1 - Method and apparatus for assisted object selection in video sequences - Google Patents

Method and apparatus for assisted object selection in video sequences Download PDF

Info

Publication number
US20180260960A1
US20180260960A1 US15/533,031 US201515533031A US2018260960A1 US 20180260960 A1 US20180260960 A1 US 20180260960A1 US 201515533031 A US201515533031 A US 201515533031A US 2018260960 A1 US2018260960 A1 US 2018260960A1
Authority
US
United States
Prior art keywords
bounding box
pixels
new line
point
motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/533,031
Inventor
Tomas Enrique Crivelli
Fabrice Urban
Lionel Oisel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
InterDigital CE Patent Holdings SAS
Original Assignee
InterDigital CE Patent Holdings SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by InterDigital CE Patent Holdings SAS filed Critical InterDigital CE Patent Holdings SAS
Publication of US20180260960A1 publication Critical patent/US20180260960A1/en
Assigned to THOMSON LICENSING reassignment THOMSON LICENSING ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: OISEL, LIONEL, CRIVELLI, Tomas Enrique, Urban, Fabrice
Assigned to INTERDIGITAL CE PATENT HOLDINGS reassignment INTERDIGITAL CE PATENT HOLDINGS ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: THOMSON LICENSING
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4023Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20101Interactive definition of point of interest, landmark or seed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30221Sports video; Sports image

Definitions

  • the present disclosure generally relates to the field of image analysis and object/region selection.
  • a method for determining a bounding box for display on a device comprises selecting at least one point that belongs to the object; motion processing an area of points around the selected at least one point to determine an estimated bounding box; and color processing the points in the estimated bounding box to determine the bounding box.
  • the present principles also relate to a method for determining a bounding box for display on a device, the bounding box containing an object in a video sequence, the method comprising selecting at least one point that belongs to the object; and joint motion and color processing the at least one point to determine the bounding box comprising the object.
  • the selecting is performed by a user.
  • the motion processing uses motion flood-filling on a Delaunay triangulation.
  • the color processing further comprises adding a new line of pixels; for each new pixel, measuring its distance to a foreground model and a background model; computing a score for each new pixel, wherein the score is equal to a difference of the distance to the background model minus the distance to the foreground model; averaging the scores for the new line of pixels; wherein if the average score for the new line of pixels is greater than a threshold, the new line of pixels is added to the estimated bounding box; wherein the bounding box is formed when no new line of pixels is added.
  • the joint processing further comprises motion processing an area of points around the selected at least one point to determine an estimated bounding box; and color processing the points in the estimated bounding box to determine the bounding box.
  • the present principles also relate to an apparatus comprising means for displaying a video sequence and for allowing a selection of at least one point on an object of interest in the displayed video sequence; means for storing a motion processing program and a color processing program; and means for processing the selected at least one point with the stored motion processing program and the stored color processing program for determining a bounding box for display on the touch screen display.
  • said means for displaying correspond to a display; said means for allowing a selection correspond to an input device; said means for processing correspond to one or several processors.
  • the input device is as least one of a mouse or keyboard.
  • the stored motion processing program includes instructions for motion flood-filling on a Delaunay triangulation to determine an estimated bounding box.
  • the stored color processing program includes instructions for adding a new line of pixels to the estimated bounding box; wherein for each new pixel, distance to a foreground model and a background model is measured; and wherein a score is computed for each new pixel, wherein the score is equal to a difference of the distance to the background model minus the distance to the foreground model; and wherein the scores for the new line of pixels are averaged; wherein if the average score for the new line of pixels is greater than a threshold, the new line of pixels is added to the estimated bounding box; and wherein the bounding box is formed when no new line of pixels is added.
  • the device is a mobile device such as a mobile phone, tablet, digital still camera, etc.
  • FIG. 1 shows an illustrative flow chart for providing a bounding box in accordance with the principles of the invention
  • FIG. 2 illustrates selection of a point on an object of interest
  • FIG. 3 illustrates selection of a trace on an object of interest
  • FIG. 4 shows an illustrative flow chart for motion processing in accordance with the principles of the invention
  • FIG. 5 illustrates Delaunay triangulation on the object of interest
  • FIG. 6 illustrates the final list of points considered part of the object of interest based on motion similarity
  • FIG. 7 illustrates a bounding box based only on motion information
  • FIG. 8 shows an illustrative flow chart for color processing in accordance with the principles of the invention.
  • FIG. 9 illustrates the final bounding box containing the object of interest for display on the device
  • FIG. 10 illustrates the ability to have multiple bounding boxes for different objects of interest.
  • FIG. 11 shows an illustrative device for use in executing the flow chart of FIG. 1 .
  • processor-based devices are a mobile phone, table, digital still camera, laptop computer, desk top computer, digital television, etc.
  • processor-based devices are a mobile phone, table, digital still camera, laptop computer, desk top computer, digital television, etc.
  • familiarity with video object processing such as Delaunay triangulation processing and flood filling (region growing) is assumed and not described herein.
  • inventive concept may be implemented using conventional programming techniques, e.g., APIs (application programming interfaces) which, as such, will not be described herein.
  • like-numbers on the figures represent similar elements.
  • color processing is referred to below, the figures are in black and white, i.e., the use of color in the figures (other than black and white) is not necessary to understanding the inventive concept.
  • the idea is to combine a simple selection such as a single point, or trace, on an object of interest along with joint motion and color processing about the single point, or trace, in order to determine a bounding box containing the object of interest for display on the device.
  • FIG. 1 shows an illustrative flow chart for providing a bounding box for display on a device in accordance with the principles of the invention.
  • step 105 at least one point on an object of interest is selected, e.g., by a user of the device. This determines a list of at least one point that is now assumed to be associated with the object of interest. This selection is illustrated in FIGS. 2 and 3 .
  • a device e.g., a mobile phone, displays a frame 131 of a video sequence on a display 130 of the mobile phone. For the purposes of this example it is assumed that display 130 is a touch screen display.
  • frame 131 shows a picture of a soccer game.
  • the user using their finger or a stylus, touches the picture of the person (the object of interest) pointed to by arrow 140 where the touch selects at least one point in the picture of frame 131 as represented by white dot 136 .
  • the user can trace a sequence of points as shown in FIG. 3 for identifying the object of interest.
  • the user using their finger or a stylus, touches the picture of the person (the object of interest) pointed to by arrow 140 and traces a sequence of points in the picture of frame 131 as represented by white trace 137 .
  • motion and color processing are then applied to the selected point(s), as represented by steps 110 and 115 , of FIG. 1 , to determine a bounding box containing the object of interest for display in step 120 .
  • step 205 the list of selected points (represented as two dimensional (2D) positions) are provided to step 205 .
  • this list of selected points comprises at least one point in the object of interest.
  • an interest area is first determined around the selected points. This interest area can be fixed or proportional to the image size. On this area, a set of interest points is obtained in step 210 by an interest point detector.
  • Interest point detectors are known in the art, e.g., Scale-Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), Good Features To Track (e.g., see Carlo Tomasi and Takeo Kanade; “Detection and Tracking of Point Features”; Carnegie Mellon University Technical Report CMU-CS-91-132, April 1991; and Jianbo Shi and Carlo Tomasi; “Good Features to Track”; IEEE Conference on Computer Vision and Pattern Recognition, pages 593-600, 1994) or by random sampling.
  • SIFT Scale-Invariant Feature Transform
  • SURF Speeded-Up Robust Features
  • Good Features To Track e.g., see Carlo Tomasi and Takeo Kanade; “Detection and Tracking of Point Features”; Carnegie Mellon University Technical Report CMU-CS-91-132, April 1991; and Jianbo Shi and Carlo Tomasi; “Good Features to Track”; IEEE Conference on Computer Vision and
  • step 220 a motion/displacement is then estimated for each interest point in the following image (frame) of the video sequence (e.g., see Bruce D. Lucas and Takeo Kanade; “An Iterative Image Registration Technique with an Application to Stereo Vision”; International Joint Conference on Artificial Intelligence, pages 674-679, 1981; and Carlo Tomasi and Takeo Kanade; “Detection and Tracking of Point Features”; Carnegie Mellon University Technical Report CMU-CS-91-132, April 1991).
  • step 225 a final point list is determined.
  • step 225 the interest point(s) closest to each input position of the trace (there might be no interest point where the user touched the screen) is considered as a current point and added to a final point list. Its neighbors, according to the triangulation, of step 215 , are also added to the final point list if a motion related distance, from step 220 , (e.g., norm of the difference between motion vectors) to such a current point is lower than a threshold and they are also close enough with respect to a spatial distance threshold. For each of those new added points the process is repeated by considering their neighbors in turn. The whole process works as a flood filling algorithm (or region growing algorithm) but on a sparse set of locations.
  • region growing algorithms among motion values is known in the art (e.g., see I. Grinias G. Tziritas; “A semi-automatic seeded region growing algorithm for video object localization and tracking”; Image Communication. Volume 16, Issue 10, August 2001).
  • a final point list is shown in FIG. 6 , where arrow 155 illustrates the final list of interest points (black dots) considered part of the object of interest according to motion similarity.
  • the final point list determines an estimated bounding box in step 230 .
  • the estimated bounding box is a box that is big enough to contain all the points in the final point list. This is illustrated in FIG. 7 by estimated bounding box 138 .
  • the latter is determined using only motion information.
  • the main component of the motion processing is the use of motion flood-filling on a Delaunay triangulation.
  • the resulting estimated bounding box is provided to color processing step 115 of FIG. 1 .
  • step 115 the list of points that result from motion processing step 110 (i.e., the estimated bounding box based on motion similarity) are introduced into a color-based bounding box estimation process for further refinement.
  • a color model foreground color model
  • the number of clusters is normally fixed to an initial value of 10.
  • the resulting clusters are analyzed in order to discard small clusters. The surviving color clusters are considered as belonging to the foreground.
  • a color model for the background is also estimated (background color model) by taking an external window (i.e., a ring around the estimated bounding box). The color vectors in the external window are clustered following the same procedure than for the foreground model. Once the foreground and background model are obtained, a post-processing on the models is applied. For Each foreground model cluster center, the minimum distance between itself and the background model clusters is computed.
  • the bounding box is determined by a process of window growing. Starting from the estimated bounding box, the size of the window is iteratively enlarged as long as the newly added points of the region are more likely to belong to the foreground model than to the background model. More in detail, for each side of the bounding box (top, left, right, bottom) a new line (row or column) of pixels is added. For each new pixel its distance to the foreground and background models are computed as the minimum distance among the distances to each model cluster.
  • a score is computed for each pixel that in a realization of the invention is the difference of the distance to the background minus the distance to the foreground, such that a high score implies that the pixel is far from the background model and close to the foreground model.
  • the average of pixels scores for the new added line is calculated and if it is bigger than a threshold, the line is kept as part of the bounding box.
  • the threshold is naturally set at 0, as the score can be negative (meaning “closer” to the background) or positive (“closer” to the foreground), and 0 means equal score.
  • it is a parameter that might be modified. In this way, taking each side in turn the window is enlarged until no new line is added.
  • the bounding box is displayed containing the object of interest as illustrated in FIG. 9 by bounding box 139 .
  • assisted selection based on joint motion and color processing in accordance with the principles of the invention can be performed on multiple objects of interest as illustrated in FIG. 10 , where the user initially selects at least a single point on each player of interest.
  • FIG. 11 an illustrative high level block diagram of a device 500 , e.g., a smart phone, for providing a bounding box in accordance with the principles of the invention, as illustrated by the flow charts of FIGS. 1, 4, and 8 , is shown. Only those portions relevant to the inventive concept are shown. As such, device 500 can perform other functions.
  • Device 500 is a processor based system as represented by processor 505 .
  • the latter represents one, or more, stored-program controlled processors as known in the art.
  • processor 505 executes programs stored in memory 510 .
  • Device 500 represents volatile and/or non-volatile memory, e.g., hard disk, CD-ROM, DVD, random access memory (RAM), etc.) for storing program instructions and data, e.g., for performing the illustrative flow charts shown in FIGS. 1, 4 and 8 , for providing a bounding box containing an object of interest.
  • Device 500 also has communications block 130 , which supports communications of data over a data connection 541 as known in the art. Data communications can be wired, or wireless, utilizing 802.11, 3G LTE, 4G LTE, etc.
  • device 500 includes a display 530 for providing information to a user, e.g., displaying a video sequence showing the bounding box containing the object of interest.
  • display 530 is a touch screen and, as such, enables selection by the user of an object of interest as illustrated in FIGS. 2 and 3 .
  • inventive concept is not so limited and other input devices can be used, e.g., a keyboard/mouse input device.
  • the system, or device automatically determines the bounding box based on motion and color propagation.
  • a single touch or trace determines a few points that belong to the object of interest, and the bounding box is then determined by flood-filling following motion and color features.
  • the region filling is determined by color propagation, and uses motion similarity as another feature for determining the pixels that are likely to belong to the same object as the selected points. That is why it is important to use motion information in order to determine which are the object's parts not only from the appearance point of view, but also on how the object coherently moves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

A device performs a method for tracking an object in a video sequence with a bounding box for display on the device by selecting at least one point that belongs to the object, then motion processing an area of points around the selected at least one point to determine an estimated bounding box; and then color processing the points in the estimated bounding box to determine the bounding box for display on the device. The colour processing comprises computing averages of scores from pixel differences to the background model minus pixel differences from the foreground model per line as long as such average is above a threshold.

Description

    BACKGROUND
  • The present disclosure generally relates to the field of image analysis and object/region selection.
  • Many problems in computer vision and image processing require a preprocessing step where objects of interests are segmented or located. For example, consider the problem of tracking an object of interest (object tracking), which requires locating the position of the object at every instant. The initial position of the object (target) can be manually defined in the first frame or as the output of an object detector in the case of dedicated trackers. Almost exclusively it is determined by a bounding box containing the object of interest. However, there are meaningful cases where the type of content, the user and the device require a more principled solution. Such as, e.g., when the user is not an expert and thus cannot provide a good selection of the object of interest from the point of view of the tracking algorithm; or the content is general and thus it is barely impossible to apply a dedicated object detector; or the device has a limited interface which requires a rapid, simple and intuitive input from the user.
  • SUMMARY
  • We propose a new approach for determining a bounding box containing an object of interest, which is then tracked along a video sequence. In particular, and in accordance with the principles of the present disclosure, at least a single point on an object of interest will initiate joint motion and color processing to determine the bounding box containing the object of interest.
  • According to the present principles, a method for determining a bounding box for display on a device, the bounding box containing an object in a video sequence, comprises selecting at least one point that belongs to the object; motion processing an area of points around the selected at least one point to determine an estimated bounding box; and color processing the points in the estimated bounding box to determine the bounding box.
  • The present principles also relate to a method for determining a bounding box for display on a device, the bounding box containing an object in a video sequence, the method comprising selecting at least one point that belongs to the object; and joint motion and color processing the at least one point to determine the bounding box comprising the object.
  • According to an embodiment, the selecting is performed by a user.
  • According to an embodiment, the motion processing uses motion flood-filling on a Delaunay triangulation.
  • According to an embodiment, for each side of the estimated bounding box, the color processing further comprises adding a new line of pixels; for each new pixel, measuring its distance to a foreground model and a background model; computing a score for each new pixel, wherein the score is equal to a difference of the distance to the background model minus the distance to the foreground model; averaging the scores for the new line of pixels; wherein if the average score for the new line of pixels is greater than a threshold, the new line of pixels is added to the estimated bounding box; wherein the bounding box is formed when no new line of pixels is added.
  • According to an embodiment, the joint processing further comprises motion processing an area of points around the selected at least one point to determine an estimated bounding box; and color processing the points in the estimated bounding box to determine the bounding box.
  • The present principles also relate to an apparatus comprising means for displaying a video sequence and for allowing a selection of at least one point on an object of interest in the displayed video sequence; means for storing a motion processing program and a color processing program; and means for processing the selected at least one point with the stored motion processing program and the stored color processing program for determining a bounding box for display on the touch screen display.
  • According to an embodiment, said means for displaying correspond to a display; said means for allowing a selection correspond to an input device; said means for processing correspond to one or several processors.
  • According to an embodiment, the input device is as least one of a mouse or keyboard.
  • According to an embodiment, the stored motion processing program includes instructions for motion flood-filling on a Delaunay triangulation to determine an estimated bounding box.
  • According to an embodiment, the stored color processing program includes instructions for adding a new line of pixels to the estimated bounding box; wherein for each new pixel, distance to a foreground model and a background model is measured; and wherein a score is computed for each new pixel, wherein the score is equal to a difference of the distance to the background model minus the distance to the foreground model; and wherein the scores for the new line of pixels are averaged; wherein if the average score for the new line of pixels is greater than a threshold, the new line of pixels is added to the estimated bounding box; and wherein the bounding box is formed when no new line of pixels is added.
  • In another illustrative embodiment the device is a mobile device such as a mobile phone, tablet, digital still camera, etc.
  • In view of the above, and as will be apparent from reading the detailed description, other embodiments and features are also possible and fall within the principles of the invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 shows an illustrative flow chart for providing a bounding box in accordance with the principles of the invention;
  • FIG. 2 illustrates selection of a point on an object of interest;
  • FIG. 3 illustrates selection of a trace on an object of interest;
  • FIG. 4 shows an illustrative flow chart for motion processing in accordance with the principles of the invention;
  • FIG. 5 illustrates Delaunay triangulation on the object of interest;
  • FIG. 6 illustrates the final list of points considered part of the object of interest based on motion similarity;
  • FIG. 7 illustrates a bounding box based only on motion information;
  • FIG. 8 shows an illustrative flow chart for color processing in accordance with the principles of the invention;
  • FIG. 9 illustrates the final bounding box containing the object of interest for display on the device;
  • FIG. 10, illustrates the ability to have multiple bounding boxes for different objects of interest; and
  • FIG. 11 shows an illustrative device for use in executing the flow chart of FIG. 1.
  • DETAILED DESCRIPTION
  • Other than the inventive concept, the elements shown in the figures are well known and will not be described in detail. For example, other than the inventive concept, a device that is processor-based is well known and not described in detail herein. Some examples of processor-based devices are a mobile phone, table, digital still camera, laptop computer, desk top computer, digital television, etc. Further, other than the inventive concept, familiarity with video object processing such as Delaunay triangulation processing and flood filling (region growing) is assumed and not described herein. It should also be noted that the inventive concept may be implemented using conventional programming techniques, e.g., APIs (application programming interfaces) which, as such, will not be described herein. Finally, like-numbers on the figures represent similar elements. It should also be noted that although color processing is referred to below, the figures are in black and white, i.e., the use of color in the figures (other than black and white) is not necessary to understanding the inventive concept.
  • We propose a new approach for selecting an object of interest which (among other possible applications) will then be tracked along a video sequence being displayed on a device. In particular, and in accordance with the inventive concept, the idea is to combine a simple selection such as a single point, or trace, on an object of interest along with joint motion and color processing about the single point, or trace, in order to determine a bounding box containing the object of interest for display on the device.
  • FIG. 1 shows an illustrative flow chart for providing a bounding box for display on a device in accordance with the principles of the invention. In step 105, at least one point on an object of interest is selected, e.g., by a user of the device. This determines a list of at least one point that is now assumed to be associated with the object of interest. This selection is illustrated in FIGS. 2 and 3. A device, e.g., a mobile phone, displays a frame 131 of a video sequence on a display 130 of the mobile phone. For the purposes of this example it is assumed that display 130 is a touch screen display. However, the invention is not so limited and other mechanisms for selecting at least one point on an object of interest can also be used, e.g., a mouse. As shown in FIG. 2, frame 131 shows a picture of a soccer game. The user, using their finger or a stylus, touches the picture of the person (the object of interest) pointed to by arrow 140 where the touch selects at least one point in the picture of frame 131 as represented by white dot 136. Alternatively, the user can trace a sequence of points as shown in FIG. 3 for identifying the object of interest. Again, the user, using their finger or a stylus, touches the picture of the person (the object of interest) pointed to by arrow 140 and traces a sequence of points in the picture of frame 131 as represented by white trace 137.
  • In accordance with the principles of the invention, motion and color processing are then applied to the selected point(s), as represented by steps 110 and 115, of FIG. 1, to determine a bounding box containing the object of interest for display in step 120.
  • Turning now to FIG. 4, motion processing step 110 will be explained in more detail. In particular, the list of selected points (represented as two dimensional (2D) positions) are provided to step 205. As described above, this list of selected points comprises at least one point in the object of interest. In step 205, an interest area is first determined around the selected points. This interest area can be fixed or proportional to the image size. On this area, a set of interest points is obtained in step 210 by an interest point detector. Interest point detectors are known in the art, e.g., Scale-Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), Good Features To Track (e.g., see Carlo Tomasi and Takeo Kanade; “Detection and Tracking of Point Features”; Carnegie Mellon University Technical Report CMU-CS-91-132, April 1991; and Jianbo Shi and Carlo Tomasi; “Good Features to Track”; IEEE Conference on Computer Vision and Pattern Recognition, pages 593-600, 1994) or by random sampling. In step 215, a Delaunay triangulation is applied on the point lattice in order to determine neighboring points. This is shown in FIG. 5, where arrow 150 illustrates Delaunay triangulation (the white lines) among the interest points detected around the input position (list of selected points). Returning to FIG. 4, in step 220, a motion/displacement is then estimated for each interest point in the following image (frame) of the video sequence (e.g., see Bruce D. Lucas and Takeo Kanade; “An Iterative Image Registration Technique with an Application to Stereo Vision”; International Joint Conference on Artificial Intelligence, pages 674-679, 1981; and Carlo Tomasi and Takeo Kanade; “Detection and Tracking of Point Features”; Carnegie Mellon University Technical Report CMU-CS-91-132, April 1991). In step 225, a final point list is determined. In particular, in step 225, the interest point(s) closest to each input position of the trace (there might be no interest point where the user touched the screen) is considered as a current point and added to a final point list. Its neighbors, according to the triangulation, of step 215, are also added to the final point list if a motion related distance, from step 220, (e.g., norm of the difference between motion vectors) to such a current point is lower than a threshold and they are also close enough with respect to a spatial distance threshold. For each of those new added points the process is repeated by considering their neighbors in turn. The whole process works as a flood filling algorithm (or region growing algorithm) but on a sparse set of locations. Other than the inventive concept, region growing algorithms among motion values is known in the art (e.g., see I. Grinias G. Tziritas; “A semi-automatic seeded region growing algorithm for video object localization and tracking”; Image Communication. Volume 16, Issue 10, August 2001). A final point list is shown in FIG. 6, where arrow 155 illustrates the final list of interest points (black dots) considered part of the object of interest according to motion similarity. Returning to FIG. 4, the final point list determines an estimated bounding box in step 230. The estimated bounding box is a box that is big enough to contain all the points in the final point list. This is illustrated in FIG. 7 by estimated bounding box 138. The latter is determined using only motion information. Illustratively, the main component of the motion processing is the use of motion flood-filling on a Delaunay triangulation. The resulting estimated bounding box is provided to color processing step 115 of FIG. 1.
  • In color processing step 115, the list of points that result from motion processing step 110 (i.e., the estimated bounding box based on motion similarity) are introduced into a color-based bounding box estimation process for further refinement. Turning now to FIG. 8, color processing step 115 will be explained in more detail. In step 305, the estimated bounding box is processed such that a color model (foreground color model) is learned, e.g., by K-means clustering of pixel color vectors, according to the color vector Euclidean distance. The number of clusters is normally fixed to an initial value of 10. Then, the resulting clusters are analyzed in order to discard small clusters. The surviving color clusters are considered as belonging to the foreground. In a different variation, other clustering techniques can be used that automatically determines the best number of clusters. The color model is then represented by the set of cluster centers. Each pixel is assigned the color of the closest learned cluster in the color space. In step 310, a color model for the background is also estimated (background color model) by taking an external window (i.e., a ring around the estimated bounding box). The color vectors in the external window are clustered following the same procedure than for the foreground model. Once the foreground and background model are obtained, a post-processing on the models is applied. For Each foreground model cluster center, the minimum distance between itself and the background model clusters is computed. If this distance is lower than a threshold, we consider that the cluster is not discriminative enough and it is removed from the foreground model. Then in step 315, the bounding box is determined by a process of window growing. Starting from the estimated bounding box, the size of the window is iteratively enlarged as long as the newly added points of the region are more likely to belong to the foreground model than to the background model. More in detail, for each side of the bounding box (top, left, right, bottom) a new line (row or column) of pixels is added. For each new pixel its distance to the foreground and background models are computed as the minimum distance among the distances to each model cluster. A score is computed for each pixel that in a realization of the invention is the difference of the distance to the background minus the distance to the foreground, such that a high score implies that the pixel is far from the background model and close to the foreground model. The average of pixels scores for the new added line is calculated and if it is bigger than a threshold, the line is kept as part of the bounding box. The threshold is naturally set at 0, as the score can be negative (meaning “closer” to the background) or positive (“closer” to the foreground), and 0 means equal score. Anyways, it is a parameter that might be modified. In this way, taking each side in turn the window is enlarged until no new line is added. Finally, and as noted earlier, the bounding box is displayed containing the object of interest as illustrated in FIG. 9 by bounding box 139.
  • It should also be noted that assisted selection based on joint motion and color processing in accordance with the principles of the invention can be performed on multiple objects of interest as illustrated in FIG. 10, where the user initially selects at least a single point on each player of interest.
  • Turning briefly to FIG. 11, an illustrative high level block diagram of a device 500, e.g., a smart phone, for providing a bounding box in accordance with the principles of the invention, as illustrated by the flow charts of FIGS. 1, 4, and 8, is shown. Only those portions relevant to the inventive concept are shown. As such, device 500 can perform other functions. Device 500 is a processor based system as represented by processor 505. The latter represents one, or more, stored-program controlled processors as known in the art. In other words, processor 505 executes programs stored in memory 510. The latter represents volatile and/or non-volatile memory, e.g., hard disk, CD-ROM, DVD, random access memory (RAM), etc.) for storing program instructions and data, e.g., for performing the illustrative flow charts shown in FIGS. 1, 4 and 8, for providing a bounding box containing an object of interest. Device 500 also has communications block 130, which supports communications of data over a data connection 541 as known in the art. Data communications can be wired, or wireless, utilizing 802.11, 3G LTE, 4G LTE, etc. Finally, device 500 includes a display 530 for providing information to a user, e.g., displaying a video sequence showing the bounding box containing the object of interest. It is assumed that display 530 is a touch screen and, as such, enables selection by the user of an object of interest as illustrated in FIGS. 2 and 3. However, it should be noted that the inventive concept is not so limited and other input devices can be used, e.g., a keyboard/mouse input device.
  • As described above, we solve the problem of how to locate the bounding box on an object of interest on a display. Once a single point, or a trace, is selected on an object of interest, the system, or device, automatically determines the bounding box based on motion and color propagation. In other words, a single touch or trace determines a few points that belong to the object of interest, and the bounding box is then determined by flood-filling following motion and color features. In accordance with the principles of the invention, the region filling is determined by color propagation, and uses motion similarity as another feature for determining the pixels that are likely to belong to the same object as the selected points. That is why it is important to use motion information in order to determine which are the object's parts not only from the appearance point of view, but also on how the object coherently moves.
  • In view of the above, the foregoing merely illustrates the principles of the invention and it will thus be appreciated that those skilled in the art will be able to devise numerous alternative arrangements which, although not explicitly described herein, embody the principles of the invention and are within the scope. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the scope of the present principles.

Claims (6)

1. A method for determining a bounding box for display on a device, the bounding box containing an object in a video sequence, the method comprising:
selecting at least one point that belongs to the object;
motion processing an area of points around the selected at least one point to determine an estimated bounding box; and
color processing the points in the estimated bounding box to determine the bounding box;
the color processing further comprising:
adding a new line of pixels;
computing a score for each new pixel, wherein the score is equal to a difference between a distance to a background model and a distance to a foreground model;
averaging the scores for the new line of pixels; wherein if the average score for the new line of pixels is greater than a threshold, the new line of pixels is added to the estimated bounding box;
wherein the bounding box is formed when no new line of pixels is added.
2. The method of claim 1, wherein the selecting is performed by a user.
3. The method of claim 1, wherein the motion processing uses motion flood-filling on a Delaunay triangulation.
4. An apparatus comprising a memory associated with at least one processor configured to:
display a video sequence and for allowing a selection of at least one point on an object of interest in the displayed video sequence;
stores in the memory a motion processing program and a color processing program; and
process the selected at least one point with the stored motion processing program and the stored color processing program for determining a bounding box;
wherein the stored motion processing program comprises instructions for motion flood-filling on a Delaunay triangulation to determine an estimated bounding box; the stored color processing program further comprising instructions:
for adding a new line of pixels to the estimated bounding box;
for computing a score for each new pixel, wherein the score is equal to a difference between a distance to a background model and a distance to a foreground model; and
for averaging scores for the new line of pixels and adding the new line to the estimated bounding box if the average score for the new line of pixels is greater than a threshold
wherein the bounding box is formed when no new line of pixels is added.
5. The apparatus of claim 4, wherein said means for displaying correspond to a display and said means for allowing a selection to an input device.
6. The apparatus of claim 5, wherein the input device is as least one of a mouse or keyboard.
US15/533,031 2014-12-04 2015-12-04 Method and apparatus for assisted object selection in video sequences Abandoned US20180260960A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP14306952.4A EP3029631A1 (en) 2014-12-04 2014-12-04 A method and apparatus for assisted object selection in video sequences
EP14306952.4 2014-12-04
PCT/EP2015/078639 WO2016087633A1 (en) 2014-12-04 2015-12-04 A method and apparatus for assisted object selection in video sequences

Publications (1)

Publication Number Publication Date
US20180260960A1 true US20180260960A1 (en) 2018-09-13

Family

ID=52302090

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/533,031 Abandoned US20180260960A1 (en) 2014-12-04 2015-12-04 Method and apparatus for assisted object selection in video sequences

Country Status (4)

Country Link
US (1) US20180260960A1 (en)
EP (2) EP3029631A1 (en)
TW (1) TW201631553A (en)
WO (1) WO2016087633A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798481A (en) * 2019-04-09 2020-10-20 杭州海康威视数字技术股份有限公司 Image sequence segmentation method and device
CN112990159A (en) * 2021-05-17 2021-06-18 清德智体(北京)科技有限公司 Video interesting segment intercepting method, electronic equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108983233B (en) * 2018-06-13 2022-06-17 四川大学 PS point combination selection method in GB-InSAR data processing
US10839531B2 (en) 2018-11-15 2020-11-17 Sony Corporation Object tracking based on a user-specified initialization point

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030095709A1 (en) * 2001-11-09 2003-05-22 Lingxiang Zhou Multiple image area detection in a digital image
US6774908B2 (en) * 2000-10-03 2004-08-10 Creative Frontier Inc. System and method for tracking an object in a video and linking information thereto
US20040233233A1 (en) * 2003-05-21 2004-11-25 Salkind Carole T. System and method for embedding interactive items in video and playing same in an interactive environment
US20060257048A1 (en) * 2005-05-12 2006-11-16 Xiaofan Lin System and method for producing a page using frames of a video stream
US20070185946A1 (en) * 2004-02-17 2007-08-09 Ronen Basri Method and apparatus for matching portions of input images
US20070279494A1 (en) * 2004-04-16 2007-12-06 Aman James A Automatic Event Videoing, Tracking And Content Generation
US20090315978A1 (en) * 2006-06-02 2009-12-24 Eidgenossische Technische Hochschule Zurich Method and system for generating a 3d representation of a dynamically changing 3d scene
US20100111370A1 (en) * 2008-08-15 2010-05-06 Black Michael J Method and apparatus for estimating body shape
US20110280453A1 (en) * 2004-07-09 2011-11-17 Ching-Chien Chen System and Method for Fusing Geospatial Data
US20110311129A1 (en) * 2008-12-18 2011-12-22 Peyman Milanfar Training-free generic object detection in 2-d and 3-d using locally adaptive regression kernels
US20120093354A1 (en) * 2010-10-19 2012-04-19 Palo Alto Research Center Incorporated Finding similar content in a mixed collection of presentation and rich document content using two-dimensional visual fingerprints
US20130208124A1 (en) * 2010-07-19 2013-08-15 Ipsotek Ltd Video analytics configuration
US20130336583A1 (en) * 2011-02-25 2013-12-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Determining model parameters based on transforming a model of an object
US20150110355A1 (en) * 2010-04-15 2015-04-23 Vision-2-Vision, Llc Vision-2-vision control system
US20150178953A1 (en) * 2013-12-20 2015-06-25 Qualcomm Incorporated Systems, methods, and apparatus for digital composition and/or retrieval

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6785329B1 (en) * 1999-12-21 2004-08-31 Microsoft Corporation Automatic video object extraction
US9317908B2 (en) * 2012-06-29 2016-04-19 Behavioral Recognition System, Inc. Automatic gain control filter in a video analysis system

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6774908B2 (en) * 2000-10-03 2004-08-10 Creative Frontier Inc. System and method for tracking an object in a video and linking information thereto
US20030095709A1 (en) * 2001-11-09 2003-05-22 Lingxiang Zhou Multiple image area detection in a digital image
US20040233233A1 (en) * 2003-05-21 2004-11-25 Salkind Carole T. System and method for embedding interactive items in video and playing same in an interactive environment
US20070185946A1 (en) * 2004-02-17 2007-08-09 Ronen Basri Method and apparatus for matching portions of input images
US20070279494A1 (en) * 2004-04-16 2007-12-06 Aman James A Automatic Event Videoing, Tracking And Content Generation
US20110280453A1 (en) * 2004-07-09 2011-11-17 Ching-Chien Chen System and Method for Fusing Geospatial Data
US20060257048A1 (en) * 2005-05-12 2006-11-16 Xiaofan Lin System and method for producing a page using frames of a video stream
US20090315978A1 (en) * 2006-06-02 2009-12-24 Eidgenossische Technische Hochschule Zurich Method and system for generating a 3d representation of a dynamically changing 3d scene
US20100111370A1 (en) * 2008-08-15 2010-05-06 Black Michael J Method and apparatus for estimating body shape
US20110311129A1 (en) * 2008-12-18 2011-12-22 Peyman Milanfar Training-free generic object detection in 2-d and 3-d using locally adaptive regression kernels
US20150110355A1 (en) * 2010-04-15 2015-04-23 Vision-2-Vision, Llc Vision-2-vision control system
US20130208124A1 (en) * 2010-07-19 2013-08-15 Ipsotek Ltd Video analytics configuration
US20120093354A1 (en) * 2010-10-19 2012-04-19 Palo Alto Research Center Incorporated Finding similar content in a mixed collection of presentation and rich document content using two-dimensional visual fingerprints
US20130336583A1 (en) * 2011-02-25 2013-12-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Determining model parameters based on transforming a model of an object
US20150178953A1 (en) * 2013-12-20 2015-06-25 Qualcomm Incorporated Systems, methods, and apparatus for digital composition and/or retrieval

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798481A (en) * 2019-04-09 2020-10-20 杭州海康威视数字技术股份有限公司 Image sequence segmentation method and device
CN112990159A (en) * 2021-05-17 2021-06-18 清德智体(北京)科技有限公司 Video interesting segment intercepting method, electronic equipment and storage medium

Also Published As

Publication number Publication date
EP3029631A1 (en) 2016-06-08
WO2016087633A1 (en) 2016-06-09
EP3227858A1 (en) 2017-10-11
EP3227858B1 (en) 2018-08-29
TW201631553A (en) 2016-09-01

Similar Documents

Publication Publication Date Title
US9141196B2 (en) Robust and efficient learning object tracker
US9165211B2 (en) Image processing apparatus and method
CN111328396A (en) Pose estimation and model retrieval for objects in images
US9721387B2 (en) Systems and methods for implementing augmented reality
US20200193638A1 (en) Hand tracking based on articulated distance field
Kim et al. Fisheye lens camera based surveillance system for wide field of view monitoring
Fang et al. A novel superpixel-based saliency detection model for 360-degree images
US9774793B2 (en) Image segmentation for a live camera feed
US20140192158A1 (en) Stereo Image Matching
US20230334235A1 (en) Detecting occlusion of digital ink
GB2506707A (en) Colour replacement in images regions
US20190066311A1 (en) Object tracking
EP3227858B1 (en) A method and apparatus for assisted object selection in video sequences
US10121251B2 (en) Method for controlling tracking using a color model, corresponding apparatus and non-transitory program storage device
Nallasivam et al. Moving human target detection and tracking in video frames
Hassan et al. An adaptive sample count particle filter
Minematsu et al. Adaptive background model registration for moving cameras
US20150030206A1 (en) Detecting and Tracking Point Features with Primary Colors
Liu et al. Automatic objects segmentation with RGB-D cameras
Baheti et al. An approach to automatic object tracking system by combination of SIFT and RANSAC with mean shift and KLT
US9552531B2 (en) Fast color-brightness-based methods for image segmentation
Akman et al. Multi-cue hand detection and tracking for a head-mounted augmented reality system
Truong et al. Single object tracking using particle filter framework and saliency-based weighted color histogram
JP6289027B2 (en) Person detection device and program
Morerio et al. Optimizing superpixel clustering for real-time egocentric-vision applications

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

AS Assignment

Owner name: THOMSON LICENSING, FRANCE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CRIVELLI, TOMAS ENRIQUE;URBAN, FABRICE;OISEL, LIONEL;SIGNING DATES FROM 20170510 TO 20170609;REEL/FRAME:048983/0700

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: INTERDIGITAL CE PATENT HOLDINGS, FRANCE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THOMSON LICENSING;REEL/FRAME:049003/0632

Effective date: 20180730

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE