[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2018058595A1 - Target detection method and device, and computer system - Google Patents

Target detection method and device, and computer system Download PDF

Info

Publication number
WO2018058595A1
WO2018058595A1 PCT/CN2016/101237 CN2016101237W WO2018058595A1 WO 2018058595 A1 WO2018058595 A1 WO 2018058595A1 CN 2016101237 W CN2016101237 W CN 2016101237W WO 2018058595 A1 WO2018058595 A1 WO 2018058595A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
detection
candidate
detection target
frame
Prior art date
Application number
PCT/CN2016/101237
Other languages
French (fr)
Chinese (zh)
Inventor
刘晓青
伍健荣
白向晖
Original Assignee
富士通株式会社
刘晓青
伍健荣
白向晖
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社, 刘晓青, 伍健荣, 白向晖 filed Critical 富士通株式会社
Priority to CN201680087590.3A priority Critical patent/CN109416728A/en
Priority to PCT/CN2016/101237 priority patent/WO2018058595A1/en
Publication of WO2018058595A1 publication Critical patent/WO2018058595A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present invention relates to the field of image processing, and in particular, to a target detection method, apparatus, and computer system.
  • target detection methods for video focus on dynamic information while ignoring static targets.
  • the deep convolutional neural network achieves higher precision, however, for video, this method is very time consuming and unqualified.
  • an embodiment of the present invention provides a target detection method, apparatus, and computer system, which use a deep neural network (DNN) as a detector to simultaneously detect dynamic and static targets, reducing time.
  • DNN deep neural network
  • a target detection method comprising:
  • the target on the first key frame For the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all detection targets on the first key frame, and assigning an identifier to each detection target ;
  • the candidate region corresponding to each detection target on the current frame is determined, and each detection target is repositioned according to the candidate region to obtain the current frame.
  • the target on the other key frames is detected based on the depth neural network, and all the detection targets on the other key frames are obtained, and according to the relocation result of the previous frame, the other The detection targets on the key frames are integrated.
  • an object detecting apparatus wherein the apparatus comprises:
  • a first detecting unit for the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all the detection targets on the first key frame, and Each inspection Measuring target allocation identifier;
  • a relocation unit for a normal frame after each key frame, determining a candidate region corresponding to each detection target on the current frame according to the target detection result of the previous frame, and repositioning each detection target according to the candidate region , get the tracking target on the current frame;
  • a second detecting unit for other key frames of the video image, detecting the target on the other key frames based on the depth neural network, obtaining all the detection targets on the other key frames, and repositioning according to the previous frame As a result, the detection targets on the other key frames are integrated.
  • a computer system wherein the computer system comprises the apparatus of the aforementioned second aspect.
  • the beneficial effects of the embodiments of the present invention are that, by using the embodiments of the present invention, the accuracy of target detection can be improved and time consumption can be reduced.
  • FIG. 1 is a schematic diagram of a target detecting method of Embodiment 1;
  • FIG. 3 is a schematic diagram of target detection of key frames of a video image
  • FIG. 4 is a schematic diagram of repositioning a detection target
  • Figure 5 is a flow chart for relocating a detection target
  • Figure 6 is an overall schematic view of one embodiment of repositioning a detection target
  • Figure 7 is a schematic diagram of integration of detection targets on other key frames
  • FIG. 8 is an overall schematic diagram of one embodiment of integrating detection targets on other key frames
  • Figure 9 is a schematic diagram of a target detecting device of Embodiment 2.
  • Figure 10 is a schematic diagram of a relocating unit of the object detecting device of Embodiment 2;
  • Figure 11 is a schematic diagram of a second detecting unit of the object detecting device of Embodiment 2;
  • Figure 12 is a schematic illustration of a computer system of the third embodiment.
  • FIG. 1 is a schematic diagram of the method. As shown in FIG. 1, the method includes:
  • Step 101 For the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all the detection targets on the first key frame, and for each detection Target allocation identifier;
  • Step 102 For a normal frame after each key frame, according to the target detection result of the previous frame, determine a candidate region corresponding to each detection target in the current frame, and reposition each detection target according to the candidate region, and obtain Tracking target on the current frame;
  • Step 103 For other key frames of the video image, based on the depth neural network on the other key frames The target is detected, and all the detection targets on the other key frames are obtained, and the detection targets on the other key frames are integrated according to the relocation result of the previous frame.
  • FIG. 2 illustrates the overall architecture of the target detection method of the present embodiment.
  • the present embodiment performs target detection based on a deep neural network, and obtains all targets on each key frame, which is called a detection target.
  • the target detection is no longer performed, but based on the detection result/target repositioning result of the previous frame, the target is repositioned, which is called the tracking target.
  • the target on the key frame is integrated based on the relocation result of the previous frame (normal frame), so as to avoid Lose the target or repeatedly identify the target or misidentify the target.
  • target detection is performed only on key frames, and high detection accuracy can be achieved with less time consumption.
  • the target detection described above may be implemented by a DNN detector, that is, based on a deep neural network for target detection.
  • a DNN detector that is, based on a deep neural network for target detection.
  • all the targets on the key frame referred to as detection targets
  • each detection target is assigned an identifier (ID) for indicating the detection.
  • ID identifier
  • This embodiment does not limit the type of the identifier, and may be a specified number, or may indicate the attribute of the detection target and the like.
  • the value of n is not limited in this embodiment, and the value of n can take into account the accuracy of the target detection and the amount of calculation (with time consumption). Correlation, large amount of calculation, time consumption, small amount of calculation, less time consumption), if you want to improve the accuracy of target detection, you can set n to a smaller value, if you want to reduce the amount of calculation, that is, reduce the time consumption, n It can be set to a larger value, preferably less than 10.
  • step 102 for a normal frame in the immediate vicinity of the key frame, the embodiment re-locates the detection target according to the target detection result of the key frame (the detection target is obtained). For other normal frames, the embodiment is based on the previous normal frame.
  • Target retargeting results (getting tracked targets) retargeting tracking targets. Since the tracking target obtained by the repositioning result is also the detection target on the key frame, for convenience of explanation, the target repositioning result is referred to as the target detection result, and the tracking target is referred to as the detection target. That is, the target detection result mentioned in step 102 includes the target detection result obtained by performing target detection on the key frame, and also includes the target relocation result obtained by performing target relocation on the normal frame. Similarly, the detection target mentioned in step 102 includes the detection target obtained by performing target detection on the key frame, and also includes the tracking target obtained by performing target relocation on the normal frame.
  • a candidate region of the detection target may be obtained by expanding a bounding box of the detection target.
  • Figure 4 illustrates a schematic diagram of target relocation for a certain detection target.
  • the embodiment may search for a candidate region of the current frame for the target, so as to be in the subsequent region. Reposition the target, as shown in the image on the right side of Figure 4, in the bolded rectangle, and assign a corresponding identifier to it.
  • parameters can be used.
  • the bounding box of the original target (the target on the previous frame) is expanded to obtain the candidate region. If the size of the original target is B_w and B_h, the size of the candidate area after the expansion is:
  • the value can be set as needed, preferably greater than 1.5.
  • the method for extending the bounding box to obtain the subsequent area is only an example, and the embodiment is not limited thereto, and other implementable extension methods are also applicable.
  • step 102 a candidate region for the detection target is obtained, and the detection target can be relocated according to the candidate region.
  • Figure 5 provides a method of relocation, as shown in Figure 5, the method includes:
  • Step 501 traverse the candidate area by using a predetermined step size to obtain a plurality of candidate targets corresponding to the detection target;
  • Step 502 Calculate the similarity between each candidate target and the detection target.
  • Step 503 Determine a candidate target that matches the detection target according to the similarity degree, and use the matched candidate target as the tracking target of the detection target on the current frame.
  • the candidate target is used as the tracking target, and the tracking number corresponding to the detection target is reset.
  • the similarity between the plurality of candidate targets and the detection target is greater than the first threshold, selecting the candidate target having the greatest similarity from the plurality of candidate targets as the tracking target, and resetting the detection.
  • the number of tracking corresponding to the target is greater than the first threshold.
  • the similarity between all the candidate targets and the detection target is not greater than the first threshold, it is determined whether the number of tracking corresponding to the detection target is 0; if the number of tracking is not 0, the number of tracking is decreased by 1 And retain the detection target on the current frame; if the number of tracking is 0, the detection target is not retained on the current frame.
  • d may be set to be less than 4 pixels to ensure that the pixels of each candidate target are higher than 50, but this embodiment is not limited thereto.
  • whether the candidate target matches the detection target can be determined by calculating the similarity between the image features.
  • a gradient difference can be used, but the embodiment is not limited thereto, and other methods of calculating the similarity are also applicable. Since the method of calculating the similarity is the prior art, this embodiment will not be described in detail.
  • the candidate target matching the detection target is found by setting the first threshold, however, if the similarity of all the candidate targets and the detection target is not greater than the first threshold, that is, there is no current frame. If the detection target is found, then in this embodiment, the detection target is not directly removed in the current frame, but the detection target of several frames is reserved by the keep_track parameter to avoid the judgment error.
  • keep_track a parameter is set for each detection target on the key frame, which is called keep_track. If there is no candidate target matching the original detection target on the current frame, the embodiment does not immediately delete the identifier of the detection target. Instead, the value of the keep track is decremented by one and the detection target is retained without updating its position until the value of keep_track is equal to zero. If the detection target matches a candidate target on the previous frame before keep_track is 0, the keep_track will be reset.
  • the value of keep_track can be set to [3 5], that is, it can be 3 or 4 or 5. However, the embodiment is not limited thereto, and it can also be set to other values.
  • each frame image (key frame and normal frame) of the video image may be downsampled before implementation in the method of the embodiment, for example, using a factor ⁇ .
  • the target area or the candidate target area of each frame of the image is down-sampled.
  • the specific implementation manner of the downsampling is not limited, and the existing means may be used.
  • FIG. 6 is an overall flowchart of an embodiment of a relocation method according to the embodiment. Referring to FIG. 6, the method includes:
  • Step 601 Downsampling a target area of the previous frame image and performing feature extraction
  • Step 602 Downsampling a candidate target area of the current frame image and performing feature extraction.
  • Step 603 Feature matching, and obtaining a matching score
  • Step 604 Determine whether the matching score is greater than the first threshold, if the determination is yes, proceed to step 605; otherwise, perform step 606;
  • Step 605 Select an optimal candidate target
  • Step 606 Decrease keep_track by one
  • Step 607 Determine whether the keep_track is 0. If the determination is no, the position of the detection target is reserved in the current frame; otherwise, the position of the detection target is not retained in the current frame.
  • step 601 and step 602 the feature extraction refers to extracting features of candidate objects on the detection target/current frame image on the previous frame image for feature matching, such as calculating similarity and the like.
  • the execution order of step 601 and step 602 is not limited in this embodiment, and may be performed simultaneously or separately.
  • step 102 After processing the last normal frame before the next key frame by step 102, the embodiment enters the next key frame (that is, other key frames except the first key frame). Processing (step 103).
  • step 103 for the next key frame, in addition to the same target detection process as in step 101, the detection target on the other frame is also based on the relocation result of the previous frame (the last normal frame described above). Integration is performed, that is, the target identification is assigned to the detection result of the current key frame according to the tracking target in the previous normal frame, thereby determining whether or not the target has left, whether there is a new target entering, or the like.
  • Figure 7 provides an integration method, as shown in Figure 7, which includes:
  • Step 701 Match each detection target on the other key frames with each candidate target on the previous frame;
  • Step 702 If the overlapping area of the detection target and the candidate target is greater than a second threshold, and the matching score of the detection target and the candidate target is greater than a first threshold, assigning a candidate for matching to the detection target The identity of the target;
  • Step 703 If the overlapping area of the detection target and the candidate target is not greater than a second threshold, or the matching score of the detection target and the candidate target is not greater than a first threshold, assigning a new target to the detection target Logo.
  • FIG. 8 is an overall flowchart of an embodiment of the integration method of the embodiment. Referring to FIG. 8, the method includes:
  • Step 801 overlapping matrix IOU
  • Step 802 Determine whether the IOU ij is greater than the second threshold. If the determination is yes, perform step 803, otherwise assign a new identifier to the target i;
  • Step 803 Image feature matching
  • Step 804 Determine whether the matching score is greater than the first threshold. If the determination is yes, assign the identifier of the target j to the target i. At this time, the IOU matrix is a matrix of (N-1) ⁇ (M-1), and step 805 is performed. Otherwise, a new identifier is assigned to the target i, and the IOU matrix is a matrix of (N-1) ⁇ M;
  • Step 805 Determine whether the number of IOU rows is greater than 0. If the determination is yes, go back to step 801, otherwise go to step 806;
  • Step 806 Determine whether the number of IOU columns is greater than 0. If the determination is yes, the candidate targets that are not matched are not retained on the current key frame, otherwise the process ends.
  • the overlap between the detection target from the new key frame and the tracking target from the previous normal frame is first calculated, and an N*M IOU (Intersection Over Union) is obtained.
  • N*M IOU Intersection Over Union
  • the process of image feature matching is the same as the foregoing, for example, by calculating a similarity (matching score), and details are not described herein again.
  • the tracking target j matches the detection target i, the jth column will be deleted from the IOU matrix, that is, the target j is no longer matched. Therefore, for robustness, the maximum values of the rows of the IOU matrix are first sorted from large to small, and the row positions of the IOU matrix are arranged accordingly, and then the matching is performed from the first row of the matrix. If all the detection targets in the key frame are assigned an identifier, and there are still matching tracking targets in the normal frame, then these targets will be removed. Through such processing, it is possible to determine whether the target has left the visual range, whether there is a new target entering, etc., and to derive the number of targets within the visual range.
  • the target relocation of the normal frame of the video image can improve the accuracy of the target detection and reduce the time consumption.
  • the embodiment of the present invention provides a target detection device.
  • the principle of the device is similar to that of the first embodiment. Therefore, the specific implementation may refer to the implementation of the method in the first embodiment. Description.
  • FIG. 9 is a schematic diagram of the object detecting apparatus of the present embodiment. As shown in FIG. 9, the apparatus 900 includes a first detecting unit 901, a relocating unit 902, and a second detecting unit 903.
  • the first detecting unit 901 is configured to detect, on the first key frame of the video image, the target on the first key frame based on the depth neural network, and obtain all the detection targets on the first key frame, and Each detection target is assigned an identification.
  • the relocation unit 902 is configured to determine a candidate region corresponding to each detection target on the current frame according to the target detection result of the previous frame for the normal frame after each key frame, and re-inspect each detection target according to the candidate region. Positioning to get the tracking target on the current frame;
  • the second detecting unit 903 is configured to detect, on the other key frames of the video image, the targets on the other key frames based on the depth neural network, obtain all the detection targets on the other key frames, and obtain the relocation result according to the previous frame. , to integrate the detection targets on other key frames.
  • FIG. 10 is a schematic diagram of one embodiment of the relocating unit 902 of the present embodiment.
  • the relocation unit 902 may include an extension unit 1001 that expands a bounding box of the detection target to obtain a candidate region of the detection target.
  • extension unit 1001 that expands a bounding box of the detection target to obtain a candidate region of the detection target.
  • the relocating unit 902 may further include: a traversing unit 1002, a calculating unit 1003, and a determining unit 1004, where the traversing unit 1002 may traverse the candidate area by using a predetermined step size to obtain a pair.
  • a plurality of candidate targets of the target should be detected;
  • the calculating unit 1003 can calculate a similarity between each candidate target and the detection target;
  • the determining unit 1004 can determine a candidate target that matches the detection target according to the similarity, and the matching candidate The target is the tracking target of the detection target on the current frame.
  • the determining unit 1004 may use the candidate target as the tracking target when the similarity between the candidate target and the detection target is greater than the first threshold, and reset the tracking number corresponding to the detection target.
  • the determining unit 1004 may select, as the tracking target, the candidate target having the largest similarity from the plurality of candidate targets when the similarity between the plurality of candidate targets and the detection target is greater than the first threshold. And reset the number of traces corresponding to the detection target.
  • the determining unit 1004 may further determine whether the tracking number is 0 when the similarity between all candidate targets and the detection target is not greater than the first threshold; if the tracking number is not 0, the tracking number is Minus 1, And the detection target is retained on the current frame; if the tracking number is 0, the detection target is not retained on the current frame.
  • the apparatus 900 may further include: a downsampling unit 904, which may perform the above-mentioned key frame of the video image and/or the target area/candidate target area on the normal frame. Sampling for image feature matching. As mentioned before, it will not be described here.
  • FIG. 11 is a schematic diagram of an embodiment of the second detecting unit 903 of the present embodiment.
  • the second detecting unit 903 may include a matching unit 1101 and a processing unit 1102 .
  • the matching unit 1101 is configured to match each detection target on the other key frames with each candidate target on the previous frame; the processing unit 1102 is configured to use the overlapping area of the detection target and the candidate target to be greater than the second a threshold value, and when the matching score of the detection target and the candidate target is greater than the first threshold, assigning, to the detection target, an identifier of the candidate target on the matching target; the overlapping area of the detection target and the candidate target is not greater than a second threshold, or When the matching score of the detection target and the candidate target is not greater than the first threshold, a new identifier is assigned to the detection target.
  • the processing unit 1102 may also assign an identifier to all the detection targets on the other key frames, and if there are still no candidate targets on the previous frame, the other frames are not retained on the other key frames. There are no candidate targets on the match.
  • target detection is performed only on key frames of the video image, and target relocation of the normal frame of the video image is performed, which can improve the accuracy of target detection and reduce time consumption.
  • This embodiment also provides a computer system configured with the target detection device 900 as described above.
  • FIG. 12 is a schematic block diagram showing the system configuration of a computer system 1200 according to an embodiment of the present invention.
  • the computer system 1200 can include a central processor 1201 and a memory 1202; the memory 1202 is coupled to the central processor 1201.
  • the figure is exemplary; other types of structures may be used in addition to or in place of the structure to implement telecommunications functions or other functions.
  • the functionality of the target detection device 900 can be integrated into the central processor 1201.
  • the central processing unit 1201 may be configured to implement the target detection method described in Embodiment 1.
  • the central processing unit 1201 can be configured to perform control for detecting the target on the first key frame based on the depth neural network for the first key frame of the video image to obtain the first key frame. All detection targets and assign an identifier to each detection target; for normal frames after each key frame, according to a target detection result of one frame, determining a candidate region corresponding to each detection target in the current frame, repositioning each detection target according to the candidate region, and obtaining a tracking target on the current frame; for other key frames of the video image, based on The deep neural network detects the targets on the other key frames, obtains all the detection targets on the other key frames, and integrates the detection targets on the other key frames according to the relocation result of the previous frame.
  • the target detecting device 900 can be configured separately from the central processing unit 1201.
  • the target detecting device 900 can be configured as a chip connected to the central processing unit 1201, and the target detecting device can be realized by the control of the central processing unit 1201. 900 features.
  • the computer system 1200 can further include an input unit 1203, an audio processing unit 1204, a display 1205, and a power supply 1206. It should be noted that the computer system 1200 does not necessarily include all of the components shown in FIG. 12; in addition, the computer system 1200 may also include components not shown in FIG. 12, and reference may be made to the prior art.
  • central processor 1201 also sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device that receives input and controls various portions of computer system 1200. The operation of the part.
  • the memory 1202 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable device.
  • the above video image, feature matching, and the like can be stored, and a program for executing the related information can be stored.
  • the central processing unit 1201 can execute the program stored by the memory 1202 to implement information storage or processing and the like.
  • the functions of other components are similar to those of the existing ones and will not be described here.
  • the various components of computer system 1200 may be implemented by special purpose hardware, firmware, software or a combination thereof without departing from the scope of the invention.
  • the computer system may be a video surveillance system, but is not limited thereto.
  • target detection is performed only on key frames of the video image, and target relocation of the normal frame of the video image is performed, which can improve the accuracy of target detection and reduce time consumption.
  • Embodiments of the present invention also provide a computer readable program, wherein the program causes the target detecting device or computer system to perform the target detection described in Embodiment 1 when the program is executed in a target detecting device or a computer system method.
  • the embodiment of the present invention further provides a storage medium storing a computer readable program, wherein the computer readable program causes the target detecting device or the computer system to execute the target detecting method described in Embodiment 1.
  • the above apparatus and method of the present invention may be implemented by hardware or by hardware in combination with software.
  • the present invention relates to a computer readable program that, when executed by a logic component, enables the logic component to implement the apparatus or components described above, or to cause the logic component to implement the various methods described above Or steps.
  • the present invention also relates to a storage medium for storing the above program, such as a hard disk, a magnetic disk, an optical disk, a DVD, a flash memory, or the like.
  • the object detection method in the object detection apparatus described in connection with the embodiment of the present invention may be directly embodied as hardware, a software module executed by the processor, or a combination of both.
  • one or more of the functional blocks shown in Figures 9-11 and/or one or more combinations of functional blocks may correspond to individual software modules of a computer program flow, or to individual hardware modules.
  • These software modules may correspond to the respective steps shown in FIGS. 1, 5, and 7, respectively.
  • These hardware modules can be implemented, for example, by curing these software modules using a Field Programmable Gate Array (FPGA).
  • FPGA Field Programmable Gate Array
  • the software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.
  • a storage medium can be coupled to the processor to enable the processor to read information from, and write information to, the storage medium; or the storage medium can be an integral part of the processor.
  • the processor and the storage medium can be located in an ASIC.
  • the software module can be stored in the memory of the mobile terminal or in a memory card that can be inserted into the mobile terminal.
  • the software module can be stored in the MEGA-SIM card or a large-capacity flash memory device.
  • One or more of the functional blocks described with respect to Figures 9-11 and/or one or more combinations of functional blocks may be implemented as a general purpose processor, digital signal processor (DSP) for performing the functions described herein.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • One or more of the functional blocks described with respect to Figures 9-11 and/or one or more combinations of functional blocks may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors One or more microprocessors in conjunction with DSP communication or any other such configuration.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

A target detection method and device, and computer system. The method comprises: performing, with respect to a first key frame of video images and on the basis of a deep neural network, detection of a target in the first key frame to obtain all detection targets in the first key frame, and allocating an identifier to each of the detection targets (101); determining, with respect to each ordinary frame following the key frame and according to the target detection result of a previous frame, a candidate region corresponding to each detection target in a current frame, and re-positioning, according to the candidate region, said detection target to obtain a tracking target in the current frame (102); and performing, with respect to another key frame of the video images and on the basis of the deep neural network, detection of a target in the other key frame to obtain all detection targets in the other key frame, and performing, according to the re-positioning result of a previous frame, integration of the detection targets in the other key frame (103). The method of the present invention improves target detection precision and efficiency.

Description

目标检测方法、装置以及计算机系统Target detection method, device and computer system 技术领域Technical field
本发明涉及图像处理领域,特别涉及一种目标检测方法、装置以及计算机系统。The present invention relates to the field of image processing, and in particular, to a target detection method, apparatus, and computer system.
背景技术Background technique
目前,针对视频的目标检测方法注重于动态信息而忽略了静态目标。在针对图像的目标检测中,深度卷积神经网络实现了较高的精度,然而,针对视频,这种方法非常耗时并且不合格。Currently, target detection methods for video focus on dynamic information while ignoring static targets. In the target detection for images, the deep convolutional neural network achieves higher precision, however, for video, this method is very time consuming and unqualified.
应该注意,上面对技术背景的介绍只是为了方便对本发明的技术方案进行清楚、完整的说明,并方便本领域技术人员的理解而阐述的。不能仅仅因为这些方案在本发明的背景技术部分进行了阐述而认为上述技术方案为本领域技术人员所公知。It should be noted that the above description of the technical background is only for the purpose of facilitating a clear and complete description of the technical solutions of the present invention, and is convenient for understanding by those skilled in the art. The above technical solutions are not considered to be well known to those skilled in the art simply because these aspects are set forth in the background section of the present invention.
发明内容Summary of the invention
为了解决背景技术指出的问题,本发明实施例提供一种目标检测方法、装置以及计算机系统,其以深度神经网络(DNN,Deep Neural Network)作为检测器,同时检测动态和静态目标,减少了时间。In order to solve the problem pointed out by the background art, an embodiment of the present invention provides a target detection method, apparatus, and computer system, which use a deep neural network (DNN) as a detector to simultaneously detect dynamic and static targets, reducing time. .
根据本实施例的第一方面,提供了一种目标检测方法,其中,所述方法包括:According to a first aspect of the present invention, a target detection method is provided, wherein the method comprises:
对于视频图像的第一个关键帧,基于深度神经网络对所述第一个关键帧上的目标进行检测,得到所述第一个关键帧上的所有检测目标,并为每个检测目标分配标识;For the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all detection targets on the first key frame, and assigning an identifier to each detection target ;
对于每个关键帧之后的普通帧,根据上一帧的目标检测结果,确定当前帧上对应每个检测目标的候选区域,根据所述候选区域对每个检测目标进行重新定位,得到当前帧上的追踪目标;For the normal frame after each key frame, according to the target detection result of the previous frame, the candidate region corresponding to each detection target on the current frame is determined, and each detection target is repositioned according to the candidate region to obtain the current frame. Tracking target
对于视频图像的其他关键帧,基于深度神经网络对所述其他关键帧上的目标进行检测,得到所述其他关键帧上的所有检测目标,并根据上一帧的重新定位结果,对所述其他关键帧上的检测目标进行整合。For other key frames of the video image, the target on the other key frames is detected based on the depth neural network, and all the detection targets on the other key frames are obtained, and according to the relocation result of the previous frame, the other The detection targets on the key frames are integrated.
根据本实施例的第二方面,提供了一种目标检测装置,其中,所述装置包括:According to a second aspect of the present invention, there is provided an object detecting apparatus, wherein the apparatus comprises:
第一检测单元,其对于视频图像的第一个关键帧,基于深度神经网络对所述第一个关键帧上的目标进行检测,得到所述第一个关键帧上的所有检测目标,并为每个检 测目标分配标识;a first detecting unit, for the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all the detection targets on the first key frame, and Each inspection Measuring target allocation identifier;
重定位单元,其对于每个关键帧之后的普通帧,根据上一帧的目标检测结果,确定当前帧上对应每个检测目标的候选区域,根据所述候选区域对每个检测目标进行重新定位,得到当前帧上的追踪目标;a relocation unit, for a normal frame after each key frame, determining a candidate region corresponding to each detection target on the current frame according to the target detection result of the previous frame, and repositioning each detection target according to the candidate region , get the tracking target on the current frame;
第二检测单元,其对于视频图像的其他关键帧,基于深度神经网络对所述其他关键帧上的目标进行检测,得到所述其他关键帧上的所有检测目标,并根据上一帧的重新定位结果,对所述其他关键帧上的检测目标进行整合。a second detecting unit, for other key frames of the video image, detecting the target on the other key frames based on the depth neural network, obtaining all the detection targets on the other key frames, and repositioning according to the previous frame As a result, the detection targets on the other key frames are integrated.
根据本实施例的第三方面,提供了一种计算机系统,其中,所述计算机系统包括前述第二方面所述的装置。According to a third aspect of the present invention, there is provided a computer system, wherein the computer system comprises the apparatus of the aforementioned second aspect.
本发明实施例的有益效果在于:通过本发明实施例,能够提高目标检测的精度并减少时间消耗。The beneficial effects of the embodiments of the present invention are that, by using the embodiments of the present invention, the accuracy of target detection can be improved and time consumption can be reduced.
参照后文的说明和附图,详细公开了本发明的特定实施方式,指明了本发明的原理可以被采用的方式。应该理解,本发明的实施方式在范围上并不因而受到限制。在所附权利要求的条款的范围内,本发明的实施方式包括许多改变、修改和等同。Specific embodiments of the present invention are disclosed in detail with reference to the following description and the drawings, in which <RTIgt; It should be understood that the embodiments of the invention are not limited in scope. The embodiments of the present invention include many variations, modifications, and equivalents within the scope of the appended claims.
针对一种实施方式描述和/或示出的特征可以以相同或类似的方式在一个或更多个其它实施方式中使用,与其它实施方式中的特征相组合,或替代其它实施方式中的特征。Features described and/or illustrated with respect to one embodiment may be used in one or more other embodiments in the same or similar manner, in combination with, or in place of, features in other embodiments. .
应该强调,术语“包括/包含”在本文使用时指特征、整件、步骤或组件的存在,但并不排除一个或更多个其它特征、整件、步骤或组件的存在或附加。It should be emphasized that the term "comprising" or "comprises" or "comprising" or "comprising" or "comprising" or "comprising" or "comprises"
附图说明DRAWINGS
在本发明实施例的一个附图或一种实施方式中描述的元素和特征可以与一个或更多个其它附图或实施方式中示出的元素和特征相结合。此外,在附图中,类似的标号表示几个附图中对应的部件,并可用于指示多于一种实施方式中使用的对应部件。The elements and features described in one of the figures or one embodiment of the embodiments of the invention may be combined with the elements and features illustrated in one or more other figures or embodiments. In the accompanying drawings, like reference numerals refer to the
所包括的附图用来提供对本发明实施例的进一步的理解,其构成了说明书的一部分,用于例示本发明的实施方式,并与文字描述一起来阐释本发明的原理。显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。在附图中:The accompanying drawings are included to provide a further understanding of the embodiments of the invention Obviously, the drawings in the following description are only some of the embodiments of the present invention, and those skilled in the art can obtain other drawings according to the drawings without any inventive labor. In the drawing:
图1是实施例1的目标检测方法的示意图; 1 is a schematic diagram of a target detecting method of Embodiment 1;
图2是实施例1的目标检测方法的架构示意图2 is a schematic structural diagram of a target detecting method of Embodiment 1
图3是对视频图像的关键帧进行目标检测的示意图;3 is a schematic diagram of target detection of key frames of a video image;
图4是对检测目标进行重定位的示意图;4 is a schematic diagram of repositioning a detection target;
图5是对检测目标进行重定位的流程图;Figure 5 is a flow chart for relocating a detection target;
图6是对检测目标进行重定位的一个实施方式的整体示意图;Figure 6 is an overall schematic view of one embodiment of repositioning a detection target;
图7是对其他关键帧上的检测目标进行整合的示意图;Figure 7 is a schematic diagram of integration of detection targets on other key frames;
图8是对其他关键帧上的检测目标进行整合的一个实施方式的整体示意图;8 is an overall schematic diagram of one embodiment of integrating detection targets on other key frames;
图9是实施例2的目标检测装置的示意图;Figure 9 is a schematic diagram of a target detecting device of Embodiment 2;
图10是实施例2的目标检测装置的重定位单元的示意图;Figure 10 is a schematic diagram of a relocating unit of the object detecting device of Embodiment 2;
图11是实施例2的目标检测装置的第二检测单元的示意图;Figure 11 is a schematic diagram of a second detecting unit of the object detecting device of Embodiment 2;
图12是实施例3的计算机系统的示意图。Figure 12 is a schematic illustration of a computer system of the third embodiment.
具体实施方式detailed description
参照附图,通过下面的说明书,本发明的前述以及其它特征将变得明显。在说明书和附图中,具体公开了本发明的特定实施方式,其表明了其中可以采用本发明的原则的部分实施方式,应了解的是,本发明不限于所描述的实施方式,相反,本发明包括落入所附权利要求的范围内的全部修改、变型以及等同物。下面结合附图对本发明的各种实施方式进行说明。这些实施方式只是示例性的,不是对本发明的限制。The foregoing and other features of the present invention will be apparent from the The specific embodiments of the present invention are disclosed in the specification and the drawings, which are illustrated in the embodiment of the invention The invention includes all modifications, variations and equivalents falling within the scope of the appended claims. Various embodiments of the present invention will be described below with reference to the accompanying drawings. These embodiments are merely exemplary and are not limiting of the invention.
下面结合附图对本发明实施例进行说明。The embodiments of the present invention will be described below with reference to the accompanying drawings.
实施例1Example 1
本实施例提供了一种目标检测方法,图1是该方法的示意图,如图1所示,该方法包括:This embodiment provides a target detection method, and FIG. 1 is a schematic diagram of the method. As shown in FIG. 1, the method includes:
步骤101:对于视频图像的第一个关键帧,基于深度神经网络对所述第一个关键帧上的目标进行检测,得到所述第一个关键帧上的所有检测目标,并为每个检测目标分配标识;Step 101: For the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all the detection targets on the first key frame, and for each detection Target allocation identifier;
步骤102:对于每个关键帧之后的普通帧,根据上一帧的目标检测结果,确定当前帧上对应每个检测目标的候选区域,根据所述候选区域对每个检测目标进行重新定位,得到当前帧上的追踪目标;Step 102: For a normal frame after each key frame, according to the target detection result of the previous frame, determine a candidate region corresponding to each detection target in the current frame, and reposition each detection target according to the candidate region, and obtain Tracking target on the current frame;
步骤103:对于视频图像的其他关键帧,基于深度神经网络对所述其他关键帧上 的目标进行检测,得到所述其他关键帧上的所有检测目标,并根据上一帧的重新定位结果,对所述其他关键帧上的检测目标进行整合。Step 103: For other key frames of the video image, based on the depth neural network on the other key frames The target is detected, and all the detection targets on the other key frames are obtained, and the detection targets on the other key frames are integrated according to the relocation result of the previous frame.
图2示意了本实施例的目标检测方法的整体架构,如图2所示,对于关键帧,本实施例基于深度神经网络进行目标检测,得到每个关键帧上的所有目标,称为检测目标;对于普通帧,不再进行目标检测,而是基于上一帧的检测结果/目标重新定位结果,重新定位目标,称为追踪目标。此外,对于除了第一个关键帧以外的关键帧,不仅检测该关键帧上的所有目标,还有基于前一帧(普通帧)的重新定位结果,对该关键帧上的目标进行整合,以免丢失目标或者重复识别目标或者错误识别目标。FIG. 2 illustrates the overall architecture of the target detection method of the present embodiment. As shown in FIG. 2, for a key frame, the present embodiment performs target detection based on a deep neural network, and obtains all targets on each key frame, which is called a detection target. For normal frames, the target detection is no longer performed, but based on the detection result/target repositioning result of the previous frame, the target is repositioned, which is called the tracking target. In addition, for key frames other than the first key frame, not only all the targets on the key frame are detected, but also the target on the key frame is integrated based on the relocation result of the previous frame (normal frame), so as to avoid Lose the target or repeatedly identify the target or misidentify the target.
通过本实施例的方法,只对关键帧进行目标检测,能够以较少的时间消耗实现较高的检测精度。With the method of the embodiment, target detection is performed only on key frames, and high detection accuracy can be achieved with less time consumption.
在步骤101中,上述目标检测可以通过DNN检测器来实现,也即,基于深度神经网络进行目标检测。对于深度神经网络的工作原理,可以参考现有技术,本实施例不再详细说明。通过步骤101对关键帧上的目标的检测,可以得到关键帧上的所有目标(称为检测目标),如图3所示,每个检测目标会被分配一个标识(ID)用于指示该检测目标。本实施例对该标识的类型不作限制,其可以是一个指定的数字编号,也可以指示该检测目标的属性等。In step 101, the target detection described above may be implemented by a DNN detector, that is, based on a deep neural network for target detection. For the working principle of the deep neural network, reference may be made to the prior art, which is not described in detail in this embodiment. By detecting the target on the key frame in step 101, all the targets on the key frame (referred to as detection targets) can be obtained. As shown in FIG. 3, each detection target is assigned an identifier (ID) for indicating the detection. aims. This embodiment does not limit the type of the identifier, and may be a specified number, or may indicate the attribute of the detection target and the like.
在本实施例中,如图2所示,在关键帧之后跟着n个普通帧,本实施例对n的取值不作限制,n的取值可以考虑目标检测的精度和计算量(与时间消耗相关,计算量大,时间消耗多,计算量小,时间消耗少),如果希望提高目标检测的精度,可以将n设置为较小的值,如果希望降低计算量,也即减少时间消耗,n可以设置为较大的值,优选可以小于10。In this embodiment, as shown in FIG. 2, following the key frame followed by n normal frames, the value of n is not limited in this embodiment, and the value of n can take into account the accuracy of the target detection and the amount of calculation (with time consumption). Correlation, large amount of calculation, time consumption, small amount of calculation, less time consumption), if you want to improve the accuracy of target detection, you can set n to a smaller value, if you want to reduce the amount of calculation, that is, reduce the time consumption, n It can be set to a larger value, preferably less than 10.
在步骤102中,对于紧邻关键帧的一个普通帧,本实施例根据该关键帧的目标检测结果(得到了检测目标)重新定位检测目标,对于其他普通帧,本实施例根据上一个普通帧的目标重新定位结果(得到了追踪目标)重新定位追踪目标。由于重新定位结果得到的追踪目标也是关键帧上的检测目标,为了方便说明,将目标重新定位结果称为目标检测结果,将追踪目标称为检测目标。也即,步骤102提到的目标检测结果包含了对关键帧进行目标检测得到的目标检测结果,也包含了对普通帧进行目标重定位得到的目标重新定位结果。同理,步骤102提到的检测目标包含了对关键帧进行目标检测得到的检测目标,也包含了对普通帧进行目标重定位得到的追踪目标。 In step 102, for a normal frame in the immediate vicinity of the key frame, the embodiment re-locates the detection target according to the target detection result of the key frame (the detection target is obtained). For other normal frames, the embodiment is based on the previous normal frame. Target retargeting results (getting tracked targets) retargeting tracking targets. Since the tracking target obtained by the repositioning result is also the detection target on the key frame, for convenience of explanation, the target repositioning result is referred to as the target detection result, and the tracking target is referred to as the detection target. That is, the target detection result mentioned in step 102 includes the target detection result obtained by performing target detection on the key frame, and also includes the target relocation result obtained by performing target relocation on the normal frame. Similarly, the detection target mentioned in step 102 includes the detection target obtained by performing target detection on the key frame, and also includes the tracking target obtained by performing target relocation on the normal frame.
在步骤102中,可以通过对上述检测目标的边界框进行扩展,得到该检测目标的候选区域。In step 102, a candidate region of the detection target may be obtained by expanding a bounding box of the detection target.
图4示意了针对某个检测目标进行目标重定位的示意图。Figure 4 illustrates a schematic diagram of target relocation for a certain detection target.
如图4所示,对于上一帧上的某一个目标,如图4左边的图像中椭圆形中的目标,本实施例可以搜索针对该目标的当前帧的候选区域,以便在该后续区域中重新定位该目标,如图4右边的图像中加粗线条的矩形内的目标,并为其分配相应的标识。As shown in FIG. 4, for a target on the previous frame, as in the object in the ellipse in the image on the left side of FIG. 4, the embodiment may search for a candidate region of the current frame for the target, so as to be in the subsequent region. Reposition the target, as shown in the image on the right side of Figure 4, in the bolded rectangle, and assign a corresponding identifier to it.
在本实施例中,可以使用参数
Figure PCTCN2016101237-appb-000001
对原始目标(上一帧上的目标)的边界框进行扩展得到该候选区域。如果原始目标的尺寸为B_w和B_h,那么扩展之后的候选区域的尺寸为:
In this embodiment, parameters can be used.
Figure PCTCN2016101237-appb-000001
The bounding box of the original target (the target on the previous frame) is expanded to obtain the candidate region. If the size of the original target is B_w and B_h, the size of the candidate area after the expansion is:
Figure PCTCN2016101237-appb-000002
Figure PCTCN2016101237-appb-000002
Figure PCTCN2016101237-appb-000003
Figure PCTCN2016101237-appb-000003
在本实施例中,
Figure PCTCN2016101237-appb-000004
的取值可以根据需要设置,优选可以大于1.5。
In this embodiment,
Figure PCTCN2016101237-appb-000004
The value can be set as needed, preferably greater than 1.5.
上述对边界框进行扩展以得到该后续区域的方法只是举例说明,本实施例并不以此作为限制,其他可实施的扩展方法也可以适用。The method for extending the bounding box to obtain the subsequent area is only an example, and the embodiment is not limited thereto, and other implementable extension methods are also applicable.
在步骤102中,得到了针对检测目标的候选区域,即可根据该候选区域对该检测目标进行重定位。图5提供了一种重定位的方法,如图5所示,该方法包括:In step 102, a candidate region for the detection target is obtained, and the detection target can be relocated according to the candidate region. Figure 5 provides a method of relocation, as shown in Figure 5, the method includes:
步骤501:使用预定步长遍历上述候选区域,得到对应该检测目标的多个候选目标;Step 501: traverse the candidate area by using a predetermined step size to obtain a plurality of candidate targets corresponding to the detection target;
步骤502:计算每个候选目标与上述检测目标的相似度;Step 502: Calculate the similarity between each candidate target and the detection target.
步骤503:根据上述相似度确定与该检测目标匹配的候选目标,将匹配的候选目标作为该检测目标在当前帧上的追踪目标。Step 503: Determine a candidate target that matches the detection target according to the similarity degree, and use the matched candidate target as the tracking target of the detection target on the current frame.
在本实施例中,如果只有一个候选目标与上述检测目标的相似度大于第一阈值,则将该候选目标作为上述追踪目标,并重置该检测目标对应的追踪数。In this embodiment, if only one candidate target has a similarity with the detection target greater than the first threshold, the candidate target is used as the tracking target, and the tracking number corresponding to the detection target is reset.
在本实施例中,如果有多个候选目标与上述检测目标的相似度大于第一阈值,则从上述多个候选目标中选择具有最大相似度的候选目标作为上述追踪目标,并重置该检测目标对应的追踪数。In this embodiment, if the similarity between the plurality of candidate targets and the detection target is greater than the first threshold, selecting the candidate target having the greatest similarity from the plurality of candidate targets as the tracking target, and resetting the detection. The number of tracking corresponding to the target.
在本实施例中,如果所有候选目标与上述检测目标的相似度都不大于第一阈值,则判断该检测目标对应的追踪数是否为0;如果追踪数不为0,则将追踪数减1,并在当前帧上保留该检测目标;如果追踪数为0,则在当前帧上不保留该检测目标。 In this embodiment, if the similarity between all the candidate targets and the detection target is not greater than the first threshold, it is determined whether the number of tracking corresponding to the detection target is 0; if the number of tracking is not 0, the number of tracking is decreased by 1 And retain the detection target on the current frame; if the number of tracking is 0, the detection target is not retained on the current frame.
在本实施例中,通过使用步长d来遍历上述后续区域,得到对应该检测目标的多个候选目标,如图4右侧的所有矩形所示的范围。通过设置该步长d可以减小后续目标的数量,从而减少计算量。在一个实施方式中,d可以设置为小于4个像素,以保证每个候选目标的像素高于50为宜,但本实施例并不以此作为限制。In the present embodiment, by using the step size d to traverse the subsequent regions, a plurality of candidate targets corresponding to the detection target are obtained, as shown by all the rectangles on the right side of FIG. By setting the step size d, the number of subsequent targets can be reduced, thereby reducing the amount of calculation. In one embodiment, d may be set to be less than 4 pixels to ensure that the pixels of each candidate target are higher than 50, but this embodiment is not limited thereto.
在本实施例中,可以通过计算图像特征之间的相似度来确定候选目标是否与检测目标相匹配。在一个实施方式中,可以使用梯度差,但本实施例并不以此作为限制,其他计算相似度的方法也适用。由于计算相似度的方法是现有技术,本实施例对此不再详细说明。In the present embodiment, whether the candidate target matches the detection target can be determined by calculating the similarity between the image features. In one embodiment, a gradient difference can be used, but the embodiment is not limited thereto, and other methods of calculating the similarity are also applicable. Since the method of calculating the similarity is the prior art, this embodiment will not be described in detail.
在本实施例中,通过设置第一阈值来找到与检测目标匹配的候选目标,然而,如果所有的候选目标与检测目标的相似度都不大于该第一阈值,也就是说,在当前帧没有找到该检测目标,那么本实施例不是直接在当前帧去除该检测目标,而是通过追踪数(keep_track)这个参数保留几帧的该检测目标,以避免判断错误。In this embodiment, the candidate target matching the detection target is found by setting the first threshold, however, if the similarity of all the candidate targets and the detection target is not greater than the first threshold, that is, there is no current frame. If the detection target is found, then in this embodiment, the detection target is not directly removed in the current frame, but the detection target of several frames is reserved by the keep_track parameter to avoid the judgment error.
在本实施例中,为关键帧上的每一个检测目标设置了一个参数,称为keep_track,如果当前帧上没有与原始检测目标匹配的候选目标,本实施例并不马上删除该检测目标的标识,而是使keep track的值减1,并保留该检测目标而不更新其位置,直到keep_track的值等于0。如果检测目标在keep_track为0之前与上一帧上的某个候选目标匹配,该keep_track将被重置。在本实施例中,keep_track的值可被设置为[3 5],也即其可以为3或4或5,然而,本实施例并不以此作为限制,其也可以设置为其他值。In this embodiment, a parameter is set for each detection target on the key frame, which is called keep_track. If there is no candidate target matching the original detection target on the current frame, the embodiment does not immediately delete the identifier of the detection target. Instead, the value of the keep track is decremented by one and the detection target is retained without updating its position until the value of keep_track is equal to zero. If the detection target matches a candidate target on the previous frame before keep_track is 0, the keep_track will be reset. In the present embodiment, the value of keep_track can be set to [3 5], that is, it can be 3 or 4 or 5. However, the embodiment is not limited thereto, and it can also be set to other values.
在本实施例中,如果该视频图像是高分辨率的,本实施例的方法中实施之前还可以对该视频图像的每一帧图像(关键帧和普通帧)进行下采样,例如使用因子δ对每一帧图像的目标区域或候选目标区域进行下采样,本实施例对下采样的具体实施方式不做限制,可以采用现有手段。In this embodiment, if the video image is high resolution, each frame image (key frame and normal frame) of the video image may be downsampled before implementation in the method of the embodiment, for example, using a factor δ. The target area or the candidate target area of each frame of the image is down-sampled. In this embodiment, the specific implementation manner of the downsampling is not limited, and the existing means may be used.
图6是本实施例的重定位方法的一个实施方式的整体流程图,请参照图6,该方法包括:FIG. 6 is an overall flowchart of an embodiment of a relocation method according to the embodiment. Referring to FIG. 6, the method includes:
步骤601:对上一帧图像的目标区域进行下采样并进行特征提取;Step 601: Downsampling a target area of the previous frame image and performing feature extraction;
步骤602:对当前帧图像的候选目标区域进行下采样并进行特征提取;Step 602: Downsampling a candidate target area of the current frame image and performing feature extraction.
步骤603:特征匹配,得到匹配分数;Step 603: Feature matching, and obtaining a matching score;
步骤604:判断匹配分数是否大于第一阈值,如果判断为是,则执行步骤605;否则执行步骤606; Step 604: Determine whether the matching score is greater than the first threshold, if the determination is yes, proceed to step 605; otherwise, perform step 606;
步骤605:选择最佳的候选目标;Step 605: Select an optimal candidate target;
步骤606:将keep_track减1;Step 606: Decrease keep_track by one;
步骤607:判断keep_track是否为0,如果判断为否,则在当前帧保留该检测目标的位置;否则在当前帧不保留该检测目标的位置。Step 607: Determine whether the keep_track is 0. If the determination is no, the position of the detection target is reserved in the current frame; otherwise, the position of the detection target is not retained in the current frame.
在步骤601和步骤602中,所说的特征提取是指提取上一帧图像上的检测目标/当前帧图像上的候选目标的特征,以便进行特征匹配,例如计算相似度等。并且,本实施例对步骤601和步骤602的执行顺序不作限制,可以同时执行,也可以分开执行。In step 601 and step 602, the feature extraction refers to extracting features of candidate objects on the detection target/current frame image on the previous frame image for feature matching, such as calculating similarity and the like. Moreover, the execution order of step 601 and step 602 is not limited in this embodiment, and may be performed simultaneously or separately.
在本实施例中,当通过步骤102对下一个关键帧之前的最后一个普通帧进行处理完毕之后,本实施例进入对下一个关键帧(也即除第一个关键帧以外的其他关键帧)的处理(步骤103)。In this embodiment, after processing the last normal frame before the next key frame by step 102, the embodiment enters the next key frame (that is, other key frames except the first key frame). Processing (step 103).
在步骤103中,对于该下一个关键帧,除了进行与步骤101相同的目标检测处理以外,还要根据上一帧(上述最后一个普通帧)的重新定位结果,对该其他帧上的检测目标进行整合,也即,根据上一个普通帧中的追踪目标,将目标标识赋予当前关键帧的检测结果,由此可以确定是否有目标已经离开了,是否有新的目标进入等。In step 103, for the next key frame, in addition to the same target detection process as in step 101, the detection target on the other frame is also based on the relocation result of the previous frame (the last normal frame described above). Integration is performed, that is, the target identification is assigned to the detection result of the current key frame according to the tracking target in the previous normal frame, thereby determining whether or not the target has left, whether there is a new target entering, or the like.
图7提供了一种整合方法,如图7所示,该方法包括:Figure 7 provides an integration method, as shown in Figure 7, which includes:
步骤701:将所述其他关键帧上的每个检测目标与前一帧上的每个候选目标进行匹配;Step 701: Match each detection target on the other key frames with each candidate target on the previous frame;
步骤702:如果所述检测目标与所述候选目标的重叠区域大于第二阈值,并且所述检测目标与所述候选目标的匹配分数大于第一阈值,则为所述检测目标分配匹配上的候选目标的标识;Step 702: If the overlapping area of the detection target and the candidate target is greater than a second threshold, and the matching score of the detection target and the candidate target is greater than a first threshold, assigning a candidate for matching to the detection target The identity of the target;
步骤703:如果所述检测目标与所述候选目标的重叠区域不大于第二阈值,或者所述检测目标与所述候选目标的匹配分数不大于第一阈值,则为所述检测目标分配新的标识。Step 703: If the overlapping area of the detection target and the candidate target is not greater than a second threshold, or the matching score of the detection target and the candidate target is not greater than a first threshold, assigning a new target to the detection target Logo.
在本实施例中,如果该其他关键帧上的所有检测目标被分配了标识,而上述前一帧上仍然有没有匹配上的候选目标,则不在该其他关键帧上保留没有匹配上的候选目标。In this embodiment, if all the detection targets on the other key frames are assigned the identifier, and there is still no matching candidate target on the previous frame, the candidate target without the matching is not retained on the other key frames. .
图8是本实施例的该整合方法的一个实施方式的整体流程图,请参照图8,该方法包括:FIG. 8 is an overall flowchart of an embodiment of the integration method of the embodiment. Referring to FIG. 8, the method includes:
步骤801:重叠矩阵IOU;Step 801: overlapping matrix IOU;
步骤802:判断IOUij是否大于第二阈值,如果判断为是,则执行步骤803,否则 为目标i分配新的标识;Step 802: Determine whether the IOU ij is greater than the second threshold. If the determination is yes, perform step 803, otherwise assign a new identifier to the target i;
步骤803:图像特征匹配;Step 803: Image feature matching;
步骤804:判断匹配分数是否大于第一阈值,如果判断为是,则为目标i分配目标j的标识,此时IOU矩阵为(N-1)×(M-1)的矩阵,并执行步骤805,否则为目标i分配新的标识,此时IOU矩阵为(N-1)×M的矩阵;Step 804: Determine whether the matching score is greater than the first threshold. If the determination is yes, assign the identifier of the target j to the target i. At this time, the IOU matrix is a matrix of (N-1)×(M-1), and step 805 is performed. Otherwise, a new identifier is assigned to the target i, and the IOU matrix is a matrix of (N-1)×M;
步骤805:判断IOU行数是否大于0,如果判断为是,则回到步骤801,否则执行步骤806;Step 805: Determine whether the number of IOU rows is greater than 0. If the determination is yes, go back to step 801, otherwise go to step 806;
步骤806:判断IOU列数是否大于0,如果判断为是,则在当前关键帧上不保留没匹配上的候选目标,否则结束。Step 806: Determine whether the number of IOU columns is greater than 0. If the determination is yes, the candidate targets that are not matched are not retained on the current key frame, otherwise the process ends.
在本实施例中,如图8所示,首先计算来自新的关键帧的检测目标与来自上一个普通帧的追踪目标之间的重叠,得到N×M的IOU(Intersection Over Union,交除并)矩阵,对于该关键帧的检测目标i,如果普通帧中的追踪目标j和该关键帧的检测目标i有重叠,重叠区域IOUij大于第二阈值th2,并且,这两个目标的匹配分数大于第一阈值,那么对该检测目标i赋予与追踪目标j相同的标识,否则,对该检测目标i赋予一个新的标识。In this embodiment, as shown in FIG. 8, the overlap between the detection target from the new key frame and the tracking target from the previous normal frame is first calculated, and an N*M IOU (Intersection Over Union) is obtained. a matrix, for the detection target i of the key frame, if the tracking target j in the normal frame overlaps with the detection target i of the key frame, the overlapping region IOU ij is greater than the second threshold th2, and the matching scores of the two targets If it is greater than the first threshold, the detection target i is given the same identifier as the tracking target j, otherwise, a new identifier is assigned to the detection target i.
在本实施例中,图像特征匹配的过程与前述相同,例如通过计算相似度(匹配分数)的方法来实现,此处不再赘述。In this embodiment, the process of image feature matching is the same as the foregoing, for example, by calculating a similarity (matching score), and details are not described herein again.
在本实施例中,如果追踪目标j与检测目标i匹配,那么将从IOU矩阵中删除第j列,即目标j不再被匹配。因此,为了鲁棒性,首先对IOU矩阵各行的最大值进行由大到小的排序,并据此安排IOU矩阵的各行位置,进而从矩阵第一行开始进行匹配。如果关键帧中所有的检测目标被分配了标识,而普通帧中仍然有没有匹配上的追踪目标,那么这些目标将被去除。通过这样的处理,可以确定目标是否离开了视觉范围,是否有新的目标进入等,并得出视觉范围内的目标数量。In the present embodiment, if the tracking target j matches the detection target i, the jth column will be deleted from the IOU matrix, that is, the target j is no longer matched. Therefore, for robustness, the maximum values of the rows of the IOU matrix are first sorted from large to small, and the row positions of the IOU matrix are arranged accordingly, and then the matching is performed from the first row of the matrix. If all the detection targets in the key frame are assigned an identifier, and there are still matching tracking targets in the normal frame, then these targets will be removed. Through such processing, it is possible to determine whether the target has left the visual range, whether there is a new target entering, etc., and to derive the number of targets within the visual range.
通过本实施例的方法,只对视频图像的关键帧进行目标检测,而对视频图像的普通帧进行目标重定位,能够提高目标检测的精度并减少时间消耗。With the method of the embodiment, only the target detection of the key frame of the video image is performed, and the target relocation of the normal frame of the video image can improve the accuracy of the target detection and reduce the time consumption.
实施例2Example 2
本实施例提供了一种目标检测装置,由于该装置解决问题的原理与实施例1的方法类似,因此其具体的实施可以参考实施例1的方法的实施,内容相同之处不再重复 说明。The embodiment of the present invention provides a target detection device. The principle of the device is similar to that of the first embodiment. Therefore, the specific implementation may refer to the implementation of the method in the first embodiment. Description.
图9是本实施例的目标检测装置的示意图,如图9所示,该装置900包括:第一检测单元901、重定位单元902和第二检测单元903。FIG. 9 is a schematic diagram of the object detecting apparatus of the present embodiment. As shown in FIG. 9, the apparatus 900 includes a first detecting unit 901, a relocating unit 902, and a second detecting unit 903.
该第一检测单元901用于对于视频图像的第一个关键帧,基于深度神经网络对该第一个关键帧上的目标进行检测,得到该第一个关键帧上的所有检测目标,并为每个检测目标分配标识。The first detecting unit 901 is configured to detect, on the first key frame of the video image, the target on the first key frame based on the depth neural network, and obtain all the detection targets on the first key frame, and Each detection target is assigned an identification.
该重定位单元902用于对于每个关键帧之后的普通帧,根据上一帧的目标检测结果,确定当前帧上对应每个检测目标的候选区域,根据该候选区域对每个检测目标进行重新定位,得到当前帧上的追踪目标;The relocation unit 902 is configured to determine a candidate region corresponding to each detection target on the current frame according to the target detection result of the previous frame for the normal frame after each key frame, and re-inspect each detection target according to the candidate region. Positioning to get the tracking target on the current frame;
第二检测单元903用于对于视频图像的其他关键帧,基于深度神经网络对该其他关键帧上的目标进行检测,得到该其他关键帧上的所有检测目标,并根据上一帧的重新定位结果,对该其他关键帧上的检测目标进行整合。The second detecting unit 903 is configured to detect, on the other key frames of the video image, the targets on the other key frames based on the depth neural network, obtain all the detection targets on the other key frames, and obtain the relocation result according to the previous frame. , to integrate the detection targets on other key frames.
图10是本实施例的重定位单元902的一个实施方式的示意图。FIG. 10 is a schematic diagram of one embodiment of the relocating unit 902 of the present embodiment.
如图10所示,在本实施方式中,该重定位单元902可以包括扩展单元1001,其对该检测目标的边界框进行扩展,得到该检测目标的候选区域。具体的扩展方法可以参考对图4的说明。As shown in FIG. 10, in the present embodiment, the relocation unit 902 may include an extension unit 1001 that expands a bounding box of the detection target to obtain a candidate region of the detection target. For a specific extension method, reference may be made to the description of FIG.
如图10所示,在本实施方式中,该重定位单元902还可以包括:遍历单元1002、计算单元1003,以及确定单元1004,该遍历单元1002可以使用预定步长遍历上述候选区域,得到对应该检测目标的多个候选目标;该计算单元1003可以计算每个候选目标与该检测目标的相似度;该确定单元1004可以根据该相似度确定与该检测目标匹配的候选目标,将匹配的候选目标作为该检测目标在当前帧上的追踪目标。As shown in FIG. 10, in the embodiment, the relocating unit 902 may further include: a traversing unit 1002, a calculating unit 1003, and a determining unit 1004, where the traversing unit 1002 may traverse the candidate area by using a predetermined step size to obtain a pair. A plurality of candidate targets of the target should be detected; the calculating unit 1003 can calculate a similarity between each candidate target and the detection target; the determining unit 1004 can determine a candidate target that matches the detection target according to the similarity, and the matching candidate The target is the tracking target of the detection target on the current frame.
在本实施方式中,该确定单元1004可以在有一个候选目标与上述检测目标的相似度大于第一阈值时,将该候选目标作为上述追踪目标,并重置该检测目标对应的追踪数。In the present embodiment, the determining unit 1004 may use the candidate target as the tracking target when the similarity between the candidate target and the detection target is greater than the first threshold, and reset the tracking number corresponding to the detection target.
在本实施方式中,该确定单元1004也可以在有多个候选目标与上述检测目标的相似度大于第一阈值时,从该多个候选目标中选择具有最大相似度的候选目标作为上述追踪目标,并重置该检测目标对应的追踪数。In the present embodiment, the determining unit 1004 may select, as the tracking target, the candidate target having the largest similarity from the plurality of candidate targets when the similarity between the plurality of candidate targets and the detection target is greater than the first threshold. And reset the number of traces corresponding to the detection target.
在本实施方式中,该确定单元1004还可以在所有候选目标与所述检测目标的相似度都不大于第一阈值时,判断追踪数是否为0;如果追踪数不为0,则将追踪数减1, 并在当前帧上保留该检测目标;如果追踪数为0,则在当前帧上不保留该检测目标。In this embodiment, the determining unit 1004 may further determine whether the tracking number is 0 when the similarity between all candidate targets and the detection target is not greater than the first threshold; if the tracking number is not 0, the tracking number is Minus 1, And the detection target is retained on the current frame; if the tracking number is 0, the detection target is not retained on the current frame.
在本实施例中,如图9所示,该装置900还可以包括:下采样单元904,其可以对上述视频图像的上述关键帧和/或上述普通帧上的目标区域/候选目标区域进行下采样,以便进行图像特征的匹配。如前所述,此处不再赘述。In this embodiment, as shown in FIG. 9, the apparatus 900 may further include: a downsampling unit 904, which may perform the above-mentioned key frame of the video image and/or the target area/candidate target area on the normal frame. Sampling for image feature matching. As mentioned before, it will not be described here.
图11是本实施例的第二检测单元903的一个实施方式的示意图。FIG. 11 is a schematic diagram of an embodiment of the second detecting unit 903 of the present embodiment.
如图11所示,在本实施方式中,该第二检测单元903可以包括匹配单元1101和处理单元1102。该匹配单元1101用于将上述其他关键帧上的每个检测目标与前一帧上的每个候选目标进行匹配;该处理单元1102用于在该检测目标与上述候选目标的重叠区域大于第二阈值,并且该检测目标与上述候选目标的匹配分数大于第一阈值时,为该检测目标分配匹配上的候选目标的标识;在该检测目标与上述候选目标的重叠区域不大于第二阈值,或者该检测目标与上述候选目标的匹配分数不大于第一阈值时,为该检测目标分配新的标识。As shown in FIG. 11 , in the embodiment, the second detecting unit 903 may include a matching unit 1101 and a processing unit 1102 . The matching unit 1101 is configured to match each detection target on the other key frames with each candidate target on the previous frame; the processing unit 1102 is configured to use the overlapping area of the detection target and the candidate target to be greater than the second a threshold value, and when the matching score of the detection target and the candidate target is greater than the first threshold, assigning, to the detection target, an identifier of the candidate target on the matching target; the overlapping area of the detection target and the candidate target is not greater than a second threshold, or When the matching score of the detection target and the candidate target is not greater than the first threshold, a new identifier is assigned to the detection target.
在本实施方式中,该处理单元1102还可以在上述其他关键帧上的所有检测目标被分配了标识,而上述前一帧上仍然有没有匹配上的候选目标时,不在该其他关键帧上保留没有匹配上的候选目标。In this embodiment, the processing unit 1102 may also assign an identifier to all the detection targets on the other key frames, and if there are still no candidate targets on the previous frame, the other frames are not retained on the other key frames. There are no candidate targets on the match.
通过本实施例的装置,只对视频图像的关键帧进行目标检测,而对视频图像的普通帧进行目标重定位,能够提高目标检测的精度并减少时间消耗。With the apparatus of the embodiment, target detection is performed only on key frames of the video image, and target relocation of the normal frame of the video image is performed, which can improve the accuracy of target detection and reduce time consumption.
实施例3Example 3
本实施例还提供了一种计算机系统,配置有如前所述的目标检测别装置900。This embodiment also provides a computer system configured with the target detection device 900 as described above.
图12是本发明实施例的计算机系统1200的系统构成的示意框图。如图12所示,该计算机系统1200可以包括中央处理器1201和存储器1202;存储器1202耦合到中央处理器1201。值得注意的是,该图是示例性的;还可以使用其他类型的结构,来补充或代替该结构,以实现电信功能或其他功能。FIG. 12 is a schematic block diagram showing the system configuration of a computer system 1200 according to an embodiment of the present invention. As shown in FIG. 12, the computer system 1200 can include a central processor 1201 and a memory 1202; the memory 1202 is coupled to the central processor 1201. It should be noted that the figure is exemplary; other types of structures may be used in addition to or in place of the structure to implement telecommunications functions or other functions.
在一个实施方式中,目标检测别装置900的功能可以被集成到中央处理器1201中。其中,中央处理器1201可以被配置为实现实施例1所述的目标检测方法。In one embodiment, the functionality of the target detection device 900 can be integrated into the central processor 1201. The central processing unit 1201 may be configured to implement the target detection method described in Embodiment 1.
例如,该中央处理器1201可以被配置为进行如下控制:对于视频图像的第一个关键帧,基于深度神经网络对该第一个关键帧上的目标进行检测,得到该第一个关键帧上的所有检测目标,并为每个检测目标分配标识;对于每个关键帧之后的普通帧,根据上 一帧的目标检测结果,确定当前帧上对应每个检测目标的候选区域,根据该候选区域对每个检测目标进行重新定位,得到当前帧上的追踪目标;对于视频图像的其他关键帧,基于深度神经网络对该其他关键帧上的目标进行检测,得到该其他关键帧上的所有检测目标,并根据上一帧的重新定位结果,对该其他关键帧上的检测目标进行整合。For example, the central processing unit 1201 can be configured to perform control for detecting the target on the first key frame based on the depth neural network for the first key frame of the video image to obtain the first key frame. All detection targets and assign an identifier to each detection target; for normal frames after each key frame, according to a target detection result of one frame, determining a candidate region corresponding to each detection target in the current frame, repositioning each detection target according to the candidate region, and obtaining a tracking target on the current frame; for other key frames of the video image, based on The deep neural network detects the targets on the other key frames, obtains all the detection targets on the other key frames, and integrates the detection targets on the other key frames according to the relocation result of the previous frame.
在另一个实施方式中,目标检测装置900可以与中央处理器1201分开配置,例如可以将目标检测装置900配置为与中央处理器1201连接的芯片,通过中央处理器1201的控制来实现目标检测装置900的功能。In another embodiment, the target detecting device 900 can be configured separately from the central processing unit 1201. For example, the target detecting device 900 can be configured as a chip connected to the central processing unit 1201, and the target detecting device can be realized by the control of the central processing unit 1201. 900 features.
如图12所示,该计算机系统1200还可以包括:输入单元1203、音频处理单元1204、显示器1205、电源1206。值得注意的是,计算机系统1200也并不是必须要包括图12中所示的所有部件;此外,计算机系统1200还可以包括图12中没有示出的部件,可以参考现有技术。As shown in FIG. 12, the computer system 1200 can further include an input unit 1203, an audio processing unit 1204, a display 1205, and a power supply 1206. It should be noted that the computer system 1200 does not necessarily include all of the components shown in FIG. 12; in addition, the computer system 1200 may also include components not shown in FIG. 12, and reference may be made to the prior art.
如图12所示,中央处理器1201有时也称为控制器或操作控件,可以包括微处理器或其他处理器装置和/或逻辑装置,该中央处理器1201接收输入并控制计算机系统1200的各个部件的操作。As shown in FIG. 12, central processor 1201, also sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device that receives input and controls various portions of computer system 1200. The operation of the part.
其中,存储器1202,例如可以是缓存器、闪存、硬驱、可移动介质、易失性存储器、非易失性存储器或其它合适装置中的一种或更多种。可储存上述视频图像、特征匹配等有关的信息,此外还可存储执行有关信息的程序。并且中央处理器1201可执行该存储器1202存储的该程序,以实现信息存储或处理等。其他部件的功能与现有类似,此处不再赘述。计算机系统1200的各部件可以通过专用硬件、固件、软件或其结合来实现,而不偏离本发明的范围。The memory 1202 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable device. The above video image, feature matching, and the like can be stored, and a program for executing the related information can be stored. And the central processing unit 1201 can execute the program stored by the memory 1202 to implement information storage or processing and the like. The functions of other components are similar to those of the existing ones and will not be described here. The various components of computer system 1200 may be implemented by special purpose hardware, firmware, software or a combination thereof without departing from the scope of the invention.
在本实施例中,该计算机系统可以是视频监控系统,但不以此作为限制。In this embodiment, the computer system may be a video surveillance system, but is not limited thereto.
通过本实施例的计算机系统,只对视频图像的关键帧进行目标检测,而对视频图像的普通帧进行目标重定位,能够提高目标检测的精度并减少时间消耗。With the computer system of the embodiment, target detection is performed only on key frames of the video image, and target relocation of the normal frame of the video image is performed, which can improve the accuracy of target detection and reduce time consumption.
本发明实施例还提供一种计算机可读程序,其中当在目标检测装置或计算机系统中执行所述程序时,所述程序使得所述目标检测装置或计算机系统执行实施例1所述的目标检测方法。Embodiments of the present invention also provide a computer readable program, wherein the program causes the target detecting device or computer system to perform the target detection described in Embodiment 1 when the program is executed in a target detecting device or a computer system method.
本发明实施例还提供一种存储有计算机可读程序的存储介质,其中所述计算机可读程序使得目标检测装置或计算机系统执行实施例1所述的目标检测方法。 The embodiment of the present invention further provides a storage medium storing a computer readable program, wherein the computer readable program causes the target detecting device or the computer system to execute the target detecting method described in Embodiment 1.
本发明以上的装置和方法可以由硬件实现,也可以由硬件结合软件实现。本发明涉及这样的计算机可读程序,当该程序被逻辑部件所执行时,能够使该逻辑部件实现上文所述的装置或构成部件,或使该逻辑部件实现上文所述的各种方法或步骤。本发明还涉及用于存储以上程序的存储介质,如硬盘、磁盘、光盘、DVD、flash存储器等。The above apparatus and method of the present invention may be implemented by hardware or by hardware in combination with software. The present invention relates to a computer readable program that, when executed by a logic component, enables the logic component to implement the apparatus or components described above, or to cause the logic component to implement the various methods described above Or steps. The present invention also relates to a storage medium for storing the above program, such as a hard disk, a magnetic disk, an optical disk, a DVD, a flash memory, or the like.
结合本发明实施例描述的在目标检测装置中的目标检测方法可直接体现为硬件、由处理器执行的软件模块或二者组合。例如,图9-11中所示的功能框图中的一个或多个和/或功能框图的一个或多个组合,既可以对应于计算机程序流程的各个软件模块,亦可以对应于各个硬件模块。这些软件模块,可以分别对应于图1、5、7所示的各个步骤。这些硬件模块例如可利用现场可编程门阵列(FPGA)将这些软件模块固化而实现。The object detection method in the object detection apparatus described in connection with the embodiment of the present invention may be directly embodied as hardware, a software module executed by the processor, or a combination of both. For example, one or more of the functional blocks shown in Figures 9-11 and/or one or more combinations of functional blocks may correspond to individual software modules of a computer program flow, or to individual hardware modules. These software modules may correspond to the respective steps shown in FIGS. 1, 5, and 7, respectively. These hardware modules can be implemented, for example, by curing these software modules using a Field Programmable Gate Array (FPGA).
软件模块可以位于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动磁盘、CD-ROM或者本领域已知的任何其它形式的存储介质。可以将一种存储介质耦接至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息;或者该存储介质可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。该软件模块可以存储在移动终端的存储器中,也可以存储在可插入移动终端的存储卡中。例如,若设备(例如移动终端)采用的是较大容量的MEGA-SIM卡或者大容量的闪存装置,则该软件模块可存储在该MEGA-SIM卡或者大容量的闪存装置中。The software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. A storage medium can be coupled to the processor to enable the processor to read information from, and write information to, the storage medium; or the storage medium can be an integral part of the processor. The processor and the storage medium can be located in an ASIC. The software module can be stored in the memory of the mobile terminal or in a memory card that can be inserted into the mobile terminal. For example, if a device (such as a mobile terminal) uses a larger capacity MEGA-SIM card or a large-capacity flash memory device, the software module can be stored in the MEGA-SIM card or a large-capacity flash memory device.
针对图9-11描述的功能框图中的一个或多个和/或功能框图的一个或多个组合,可以实现为用于执行本申请所描述功能的通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑器件、分立门或晶体管逻辑器件、分立硬件组件、或者其任意适当组合。针对图9-11描述的功能框图中的一个或多个和/或功能框图的一个或多个组合,还可以实现为计算设备的组合,例如,DSP和微处理器的组合、多个微处理器、与DSP通信结合的一个或多个微处理器或者任何其它这种配置。One or more of the functional blocks described with respect to Figures 9-11 and/or one or more combinations of functional blocks may be implemented as a general purpose processor, digital signal processor (DSP) for performing the functions described herein. An application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or any suitable combination thereof. One or more of the functional blocks described with respect to Figures 9-11 and/or one or more combinations of functional blocks may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors One or more microprocessors in conjunction with DSP communication or any other such configuration.
以上结合具体的实施方式对本发明进行了描述,但本领域技术人员应该清楚,这些描述都是示例性的,并不是对本发明保护范围的限制。本领域技术人员可以根据本发明的原理对本发明做出各种变型和修改,这些变型和修改也在本发明的范围内。 The present invention has been described in connection with the specific embodiments thereof, and it should be understood by those skilled in the art that A person skilled in the art can make various modifications and changes to the invention in accordance with the principles of the invention, which are also within the scope of the invention.

Claims (19)

  1. 一种目标检测方法,其中,所述方法包括:A target detection method, wherein the method comprises:
    对于视频图像的第一个关键帧,基于深度神经网络对所述第一个关键帧上的目标进行检测,得到所述第一个关键帧上的所有检测目标,并为每个检测目标分配标识;For the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all detection targets on the first key frame, and assigning an identifier to each detection target ;
    对于每个关键帧之后的普通帧,根据上一帧的目标检测结果,确定当前帧上对应每个检测目标的候选区域,根据所述候选区域对每个检测目标进行重新定位,得到当前帧上的追踪目标;For the normal frame after each key frame, according to the target detection result of the previous frame, the candidate region corresponding to each detection target on the current frame is determined, and each detection target is repositioned according to the candidate region to obtain the current frame. Tracking target
    对于视频图像的其他关键帧,基于深度神经网络对所述其他关键帧上的目标进行检测,得到所述其他关键帧上的所有检测目标,并根据上一帧的重新定位结果,对所述其他关键帧上的检测目标进行整合。For other key frames of the video image, the target on the other key frames is detected based on the depth neural network, and all the detection targets on the other key frames are obtained, and according to the relocation result of the previous frame, the other The detection targets on the key frames are integrated.
  2. 根据权利要求1所述的方法,其中,确定当前帧上对应每个检测目标的候选区域,包括:The method of claim 1, wherein determining a candidate region corresponding to each detection target on the current frame comprises:
    对所述检测目标的边界框进行扩展,得到所述检测目标的候选区域。The bounding box of the detection target is expanded to obtain a candidate region of the detection target.
  3. 根据权利要求1所述的方法,其中,根据所述候选区域对每个检测目标进行重新定位,包括:The method of claim 1, wherein relocating each detection target according to the candidate region comprises:
    使用预定步长遍历所述候选区域,得到对应所述检测目标的多个候选目标;Traversing the candidate regions using a predetermined step size to obtain a plurality of candidate targets corresponding to the detection target;
    计算每个候选目标与所述检测目标的相似度;Calculating a similarity between each candidate target and the detection target;
    根据所述相似度确定与所述检测目标匹配的候选目标,将匹配的候选目标作为所述检测目标在当前帧上的追踪目标。A candidate target that matches the detection target is determined according to the similarity, and the matched candidate target is used as a tracking target of the detection target on the current frame.
  4. 根据权利要求3所述的方法,其中,根据所述相似度确定与所述检测目标匹配的候选目标,包括:The method of claim 3, wherein determining a candidate target that matches the detection target based on the similarity comprises:
    如果有一个候选目标与所述检测目标的相似度大于第一阈值,则将所述候选目标作为所述追踪目标,并重置所述检测目标对应的追踪数。If there is a similarity between the candidate target and the detection target greater than the first threshold, the candidate target is used as the tracking target, and the tracking number corresponding to the detection target is reset.
  5. 根据权利要求3所述的方法,其中,根据所述相似度确定与所述检测目标匹配的候选目标,包括:The method of claim 3, wherein determining a candidate target that matches the detection target based on the similarity comprises:
    如果有多个候选目标与所述检测目标的相似度大于第一阈值,则从所述多个候选目标中选择具有最大相似度的候选目标作为所述追踪目标,并重置所述检测目标对应的追踪数。 If a similarity between the plurality of candidate targets and the detection target is greater than a first threshold, selecting a candidate target having the greatest similarity from the plurality of candidate targets as the tracking target, and resetting the detection target corresponding The number of tracking.
  6. 根据权利要求3所述的方法,其中,根据所述相似度确定与所述检测目标匹配的候选目标,包括:The method of claim 3, wherein determining a candidate target that matches the detection target based on the similarity comprises:
    如果所有候选目标与所述检测目标的相似度都不大于第一阈值,则判断追踪数是否为0;If the degree of similarity between all candidate targets and the detection target is not greater than the first threshold, determining whether the number of tracking is 0;
    如果追踪数不为0,则将追踪数减1,并在当前帧上保留所述检测目标;If the number of tracking is not 0, the number of tracking is decremented by 1 and the detection target is retained on the current frame;
    如果追踪数为0,则在当前帧上不保留所述检测目标。If the number of tracking is 0, the detection target is not retained on the current frame.
  7. 根据权利要求1所述的方法,其中,所述方法还包括:The method of claim 1 wherein the method further comprises:
    对所述关键帧和/或所述普通帧上的目标区域或候选目标区域进行下采样。Downsampling the target area or candidate target area on the key frame and/or the normal frame.
  8. 根据权利要求1所述的方法,其中,对所述其他关键帧上的检测目标进行整合,包括:The method of claim 1 wherein integrating the detection targets on said other key frames comprises:
    将所述其他关键帧上的每个检测目标与前一帧上的每个候选目标进行匹配;Matching each detection target on the other key frames with each candidate target on the previous frame;
    如果所述检测目标与所述候选目标的重叠区域大于第二阈值,并且所述检测目标与所述候选目标的匹配分数大于第一阈值,则为所述检测目标分配匹配上的候选目标的标识;Assigning an identifier of the candidate target on the matching target to the detection target if the overlapping area of the detection target and the candidate target is greater than a second threshold, and the matching score of the detection target and the candidate target is greater than a first threshold ;
    如果所述检测目标与所述候选目标的重叠区域不大于第二阈值,或者所述检测目标与所述候选目标的匹配分数不大于第一阈值,则为所述检测目标分配新的标识。And if the overlapping area of the detection target and the candidate target is not greater than a second threshold, or a matching score of the detection target and the candidate target is not greater than a first threshold, assigning a new identifier to the detection target.
  9. 根据权利要求8所述的方法,其中,The method of claim 8 wherein
    如果所述其他关键帧上的所有检测目标被分配了标识,而所述前一帧上仍然有没有匹配上的候选目标,则不在所述其他关键帧上保留没有匹配上的候选目标。If all of the detection targets on the other key frames are assigned an identification, and there are still no candidate candidates on the previous frame, then no candidate targets on the matching are retained on the other key frames.
  10. 一种目标检测装置,其中,所述装置包括:A target detecting device, wherein the device comprises:
    第一检测单元,其对于视频图像的第一个关键帧,基于深度神经网络对所述第一个关键帧上的目标进行检测,得到所述第一个关键帧上的所有检测目标,并为每个检测目标分配标识;a first detecting unit, for the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all the detection targets on the first key frame, and Each detection target is assigned an identifier;
    重定位单元,其对于每个关键帧之后的普通帧,根据上一帧的目标检测结果,确定当前帧上对应每个检测目标的候选区域,根据所述候选区域对每个检测目标进行重新定位,得到当前帧上的追踪目标;a relocation unit, for a normal frame after each key frame, determining a candidate region corresponding to each detection target on the current frame according to the target detection result of the previous frame, and repositioning each detection target according to the candidate region , get the tracking target on the current frame;
    第二检测单元,其对于视频图像的其他关键帧,基于深度神经网络对所述其他关键帧上的目标进行检测,得到所述其他关键帧上的所有检测目标,并根据上一帧的重新定位结果,对所述其他关键帧上的检测目标进行整合。 a second detecting unit, for other key frames of the video image, detecting the target on the other key frames based on the depth neural network, obtaining all the detection targets on the other key frames, and repositioning according to the previous frame As a result, the detection targets on the other key frames are integrated.
  11. 根据权利要求10所述的装置,其中,所述重定位单元包括:The apparatus of claim 10 wherein said relocating unit comprises:
    扩展单元,其对所述检测目标的边界框进行扩展,得到所述检测目标的候选区域。And an extension unit that expands a bounding box of the detection target to obtain a candidate region of the detection target.
  12. 根据权利要求10所述的装置,其中,所述重定位单元包括:The apparatus of claim 10 wherein said relocating unit comprises:
    遍历单元,其使用预定步长遍历所述候选区域,得到对应所述检测目标的多个候选目标;Traversing a unit that traverses the candidate region using a predetermined step size to obtain a plurality of candidate targets corresponding to the detection target;
    计算单元,其计算每个候选目标与所述检测目标的相似度;a calculation unit that calculates a similarity between each candidate target and the detection target;
    确定单元,其根据所述相似度确定与所述检测目标匹配的候选目标,将匹配的候选目标作为所述检测目标在当前帧上的追踪目标。a determining unit that determines a candidate target that matches the detection target according to the similarity, and uses the matched candidate target as a tracking target of the detection target on the current frame.
  13. 根据权利要求12所述的装置,其中,The device according to claim 12, wherein
    所述确定单元在有一个候选目标与所述检测目标的相似度大于第一阈值时,将所述候选目标作为所述追踪目标,并重置所述检测目标对应的追踪数。When the similarity between the candidate target and the detection target is greater than the first threshold, the determining unit uses the candidate target as the tracking target, and resets the tracking number corresponding to the detection target.
  14. 根据权利要求12所述的装置,其中,The device according to claim 12, wherein
    所述确定单元在有多个候选目标与所述检测目标的相似度大于第一阈值时,从所述多个候选目标中选择具有最大相似度的候选目标作为所述追踪目标,并重置所述检测目标对应的追踪数。When the similarity between the plurality of candidate targets and the detection target is greater than the first threshold, the determining unit selects a candidate target having the greatest similarity from the plurality of candidate targets as the tracking target, and resets the The number of tracking corresponding to the detection target.
  15. 根据权利要求12所述的装置,其中,The device according to claim 12, wherein
    所述确定单元在所有候选目标与所述检测目标的相似度都不大于第一阈值时,判断追踪数是否为0;如果追踪数不为0,则将追踪数减1,并在当前帧上保留所述检测目标;如果追踪数为0,则在当前帧上不保留所述检测目标。The determining unit determines whether the tracking number is 0 when the similarity between all the candidate targets and the detection target is not greater than the first threshold; if the tracking number is not 0, the tracking number is decreased by 1 and is on the current frame. The detection target is retained; if the number of tracking is 0, the detection target is not retained on the current frame.
  16. 根据权利要求10所述的装置,其中,所述装置还包括:The device of claim 10, wherein the device further comprises:
    下采样单元,其对所述关键帧和/或所述普通帧上的目标区域或候选目标区域进行下采样。A downsampling unit that downsamples the target area or candidate target area on the key frame and/or the normal frame.
  17. 根据权利要求10所述的装置,其中,所述第二检测单元包括:The apparatus of claim 10, wherein the second detecting unit comprises:
    匹配单元,其将所述其他关键帧上的每个检测目标与前一帧上的每个候选目标进行匹配;a matching unit that matches each detection target on the other key frames with each candidate target on the previous frame;
    处理单元,其在所述检测目标与所述候选目标的重叠区域大于第二阈值,并且所述检测目标与所述候选目标的匹配分数大于第一阈值时,为所述检测目标分配匹配上的候选目标的标识;在所述检测目标与所述候选目标的重叠区域不大于第二阈值,或者所述检测目标与所述候选目标的匹配分数不大于第一阈值时,为所述检测目标分配 新的标识。a processing unit that allocates a match for the detection target when an overlapping area of the detection target and the candidate target is greater than a second threshold, and a matching score of the detection target and the candidate target is greater than a first threshold An identifier of the candidate target; when the overlapping area of the detection target and the candidate target is not greater than a second threshold, or the matching score of the detection target and the candidate target is not greater than a first threshold, the detection target is allocated New logo.
  18. 根据权利要求17所述的装置,其中,The device according to claim 17, wherein
    所述处理单元在所述其他关键帧上的所有检测目标被分配了标识,而所述前一帧上仍然有没有匹配上的候选目标时,不在所述其他关键帧上保留没有匹配上的候选目标。The processing unit allocates an identifier to all the detection targets on the other key frames, and when there are still no candidate targets on the previous frame, the candidates on the other key frames are not retained. aims.
  19. 一种计算机系统,其中,所述计算机系统包括权利要求10-18任一项所述的装置。 A computer system, wherein the computer system comprises the apparatus of any of claims 10-18.
PCT/CN2016/101237 2016-09-30 2016-09-30 Target detection method and device, and computer system WO2018058595A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201680087590.3A CN109416728A (en) 2016-09-30 2016-09-30 Object detection method, device and computer system
PCT/CN2016/101237 WO2018058595A1 (en) 2016-09-30 2016-09-30 Target detection method and device, and computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/101237 WO2018058595A1 (en) 2016-09-30 2016-09-30 Target detection method and device, and computer system

Publications (1)

Publication Number Publication Date
WO2018058595A1 true WO2018058595A1 (en) 2018-04-05

Family

ID=61763242

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/101237 WO2018058595A1 (en) 2016-09-30 2016-09-30 Target detection method and device, and computer system

Country Status (2)

Country Link
CN (1) CN109416728A (en)
WO (1) WO2018058595A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544603A (en) * 2018-11-28 2019-03-29 上饶师范学院 Method for tracking target based on depth migration study
CN109902627A (en) * 2019-02-28 2019-06-18 中科创达软件股份有限公司 A kind of object detection method and device
CN109903281A (en) * 2019-02-28 2019-06-18 中科创达软件股份有限公司 It is a kind of based on multiple dimensioned object detection method and device
CN110147702A (en) * 2018-07-13 2019-08-20 腾讯科技(深圳)有限公司 A kind of object detection and recognition method and system of real-time video
CN110322475A (en) * 2019-05-23 2019-10-11 北京中科晶上科技股份有限公司 A kind of sparse detection method of video
CN110363790A (en) * 2018-04-11 2019-10-22 北京京东尚科信息技术有限公司 Target tracking method, device and computer readable storage medium
CN110428442A (en) * 2019-08-07 2019-11-08 北京百度网讯科技有限公司 Target determines method, targeting system and monitoring security system
CN111179304A (en) * 2018-11-09 2020-05-19 北京京东尚科信息技术有限公司 Object association method, device and computer-readable storage medium
CN111461010A (en) * 2020-04-01 2020-07-28 贵州电网有限责任公司 Power equipment identification efficiency optimization method based on template tracking
CN112634327A (en) * 2020-12-21 2021-04-09 合肥讯图信息科技有限公司 Tracking method based on YOLOv4 model
CN113312949A (en) * 2020-04-13 2021-08-27 阿里巴巴集团控股有限公司 Video data processing method, video data processing device and electronic equipment
CN115063454A (en) * 2022-08-16 2022-09-16 浙江所托瑞安科技集团有限公司 Multi-target tracking matching method, device, terminal and storage medium

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977797B (en) * 2019-03-06 2023-06-20 上海交通大学 Optimization method of first-order target detector based on sorting loss function
KR102340988B1 (en) * 2019-10-04 2021-12-17 에스케이텔레콤 주식회사 Method and Apparatus for Detecting Objects from High Resolution Image
CN112686925A (en) * 2019-10-18 2021-04-20 西安光启未来技术研究院 Target tracking method and device
CN110956219B (en) * 2019-12-09 2023-11-14 爱芯元智半导体(宁波)有限公司 Video data processing method, device and electronic system
CN113079342A (en) * 2020-01-03 2021-07-06 深圳市春盛海科技有限公司 Target tracking method and system based on high-resolution image device
CN113065523B (en) * 2021-04-26 2023-06-16 上海哔哩哔哩科技有限公司 Target tracking method and device, electronic equipment and storage medium
CN114581781B (en) 2022-05-05 2022-08-09 之江实验室 Target detection method and device for high-resolution remote sensing image

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080232643A1 (en) * 2007-03-23 2008-09-25 Technion Research & Development Foundation Ltd. Bitmap tracker for visual tracking under very general conditions
US20120207356A1 (en) * 2011-02-10 2012-08-16 Murphy William A Targeted content acquisition using image analysis
CN104166861A (en) * 2014-08-11 2014-11-26 叶茂 Pedestrian detection method
CN105447458A (en) * 2015-11-17 2016-03-30 深圳市商汤科技有限公司 Large scale crowd video analysis system and method thereof

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329766B (en) * 2007-06-18 2012-05-30 索尼(中国)有限公司 Apparatus, method and system for analyzing mobile image
CN102722714B (en) * 2012-05-18 2014-07-23 西安电子科技大学 Artificial neural network expanding type learning method based on target tracking
CN105224856A (en) * 2014-07-02 2016-01-06 腾讯科技(深圳)有限公司 Computer system detection method and device
CN105718890A (en) * 2016-01-22 2016-06-29 北京大学 Method for detecting specific videos based on convolution neural network
CN105760846B (en) * 2016-03-01 2019-02-15 北京正安维视科技股份有限公司 Target detection and localization method and system based on depth data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080232643A1 (en) * 2007-03-23 2008-09-25 Technion Research & Development Foundation Ltd. Bitmap tracker for visual tracking under very general conditions
US20120207356A1 (en) * 2011-02-10 2012-08-16 Murphy William A Targeted content acquisition using image analysis
CN104166861A (en) * 2014-08-11 2014-11-26 叶茂 Pedestrian detection method
CN105447458A (en) * 2015-11-17 2016-03-30 深圳市商汤科技有限公司 Large scale crowd video analysis system and method thereof

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363790A (en) * 2018-04-11 2019-10-22 北京京东尚科信息技术有限公司 Target tracking method, device and computer readable storage medium
CN110147702A (en) * 2018-07-13 2019-08-20 腾讯科技(深圳)有限公司 A kind of object detection and recognition method and system of real-time video
CN110147702B (en) * 2018-07-13 2023-05-23 腾讯科技(深圳)有限公司 Method and system for detecting and identifying target of real-time video
CN111179304B (en) * 2018-11-09 2024-04-05 北京京东尚科信息技术有限公司 Target association method, apparatus and computer readable storage medium
CN111179304A (en) * 2018-11-09 2020-05-19 北京京东尚科信息技术有限公司 Object association method, device and computer-readable storage medium
CN109544603A (en) * 2018-11-28 2019-03-29 上饶师范学院 Method for tracking target based on depth migration study
CN109902627A (en) * 2019-02-28 2019-06-18 中科创达软件股份有限公司 A kind of object detection method and device
CN109903281A (en) * 2019-02-28 2019-06-18 中科创达软件股份有限公司 It is a kind of based on multiple dimensioned object detection method and device
CN109903281B (en) * 2019-02-28 2021-07-27 中科创达软件股份有限公司 Multi-scale-based target detection method and device
CN110322475A (en) * 2019-05-23 2019-10-11 北京中科晶上科技股份有限公司 A kind of sparse detection method of video
CN110322475B (en) * 2019-05-23 2022-11-11 北京中科晶上科技股份有限公司 Video sparse detection method
CN110428442B (en) * 2019-08-07 2022-04-12 北京百度网讯科技有限公司 Target determination method, target determination system and monitoring security system
CN110428442A (en) * 2019-08-07 2019-11-08 北京百度网讯科技有限公司 Target determines method, targeting system and monitoring security system
CN111461010A (en) * 2020-04-01 2020-07-28 贵州电网有限责任公司 Power equipment identification efficiency optimization method based on template tracking
CN113312949A (en) * 2020-04-13 2021-08-27 阿里巴巴集团控股有限公司 Video data processing method, video data processing device and electronic equipment
CN113312949B (en) * 2020-04-13 2023-11-24 阿里巴巴集团控股有限公司 Video data processing method, video data processing device and electronic equipment
CN112634327A (en) * 2020-12-21 2021-04-09 合肥讯图信息科技有限公司 Tracking method based on YOLOv4 model
CN115063454A (en) * 2022-08-16 2022-09-16 浙江所托瑞安科技集团有限公司 Multi-target tracking matching method, device, terminal and storage medium
CN115063454B (en) * 2022-08-16 2022-11-29 浙江所托瑞安科技集团有限公司 Multi-target tracking matching method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN109416728A (en) 2019-03-01

Similar Documents

Publication Publication Date Title
WO2018058595A1 (en) Target detection method and device, and computer system
JP6511149B2 (en) Method of calculating area of fingerprint overlap area, electronic device for performing the same, computer program, and recording medium
US9373054B2 (en) Method for selecting frames from video sequences based on incremental improvement
CN110634153A (en) Target tracking template updating method and device, computer equipment and storage medium
US10552686B2 (en) Object recognition device that determines overlapping states for a plurality of objects
JP6570370B2 (en) Image processing method, image processing apparatus, program, and recording medium
US9747507B2 (en) Ground plane detection
WO2018058530A1 (en) Target detection method and device, and image processing apparatus
US8660302B2 (en) Apparatus and method for tracking target
CN112036232B (en) Image table structure identification method, system, terminal and storage medium
CN111882520A (en) Screen defect detection method and device and head-mounted display equipment
TWI514327B (en) Method and system for object detection and tracking
US9256792B2 (en) Image processing apparatus, image processing method, and program
CN112906483A (en) Target re-identification method and device and computer readable storage medium
US10643338B2 (en) Object detection device and object detection method
CN109447022B (en) Lens type identification method and device
WO2018058573A1 (en) Object detection method, object detection apparatus and electronic device
US10803295B2 (en) Method and device for face selection, recognition and comparison
WO2019148362A1 (en) Object detection method and apparatus
JP2016053763A (en) Image processor, image processing method and program
JP2015026117A (en) Image processing method, image processing apparatus, program, and recording medium
US10410044B2 (en) Image processing apparatus, image processing method, and storage medium for detecting object from image
US10713808B2 (en) Stereo matching method and system using rectangular window
KR101660596B1 (en) Method for modifying gradient of facial shape, and system for the same
JP6175904B2 (en) Verification target extraction system, verification target extraction method, verification target extraction program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16917345

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16917345

Country of ref document: EP

Kind code of ref document: A1