CN111340848A - Object tracking method, system, device and medium for target area - Google Patents
Object tracking method, system, device and medium for target area Download PDFInfo
- Publication number
- CN111340848A CN111340848A CN202010119633.XA CN202010119633A CN111340848A CN 111340848 A CN111340848 A CN 111340848A CN 202010119633 A CN202010119633 A CN 202010119633A CN 111340848 A CN111340848 A CN 111340848A
- Authority
- CN
- China
- Prior art keywords
- frame
- image
- features
- images
- human
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000033001 locomotion Effects 0.000 claims abstract description 64
- 238000013528 artificial neural network Methods 0.000 claims description 24
- 238000012545 processing Methods 0.000 claims description 21
- 238000013507 mapping Methods 0.000 claims description 20
- 241001465754 Metazoa Species 0.000 claims description 11
- 210000000697 sensory organ Anatomy 0.000 claims description 6
- 238000000926 separation method Methods 0.000 claims description 6
- 230000001815 facial effect Effects 0.000 claims description 4
- 238000012544 monitoring process Methods 0.000 abstract description 35
- 238000004891 communication Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 9
- 238000009434 installation Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 230000036544 posture Effects 0.000 description 6
- 230000009466 transformation Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 239000010410 layer Substances 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 239000002356 single layer Substances 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/166—Detection; Localisation; Normalisation using acquisition arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
The invention provides a method, a system, equipment and a medium for tracking an object in a target area, wherein the method comprises the following steps: acquiring single-frame or multi-frame images in one or more target areas associated with one or more objects; determining whether the single or multiple frames of images contain one or more objects; displaying a single or multiple frame image containing one or more objects; and determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images. If the object is a person, the invention can utilize the human face or body characteristics of different pedestrians in the monitoring video to construct the motion tracks of different pedestrians in different target areas, thereby realizing cross-area and cross-lens tracking of different pedestrians.
Description
Technical Field
The present invention relates to image recognition technologies, and in particular, to a method, a system, a device, and a medium for tracking an object in a target area.
Background
In recent years, object (e.g. human, animal) identification technology has been widely used in the construction of "smart cities", "safe cities", etc. However, in the existing cameras, more than 80% of the cameras can not shoot clear human faces or human bodies under any circumstances, and in addition, the anti-reconnaissance capability of the criminals is improved, the cameras can be intentionally avoided, the human faces or human body information can be captured in time, and the difficulty of timely alarming and handling is large; moreover, in an actual scene, one camera often cannot cover all areas, and multiple cameras generally do not overlap with each other. Therefore, the present invention provides a method, a system, a device and a medium for tracking objects in target areas, so as to perform cross-border tracking on pedestrians in different target areas.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide a method, system, device and medium for tracking an object in a target area, which are used to solve the technical problems in the prior art.
To achieve the above and other related objects, the present invention provides an object tracking method for a target area, comprising the steps of:
acquiring single-frame or multi-frame images in one or more target areas associated with one or more objects;
determining whether the single or multiple frames of images contain the one or more objects;
displaying a single-frame or multi-frame image containing the one or more objects;
and determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images.
Optionally, the target region comprises at least one of: residential areas, stations, airports, markets and hospitals.
Optionally, the multi-frame image includes one or more continuous frame images, a plurality of single frame images.
Optionally, obtaining features of an object in the single-frame or multi-frame image, where the features include global features and/or local features;
and determining whether the one or more objects are contained in the one or more single-frame or multi-frame images according to the global features and/or the local features.
Optionally, the subject comprises a human, an animal.
Optionally, if the subject is a human;
the global features include at least one of: human face features, human body features; and/or the presence of a gas in the gas,
the local features include at least one of: human face features, human body features.
Optionally, acquiring a single-frame or multi-frame image containing one or more global features and local features;
inputting a certain frame of image containing the one or more global features and the local features into a hierarchical vectorization model, and acquiring a global feature vector and a local feature vector of the certain frame of image;
and determining whether the one or more objects are contained in the certain frame image according to the global feature vector and the local feature vector of the certain frame image.
Optionally, inputting the certain frame of image containing one or more human faces or human bodies into the layered vectorization model;
dividing the certain frame of image containing one or more human faces or human bodies into one or more image blocks;
extracting local features of each image block, and acquiring a local feature descriptor of each image block according to the local features;
quantizing the local feature descriptors of each image block to generate an image block feature dictionary;
according to the mapping between the image block feature dictionary and the certain frame image, encoding to form a human face or human body feature vector of the certain frame image;
and acquiring the face or human body feature vector of the certain frame of image.
Optionally, acquiring an image to be compared containing the one or more objects; wherein, the images to be compared are at least two single-frame or multi-frame images;
processing the images to be compared, and mapping the images to be compared to the same comparison space through at least two deep neural networks;
and comparing the images to be compared through the same comparison space, and determining whether one or more identical objects exist in the images to be compared.
Optionally, selecting one single-frame or multi-frame image in the images to be compared as a reference frame image, and using the rest single-frame or multi-frame images in the images to be compared as comparison frame images;
inputting the reference frame image and the comparison frame image into at least two deep neural networks respectively, and mapping the comparison frame image and the reference frame image to the same comparison space at the same time;
comparing one or more human faces or human body features in the comparison frame image with one or more human faces or human body features in the reference frame image in the same comparison space;
and if the one or more human face or human body features in the comparison frame image are the same as the one or more human face or human body features in the reference frame image or the similarity of the one or more human face or human body features in the comparison frame image is greater than a preset value in the comparison result, one or more same objects exist in the reference frame image and the comparison frame image.
Optionally, the sources of the images to be compared include: identification photo image and image collected by camera.
Optionally, the motion information comprises at least one of: time of movement, geographical location of movement.
Optionally, the facial features include at least one of: eye shape, nose shape, mouth shape, eye separation distance, position of five sense organs, face contour.
Optionally, the human body characteristic comprises at least one of: dress, body type, hairstyle, and posture.
The invention also provides an object tracking system for the target area, which comprises:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring single-frame or multi-frame images in one or more target areas associated with one or more objects;
an object module for determining whether the single or multiple frames of images contain the one or more objects;
the display module is used for displaying a single-frame or multi-frame image containing the one or more objects;
and the tracking module is used for determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images.
Optionally, the target region comprises at least one of: residential areas, stations, airports, markets and hospitals.
Optionally, the multi-frame image includes one or more continuous frame images, a plurality of single frame images.
Optionally, obtaining features of an object in the single-frame or multi-frame image, where the features include global features and/or local features;
and determining whether the one or more objects are contained in the one or more single-frame or multi-frame images according to the global features and/or the local features.
Optionally, the subject comprises a human, an animal.
Optionally, if the subject is a human;
the global features include at least one of: human face features, human body features; and/or the presence of a gas in the gas,
the local features include at least one of: human face features, human body features.
Optionally, acquiring a single-frame or multi-frame image containing one or more global features and local features;
inputting a certain frame of image containing the one or more global features and the local features into a hierarchical vectorization model, and acquiring a global feature vector and a local feature vector of the certain frame of image;
and determining whether the one or more objects are contained in the certain frame image according to the global feature vector and the local feature vector of the certain frame image.
Optionally, inputting the certain frame of image containing one or more human faces or human bodies into the layered vectorization model;
dividing the certain frame of image containing one or more human faces or human bodies into one or more image blocks;
extracting local features of each image block, and acquiring a local feature descriptor of each image block according to the local features;
quantizing the local feature descriptors of each image block to generate an image block feature dictionary;
according to the mapping between the image block feature dictionary and the certain frame image, encoding to form a human face or human body feature vector of the certain frame image;
and acquiring the face or human body feature vector of the certain frame of image.
Optionally, acquiring an image to be compared containing the one or more objects; wherein, the images to be compared are at least two single-frame or multi-frame images;
processing the images to be compared, and mapping the images to be compared to the same comparison space through at least two deep neural networks;
and comparing the images to be compared through the same comparison space, and determining whether one or more identical objects exist in the images to be compared.
Optionally, selecting one single-frame or multi-frame image in the images to be compared as a reference frame image, and using the rest single-frame or multi-frame images in the images to be compared as comparison frame images;
inputting the reference frame image and the comparison frame image into at least two deep neural networks respectively, and mapping the comparison frame image and the reference frame image to the same comparison space at the same time;
comparing one or more human faces or human body features in the comparison frame image with one or more human faces or human body features in the reference frame image in the same comparison space;
and if the one or more human face or human body features in the comparison frame image are the same as the one or more human face or human body features in the reference frame image or the similarity of the one or more human face or human body features in the comparison frame image is greater than a preset value in the comparison result, one or more same objects exist in the reference frame image and the comparison frame image.
Optionally, the sources of the images to be compared include: identification photo image and image collected by camera.
Optionally, the motion information comprises at least one of: time of movement, geographical location of movement.
Optionally, the facial features include at least one of: eye shape, nose shape, mouth shape, eye separation distance, position of five sense organs, face contour.
Optionally, the human body characteristic comprises at least one of: dress, body type, hairstyle, and posture.
The present invention also provides an object tracking device for a target area, comprising:
acquiring single-frame or multi-frame images in one or more target areas associated with one or more objects;
determining whether the single or multiple frames of images contain one or more objects;
displaying a single or multiple frame image containing one or more objects;
and determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images.
The present invention also provides an apparatus comprising:
one or more processors; and
one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform a method as described in one or more of the above.
The present invention also provides one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the methods as described in one or more of the above.
As described above, the object tracking method, system, device and medium for the target area provided by the present invention have the following beneficial effects: obtaining one or more images of a single frame or multiple frames in one or more target areas associated with one or more objects; determining whether the single or multiple frames of images contain one or more objects; displaying a single or multiple frame image containing one or more objects; and determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images. If the object is a person, the invention can utilize the human face or body characteristics of different pedestrians in the monitoring video to construct the motion tracks of different pedestrians in different target areas, thereby realizing cross-area and cross-lens tracking of different pedestrians.
Drawings
Fig. 1 is a schematic flowchart of an object tracking method for a target area according to an embodiment.
Fig. 2 is a schematic flowchart of an object tracking method for a target area according to another embodiment.
Fig. 3 is a schematic flowchart of an object tracking method for a target area according to yet another embodiment.
Fig. 4 is a schematic hardware structure diagram of an object tracking system for a target area according to an embodiment.
Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment.
Fig. 6 is a schematic diagram of a hardware structure of a terminal device according to another embodiment.
Description of the element reference numerals
M10 acquisition module
M20 object module
M30 display module
M40 tracking module
1100 input device
1101 first processor
1102 output device
1103 first memory
1104 communication bus
1200 processing assembly
1201 second processor
1202 second memory
1203 communication assembly
1204 Power supply Assembly
1205 multimedia assembly
1206 voice assembly
1207 input/output interface
1208 sensor assembly
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
Referring to fig. 1, the present invention provides an object tracking method for a target area, including the following steps:
s10, acquiring single-frame or multi-frame images in one or more target areas associated with one or more objects; the multi-frame image in the embodiment of the present application includes, as an example, one or more continuous frame images (or videos), and a plurality of single frame images. The method comprises the steps of collecting one or more multi-frame images through image collecting equipment; for example, multiplexing network cameras that have been built in the past; one or more videos are collected by multiplexing the built camera, and compared with a newly installed camera, the camera saves the transformation of a weak current line and the fire control examination and approval, is simple and convenient to implement, and has no technical threshold.
S20, determining whether the single or multiple frame image contains one or more objects; specifically, the characteristics of an object in the single-frame or multi-frame image are obtained, wherein the characteristics comprise global characteristics and/or local characteristics; and determining whether one or more objects are contained in the one or more single-frame or multi-frame images according to the global features and/or the local features. Wherein the object comprises a human or an animal.
S30, displaying a single or multiple frame image containing one or more objects; wherein, real-time display can be carried out, and pop-up display can also be carried out.
And S40, determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images. Wherein the motion information comprises at least one of: time of movement, geographical location of movement.
By the technical scheme, the method can construct the motion tracks of different objects in different target areas by utilizing the image characteristics of the different objects in different monitoring videos, so that cross-area and cross-lens tracking of one or more objects is realized.
As shown in fig. 2, in an embodiment, a pedestrian is taken as an object, and the following is explained in detail:
s100, acquiring one or more monitoring videos in one or more target areas; this application is compared with the camera of new installation through multiplexing the network camera that has been built in the past and gather one or more videos, has removed light current circuit transformation and fire control examination and approval from, and it is simple convenient to implement, does not have the technical threshold. The pedestrian flow is usually large in residential areas, schools, stations, airports, markets, hospitals and other places, and the number of covered pedestrians is large; the one or more target regions in embodiments of the present application include at least one of: residential areas, schools, stations, airports, markets and hospitals. By collecting the monitoring videos of the target areas, monitoring resources can be saved, and cross-area and cross-border tracking can be achieved by using less monitoring resources.
S200, determining whether one or more monitoring videos contain one or more pedestrians; specifically, acquiring characteristics of pedestrians in one or more monitoring videos, wherein the characteristics comprise global characteristics and/or local characteristics; and determining whether one or more pedestrians are contained in one or more monitoring videos according to the global features and/or the local features. Wherein the global features include at least one of: human face features, human body features; and/or, the local features include at least one of: human face features, human body features.
Specifically, one or more monitoring videos containing one or more global features and local features are obtained;
inputting a certain frame image containing one or more global features and local features into a layered vectorization model, and acquiring a global feature vector and a local feature vector of the frame image;
and determining whether one or more pedestrians are contained in the certain frame image according to the global feature vector and the local feature vector of the certain frame image.
The layered vectorization model is actually a multi-layer feature coding process. A single layer signature consists of the following steps: firstly, all images containing human faces or human bodies in a picture library are partitioned; secondly, extracting local features (such as LBP and SIFT) of each block of region to form a local feature descriptor; then, quantizing all local feature descriptors to form a dictionary; and finally, according to the mapping of the dictionary information and the face or human body image, coding to form a face or human body feature vector of the face or human body image, and defining the face or human body feature vector as face or human body DNA.
As an example, the method for determining whether one or more pedestrians are included in the certain frame of image by using human body or human face features as global features and/or local features specifically includes:
inputting the certain frame of image containing one or more human faces or human bodies into the layered vectorization model;
dividing the frame image containing one or more human faces or human bodies into one or more image blocks;
extracting local features of each image block, and acquiring a local feature descriptor of each image block according to the local features;
quantizing the local feature descriptors of each image block to generate an image block feature dictionary;
according to the mapping between the image block feature dictionary and the frame image, encoding to form a human face or human body feature vector of the frame image;
acquiring a human face or human body feature vector of the frame image; and determining whether one or more pedestrians are contained in the certain frame of image according to the face or human body feature vector of the frame of image. As an example, the present application defines the face or body feature vector of the frame image as a face or body DNA. The human face or human body feature vector is not affected by interference factors, and the interference factors comprise at least one of the following: illumination, shading, angle, age, race. Specifically, the face features include at least one of: eye shape, nose shape, mouth shape, eye separation distance, position of five sense organs, face contour. The human body characteristics include at least one of: dress, body type, hairstyle, and posture. By way of example, global features in embodiments of the present application may be features that are easily ignored of insignificant details and some occur less frequently, such as: clothes logo, moles on face. The local features may be human skeleton key points, human postures, etc.
S300, displaying a single-frame or multi-frame image containing one or more pedestrians; in the embodiment of the application, real-time display and pop-up display can be performed. As an example, when one or more target pedestrians are included in one or more monitoring videos, a corresponding monitoring picture is popped up, displayed and played in real time, and the one or more target pedestrians are tracked until the one or more target pedestrians disappear in the monitoring picture.
S400, determining the motion information of the one or more pedestrians in the one or more target areas according to the displayed single-frame or multi-frame image. Wherein the motion information comprises at least one of: time of movement, geographical location of movement.
The method comprises the steps of acquiring one or more monitoring videos containing one or more human faces or human bodies in one or more target areas; inputting a certain frame of image containing one or more human faces or human bodies in one or more monitoring videos into a layered vectorization model to obtain a human face or human body feature vector of the frame of image; and identifying whether the frame image contains the human face or the human body of one or more target pedestrians according to the human face or the human body feature vector of the frame image. The method can identify whether the single-frame or multi-frame image contains the human face or the human body of one or more target pedestrians, then judge the image acquisition equipment where the single-frame or multi-frame image comes from, and generate the motion information of one or more target pedestrians according to the corresponding geographic position of the image acquisition equipment, so that the cross-region and cross-border head tracking can be carried out on one or more target pedestrians.
As an example, for example, a video shot by 5 cameras in a certain residential area is obtained, each camera shoots a section of video, whether a human face or a human body exists in the 5 sections of video is manually watched, video segments of the human face or the human body existing in the 5 sections of video are cut out, the video segments of the human face or the human body existing in the 5 sections of video are cut into frames, frames and frames containing images of the human face or the human body, and then the images of the frames containing the human face or the human body are input into a hierarchical vectorization model, so that a human face or human body feature vector of each frame of image is obtained; and identifying whether the certain frame of image contains the human face or the human body of one or more target pedestrians or not according to the human face or the human body feature vector of each frame of image. Each layer in the layered vectorization model comprises one or more deep neural networks after training is completed, and the deep neural networks are trained according to images containing faces or human bodies of target pedestrians. If the human face or the human body of one or more target pedestrians exists in some video segments, the movement time of the one or more target pedestrians is directly obtained from the video segments, then the video segments are judged to come from which cameras, and the movement geographic position of the one or more target pedestrians can be approximately obtained according to the installation positions of the cameras; so that cross-shot tracking can be achieved for the one or more target pedestrians. The object in the embodiment of the present application is, for example, a missing child, a suspect in a certain state, or the like.
In the embodiment of the application, if the object is an animal body, the tracking method is consistent with that of a pedestrian; specific functions and technical effects can be obtained by referring to the above embodiments, which are not described herein again.
As shown in fig. 3, in another embodiment, a pedestrian is taken as an object, and the following is explained in detail:
s500, acquiring an image to be compared containing one or more pedestrians; wherein, the images to be compared are at least two single-frame or multi-frame images; the sources of the images to be compared include: identification photo image and image collected by camera.
S600, processing the images to be compared, and mapping the images to be compared to the same comparison space through at least two deep neural networks;
s700, comparing the images to be compared through the same comparison space, and determining whether one or more same pedestrians exist in the images to be compared.
Specifically, one single-frame or multi-frame image in the images to be compared is selected as a reference frame image, and the rest single-frame or multi-frame images in the images to be compared are used as comparison frame images;
inputting the reference frame image and the comparison frame image into at least two deep neural networks respectively, and mapping the comparison frame image and the reference frame image to the same comparison space at the same time;
comparing one or more human faces or human body features in the comparison frame image with one or more human faces or human body features in the reference frame image in the same comparison space;
and if the one or more human face or human body features in the comparison frame image are the same as the one or more human face or human body features in the reference frame image or the similarity of the one or more human face or human body features in the comparison frame image is greater than a preset value, one or more same pedestrians exist in the reference frame image and the comparison frame image.
In the embodiment of the application, if one or more same pedestrians exist in the reference frame image and the comparison frame image, and one or more target pedestrians exist in the reference frame image, it can be determined that one or more target pedestrians also exist in the rest of the single-frame or multi-frame images. The moving time of the one or more target pedestrians is directly obtained from the monitoring videos or video clips by finding the corresponding monitoring videos or video clips, then the cameras from which the monitoring videos or video clips are derived are judged, and the moving geographic positions of the one or more target pedestrians can be approximately obtained according to the installation positions of the cameras, so that the one or more target pedestrians can be tracked across areas and across lenses.
As an example, for example, a video captured by 15 cameras in a hospital is obtained, each camera captures a video segment, whether a human face or a human body exists in the 15 video segments is manually watched, video segments in which the human face or the human body exists in the 15 video segments are cut out, the video segments in which the human face or the human body exists are cut into frames, and the frames contain images of the human face or the human body, and then each frame contains images of the human face or the human body and is input into at least two deep neural networks, so that one frame of image containing the one or more human faces or the human body and another frame of image containing the one or more human faces or the human body are simultaneously mapped into the same comparison space; and comparing one or more faces or human bodies in the certain frame of image and the other frame of image in the comparison space, and determining whether one or more identical faces or human bodies exist in the certain frame of image and the other frame of image according to a comparison result. If one or more identical faces or human bodies exist in the video segments, and the identical faces or human bodies contain the faces or human bodies of one or more target pedestrians; acquiring each frame image of the face or the body of the one or more target pedestrians from the one or more videos; and determining the motion information of the one or more target pedestrians according to the acquired image of each frame containing the human face or the human body of the one or more target pedestrians. Wherein the motion information comprises at least one of: time of movement, geographical location of movement. The deep neural network is a trained deep neural network, and the deep neural network is trained according to images of faces or human bodies including target pedestrians. If the human face or the human body of one or more target pedestrians exists in some video segments, the movement time of the one or more target pedestrians is directly obtained from the video segments, then the video segments are judged to come from which cameras, and the movement geographic position of the one or more target pedestrians can be approximately obtained according to the installation positions of the cameras; so that cross-shot tracking can be achieved for the one or more target pedestrians. The target pedestrian in the embodiment of the present application is a person such as a doctor, a patient, a ticket vendor, or the like.
In the embodiment of the application, if the object is an animal body, the tracking method is consistent with that of a pedestrian; specific functions and technical effects can be obtained by referring to the above embodiments, which are not described herein again.
The invention provides an object tracking method for a target area, which comprises the steps of obtaining single-frame or multi-frame images in one or more target areas related to one or more objects; determining whether the single or multiple frames of images contain one or more objects; displaying a single or multiple frame image containing one or more objects; and determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images. If the object is a person, the invention can utilize the human face or body characteristics of different pedestrians in the monitoring video to construct the motion tracks of different pedestrians in different target areas, thereby realizing cross-area and cross-lens tracking of different pedestrians. The method can reuse the network camera of the established video monitoring system, avoids the transformation of weak current lines and fire examination and approval, is simple and convenient to implement, and has no technical threshold. Meanwhile, 5 paths of 1080P @30FPS video streams can be analyzed in real time by multiplexing the existing network cameras, the H.264/H.265 video coding format is compatible, the maximum video code rate is 200Mbps, and the maximum resolution is 3840 × 2160.
As shown in fig. 4, the present invention further provides an object tracking system for a target area, including:
an obtaining module M10, configured to obtain a single-frame or multi-frame image in one or more target areas associated with one or more objects; the multi-frame image in the embodiment of the present application includes, as an example, one or more continuous frame images (or videos), and a plurality of single frame images. The method comprises the steps of collecting one or more multi-frame images through image collecting equipment; for example, multiplexing network cameras that have been built in the past; one or more videos are collected by multiplexing the built camera, and compared with a newly installed camera, the camera saves the transformation of a weak current line and the fire control examination and approval, is simple and convenient to implement, and has no technical threshold.
An object module M20 for determining whether the single or multiple frame image contains one or more objects; specifically, the characteristics of an object in the single-frame or multi-frame image are obtained, wherein the characteristics comprise global characteristics and/or local characteristics; and determining whether one or more objects are contained in the one or more single-frame or multi-frame images according to the global features and/or the local features. Wherein the object comprises a human or an animal.
A display module M30 for displaying a single or multiple frame image containing one or more objects; wherein, real-time display can be carried out, and pop-up display can also be carried out.
A tracking module M40, configured to determine motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame image. Wherein the motion information comprises at least one of: time of movement, geographical location of movement.
In one embodiment, the detailed description is given with reference to a pedestrian:
an obtaining module M10, configured to obtain one or more surveillance videos in one or more target areas; this application is compared with the camera of new installation through multiplexing the network camera that has been built in the past and gather one or more videos, has removed light current circuit transformation and fire control examination and approval from, and it is simple convenient to implement, does not have the technical threshold. The pedestrian flow is usually large in residential areas, schools, stations, airports, markets, hospitals and other places, and the number of covered pedestrians is large; the one or more target regions in embodiments of the present application include at least one of: residential areas, schools, stations, airports, markets and hospitals. By collecting the monitoring videos of the target areas, monitoring resources can be saved, and cross-area and cross-border tracking can be achieved by using less monitoring resources.
An object module M20 for determining whether one or more surveillance videos contain one or more pedestrians; specifically, acquiring characteristics of pedestrians in one or more monitoring videos, wherein the characteristics comprise global characteristics and/or local characteristics; and determining whether one or more pedestrians are contained in one or more monitoring videos according to the global features and/or the local features. Wherein the global features include at least one of: human face features, human body features; and/or, the local features include at least one of: human face features, human body features.
Specifically, one or more monitoring videos containing one or more global features and local features are obtained;
inputting a certain frame image containing one or more global features and local features into a layered vectorization model, and acquiring a global feature vector and a local feature vector of the frame image;
and determining whether one or more pedestrians are contained in the certain frame image according to the global feature vector and the local feature vector of the certain frame image.
The layered vectorization model is actually a multi-layer feature coding process. A single layer signature consists of the following steps: firstly, all images containing human faces or human bodies in a picture library are partitioned; secondly, extracting local features (such as LBP and SIFT) of each block of region to form a local feature descriptor; then, quantizing all local feature descriptors to form a dictionary; and finally, according to the mapping of the dictionary information and the face or human body image, coding to form a face or human body feature vector of the face or human body image, and defining the face or human body feature vector as face or human body DNA.
As an example, the method for determining whether one or more pedestrians are included in the certain frame of image by using human body or human face features as global features and/or local features specifically includes:
inputting the certain frame of image containing one or more human faces or human bodies into the layered vectorization model;
dividing the frame image containing one or more human faces or human bodies into one or more image blocks;
extracting local features of each image block, and acquiring a local feature descriptor of each image block according to the local features;
quantizing the local feature descriptors of each image block to generate an image block feature dictionary;
according to the mapping between the image block feature dictionary and the frame image, encoding to form a human face or human body feature vector of the frame image;
acquiring a human face or human body feature vector of the frame image; and determining whether one or more pedestrians are contained in the certain frame of image according to the face or human body feature vector of the frame of image. As an example, the present application defines the face or body feature vector of the frame image as a face or body DNA. The human face or human body feature vector is not affected by interference factors, and the interference factors comprise at least one of the following: illumination, shading, angle, age, race. Specifically, the face features include at least one of: eye shape, nose shape, mouth shape, eye separation distance, position of five sense organs, face contour. The human body characteristics include at least one of: dress, body type, hairstyle, and posture. By way of example, global features in embodiments of the present application may be features that are easily ignored of insignificant details and some occur less frequently, such as: clothes logo, moles on face. The local features may be human skeleton key points, human postures, etc.
A display module M30 for displaying a single or multiple frame image containing one or more pedestrians; in the embodiment of the application, real-time display and pop-up display can be performed. As an example, when one or more target pedestrians are included in one or more monitoring videos, a corresponding monitoring picture is popped up, displayed and played in real time, and the one or more target pedestrians are tracked until the one or more target pedestrians disappear in the monitoring picture.
A tracking module M40, configured to determine motion information of the one or more pedestrians in the one or more target areas according to the displayed single-frame or multi-frame image. Wherein the motion information comprises at least one of: time of movement, geographical location of movement.
The method comprises the steps of acquiring a single-frame or multi-frame image containing one or more human faces or human bodies; inputting a certain frame of image containing one or more human faces or human bodies into a layered vectorization model to obtain a human face or human body feature vector of the certain frame of image; and identifying whether the certain frame image contains the human face or the human body of one or more target pedestrians or not according to the human face or the human body feature vector of the certain frame image. The system can identify whether a single-frame or multi-frame image contains the human face or the human body of one or more target pedestrians, then judge the image acquisition equipment where the single-frame or multi-frame image comes from, and generate the motion information of one or more target pedestrians according to the corresponding geographic position of the image acquisition equipment, so that the one or more target pedestrians can be tracked across regions and across environmental heads.
As an example, for example, videos shot by 8 cameras in a certain residential area are obtained, each camera shoots three segments of videos, whether a human face or a human body exists in the 24 segments of videos is manually watched, video segments with the human face or the human body existing in the 24 segments of videos are cut out, the video segments with the human face or the human body existing are cut into frames and frames, images with the human face or the human body are included in the frames, then the images with the human face or the human body in each frame are input into a hierarchical vectorization model, and a human face or human body feature vector of each frame of images is obtained; and identifying whether the certain frame of image contains the human face or the human body of one or more target pedestrians or not according to the human face or the human body feature vector of each frame of image. Each layer in the layered vectorization model comprises one or more deep neural networks after training is completed, and the deep neural networks are trained according to images containing faces or human bodies of target pedestrians. If the human face or the human body of one or more target pedestrians exists in some video segments, the movement time of the one or more target pedestrians is directly obtained from the video segments, then the video segments are judged to come from which cameras, and the movement geographic position of the one or more target pedestrians can be approximately obtained according to the installation positions of the cameras; so that cross-shot tracking can be achieved for the one or more target pedestrians. The object in the embodiment of the present application is, for example, a missing child, a suspect in a certain state, or the like.
In the embodiment of the application, if the object is an animal body, the tracking method is consistent with that of a pedestrian; specific functions and technical effects can be obtained by referring to the above embodiments, which are not described herein again.
In another embodiment, a pedestrian is taken as an object, and the following is explained in detail:
acquiring an image to be compared containing one or more pedestrians; wherein, the images to be compared are at least two single-frame or multi-frame images; the sources of the images to be compared include: identification photo image and image collected by camera.
Processing the images to be compared, and mapping the images to be compared to the same comparison space through at least two deep neural networks;
and comparing the images to be compared through the same comparison space, and determining whether one or more same pedestrians exist in the images to be compared.
Selecting one single-frame or multi-frame image in the images to be compared as a reference frame image, and using the rest single-frame or multi-frame images in the images to be compared as comparison frame images;
inputting the reference frame image and the comparison frame image into at least two deep neural networks respectively, and mapping the comparison frame image and the reference frame image to the same comparison space at the same time;
comparing one or more human faces or human body features in the comparison frame image with one or more human faces or human body features in the reference frame image in the same comparison space;
and if the one or more human face or human body features in the comparison frame image are the same as the one or more human face or human body features in the reference frame image or the similarity of the one or more human face or human body features in the comparison frame image is greater than a preset value, one or more same pedestrians exist in the reference frame image and the comparison frame image.
In the embodiment of the application, if one or more same pedestrians exist in the reference frame image and the comparison frame image, and one or more target pedestrians exist in the reference frame image, it can be determined that one or more target pedestrians also exist in the rest of the single-frame or multi-frame images. Finding out corresponding monitoring videos or video clips, directly obtaining the movement time of the one or more target pedestrians from the monitoring videos or video clips, judging which cameras the monitoring videos or video clips come from, and according to the installation positions of the cameras, roughly obtaining the movement geographic positions of the one or more target pedestrians, so that cross-region and cross-lens tracking can be achieved for the one or more target pedestrians.
As an example, for example, a video captured by 10 cameras in a certain hospital is obtained, each camera captures two segments of video, whether a human face or a human body exists in the 20 segments of video is manually watched, video segments in which the human face or the human body exists in the 20 segments of video are cut, the video segments in which the human face or the human body exists are then divided into frames and frames, which contain images of the human face or the human body, and then each frame containing the images of the human face or the human body is input into at least two deep neural networks, so that one frame containing the one or more human faces or the human body and another frame containing the one or more human faces or the human body are simultaneously mapped into the same comparison space; and comparing one or more faces or human bodies in the certain frame of image and the other frame of image in the comparison space, and determining whether one or more identical faces or human bodies exist in the certain frame of image and the other frame of image according to a comparison result. If one or more identical faces or human bodies exist in the video segments, and the identical faces or human bodies contain the faces or human bodies of one or more target pedestrians; acquiring each frame image of the face or the body of the one or more target pedestrians from the one or more videos; and determining the motion information of the one or more target pedestrians according to the acquired image of each frame containing the human face or the human body of the one or more target pedestrians. Wherein the motion information comprises at least one of: time of movement, geographical location of movement. The deep neural network is a trained deep neural network, and the deep neural network is trained according to images of faces or human bodies including target pedestrians. If the human face or the human body of one or more target pedestrians exists in some video segments, the movement time of the one or more target pedestrians is directly obtained from the video segments, then the video segments are judged to come from which cameras, and the movement geographic position of the one or more target pedestrians can be approximately obtained according to the installation positions of the cameras; so that cross-shot tracking can be achieved for the one or more target pedestrians. The target pedestrian in the embodiment of the present application is a person such as a doctor, a patient, a ticket vendor, or the like.
In the embodiment of the application, if the object is an animal body, the tracking method is consistent with that of a pedestrian; specific functions and technical effects can be obtained by referring to the above embodiments, which are not described herein again.
The invention provides an object tracking system for a target area, which acquires single-frame or multi-frame images in one or more target areas associated with one or more objects through an acquisition module; the object module determines whether the single-frame or multi-frame image contains one or more objects; the display module displays a single-frame or multi-frame image containing one or more objects; the tracking module determines motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images. If the object is a person, the invention can utilize the human face or body characteristics of different pedestrians in the monitoring video to construct the motion tracks of different pedestrians in different target areas, thereby realizing cross-area and cross-lens tracking of different pedestrians. The system can reuse the network camera of the constructed video monitoring system, avoids weak current line transformation and fire control approval, is simple and convenient to implement, and has no technical threshold. Meanwhile, 5 paths of 1080P @30FPS video streams can be analyzed in real time by multiplexing the existing network cameras, the H.264/H.265 video coding format is compatible, the maximum video code rate is 200Mbps, and the maximum resolution is 3840 × 2160.
An embodiment of the present application further provides an object tracking apparatus for a target area, including:
acquiring single-frame or multi-frame images in one or more target areas associated with one or more objects;
determining whether the single or multiple frames of images contain one or more objects;
displaying a single or multiple frame image containing one or more objects;
and determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images.
In this embodiment, the data processing device executes the system or the method, and specific functions and technical effects are described with reference to the above embodiments, which are not described herein again.
An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: the mobile terminal includes a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle-mounted computer, a desktop computer, a set-top box, an intelligent television, a wearable device, and the like.
Embodiments of the present application also provide a non-transitory readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) included in the method in fig. 1 according to the embodiments of the present application.
Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.
Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the first processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.
Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.
In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.
Fig. 6 is a schematic hardware structure diagram of a terminal device according to an embodiment of the present application. FIG. 6 is a specific embodiment of the implementation of FIG. 5. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.
The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 1 in the above embodiment.
The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.
The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the data processing method described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.
The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.
The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.
The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.
The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.
The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.
As can be seen from the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 referred to in the embodiment of fig. 6 can be implemented as the input device in the embodiment of fig. 5.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.
Claims (31)
1. An object tracking method for a target area, comprising the steps of:
acquiring single-frame or multi-frame images in one or more target areas associated with one or more objects;
determining whether the single or multiple frames of images contain the one or more objects;
displaying a single-frame or multi-frame image containing the one or more objects;
and determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images.
2. The object tracking method for a target region according to claim 1, wherein the target region comprises at least one of: residential areas, stations, airports, markets and hospitals.
3. The object tracking method for the target area according to claim 1, wherein the multi-frame image includes one or more continuous frame images, a plurality of single frame images.
4. The method for tracking the object in the target area according to claim 1, wherein the features of the object in the single-frame or multi-frame image are obtained, and the features comprise global features and/or local features;
and determining whether the one or more objects are contained in the one or more single-frame or multi-frame images according to the global features and/or the local features.
5. The method of object tracking of a target region of claim 4, wherein the object comprises a human being or an animal.
6. The method of claim 5, wherein if the object is a person;
the global features include at least one of: human face features, human body features; and/or the presence of a gas in the gas,
the local features include at least one of: human face features, human body features.
7. The object tracking method for a target region according to any one of claims 4 to 6,
acquiring a single-frame or multi-frame image containing one or more global features and local features;
inputting a certain frame of image containing the one or more global features and the local features into a hierarchical vectorization model, and acquiring a global feature vector and a local feature vector of the certain frame of image;
and determining whether the one or more objects are contained in the certain frame image according to the global feature vector and the local feature vector of the certain frame image.
8. The method for tracking the object in the target region according to claim 7, wherein the image of the frame containing one or more human faces or human bodies is input into the layered vectorized model;
dividing the certain frame of image containing one or more human faces or human bodies into one or more image blocks;
extracting local features of each image block, and acquiring a local feature descriptor of each image block according to the local features;
quantizing the local feature descriptors of each image block to generate an image block feature dictionary;
according to the mapping between the image block feature dictionary and the certain frame image, encoding to form a human face or human body feature vector of the certain frame image;
and acquiring the face or human body feature vector of the certain frame of image.
9. The object tracking method for a target region according to claim 1,
acquiring an image to be compared containing the one or more objects; wherein, the images to be compared are at least two single-frame or multi-frame images;
processing the images to be compared, and mapping the images to be compared to the same comparison space through at least two deep neural networks;
and comparing the images to be compared through the same comparison space, and determining whether one or more identical objects exist in the images to be compared.
10. The object tracking method for a target region according to claim 9,
selecting one single-frame or multi-frame image in the images to be compared as a reference frame image, and using the rest single-frame or multi-frame images in the images to be compared as comparison frame images;
inputting the reference frame image and the comparison frame image into at least two deep neural networks respectively, and mapping the comparison frame image and the reference frame image to the same comparison space at the same time;
comparing one or more human faces or human body features in the comparison frame image with one or more human faces or human body features in the reference frame image in the same comparison space;
and if the one or more human face or human body features in the comparison frame image are the same as the one or more human face or human body features in the reference frame image or the similarity of the one or more human face or human body features in the comparison frame image is greater than a preset value in the comparison result, one or more same objects exist in the reference frame image and the comparison frame image.
11. The method of object tracking of a target region according to claim 9 or 10, wherein the source of the image to be compared comprises: identification photo image and image collected by camera.
12. The object tracking method of a target region according to claim 1, wherein the motion information includes at least one of: time of movement, geographical location of movement.
13. The method of claim 6, wherein the facial features include at least one of: eye shape, nose shape, mouth shape, eye separation distance, position of five sense organs, face contour.
14. The object tracking method for the target area according to claim 6, wherein the human body features include at least one of: dress, body type, hairstyle, and posture.
15. An object tracking system for a target area, comprising:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring single-frame or multi-frame images in one or more target areas associated with one or more objects;
an object module for determining whether the single or multiple frames of images contain the one or more objects;
the display module is used for displaying a single-frame or multi-frame image containing the one or more objects;
and the tracking module is used for determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images.
16. The object tracking system of claim 15, wherein the target region comprises at least one of: residential areas, stations, airports, markets and hospitals.
17. The object tracking system of a target region of claim 15, wherein the multi-frame image comprises one or more of a continuous frame image, a plurality of single frame images.
18. The object tracking system for the target region according to claim 15, wherein the features of the object in the single or multiple frames of images are obtained, the features including global features and/or local features;
and determining whether the one or more objects are contained in the one or more single-frame or multi-frame images according to the global features and/or the local features.
19. The object tracking system of claim 18, wherein the object comprises a human or an animal.
20. The object tracking system for a target area of claim 19, wherein if the object is a person;
the global features include at least one of: human face features, human body features; and/or the presence of a gas in the gas,
the local features include at least one of: human face features, human body features.
21. The object tracking system for a target region of any of claims 18 to 20,
acquiring a single-frame or multi-frame image containing one or more global features and local features;
inputting a certain frame of image containing the one or more global features and the local features into a hierarchical vectorization model, and acquiring a global feature vector and a local feature vector of the certain frame of image;
and determining whether the one or more objects are contained in the certain frame image according to the global feature vector and the local feature vector of the certain frame image.
22. The object tracking system for the target region according to claim 21, wherein the certain frame of image containing one or more human faces or human bodies is input into the layered vectorization model;
dividing the certain frame of image containing one or more human faces or human bodies into one or more image blocks;
extracting local features of each image block, and acquiring a local feature descriptor of each image block according to the local features;
quantizing the local feature descriptors of each image block to generate an image block feature dictionary;
according to the mapping between the image block feature dictionary and the certain frame image, encoding to form a human face or human body feature vector of the certain frame image;
and acquiring the face or human body feature vector of the certain frame of image.
23. The object tracking system for a target region of claim 15,
acquiring an image to be compared containing the one or more objects; wherein, the images to be compared are at least two single-frame or multi-frame images;
processing the images to be compared, and mapping the images to be compared to the same comparison space through at least two deep neural networks;
and comparing the images to be compared through the same comparison space, and determining whether one or more identical objects exist in the images to be compared.
24. The object tracking system for a target region of claim 23,
selecting one single-frame or multi-frame image in the images to be compared as a reference frame image, and using the rest single-frame or multi-frame images in the images to be compared as comparison frame images;
inputting the reference frame image and the comparison frame image into at least two deep neural networks respectively, and mapping the comparison frame image and the reference frame image to the same comparison space at the same time;
comparing one or more human faces or human body features in the comparison frame image with one or more human faces or human body features in the reference frame image in the same comparison space;
and if the one or more human face or human body features in the comparison frame image are the same as the one or more human face or human body features in the reference frame image or the similarity of the one or more human face or human body features in the comparison frame image is greater than a preset value in the comparison result, one or more same objects exist in the reference frame image and the comparison frame image.
25. The object tracking system of claim 23 or 24, wherein the source of the image to be compared comprises: identification photo image and image collected by camera.
26. The object tracking system of a target region of claim 15, wherein the motion information includes at least one of: time of movement, geographical location of movement.
27. The object tracking system for a target region of claim 20, wherein the facial features include at least one of: eye shape, nose shape, mouth shape, eye separation distance, position of five sense organs, face contour.
28. The object tracking system of a target region of claim 20, wherein the human body features include at least one of: dress, body type, hairstyle, and posture.
29. An object tracking apparatus for a target area, comprising:
acquiring single-frame or multi-frame images in one or more target areas associated with one or more objects;
determining whether the single or multiple frames of images contain the one or more objects;
displaying a single-frame or multi-frame image containing the one or more objects;
and determining the motion information of the one or more objects in the one or more target areas according to the displayed single-frame or multi-frame images.
30. An apparatus, comprising:
one or more processors; and
one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method recited by one or more of claims 1-14.
31. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method recited by one or more of claims 1-14.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010119633.XA CN111340848A (en) | 2020-02-26 | 2020-02-26 | Object tracking method, system, device and medium for target area |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010119633.XA CN111340848A (en) | 2020-02-26 | 2020-02-26 | Object tracking method, system, device and medium for target area |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111340848A true CN111340848A (en) | 2020-06-26 |
Family
ID=71187059
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010119633.XA Pending CN111340848A (en) | 2020-02-26 | 2020-02-26 | Object tracking method, system, device and medium for target area |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111340848A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215204A (en) * | 2020-11-05 | 2021-01-12 | 成都体育学院 | A method and system for analyzing human motion state information |
CN112580543A (en) * | 2020-12-24 | 2021-03-30 | 四川云从天府人工智能科技有限公司 | Behavior recognition method, system and device |
CN112906452A (en) * | 2020-12-10 | 2021-06-04 | 叶平 | Automatic identification, tracking and statistics method and system for antelope buffalo deer |
CN113794861A (en) * | 2021-09-10 | 2021-12-14 | 王平 | Monitoring system and monitoring method based on big data network |
CN114202717A (en) * | 2021-10-18 | 2022-03-18 | 北京贝思科技术有限公司 | Single-camera target tracking method and device based on non-identity information and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102469303A (en) * | 2010-11-12 | 2012-05-23 | 索尼公司 | Video surveillance |
US20170148174A1 (en) * | 2015-11-20 | 2017-05-25 | Electronics And Telecommunications Research Institute | Object tracking method and object tracking apparatus for performing the method |
CN108875588A (en) * | 2018-05-25 | 2018-11-23 | 武汉大学 | Across camera pedestrian detection tracking based on deep learning |
CN110059673A (en) * | 2019-05-05 | 2019-07-26 | 重庆中科云从科技有限公司 | A kind of recognition of face premises automation test macro and method |
-
2020
- 2020-02-26 CN CN202010119633.XA patent/CN111340848A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102469303A (en) * | 2010-11-12 | 2012-05-23 | 索尼公司 | Video surveillance |
US20170148174A1 (en) * | 2015-11-20 | 2017-05-25 | Electronics And Telecommunications Research Institute | Object tracking method and object tracking apparatus for performing the method |
CN108875588A (en) * | 2018-05-25 | 2018-11-23 | 武汉大学 | Across camera pedestrian detection tracking based on deep learning |
CN110059673A (en) * | 2019-05-05 | 2019-07-26 | 重庆中科云从科技有限公司 | A kind of recognition of face premises automation test macro and method |
Non-Patent Citations (1)
Title |
---|
李晋等: "基于双层异构深度神经网络模型的人脸识别关键技术研究", 《电信工程技术与标准化》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215204A (en) * | 2020-11-05 | 2021-01-12 | 成都体育学院 | A method and system for analyzing human motion state information |
CN112906452A (en) * | 2020-12-10 | 2021-06-04 | 叶平 | Automatic identification, tracking and statistics method and system for antelope buffalo deer |
CN112580543A (en) * | 2020-12-24 | 2021-03-30 | 四川云从天府人工智能科技有限公司 | Behavior recognition method, system and device |
CN112580543B (en) * | 2020-12-24 | 2024-04-16 | 四川云从天府人工智能科技有限公司 | Behavior recognition method, system and device |
CN113794861A (en) * | 2021-09-10 | 2021-12-14 | 王平 | Monitoring system and monitoring method based on big data network |
CN114202717A (en) * | 2021-10-18 | 2022-03-18 | 北京贝思科技术有限公司 | Single-camera target tracking method and device based on non-identity information and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110163048B (en) | Hand key point recognition model training method, hand key point recognition method and hand key point recognition equipment | |
CN112991656B (en) | Human body abnormal behavior recognition alarm system and method under panoramic monitoring based on attitude estimation | |
CN111340848A (en) | Object tracking method, system, device and medium for target area | |
CN111541907B (en) | Article display method, apparatus, device and storage medium | |
KR20190032084A (en) | Apparatus and method for providing mixed reality content | |
CN111339943A (en) | Object management method, system, platform, equipment and medium | |
CN110929770A (en) | Intelligent tracking method, system and equipment based on image processing and readable medium | |
CN110807361A (en) | Human body recognition method and device, computer equipment and storage medium | |
CN108388882A (en) | Based on the gesture identification method that the overall situation-part is multi-modal RGB-D | |
CN112200187A (en) | Target detection method, device, machine readable medium and equipment | |
CN112085795B (en) | Article positioning method, device, equipment and storage medium | |
CN111047621A (en) | Target object tracking method, system, equipment and readable medium | |
CN112818807A (en) | Tumble detection method, tumble detection device, tumble detection apparatus, and storage medium | |
CN110929619A (en) | Target object tracking method, system and device based on image processing and readable medium | |
CN114299615A (en) | Action recognition method, device, medium and equipment based on multi-feature fusion based on key points | |
CN111291638A (en) | Object comparison method, system, equipment and medium | |
CN110069996A (en) | Headwork recognition methods, device and electronic equipment | |
CN111199169A (en) | Image processing method and device | |
CN112529939A (en) | Target track matching method and device, machine readable medium and equipment | |
CN111260697A (en) | Target object identification method, system, device and medium | |
CN109040588A (en) | Face image photographing method and device, storage medium and terminal | |
JP2013004001A (en) | Display control device, display control method, and program | |
CN113936240A (en) | Method, device and equipment for determining sample image and storage medium | |
CN111310595B (en) | Method and device for generating information | |
CN113269730A (en) | Image processing method, image processing device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200626 |
|
RJ01 | Rejection of invention patent application after publication |