CN109903339A - A kind of video group personage's position finding and detection method based on multidimensional fusion feature - Google Patents
A kind of video group personage's position finding and detection method based on multidimensional fusion feature Download PDFInfo
- Publication number
- CN109903339A CN109903339A CN201910235608.5A CN201910235608A CN109903339A CN 109903339 A CN109903339 A CN 109903339A CN 201910235608 A CN201910235608 A CN 201910235608A CN 109903339 A CN109903339 A CN 109903339A
- Authority
- CN
- China
- Prior art keywords
- video
- feature
- detection
- personage
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Image Analysis (AREA)
Abstract
The present invention discloses a kind of video group personage's position finding and detection method of multidimensional fusion feature.Multi-layer video features figure is extracted in the invention first, establish the semantic information that top-down and bottom-up binary feature treatment channel sufficiently excavates video, then fusion multi-layer video features figure obtains multidimensional fusion feature, grab video candidate target, last parallel processing candidate target position returns and category classification, completes video group personage's detection and localization.The present invention obtains video semanteme information abundant by fusion multi-layer feature, while carrying out multitask predicted operation, effectively promotes the speed of group personage detection and localization, has good accuracy rate and implementation.
Description
Technical field
It is especially a kind of special based on multidimensional fusion the present invention relates to the interleaving techniques such as computer vision, pattern-recognition field
Video group personage's position finding and detection method of sign.
Background technique
With the development of video acquisition and image processing techniques, video group personage's detection and localization is computer vision
One popular research direction in field, is with a wide range of applications, and it is also more high-rise computer vision problems
Basis, such as dense population monitoring, social semantic analysis etc..
The task definition of video group personage's detection and localization is not difficult matter for human eye, mainly by different colours
The perceptual positioning of block, the position for sorting out target person, but what is handled for computer is rgb matrix, such as
What regional location where being partitioned into group personage in scene, and reducing influence of the background area to detection and localization is part hardly possible
Thing.
The bounding box recurrence of the development experience of video group personage detection and localization algorithm, deep neural network rise, more ginsengs
It examines window development, difficult sample excavation and focusing and multiple dimensioned multiport detects the progress of these great-leap-forward technologies, according to algorithm
Core can be divided into two types, and one is the detection and localization algorithm based on traditional-handwork feature, another kind is based on depth
The detection and localization algorithm of habit.Before 2013, traditional craft is based primarily upon to the detection and localization of personage in video or image
Feature is limited by feature description and computing capability, and computer vision research personnel try one's best to design the detection of diversification
Algorithm makes up deficiency of the hand-designed feature in image feature representation ability, and using exquisite calculation method to detection model
Accelerate, reduces space-time consumption.In the manual feature detection algorithm for occurring several representatives among these, Viola-Jones detector,
HOG detector, deformable part model detector.
With the rise of deep neural network, the detection model based on deep learning overcomes the detection of traditional-handwork feature and calculates
Method describes limited disadvantage to feature, the expression of automatic learning characteristic from big data, wherein including thousands of parameter, needle
New effective character representation can be obtained quickly by training study to new application scenarios, the detection mould based on deep learning
Type, which is broadly divided into, is nominated based on region and based on end-to-end both direction.Detection model based on region nomination is first to be detected
Image selects a large amount of region candidate frame, may include the target to be detected in these candidate frames, then extract each candidate frame
Feature obtains feature vector, and characteristic of division vector obtains classification information, finally carries out position and returns to obtain corresponding coordinate information.
Candidate frame extraction is given up based on detection end to end, directly feature extraction, candidate frame is returned and classification is placed on a convolution
It is completed in network.
Since group's personage's behavior has the feature of communality and diversity, be interactive interpersonal behavior and people and
The set of behavior interaction between environment, is mutually blocked or people so person to person easily occurs in group personage behavior generating process
With mutually blocking for object, in addition when video imaging the factors such as illumination variation interference, the existing detection based on deep learning
In the detection process character positions can cannot be accurately positioned because of these disturbing factors in model, or even cause personage's missing inspection.
Summary of the invention
Goal of the invention: in group's personage's scene, due to existing simultaneously multiple personages, in order to efficiently locate detection
Group personage needs to carry out each personage accurately feature and describes.The existing detection model based on deep learning is usually adopted
Monohierarchy top layer video features are used to return as detection foundation although top layer video features include video semanteme abundant
Character positions out are relatively rough.In recent years, also there are some detection models using multi-layer fusion video features, these
Although the video features fusion bottom video feature of model is only used during Fusion Features with promoting Detection accuracy
Unidirectional fusion structure, this will lead to the characteristic information that each hierarchy characteristic figure only includes current level and more high-level, no
The mapping result that all hierarchy characteristics can be embodied, to prevent testing result from being optimal.To overcome the shortcomings of existing technologies,
The present invention proposes a kind of video group personage's position finding and detection method based on multidimensional fusion feature, and this method extracts multi-level view
Frequency feature merges multi-level video features using two-way treatment channel and forms multidimensional fusion feature, can effectively utilize all
The characteristic information of level obtains video semanteme information abundant, to more comprehensively be retouched to the character features in video
It states, while parallel progress multitask predicted operation, effectively promotes the speed of group personage detection and localization, there is good accuracy rate
And implementation.
Technical solution: to achieve the above object, technical solution proposed by the present invention are as follows:
A kind of video group personage's position finding and detection method based on multidimensional fusion feature includes the steps that sequence executes (1)
To (8):
(1) video as training sample is inputted, the kind of object and position in video it is known that carry out video big frame by frame
The size of each frame video frame is uniformly scaled H × W size by small normalization, and H indicates video frame height, and W indicates video frame
Width;
(2) it is obtained using InceptionV3 model frame by frame to by step (1) treated, video carries out feature extraction
The characteristics of image of each level of video forms multi-layer video features figure F ', F '={ Fi' | i=1,2 ..., numF }, Fi' indicate
I-th tomographic image feature, numF indicate the total number of plies of video image characteristic extracted, F1' indicate underlying image feature, F 'numFIt indicates
Top layer images feature;
(3) the multi-layer video features figure F ' carry out Fusion Features operation to being drawn into, includes the steps that successively executing (3-
1) to (3-4):
(3-1) increases by one from F 'numFTo F1' fusion channel, to the figure F ' progress of multi-layer video features from top-level feature
Downward Fusion Features obtain top-down video features figure Ftop-down;The method of Fusion Features are as follows: since top layer images feature
F′numFStart, traverses each tomographic image feature F downwardsi', to Fi' carry out convolution kernel successively as conv1, step-length stride1's
Convolution operation and upSample1Up-sampling operation again, obtainsIt finally obtains
(3-2) increase by one fromIt arrivesFusion channel, it is rightIt carries out upward from low-level image feature
Fusion Features, obtain bottom-up video features figure Fbottom-up, Indicate bottom-up video features figure Fbottom-upThe i-th tomographic image feature;The method of Fusion Features are as follows:
A. i=1 is initialized;
B. it calculatesIt is rightProgress convolution kernel is conv2, step-length stride2Convolution
Operation, obtains resultIt calculates
C. i=i+1 is updated;
D. circulation executes step b to c, until i > numF is obtained after circulation terminates:
(3-3) is to bottom-up video features figure Fbottom-upIn each tomographic image featureCarrying out convolution kernel is
conv3, step-length stride3Convolution operation, obtained result is denoted as Fi, obtained all FiConstitute multidimensional fusion feature figure
F, F={ Fi| i=1,2 ..., numF };
(4) by multidimensional fusion feature figure F input area candidate network, K detection target is exported, obtains target position set
Box={ Boxj| j=1,2 ..., K } and corresponding personage's Making by Probability Sets Person={ Personj| j=1,2 ..., K }, it is described
BoxjIndicate the position of j-th of detection target, PersonjIndicate that j-th of detection target is the probability of personage, Personj∈ [0,
1], PersonjThe bigger expression detection target of value be personage a possibility that it is bigger;
(5) classified according to Person to detection target, the real border frame position that K detection target is arranged is
PPerson={ PPersonj| j=1,2 ..., K }, calculate group personage classification loss function Losscls, calculation formula isWherein, PPersonjIndicate the true classification of j-th of detection target, PPersonj
Value is 0 or 1, PPersonj=0 indicates that the detection target is not personage, PPersonj=1 indicates that the detection target is personage;
(6) according to Box and Person regressive object position, the actual position of K detection target is set are as follows:
BBox={ BBoxj| j=1,2 ..., K }
Calculate group's character positions loss function are as follows:
Wherein, BBoxjIndicate the actual position of j-th of detection target;
(7) group personage detection and localization penalty values Loss, calculation formula Loss=Loss are calculatedcls+λLosslocIf
Loss≤Lossmax, then region candidate network is trained finishes, output area candidate network parameter, executes step (8);If
Loss > Lossmax, then each layer of update area candidate network of parameterThen return step (4), again
Carry out person detecting;LossmaxIt is preset crowd's detection and localization maximum loss value, λ is that position returns and human classification task
Balance factor, α are the learning rates of stochastic gradient descent method,Indicate the local derviation of group personage detection and localization loss function
Number;
(8) video to be detected is reacquired, video to be detected is successively normalized, feature extraction and feature
Fusion, obtains the multidimensional fusion feature figure F of video to be detectednew, by FnewThe trained region candidate net of input step (7)
Network obtains the group personage detection and localization result in new video.
Further, in the step (1), H=720, W=1280.
Further, in the step (2), numF=4.
Further, in the step (3), conv1=1, stride1=1, upSample1=2, conv2=3,
stride2=2, conv3=1, stride3=1.
Further, in the step (4), K=12;In the step (7), Lossmax=0.5, λ=1, α=
0.0001。
The utility model has the advantages that the invention adopts the above technical scheme compared with prior art, have following technical effect that
The present invention extracts the video presentation of video multi-layer, carries out binary feature processing, merges multi-layer video features figure
Multidimensional fusion feature is obtained, video candidate target is grabbed, parallel processing candidate target position returns and category classification, completes video
Group's personage's detection and localization.The present invention obtains video semanteme information abundant by fusion multi-layer feature, while carrying out more
Business predicted operation, effectively promotes the speed of group personage detection and localization, has good accuracy rate and implementation, specifically:
(1) present invention establishes top-down and bottom-up binary feature treatment channel, sufficiently excavates the semanteme of video
Information improves hierarchy characteristic utilization rate.
(2) present invention fusion multidimensional video feature organically combines position accurately low-level image feature and semantic top layer abundant
Feature can preferably improve detection accuracy.
(3) the multiple prediction tasks of parallel processing of the present invention, and task balance factor is set, be conducive to be built according to scene characteristic
Found optimum detection model.
Detailed description of the invention
Fig. 1 is the video group personage position finding and detection method process based on multidimensional fusion feature;
Fig. 2 is the structure chart of one of present invention region candidate network;
Fig. 3 is distinct methods Detection accuracy comparison diagram.
Specific embodiment
Technical solution of the present invention is described in further detail in the following with reference to the drawings and specific embodiments:
Embodiment 1: Fig. 1 is video group personage's position finding and detection method based on multidimensional fusion feature that the present embodiment proposes
Flow chart, specifically includes the following steps:
One, pre-process: video of the input as training sample, kind of object and position in video it is known that video by
Frame carries out size normalization, the size of each frame video frame is uniformly scaled H × W size, H indicates video frame height, W table
Show video frame width;This step is equivalent to pretreatment, is conducive to subsequent detection, in the present embodiment, H=720, W=1280.
Two, feature extraction: using InceptionV3 model frame by frame to video carries out feature by step (1) treated
It extracts, obtains the characteristics of image of each level of video, form multi-layer video features figure F ', F '={ Fi' | i=1,2 ...,
NumF }, Fi' indicating the i-th tomographic image feature, numF indicates the total number of plies of video image characteristic extracted, F1' indicate bottom layer image
Feature, F 'numFIndicate top layer images feature, in the present embodiment, numF=4.
Low-level image feature target position information is accurate, can return out the detailed location data of target, but can characterize
Semantic information is fewer, and data volume is big, and operation processing, which is got up, needs to occupy a large amount of space-time consumption.Although top-level feature is wrapped
The semanteme contained is abundant, but because of Multilevel method, target position is relatively coarse, and the target semanteme returned out is not fine and smooth, in group
In personage's scene, it be easy to cause erroneous judgement.The feature of each level has respective advantage and disadvantage, in order to take out in group's personage's scene
Accurately group personage location information is taken out, using the characteristics of image of InceptionV3 model extraction video multi-layer, is formed more
Hierarchy characteristic figure.It the use of the reason of InceptionV3 model is that this Feature Selection Model is not only functional in this step,
And there is powerful calculated performance, convenient for processing later.
Three, Fusion Features: the multi-layer video features figure F ' carry out Fusion Features operation to being drawn into, including successively execute
The step of (3-1) to (3-4):
(3-1) increases by one from F 'numFTo F1' fusion channel, to the figure F ' progress of multi-layer video features from top-level feature
Downward Fusion Features obtain top-down video features figure Ftop-down;The method of Fusion Features are as follows: since top layer images feature
F′numFStart, traverses each tomographic image feature F downwardsi', to Fi' carry out convolution kernel successively as conv1, step-length stride1's
Convolution operation and upSample1Up-sampling operation again, obtainsIt finally obtains
(3-2) increase by one fromIt arrivesFusion channel, to Ftop-downIt carries out upward from low-level image feature
Fusion Features, obtain bottom-up video features figure Fbottom-up, Indicate bottom-up video features figure Fbottom-upThe i-th tomographic image feature;The method of Fusion Features are as follows:
A. i=1 is initialized;
B. it calculatesIt is rightProgress convolution kernel is conv2, step-length stride2Convolution
Operation, obtains resultIt calculates
C. i=i+1 is updated;
D. circulation executes step b to c, until i > numF is obtained after circulation terminates:
(3-3) is to bottom-up video features figure Fbottom-upIn each tomographic image featureCarrying out convolution kernel is
conv3, step-length stride3Convolution operation, obtained result is denoted as Fi, obtained all FiConstitute multidimensional fusion feature figure
F, F={ Fi| i=1,2 ..., numF }.
In step 3, conv1=1, stride1=1, upSample1=2, conv2=3, stride2=2, conv3=1,
stride3=1.
The fusion of multilayer feature is not simply to be added, and first has to consider whether the size of hierarchy characteristic is consistent,
The secondary reasonability for needing to consider hierarchy characteristic fusion can or can not lead to the case where detection effect reduces instead after merging.The present invention
Improvement and design, every layer of the top-down structure spy comprising current layer and higher have been carried out to existing Feature fusion
Reference breath, can be directly used every layer of optimal size and is detected, in order to embody the mapping result of all hierarchy characteristics
To be optimal detection effect, bottom-up channel is especially increased, Opposite direction connection is carried out to top-down processing result, more
Bottom position information is efficiently utilized, and convolution is carried out to each fusion results using convolution operation finally, on eliminating
The aliasing effect of sampling.
Four, region candidate network training:
Region candidate network is a kind of common target detection network, and main functional module is as shown in Fig. 2, its meeting first
Needs of the k rectangular window to adapt to different size objectives are generated for each pixel of sliding window, then by each rectangle
The location information of window and corresponding characteristics of image input network, carry out classification layer respectively for each rectangular window and return layer
Operation.Classification layer mainly differentiates the probability in current rectangle window there are personage, and parameter includes personage's weight parameter WPAnd background
Interference parameter WE.It returns layer and mainly obtains coordinate information of the current rectangle window in full scale drawing picture, parameter includes rectangular window
Mouthful coordinate and the high offset weighting parameter W of widthx、Wy、WhAnd Ww.In whole region candidate network training process, all ginsengs are shared
Several settings and adjustment.
The training process of region candidate network is as follows:
Multidimensional fusion feature figure F input area candidate network is exported K detection target, herein K=12, therefore by (4-1)
Obtain target position set Box={ Boxj| j=1,2 ..., 12 } and corresponding personage's Making by Probability Sets Person={ Personj|j
=1,2 ..., 12 }, the BoxjIndicate the position of j-th of detection target, PersonjIndicate that j-th of detection target is personage's
Probability, Personj∈ [0,1], PersonjThe bigger expression detection target of value be personage a possibility that it is bigger;
(4-2) classifies to detection target according to Person, and the real border frame position that 12 detection targets are arranged is
PPerson={ PPersonj| j=1,2 ..., 12 }, calculate group personage classification loss function Losscls, calculation formula isWherein, PPersonjIndicate the true classification of j-th of detection target, PPersonj
Value is 0 or 1, PPersonj=0 indicates that the detection target is not personage, PPersonj=1 indicates that the detection target is personage;
The actual position of 12 detection targets is arranged according to Box and Person regressive object position in (4-3) are as follows:
BBox={ BBoxj| j=1,2 ..., 12 }
Calculate group's character positions loss function are as follows:
Wherein, BBoxjIndicate the actual position of j-th of detection target;
(4-4) calculates group personage detection and localization penalty values Loss, calculation formula Loss=Losscls+λLosslocIf
Loss≤Lossmax, then region candidate network is trained finishes, output area candidate network parameter, executes step (8);If
Loss > Lossmax, then each layer of update area candidate network of parameterThen return step (4), again
Carry out person detecting;LossmaxIt is preset crowd's detection and localization maximum loss value, λ is that position returns and human classification task
Balance factor, α are the learning rates of stochastic gradient descent method,Indicate the local derviation of group personage detection and localization loss function
It counts, Loss in the present embodimentmax=0.5, λ=1, α=0.0001.
Five, video to be detected is detected using trained region candidate network:
Video to be detected is reacquired, video to be detected is successively normalized, feature extraction and feature are melted
It closes, obtains the multidimensional fusion feature figure F of video to be detectednew, by FnewThe trained region candidate network of input step (7),
Obtain the group personage detection and localization result in new video.Using area candidate network carries out target detection, it is contemplated that people from group
The feature that object number is more, task is complicated, it is parallel to carry out position recurrence and category classification operation, improve detection efficiency.In classification
During classification, because it is personage that detection is with clearly defined objective, classification two is divided to for two class of personage and non-personage, reduce detection other
The time of classification waste, and true classification results are incorporated, improve the accuracy of category classification.It is put back into during returning in place,
In order to simplify calculating process, the other target position of figure kind, recurrence task of refining only are returned.During integrally training, add
Enter task balance factor, according to scene type, adjusts optimal task ratio, complete video group personage's detection and localization.
Six, experiment simulation
During test method performance, currently common object detection method Faster-RCNN, FPN and Mask- are selected
Method, evaluation criterion are detection Detection accuracies under different IoU threshold values and different sizes to RCNN as a comparison.So-called IoU is
Refer to friendship and the ratio of testing result and legitimate reading, IoU ∈ [0,1], IoU value is higher, the result of detection closer to legitimate reading,
Remember that IoU >=0.5 is AP_50 in test process, note IoU >=0.75 is AP_75.In evaluation procedure, by target size be divided into it is small,
In, big three classifications, be denoted as AP_S, AP_M, AP_L respectively.Fig. 3 give the present invention with control methods Faster-RCNN,
The Detection accuracy comparison diagram of FPN, Mask-RCNN.From experimental result it can be found that with monohierarchy top-level feature is used only
Faster-RCNN is compared, and has been used three kinds of methods of multi-layer fusion feature to obtain higher Detection accuracy, has been illustrated multilayer
Grade fusion feature has stronger feature representation ability compared to monohierarchy top-level feature.FPN and Mask-RCNN are in characteristic processing
During check configuration be used only carry out fusion treatment, the present invention obtains more accurate detection using two-way treatment channel
Effect, experimental result also show this patent method and have obtained more preferably detection accurately for different IoU threshold values and target size
Rate.
The above is only a preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (5)
1. a kind of video group personage's position finding and detection method based on multidimensional fusion feature, which is characterized in that executed including sequence
The step of (1) to (8):
(1) video as training sample is inputted, the kind of object and position in video are returned it is known that carrying out size frame by frame to video
The size of each frame video frame is uniformly scaled H × W size by one processing, and H indicates video frame height, and W indicates that video frame is wide
Degree;
(2) video is obtained frame by frame to video carries out feature extraction by step (1) treated using InceptionV3 model
The characteristics of image of each level forms multi-layer video features figure F ', F '={ Fi' | i=1,2 ..., numF }, Fi' indicate i-th
Tomographic image feature, numF indicate the total number of plies of video image characteristic extracted, F1' indicate underlying image feature, F 'numFIndicate top
Tomographic image feature;
(3) the multi-layer video features figure F ' carry out Fusion Features operation to being drawn into, includes the steps that successively executing (3-1) extremely
(3-4):
(3-1) increases by one from F 'numFTo F1' fusion channel, it is downward from top-level feature to multi-layer video features figure F ' progress
Fusion Features, obtain top-down video features figure Ftop-down;The method of Fusion Features are as follows: since top layer images feature
F′numFStart, traverses each tomographic image feature F downwardsi', to Fi' carry out convolution kernel successively as conv1, step-length stride1's
Convolution operation and upSample1Up-sampling operation again, obtainsFinally obtain Ftop-down={ Fi top-down| i=1,
2 ..., numF };
(3-2) increases by one from F1 top-downIt arrivesFusion channel, to Ftop-downCarry out the spy upward from low-level image feature
Sign fusion, obtains bottom-up video features figure Fbottom-up, Fbottom-up={ Fi bottom-up| i=1,2 ..., numF },
Fi bottom-upIndicate bottom-up video features figure Fbottom-upThe i-th tomographic image feature;The method of Fusion Features are as follows:
A. i=1 is initialized;
B. F is calculatedi bottom-up=Fi top-down, to Fi bottom-upProgress convolution kernel is conv2, step-length stride2Convolution behaviour
Make, obtains resultIt calculates
C. i=i+1 is updated;
D. circulation executes step b to c, until i > numF is obtained after circulation terminates:
Fbottom-up={ Fi bottom-up| i=1,2 ..., numF }
(3-3) is to bottom-up video features figure Fbottom-upIn each tomographic image feature Fi bottom-upCarrying out convolution kernel is
conv3, step-length stride3Convolution operation, obtained result is denoted as Fi, obtained all FiConstitute multidimensional fusion feature figure
F, F={ Fi| i=1,2 ..., numF };
(4) by multidimensional fusion feature figure F input area candidate network, K detection target is exported, obtains target position set Box
={ Boxj| j=1,2 ..., K } and corresponding personage's Making by Probability Sets Person={ Personj| j=1,2 ..., K }, the Boxj
Indicate the position of j-th of detection target, PersonjIndicate that j-th of detection target is the probability of personage, Personj∈ [0,1],
PersonjThe bigger expression detection target of value be personage a possibility that it is bigger;
(5) classified according to Person to detection target, the real border frame position that K detection target is arranged is PPerson
={ PPersonj| j=1,2 ..., K }, calculate group personage classification loss function Losscls, calculation formula isWherein, PPersonjIndicate the true classification of j-th of detection target, PPersonj
Value is 0 or 1, PPersonj=0 indicates that the detection target is not personage, PPersonj=1 indicates that the detection target is personage;
(6) according to Box and Person regressive object position, the actual position of K detection target is set are as follows:
BBox={ BBoxj| j=1,2 ..., K }
Calculate group's character positions loss function are as follows:
Wherein, BBoxjIndicate the actual position of j-th of detection target;
(7) group personage detection and localization penalty values Loss, calculation formula Loss=Loss are calculatedcls+λLosslocIf Loss≤
Lossmax, then region candidate network is trained finishes, output area candidate network parameter, executes step (8);If Loss >
Lossmax, then each layer of update area candidate network of parameterThen return step (4), re-start
Person detecting;LossmaxIt is preset crowd's detection and localization maximum loss value, λ is the balance of position recurrence and human classification task
The factor, α are the learning rates of stochastic gradient descent method,Indicate the partial derivative of group personage detection and localization loss function;
(8) video to be detected is reacquired, video to be detected is successively normalized, feature extraction and feature are melted
It closes, obtains the multidimensional fusion feature figure F of video to be detectednew, by FnewThe trained region candidate network of input step (7),
Obtain the group personage detection and localization result in new video.
2. a kind of video group personage's position finding and detection method based on multidimensional fusion feature according to claim 1, special
Sign is, in the step (1), H=720, W=1280.
3. a kind of video group personage's position finding and detection method based on multidimensional fusion feature according to claim 1, special
Sign is, in the step (2), numF=4.
4. a kind of video group personage's position finding and detection method based on multidimensional fusion feature according to claim 1, special
Sign is, in the step (3), conv1=1, stride1=1, upSample1=2, conv2=3, stride2=2, conv3
=1, stride3=1.
5. a kind of video group personage's position finding and detection method based on multidimensional fusion feature according to claim 1, special
Sign is, in the step (4), K=12;In the step (7), Lossmax=0.5, λ=1, α=0.0001.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910235608.5A CN109903339B (en) | 2019-03-26 | 2019-03-26 | Video group figure positioning detection method based on multi-dimensional fusion features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910235608.5A CN109903339B (en) | 2019-03-26 | 2019-03-26 | Video group figure positioning detection method based on multi-dimensional fusion features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109903339A true CN109903339A (en) | 2019-06-18 |
CN109903339B CN109903339B (en) | 2021-03-05 |
Family
ID=66953909
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910235608.5A Active CN109903339B (en) | 2019-03-26 | 2019-03-26 | Video group figure positioning detection method based on multi-dimensional fusion features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109903339B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110675391A (en) * | 2019-09-27 | 2020-01-10 | 联想(北京)有限公司 | Image processing method, apparatus, computing device, and medium |
CN111488834A (en) * | 2020-04-13 | 2020-08-04 | 河南理工大学 | Crowd counting method based on multi-level feature fusion |
CN111491180A (en) * | 2020-06-24 | 2020-08-04 | 腾讯科技(深圳)有限公司 | Method and device for determining key frame |
CN113610056A (en) * | 2021-08-31 | 2021-11-05 | 的卢技术有限公司 | Obstacle detection method, obstacle detection device, electronic device, and storage medium |
CN114255384A (en) * | 2021-12-14 | 2022-03-29 | 广东博智林机器人有限公司 | Method and device for detecting number of people, electronic equipment and storage medium |
CN114494999A (en) * | 2022-01-18 | 2022-05-13 | 西南交通大学 | Double-branch combined target intensive prediction method and system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140307917A1 (en) * | 2013-04-12 | 2014-10-16 | Toyota Motor Engineering & Manufacturing North America, Inc. | Robust feature fusion for multi-view object tracking |
CN107341471A (en) * | 2017-07-04 | 2017-11-10 | 南京邮电大学 | A kind of Human bodys' response method based on Bilayer condition random field |
CN108038867A (en) * | 2017-12-22 | 2018-05-15 | 湖南源信光电科技股份有限公司 | Fire defector and localization method based on multiple features fusion and stereoscopic vision |
CN108229319A (en) * | 2017-11-29 | 2018-06-29 | 南京大学 | The ship video detecting method merged based on frame difference with convolutional neural networks |
CN108399435A (en) * | 2018-03-21 | 2018-08-14 | 南京邮电大学 | A kind of video classification methods based on sound feature |
CN108846446A (en) * | 2018-07-04 | 2018-11-20 | 国家新闻出版广电总局广播科学研究院 | The object detection method of full convolutional network is merged based on multipath dense feature |
CN108898078A (en) * | 2018-06-15 | 2018-11-27 | 上海理工大学 | A kind of traffic sign real-time detection recognition methods of multiple dimensioned deconvolution neural network |
CN109472298A (en) * | 2018-10-19 | 2019-03-15 | 天津大学 | Depth binary feature pyramid for the detection of small scaled target enhances network |
CN109508686A (en) * | 2018-11-26 | 2019-03-22 | 南京邮电大学 | A kind of Human bodys' response method based on the study of stratification proper subspace |
-
2019
- 2019-03-26 CN CN201910235608.5A patent/CN109903339B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140307917A1 (en) * | 2013-04-12 | 2014-10-16 | Toyota Motor Engineering & Manufacturing North America, Inc. | Robust feature fusion for multi-view object tracking |
CN107341471A (en) * | 2017-07-04 | 2017-11-10 | 南京邮电大学 | A kind of Human bodys' response method based on Bilayer condition random field |
CN108229319A (en) * | 2017-11-29 | 2018-06-29 | 南京大学 | The ship video detecting method merged based on frame difference with convolutional neural networks |
CN108038867A (en) * | 2017-12-22 | 2018-05-15 | 湖南源信光电科技股份有限公司 | Fire defector and localization method based on multiple features fusion and stereoscopic vision |
CN108399435A (en) * | 2018-03-21 | 2018-08-14 | 南京邮电大学 | A kind of video classification methods based on sound feature |
CN108898078A (en) * | 2018-06-15 | 2018-11-27 | 上海理工大学 | A kind of traffic sign real-time detection recognition methods of multiple dimensioned deconvolution neural network |
CN108846446A (en) * | 2018-07-04 | 2018-11-20 | 国家新闻出版广电总局广播科学研究院 | The object detection method of full convolutional network is merged based on multipath dense feature |
CN109472298A (en) * | 2018-10-19 | 2019-03-15 | 天津大学 | Depth binary feature pyramid for the detection of small scaled target enhances network |
CN109508686A (en) * | 2018-11-26 | 2019-03-22 | 南京邮电大学 | A kind of Human bodys' response method based on the study of stratification proper subspace |
Non-Patent Citations (2)
Title |
---|
TAN FEIGANG等: "Person Re-Identification Based on Multi-Level and Multi-Feature Fusion", 《2017 INTERNATIONAL CONFERENCE ON SMART CITY AND SYSTEMS ENGINEERING (ICSCSE)》 * |
李贺: "基于卷积神经网络特征共享与目标检测的跟踪算法研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110675391A (en) * | 2019-09-27 | 2020-01-10 | 联想(北京)有限公司 | Image processing method, apparatus, computing device, and medium |
CN110675391B (en) * | 2019-09-27 | 2022-11-18 | 联想(北京)有限公司 | Image processing method, apparatus, computing device, and medium |
CN111488834A (en) * | 2020-04-13 | 2020-08-04 | 河南理工大学 | Crowd counting method based on multi-level feature fusion |
CN111488834B (en) * | 2020-04-13 | 2023-07-04 | 河南理工大学 | Crowd counting method based on multi-level feature fusion |
CN111491180A (en) * | 2020-06-24 | 2020-08-04 | 腾讯科技(深圳)有限公司 | Method and device for determining key frame |
CN113610056A (en) * | 2021-08-31 | 2021-11-05 | 的卢技术有限公司 | Obstacle detection method, obstacle detection device, electronic device, and storage medium |
CN113610056B (en) * | 2021-08-31 | 2024-06-07 | 的卢技术有限公司 | Obstacle detection method, obstacle detection device, electronic equipment and storage medium |
CN114255384A (en) * | 2021-12-14 | 2022-03-29 | 广东博智林机器人有限公司 | Method and device for detecting number of people, electronic equipment and storage medium |
CN114494999A (en) * | 2022-01-18 | 2022-05-13 | 西南交通大学 | Double-branch combined target intensive prediction method and system |
CN114494999B (en) * | 2022-01-18 | 2022-11-15 | 西南交通大学 | Double-branch combined target intensive prediction method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109903339B (en) | 2021-03-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109903339A (en) | A kind of video group personage's position finding and detection method based on multidimensional fusion feature | |
Jia et al. | Detection and segmentation of overlapped fruits based on optimized mask R-CNN application in apple harvesting robot | |
CN110598029B (en) | Fine-grained image classification method based on attention transfer mechanism | |
CN111898406B (en) | Face detection method based on focus loss and multitask cascade | |
CN110472627A (en) | One kind SAR image recognition methods end to end, device and storage medium | |
CN108805070A (en) | A kind of deep learning pedestrian detection method based on built-in terminal | |
CN110458844A (en) | A kind of semantic segmentation method of low illumination scene | |
CN110084173A (en) | Number of people detection method and device | |
CN110147743A (en) | Real-time online pedestrian analysis and number system and method under a kind of complex scene | |
CN109961034A (en) | Video object detection method based on convolution gating cycle neural unit | |
CN109583425A (en) | A kind of integrated recognition methods of the remote sensing images ship based on deep learning | |
CN107818302A (en) | Non-rigid multi-scale object detection method based on convolutional neural network | |
CN108010049A (en) | Split the method in human hand region in stop-motion animation using full convolutional neural networks | |
CN106951867A (en) | Face identification method, device, system and equipment based on convolutional neural networks | |
CN109559300A (en) | Image processing method, electronic equipment and computer readable storage medium | |
CN108961675A (en) | Fall detection method based on convolutional neural networks | |
CN106649487A (en) | Image retrieval method based on interest target | |
CN110490177A (en) | A kind of human-face detector training method and device | |
CN109508360A (en) | A kind of polynary flow data space-time autocorrelation analysis method of geography based on cellular automata | |
CN107945153A (en) | A kind of road surface crack detection method based on deep learning | |
CN108447080A (en) | Method for tracking target, system and storage medium based on individual-layer data association and convolutional neural networks | |
CN106599800A (en) | Face micro-expression recognition method based on deep learning | |
CN109635812B (en) | The example dividing method and device of image | |
CN109902558A (en) | A kind of human health deep learning prediction technique based on CNN-LSTM | |
CN107730515A (en) | Panoramic picture conspicuousness detection method with eye movement model is increased based on region |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |