[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109903339A - A kind of video group personage's position finding and detection method based on multidimensional fusion feature - Google Patents

A kind of video group personage's position finding and detection method based on multidimensional fusion feature Download PDF

Info

Publication number
CN109903339A
CN109903339A CN201910235608.5A CN201910235608A CN109903339A CN 109903339 A CN109903339 A CN 109903339A CN 201910235608 A CN201910235608 A CN 201910235608A CN 109903339 A CN109903339 A CN 109903339A
Authority
CN
China
Prior art keywords
video
feature
detection
personage
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910235608.5A
Other languages
Chinese (zh)
Other versions
CN109903339B (en
Inventor
陈志�
掌静
岳文静
周传
陈璐
刘玲
任杰
周松颖
江婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201910235608.5A priority Critical patent/CN109903339B/en
Publication of CN109903339A publication Critical patent/CN109903339A/en
Application granted granted Critical
Publication of CN109903339B publication Critical patent/CN109903339B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The present invention discloses a kind of video group personage's position finding and detection method of multidimensional fusion feature.Multi-layer video features figure is extracted in the invention first, establish the semantic information that top-down and bottom-up binary feature treatment channel sufficiently excavates video, then fusion multi-layer video features figure obtains multidimensional fusion feature, grab video candidate target, last parallel processing candidate target position returns and category classification, completes video group personage's detection and localization.The present invention obtains video semanteme information abundant by fusion multi-layer feature, while carrying out multitask predicted operation, effectively promotes the speed of group personage detection and localization, has good accuracy rate and implementation.

Description

A kind of video group personage's position finding and detection method based on multidimensional fusion feature
Technical field
It is especially a kind of special based on multidimensional fusion the present invention relates to the interleaving techniques such as computer vision, pattern-recognition field Video group personage's position finding and detection method of sign.
Background technique
With the development of video acquisition and image processing techniques, video group personage's detection and localization is computer vision One popular research direction in field, is with a wide range of applications, and it is also more high-rise computer vision problems Basis, such as dense population monitoring, social semantic analysis etc..
The task definition of video group personage's detection and localization is not difficult matter for human eye, mainly by different colours The perceptual positioning of block, the position for sorting out target person, but what is handled for computer is rgb matrix, such as What regional location where being partitioned into group personage in scene, and reducing influence of the background area to detection and localization is part hardly possible Thing.
The bounding box recurrence of the development experience of video group personage detection and localization algorithm, deep neural network rise, more ginsengs It examines window development, difficult sample excavation and focusing and multiple dimensioned multiport detects the progress of these great-leap-forward technologies, according to algorithm Core can be divided into two types, and one is the detection and localization algorithm based on traditional-handwork feature, another kind is based on depth The detection and localization algorithm of habit.Before 2013, traditional craft is based primarily upon to the detection and localization of personage in video or image Feature is limited by feature description and computing capability, and computer vision research personnel try one's best to design the detection of diversification Algorithm makes up deficiency of the hand-designed feature in image feature representation ability, and using exquisite calculation method to detection model Accelerate, reduces space-time consumption.In the manual feature detection algorithm for occurring several representatives among these, Viola-Jones detector, HOG detector, deformable part model detector.
With the rise of deep neural network, the detection model based on deep learning overcomes the detection of traditional-handwork feature and calculates Method describes limited disadvantage to feature, the expression of automatic learning characteristic from big data, wherein including thousands of parameter, needle New effective character representation can be obtained quickly by training study to new application scenarios, the detection mould based on deep learning Type, which is broadly divided into, is nominated based on region and based on end-to-end both direction.Detection model based on region nomination is first to be detected Image selects a large amount of region candidate frame, may include the target to be detected in these candidate frames, then extract each candidate frame Feature obtains feature vector, and characteristic of division vector obtains classification information, finally carries out position and returns to obtain corresponding coordinate information. Candidate frame extraction is given up based on detection end to end, directly feature extraction, candidate frame is returned and classification is placed on a convolution It is completed in network.
Since group's personage's behavior has the feature of communality and diversity, be interactive interpersonal behavior and people and The set of behavior interaction between environment, is mutually blocked or people so person to person easily occurs in group personage behavior generating process With mutually blocking for object, in addition when video imaging the factors such as illumination variation interference, the existing detection based on deep learning In the detection process character positions can cannot be accurately positioned because of these disturbing factors in model, or even cause personage's missing inspection.
Summary of the invention
Goal of the invention: in group's personage's scene, due to existing simultaneously multiple personages, in order to efficiently locate detection Group personage needs to carry out each personage accurately feature and describes.The existing detection model based on deep learning is usually adopted Monohierarchy top layer video features are used to return as detection foundation although top layer video features include video semanteme abundant Character positions out are relatively rough.In recent years, also there are some detection models using multi-layer fusion video features, these Although the video features fusion bottom video feature of model is only used during Fusion Features with promoting Detection accuracy Unidirectional fusion structure, this will lead to the characteristic information that each hierarchy characteristic figure only includes current level and more high-level, no The mapping result that all hierarchy characteristics can be embodied, to prevent testing result from being optimal.To overcome the shortcomings of existing technologies, The present invention proposes a kind of video group personage's position finding and detection method based on multidimensional fusion feature, and this method extracts multi-level view Frequency feature merges multi-level video features using two-way treatment channel and forms multidimensional fusion feature, can effectively utilize all The characteristic information of level obtains video semanteme information abundant, to more comprehensively be retouched to the character features in video It states, while parallel progress multitask predicted operation, effectively promotes the speed of group personage detection and localization, there is good accuracy rate And implementation.
Technical solution: to achieve the above object, technical solution proposed by the present invention are as follows:
A kind of video group personage's position finding and detection method based on multidimensional fusion feature includes the steps that sequence executes (1) To (8):
(1) video as training sample is inputted, the kind of object and position in video it is known that carry out video big frame by frame The size of each frame video frame is uniformly scaled H × W size by small normalization, and H indicates video frame height, and W indicates video frame Width;
(2) it is obtained using InceptionV3 model frame by frame to by step (1) treated, video carries out feature extraction The characteristics of image of each level of video forms multi-layer video features figure F ', F '={ Fi' | i=1,2 ..., numF }, Fi' indicate I-th tomographic image feature, numF indicate the total number of plies of video image characteristic extracted, F1' indicate underlying image feature, F 'numFIt indicates Top layer images feature;
(3) the multi-layer video features figure F ' carry out Fusion Features operation to being drawn into, includes the steps that successively executing (3- 1) to (3-4):
(3-1) increases by one from F 'numFTo F1' fusion channel, to the figure F ' progress of multi-layer video features from top-level feature Downward Fusion Features obtain top-down video features figure Ftop-down;The method of Fusion Features are as follows: since top layer images feature F′numFStart, traverses each tomographic image feature F downwardsi', to Fi' carry out convolution kernel successively as conv1, step-length stride1's Convolution operation and upSample1Up-sampling operation again, obtainsIt finally obtains
(3-2) increase by one fromIt arrivesFusion channel, it is rightIt carries out upward from low-level image feature Fusion Features, obtain bottom-up video features figure Fbottom-up, Indicate bottom-up video features figure Fbottom-upThe i-th tomographic image feature;The method of Fusion Features are as follows:
A. i=1 is initialized;
B. it calculatesIt is rightProgress convolution kernel is conv2, step-length stride2Convolution Operation, obtains resultIt calculates
C. i=i+1 is updated;
D. circulation executes step b to c, until i > numF is obtained after circulation terminates:
(3-3) is to bottom-up video features figure Fbottom-upIn each tomographic image featureCarrying out convolution kernel is conv3, step-length stride3Convolution operation, obtained result is denoted as Fi, obtained all FiConstitute multidimensional fusion feature figure F, F={ Fi| i=1,2 ..., numF };
(4) by multidimensional fusion feature figure F input area candidate network, K detection target is exported, obtains target position set Box={ Boxj| j=1,2 ..., K } and corresponding personage's Making by Probability Sets Person={ Personj| j=1,2 ..., K }, it is described BoxjIndicate the position of j-th of detection target, PersonjIndicate that j-th of detection target is the probability of personage, Personj∈ [0, 1], PersonjThe bigger expression detection target of value be personage a possibility that it is bigger;
(5) classified according to Person to detection target, the real border frame position that K detection target is arranged is PPerson={ PPersonj| j=1,2 ..., K }, calculate group personage classification loss function Losscls, calculation formula isWherein, PPersonjIndicate the true classification of j-th of detection target, PPersonj Value is 0 or 1, PPersonj=0 indicates that the detection target is not personage, PPersonj=1 indicates that the detection target is personage;
(6) according to Box and Person regressive object position, the actual position of K detection target is set are as follows:
BBox={ BBoxj| j=1,2 ..., K }
Calculate group's character positions loss function are as follows:
Wherein, BBoxjIndicate the actual position of j-th of detection target;
(7) group personage detection and localization penalty values Loss, calculation formula Loss=Loss are calculatedcls+λLosslocIf Loss≤Lossmax, then region candidate network is trained finishes, output area candidate network parameter, executes step (8);If Loss > Lossmax, then each layer of update area candidate network of parameterThen return step (4), again Carry out person detecting;LossmaxIt is preset crowd's detection and localization maximum loss value, λ is that position returns and human classification task Balance factor, α are the learning rates of stochastic gradient descent method,Indicate the local derviation of group personage detection and localization loss function Number;
(8) video to be detected is reacquired, video to be detected is successively normalized, feature extraction and feature Fusion, obtains the multidimensional fusion feature figure F of video to be detectednew, by FnewThe trained region candidate net of input step (7) Network obtains the group personage detection and localization result in new video.
Further, in the step (1), H=720, W=1280.
Further, in the step (2), numF=4.
Further, in the step (3), conv1=1, stride1=1, upSample1=2, conv2=3, stride2=2, conv3=1, stride3=1.
Further, in the step (4), K=12;In the step (7), Lossmax=0.5, λ=1, α= 0.0001。
The utility model has the advantages that the invention adopts the above technical scheme compared with prior art, have following technical effect that
The present invention extracts the video presentation of video multi-layer, carries out binary feature processing, merges multi-layer video features figure Multidimensional fusion feature is obtained, video candidate target is grabbed, parallel processing candidate target position returns and category classification, completes video Group's personage's detection and localization.The present invention obtains video semanteme information abundant by fusion multi-layer feature, while carrying out more Business predicted operation, effectively promotes the speed of group personage detection and localization, has good accuracy rate and implementation, specifically:
(1) present invention establishes top-down and bottom-up binary feature treatment channel, sufficiently excavates the semanteme of video Information improves hierarchy characteristic utilization rate.
(2) present invention fusion multidimensional video feature organically combines position accurately low-level image feature and semantic top layer abundant Feature can preferably improve detection accuracy.
(3) the multiple prediction tasks of parallel processing of the present invention, and task balance factor is set, be conducive to be built according to scene characteristic Found optimum detection model.
Detailed description of the invention
Fig. 1 is the video group personage position finding and detection method process based on multidimensional fusion feature;
Fig. 2 is the structure chart of one of present invention region candidate network;
Fig. 3 is distinct methods Detection accuracy comparison diagram.
Specific embodiment
Technical solution of the present invention is described in further detail in the following with reference to the drawings and specific embodiments:
Embodiment 1: Fig. 1 is video group personage's position finding and detection method based on multidimensional fusion feature that the present embodiment proposes Flow chart, specifically includes the following steps:
One, pre-process: video of the input as training sample, kind of object and position in video it is known that video by Frame carries out size normalization, the size of each frame video frame is uniformly scaled H × W size, H indicates video frame height, W table Show video frame width;This step is equivalent to pretreatment, is conducive to subsequent detection, in the present embodiment, H=720, W=1280.
Two, feature extraction: using InceptionV3 model frame by frame to video carries out feature by step (1) treated It extracts, obtains the characteristics of image of each level of video, form multi-layer video features figure F ', F '={ Fi' | i=1,2 ..., NumF }, Fi' indicating the i-th tomographic image feature, numF indicates the total number of plies of video image characteristic extracted, F1' indicate bottom layer image Feature, F 'numFIndicate top layer images feature, in the present embodiment, numF=4.
Low-level image feature target position information is accurate, can return out the detailed location data of target, but can characterize Semantic information is fewer, and data volume is big, and operation processing, which is got up, needs to occupy a large amount of space-time consumption.Although top-level feature is wrapped The semanteme contained is abundant, but because of Multilevel method, target position is relatively coarse, and the target semanteme returned out is not fine and smooth, in group In personage's scene, it be easy to cause erroneous judgement.The feature of each level has respective advantage and disadvantage, in order to take out in group's personage's scene Accurately group personage location information is taken out, using the characteristics of image of InceptionV3 model extraction video multi-layer, is formed more Hierarchy characteristic figure.It the use of the reason of InceptionV3 model is that this Feature Selection Model is not only functional in this step, And there is powerful calculated performance, convenient for processing later.
Three, Fusion Features: the multi-layer video features figure F ' carry out Fusion Features operation to being drawn into, including successively execute The step of (3-1) to (3-4):
(3-1) increases by one from F 'numFTo F1' fusion channel, to the figure F ' progress of multi-layer video features from top-level feature Downward Fusion Features obtain top-down video features figure Ftop-down;The method of Fusion Features are as follows: since top layer images feature F′numFStart, traverses each tomographic image feature F downwardsi', to Fi' carry out convolution kernel successively as conv1, step-length stride1's Convolution operation and upSample1Up-sampling operation again, obtainsIt finally obtains
(3-2) increase by one fromIt arrivesFusion channel, to Ftop-downIt carries out upward from low-level image feature Fusion Features, obtain bottom-up video features figure Fbottom-up, Indicate bottom-up video features figure Fbottom-upThe i-th tomographic image feature;The method of Fusion Features are as follows:
A. i=1 is initialized;
B. it calculatesIt is rightProgress convolution kernel is conv2, step-length stride2Convolution Operation, obtains resultIt calculates
C. i=i+1 is updated;
D. circulation executes step b to c, until i > numF is obtained after circulation terminates:
(3-3) is to bottom-up video features figure Fbottom-upIn each tomographic image featureCarrying out convolution kernel is conv3, step-length stride3Convolution operation, obtained result is denoted as Fi, obtained all FiConstitute multidimensional fusion feature figure F, F={ Fi| i=1,2 ..., numF }.
In step 3, conv1=1, stride1=1, upSample1=2, conv2=3, stride2=2, conv3=1, stride3=1.
The fusion of multilayer feature is not simply to be added, and first has to consider whether the size of hierarchy characteristic is consistent, The secondary reasonability for needing to consider hierarchy characteristic fusion can or can not lead to the case where detection effect reduces instead after merging.The present invention Improvement and design, every layer of the top-down structure spy comprising current layer and higher have been carried out to existing Feature fusion Reference breath, can be directly used every layer of optimal size and is detected, in order to embody the mapping result of all hierarchy characteristics To be optimal detection effect, bottom-up channel is especially increased, Opposite direction connection is carried out to top-down processing result, more Bottom position information is efficiently utilized, and convolution is carried out to each fusion results using convolution operation finally, on eliminating The aliasing effect of sampling.
Four, region candidate network training:
Region candidate network is a kind of common target detection network, and main functional module is as shown in Fig. 2, its meeting first Needs of the k rectangular window to adapt to different size objectives are generated for each pixel of sliding window, then by each rectangle The location information of window and corresponding characteristics of image input network, carry out classification layer respectively for each rectangular window and return layer Operation.Classification layer mainly differentiates the probability in current rectangle window there are personage, and parameter includes personage's weight parameter WPAnd background Interference parameter WE.It returns layer and mainly obtains coordinate information of the current rectangle window in full scale drawing picture, parameter includes rectangular window Mouthful coordinate and the high offset weighting parameter W of widthx、Wy、WhAnd Ww.In whole region candidate network training process, all ginsengs are shared Several settings and adjustment.
The training process of region candidate network is as follows:
Multidimensional fusion feature figure F input area candidate network is exported K detection target, herein K=12, therefore by (4-1) Obtain target position set Box={ Boxj| j=1,2 ..., 12 } and corresponding personage's Making by Probability Sets Person={ Personj|j =1,2 ..., 12 }, the BoxjIndicate the position of j-th of detection target, PersonjIndicate that j-th of detection target is personage's Probability, Personj∈ [0,1], PersonjThe bigger expression detection target of value be personage a possibility that it is bigger;
(4-2) classifies to detection target according to Person, and the real border frame position that 12 detection targets are arranged is PPerson={ PPersonj| j=1,2 ..., 12 }, calculate group personage classification loss function Losscls, calculation formula isWherein, PPersonjIndicate the true classification of j-th of detection target, PPersonj Value is 0 or 1, PPersonj=0 indicates that the detection target is not personage, PPersonj=1 indicates that the detection target is personage;
The actual position of 12 detection targets is arranged according to Box and Person regressive object position in (4-3) are as follows:
BBox={ BBoxj| j=1,2 ..., 12 }
Calculate group's character positions loss function are as follows:
Wherein, BBoxjIndicate the actual position of j-th of detection target;
(4-4) calculates group personage detection and localization penalty values Loss, calculation formula Loss=Losscls+λLosslocIf Loss≤Lossmax, then region candidate network is trained finishes, output area candidate network parameter, executes step (8);If Loss > Lossmax, then each layer of update area candidate network of parameterThen return step (4), again Carry out person detecting;LossmaxIt is preset crowd's detection and localization maximum loss value, λ is that position returns and human classification task Balance factor, α are the learning rates of stochastic gradient descent method,Indicate the local derviation of group personage detection and localization loss function It counts, Loss in the present embodimentmax=0.5, λ=1, α=0.0001.
Five, video to be detected is detected using trained region candidate network:
Video to be detected is reacquired, video to be detected is successively normalized, feature extraction and feature are melted It closes, obtains the multidimensional fusion feature figure F of video to be detectednew, by FnewThe trained region candidate network of input step (7), Obtain the group personage detection and localization result in new video.Using area candidate network carries out target detection, it is contemplated that people from group The feature that object number is more, task is complicated, it is parallel to carry out position recurrence and category classification operation, improve detection efficiency.In classification During classification, because it is personage that detection is with clearly defined objective, classification two is divided to for two class of personage and non-personage, reduce detection other The time of classification waste, and true classification results are incorporated, improve the accuracy of category classification.It is put back into during returning in place, In order to simplify calculating process, the other target position of figure kind, recurrence task of refining only are returned.During integrally training, add Enter task balance factor, according to scene type, adjusts optimal task ratio, complete video group personage's detection and localization.
Six, experiment simulation
During test method performance, currently common object detection method Faster-RCNN, FPN and Mask- are selected Method, evaluation criterion are detection Detection accuracies under different IoU threshold values and different sizes to RCNN as a comparison.So-called IoU is Refer to friendship and the ratio of testing result and legitimate reading, IoU ∈ [0,1], IoU value is higher, the result of detection closer to legitimate reading, Remember that IoU >=0.5 is AP_50 in test process, note IoU >=0.75 is AP_75.In evaluation procedure, by target size be divided into it is small, In, big three classifications, be denoted as AP_S, AP_M, AP_L respectively.Fig. 3 give the present invention with control methods Faster-RCNN, The Detection accuracy comparison diagram of FPN, Mask-RCNN.From experimental result it can be found that with monohierarchy top-level feature is used only Faster-RCNN is compared, and has been used three kinds of methods of multi-layer fusion feature to obtain higher Detection accuracy, has been illustrated multilayer Grade fusion feature has stronger feature representation ability compared to monohierarchy top-level feature.FPN and Mask-RCNN are in characteristic processing During check configuration be used only carry out fusion treatment, the present invention obtains more accurate detection using two-way treatment channel Effect, experimental result also show this patent method and have obtained more preferably detection accurately for different IoU threshold values and target size Rate.
The above is only a preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (5)

1. a kind of video group personage's position finding and detection method based on multidimensional fusion feature, which is characterized in that executed including sequence The step of (1) to (8):
(1) video as training sample is inputted, the kind of object and position in video are returned it is known that carrying out size frame by frame to video The size of each frame video frame is uniformly scaled H × W size by one processing, and H indicates video frame height, and W indicates that video frame is wide Degree;
(2) video is obtained frame by frame to video carries out feature extraction by step (1) treated using InceptionV3 model The characteristics of image of each level forms multi-layer video features figure F ', F '={ Fi' | i=1,2 ..., numF }, Fi' indicate i-th Tomographic image feature, numF indicate the total number of plies of video image characteristic extracted, F1' indicate underlying image feature, F 'numFIndicate top Tomographic image feature;
(3) the multi-layer video features figure F ' carry out Fusion Features operation to being drawn into, includes the steps that successively executing (3-1) extremely (3-4):
(3-1) increases by one from F 'numFTo F1' fusion channel, it is downward from top-level feature to multi-layer video features figure F ' progress Fusion Features, obtain top-down video features figure Ftop-down;The method of Fusion Features are as follows: since top layer images feature F′numFStart, traverses each tomographic image feature F downwardsi', to Fi' carry out convolution kernel successively as conv1, step-length stride1's Convolution operation and upSample1Up-sampling operation again, obtainsFinally obtain Ftop-down={ Fi top-down| i=1, 2 ..., numF };
(3-2) increases by one from F1 top-downIt arrivesFusion channel, to Ftop-downCarry out the spy upward from low-level image feature Sign fusion, obtains bottom-up video features figure Fbottom-up, Fbottom-up={ Fi bottom-up| i=1,2 ..., numF }, Fi bottom-upIndicate bottom-up video features figure Fbottom-upThe i-th tomographic image feature;The method of Fusion Features are as follows:
A. i=1 is initialized;
B. F is calculatedi bottom-up=Fi top-down, to Fi bottom-upProgress convolution kernel is conv2, step-length stride2Convolution behaviour Make, obtains resultIt calculates
C. i=i+1 is updated;
D. circulation executes step b to c, until i > numF is obtained after circulation terminates:
Fbottom-up={ Fi bottom-up| i=1,2 ..., numF }
(3-3) is to bottom-up video features figure Fbottom-upIn each tomographic image feature Fi bottom-upCarrying out convolution kernel is conv3, step-length stride3Convolution operation, obtained result is denoted as Fi, obtained all FiConstitute multidimensional fusion feature figure F, F={ Fi| i=1,2 ..., numF };
(4) by multidimensional fusion feature figure F input area candidate network, K detection target is exported, obtains target position set Box ={ Boxj| j=1,2 ..., K } and corresponding personage's Making by Probability Sets Person={ Personj| j=1,2 ..., K }, the Boxj Indicate the position of j-th of detection target, PersonjIndicate that j-th of detection target is the probability of personage, Personj∈ [0,1], PersonjThe bigger expression detection target of value be personage a possibility that it is bigger;
(5) classified according to Person to detection target, the real border frame position that K detection target is arranged is PPerson ={ PPersonj| j=1,2 ..., K }, calculate group personage classification loss function Losscls, calculation formula isWherein, PPersonjIndicate the true classification of j-th of detection target, PPersonj Value is 0 or 1, PPersonj=0 indicates that the detection target is not personage, PPersonj=1 indicates that the detection target is personage;
(6) according to Box and Person regressive object position, the actual position of K detection target is set are as follows:
BBox={ BBoxj| j=1,2 ..., K }
Calculate group's character positions loss function are as follows:
Wherein, BBoxjIndicate the actual position of j-th of detection target;
(7) group personage detection and localization penalty values Loss, calculation formula Loss=Loss are calculatedcls+λLosslocIf Loss≤ Lossmax, then region candidate network is trained finishes, output area candidate network parameter, executes step (8);If Loss > Lossmax, then each layer of update area candidate network of parameterThen return step (4), re-start Person detecting;LossmaxIt is preset crowd's detection and localization maximum loss value, λ is the balance of position recurrence and human classification task The factor, α are the learning rates of stochastic gradient descent method,Indicate the partial derivative of group personage detection and localization loss function;
(8) video to be detected is reacquired, video to be detected is successively normalized, feature extraction and feature are melted It closes, obtains the multidimensional fusion feature figure F of video to be detectednew, by FnewThe trained region candidate network of input step (7), Obtain the group personage detection and localization result in new video.
2. a kind of video group personage's position finding and detection method based on multidimensional fusion feature according to claim 1, special Sign is, in the step (1), H=720, W=1280.
3. a kind of video group personage's position finding and detection method based on multidimensional fusion feature according to claim 1, special Sign is, in the step (2), numF=4.
4. a kind of video group personage's position finding and detection method based on multidimensional fusion feature according to claim 1, special Sign is, in the step (3), conv1=1, stride1=1, upSample1=2, conv2=3, stride2=2, conv3 =1, stride3=1.
5. a kind of video group personage's position finding and detection method based on multidimensional fusion feature according to claim 1, special Sign is, in the step (4), K=12;In the step (7), Lossmax=0.5, λ=1, α=0.0001.
CN201910235608.5A 2019-03-26 2019-03-26 Video group figure positioning detection method based on multi-dimensional fusion features Active CN109903339B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910235608.5A CN109903339B (en) 2019-03-26 2019-03-26 Video group figure positioning detection method based on multi-dimensional fusion features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910235608.5A CN109903339B (en) 2019-03-26 2019-03-26 Video group figure positioning detection method based on multi-dimensional fusion features

Publications (2)

Publication Number Publication Date
CN109903339A true CN109903339A (en) 2019-06-18
CN109903339B CN109903339B (en) 2021-03-05

Family

ID=66953909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910235608.5A Active CN109903339B (en) 2019-03-26 2019-03-26 Video group figure positioning detection method based on multi-dimensional fusion features

Country Status (1)

Country Link
CN (1) CN109903339B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110675391A (en) * 2019-09-27 2020-01-10 联想(北京)有限公司 Image processing method, apparatus, computing device, and medium
CN111488834A (en) * 2020-04-13 2020-08-04 河南理工大学 Crowd counting method based on multi-level feature fusion
CN111491180A (en) * 2020-06-24 2020-08-04 腾讯科技(深圳)有限公司 Method and device for determining key frame
CN113610056A (en) * 2021-08-31 2021-11-05 的卢技术有限公司 Obstacle detection method, obstacle detection device, electronic device, and storage medium
CN114255384A (en) * 2021-12-14 2022-03-29 广东博智林机器人有限公司 Method and device for detecting number of people, electronic equipment and storage medium
CN114494999A (en) * 2022-01-18 2022-05-13 西南交通大学 Double-branch combined target intensive prediction method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140307917A1 (en) * 2013-04-12 2014-10-16 Toyota Motor Engineering & Manufacturing North America, Inc. Robust feature fusion for multi-view object tracking
CN107341471A (en) * 2017-07-04 2017-11-10 南京邮电大学 A kind of Human bodys' response method based on Bilayer condition random field
CN108038867A (en) * 2017-12-22 2018-05-15 湖南源信光电科技股份有限公司 Fire defector and localization method based on multiple features fusion and stereoscopic vision
CN108229319A (en) * 2017-11-29 2018-06-29 南京大学 The ship video detecting method merged based on frame difference with convolutional neural networks
CN108399435A (en) * 2018-03-21 2018-08-14 南京邮电大学 A kind of video classification methods based on sound feature
CN108846446A (en) * 2018-07-04 2018-11-20 国家新闻出版广电总局广播科学研究院 The object detection method of full convolutional network is merged based on multipath dense feature
CN108898078A (en) * 2018-06-15 2018-11-27 上海理工大学 A kind of traffic sign real-time detection recognition methods of multiple dimensioned deconvolution neural network
CN109472298A (en) * 2018-10-19 2019-03-15 天津大学 Depth binary feature pyramid for the detection of small scaled target enhances network
CN109508686A (en) * 2018-11-26 2019-03-22 南京邮电大学 A kind of Human bodys' response method based on the study of stratification proper subspace

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140307917A1 (en) * 2013-04-12 2014-10-16 Toyota Motor Engineering & Manufacturing North America, Inc. Robust feature fusion for multi-view object tracking
CN107341471A (en) * 2017-07-04 2017-11-10 南京邮电大学 A kind of Human bodys' response method based on Bilayer condition random field
CN108229319A (en) * 2017-11-29 2018-06-29 南京大学 The ship video detecting method merged based on frame difference with convolutional neural networks
CN108038867A (en) * 2017-12-22 2018-05-15 湖南源信光电科技股份有限公司 Fire defector and localization method based on multiple features fusion and stereoscopic vision
CN108399435A (en) * 2018-03-21 2018-08-14 南京邮电大学 A kind of video classification methods based on sound feature
CN108898078A (en) * 2018-06-15 2018-11-27 上海理工大学 A kind of traffic sign real-time detection recognition methods of multiple dimensioned deconvolution neural network
CN108846446A (en) * 2018-07-04 2018-11-20 国家新闻出版广电总局广播科学研究院 The object detection method of full convolutional network is merged based on multipath dense feature
CN109472298A (en) * 2018-10-19 2019-03-15 天津大学 Depth binary feature pyramid for the detection of small scaled target enhances network
CN109508686A (en) * 2018-11-26 2019-03-22 南京邮电大学 A kind of Human bodys' response method based on the study of stratification proper subspace

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TAN FEIGANG等: "Person Re-Identification Based on Multi-Level and Multi-Feature Fusion", 《2017 INTERNATIONAL CONFERENCE ON SMART CITY AND SYSTEMS ENGINEERING (ICSCSE)》 *
李贺: "基于卷积神经网络特征共享与目标检测的跟踪算法研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110675391A (en) * 2019-09-27 2020-01-10 联想(北京)有限公司 Image processing method, apparatus, computing device, and medium
CN110675391B (en) * 2019-09-27 2022-11-18 联想(北京)有限公司 Image processing method, apparatus, computing device, and medium
CN111488834A (en) * 2020-04-13 2020-08-04 河南理工大学 Crowd counting method based on multi-level feature fusion
CN111488834B (en) * 2020-04-13 2023-07-04 河南理工大学 Crowd counting method based on multi-level feature fusion
CN111491180A (en) * 2020-06-24 2020-08-04 腾讯科技(深圳)有限公司 Method and device for determining key frame
CN113610056A (en) * 2021-08-31 2021-11-05 的卢技术有限公司 Obstacle detection method, obstacle detection device, electronic device, and storage medium
CN113610056B (en) * 2021-08-31 2024-06-07 的卢技术有限公司 Obstacle detection method, obstacle detection device, electronic equipment and storage medium
CN114255384A (en) * 2021-12-14 2022-03-29 广东博智林机器人有限公司 Method and device for detecting number of people, electronic equipment and storage medium
CN114494999A (en) * 2022-01-18 2022-05-13 西南交通大学 Double-branch combined target intensive prediction method and system
CN114494999B (en) * 2022-01-18 2022-11-15 西南交通大学 Double-branch combined target intensive prediction method and system

Also Published As

Publication number Publication date
CN109903339B (en) 2021-03-05

Similar Documents

Publication Publication Date Title
CN109903339A (en) A kind of video group personage's position finding and detection method based on multidimensional fusion feature
Jia et al. Detection and segmentation of overlapped fruits based on optimized mask R-CNN application in apple harvesting robot
CN110598029B (en) Fine-grained image classification method based on attention transfer mechanism
CN111898406B (en) Face detection method based on focus loss and multitask cascade
CN110472627A (en) One kind SAR image recognition methods end to end, device and storage medium
CN108805070A (en) A kind of deep learning pedestrian detection method based on built-in terminal
CN110458844A (en) A kind of semantic segmentation method of low illumination scene
CN110084173A (en) Number of people detection method and device
CN110147743A (en) Real-time online pedestrian analysis and number system and method under a kind of complex scene
CN109961034A (en) Video object detection method based on convolution gating cycle neural unit
CN109583425A (en) A kind of integrated recognition methods of the remote sensing images ship based on deep learning
CN107818302A (en) Non-rigid multi-scale object detection method based on convolutional neural network
CN108010049A (en) Split the method in human hand region in stop-motion animation using full convolutional neural networks
CN106951867A (en) Face identification method, device, system and equipment based on convolutional neural networks
CN109559300A (en) Image processing method, electronic equipment and computer readable storage medium
CN108961675A (en) Fall detection method based on convolutional neural networks
CN106649487A (en) Image retrieval method based on interest target
CN110490177A (en) A kind of human-face detector training method and device
CN109508360A (en) A kind of polynary flow data space-time autocorrelation analysis method of geography based on cellular automata
CN107945153A (en) A kind of road surface crack detection method based on deep learning
CN108447080A (en) Method for tracking target, system and storage medium based on individual-layer data association and convolutional neural networks
CN106599800A (en) Face micro-expression recognition method based on deep learning
CN109635812B (en) The example dividing method and device of image
CN109902558A (en) A kind of human health deep learning prediction technique based on CNN-LSTM
CN107730515A (en) Panoramic picture conspicuousness detection method with eye movement model is increased based on region

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant