[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110428447B - Target tracking method and system based on strategy gradient - Google Patents

Target tracking method and system based on strategy gradient Download PDF

Info

Publication number
CN110428447B
CN110428447B CN201910638477.5A CN201910638477A CN110428447B CN 110428447 B CN110428447 B CN 110428447B CN 201910638477 A CN201910638477 A CN 201910638477A CN 110428447 B CN110428447 B CN 110428447B
Authority
CN
China
Prior art keywords
target
response
tracking
template
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910638477.5A
Other languages
Chinese (zh)
Other versions
CN110428447A (en
Inventor
殷海兵
王康豪
黄晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910638477.5A priority Critical patent/CN110428447B/en
Publication of CN110428447A publication Critical patent/CN110428447A/en
Application granted granted Critical
Publication of CN110428447B publication Critical patent/CN110428447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking method and system based on a strategy gradient, and belongs to the field of computer vision. The method comprises the following steps: (1) inputting the target image into a convolutional neural network to obtain a target appearance template Z; (2) inputting the search image into a convolutional neural network, and waiting for a search area feature map; (3) calculating the template image Z and the search area characteristic graph through a similarity measurement function f to obtain a response graph ht; (4) inputting the response image ht and the historical response image hi obtained in the step (3) into a policy network, and selecting the action with the highest score to be added into the set Ct (i is 1-N); (5) repeating (4) until each historical response map in the response map template pool is traversed; and finally, executing the action with the largest occurrence number in the set Ct (i equals to 1 to N). The system includes a tracker and a decision maker. Wrong template updating is avoided, and the target can be found and found again in time when the target is lost.

Description

Target tracking method and system based on strategy gradient
Technical Field
The invention relates to the field of computer vision, in particular to a target tracking method and system based on a strategy gradient.
Background
Visual Object Tracking (VOT) is one of the most challenging problems in the field of computer vision. The method has wide application in video monitoring, man-machine interaction and automatic driving. Despite significant advances in VOT technology over the last several decades, it still faces significant challenges where severe occlusion, severe illumination changes and distortions, etc., may cause tracking failures.
Visual target tracking algorithms can be mainly classified into two categories: a generate class method and a discriminate class method. The generation-based method generally constructs a model from a target region of a current frame, and then searches for a region most similar to the model in a next frame, which is known as kalman filtering, particle filtering, mean shift, and the like. The discrimination class method, also known as detection tracking, learns a discrimination model to distinguish between a target region and a surrounding background region. The difference between the two methods is that the detection tracking method utilizes machine learning to train the classifier, and background information is used in the training process. Thus, the classifier can focus on identifying the foreground and background, and therefore, the discrimination class method is generally better than the generation class method.
Among the discrimination-like methods, a method based on a Discrimination Correlation Filter (DCF) is known for its high efficiency and high accuracy. By using the discrete fourier transform and cyclic shift of the training samples, the DCF-based modified KCF tracker can run at 292fps on a single CPU, far exceeding the real-time requirement. In recent years, research on DCF has been successful by using multi-feature channels, scale estimation, and reduction of boundary effects. However, as the accuracy of the DCF-like tracker increases, its speed also drops sharply.
In recent years, tracking algorithms based on Convolutional Neural Networks (CNN) have attracted attention for their excellent performance. Unlike conventional tracking algorithms, CNN-based algorithms use deep convolution features rather than manual features, which makes them show superior results over multiple tracking benchmarks. While these CNN-based trackers are superior in performance, these approaches either use a simple online update strategy or never update the initial appearance template, relying only on the powerful characterization capabilities of the trained CNN. This may be effective for interference free short term tracking. However, once severe occlusion or significant appearance change occurs, the tracker drifts into the background, thereby losing the target. And these methods also lack an effective means to re-detect the target after it is lost.
Therefore, the invention provides a strategy gradient-based target tracking algorithm, which recognizes unreliable tracking results by learning an effective strategy through a strategy gradient algorithm in reinforcement learning, and takes measures to prevent wrong template updating and detect lost targets again.
The closest prior art:
[1]Tracking-Learning-Detection
[2]Long-term correlation tracking
[3]Large Margin Object Tracking with Circulant Feature Maps
[4]Reliable Re-detection for Long-term Tracking
[5]Tracking as Online Decision-Making:Learning a Policy from Streaming Videos with Reinforcement Learning
in the above target tracking technology, the method [1-4] solves the problems of template updating and re-detection by means of manual strategy design, and such methods generally have a fixed mathematical formula to calculate the tracking confidence level, and only update the tracking model when the tracking confidence level is higher. However, this kind of method has certain limitations due to the fixed parameter formula, and cannot perfectly adapt to different tracking sequences. Method [5] learns a policy through the Q-learning algorithm to decide when to update the target appearance template and whether to search the entire image globally. However, this method uses a single response diagram to represent the state, and does not consider the response diversity of different tracking sequences, so that the tracking result cannot be accurately estimated to make a reliable decision. In addition to this, the global search severely impacts the speed of the algorithm.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a target tracking method based on strategy gradients, which can accurately identify unreliable tracking results and determine when to update an appearance template and whether to detect again or not so as to avoid wrong template update and timely find and find out the target again when the target is lost.
A strategy gradient-based target tracking method comprises the following steps:
(1) inputting the target image into a convolutional neural network to obtain a target appearance template Z;
(2) inputting the search image into a convolutional neural network, and waiting for a search area feature map;
(3) comparing the template image Z with a candidate area with the same size of the search area feature map, and calculating through a similarity measurement function f to obtain a response map ht; the similarity measurement function f is a Siamese network:
Figure BDA0002130277770000031
wherein
Figure BDA0002130277770000032
Is a convolution embedding function, which represents the cross-correlation of two feature maps, and b is an offset. After a target appearance template Z and a search area feature map are input, a function f generates a response map ht;
(4) inputting the response image ht and the historical response image hi obtained in the step (3) into a policy network together to obtain the regularization score of each decision-making action; the historical response graphs hi (i is 1-N) come from a response graph template pool, N historical response graphs are stored in the response graph template pool, and each historical response graph corresponds to the latest good tracking result; then, the action with the highest score is selected and added to the set Ct (i ═ 1 to N);
(5) repeating the step (4) until each historical response map in the response map template pool is traversed; and finally, executing the action with the largest occurrence number in the set Ct (i equals to 1 to N).
Further, the policy network includes a state stAction a, learning strategy pi and reward RtState s oftRepresented as a tuple (h)i,ht) Action a includes updating, tracking and redetecting, the reward RtAwarding a prize R according to the overlap ratio of the current bounding box to the objecttThe learning strategy pi optimizes the depth strategy network by using gradient descent:
Δθ=α▽θlogπθ(at|st)Rτ (2)
wherein R isτRepresenting the return of the whole process, during the training process, the action samples are extracted from the policy network, and then the reward R is given through the evaluation of the selected actiontThe strategy is optimized by updating parameters using the reward information to maximize the desired reward, resulting in a trained strategy network.
Further, in the training process, the tracking process of one frame is regarded as the whole process, and a back propagation algorithm is executed by using formula (2), and a return function of the back propagation algorithm is defined as follows:
Figure BDA0002130277770000033
Figure BDA0002130277770000034
Figure BDA0002130277770000035
wherein, the Intersection-over-Union (IOU) represents the overlapping rate of the predicted box b and the real box g;
the strategy network consists of two 516-dimensional full-connection layers and an output layer, wherein the output layer outputs three actions of updating, tracking and re-detecting, and each full-connection layer is initialized randomly and is subjected to ReLU and batch regularization processing; the whole algorithm trains 200 cycles on an object tracking reference (OTB) data set, and each cycle is finished after an agent interacts with all training images; for each cycle, after 8192 samples are collected, the policy network starts learning.
Furthermore, after the strategy network learns every time, the updated strategy network continues to sample for the next learning, and the learning rate is 10 in the whole training process-6Down to 10-8And a batch size of 64 is used.
Further, if the maximum action in the execution set Ct is update, the update target position Pt is Ptp, which is the predicted position of the current target, and the response map ht is added to the response map template pool, one old response map in the response map template pool is discarded, and the target appearance template Z is updated with the current target position information.
Further, if the maximum action in the execution set Ct is tracking, the target position Pt is updated to Ptp, which is the predicted position of the current target.
Further, if the most action in the execution set Ct is re-detection, a search area where one target is most likely to appear is obtained through a particle filter, a response map htc of the search area is calculated, a predicted target position Ptc of the re-detection area is obtained, ht ═ htc, Ptp ═ Ptc is updated, and then the predicted target position Ptc is input to the policy network again for decision-making.
Further, in the re-detection process, the particle filter draws candidate search regions where M targets are most likely to appear, and for each candidate search region, the tracking network is reused to calculate a response map, and then an optimal candidate search region is selected through the confidence score:
Ci=max(fi)·cos(γ||Pi-Pt||)
wherein f isiIs a response map of the ith candidate search region, PiAnd PtIs the ith candidate search area and the center position of the target in the previous frame, and γ is a predefined distance penalty parameter.
Further, when the re-detection is performed twice or more in one frame, the re-detection result is discarded and the initial tracking result is used.
A target tracking system based on strategy gradient comprises a tracker and a decision maker; the tracker calculates a target appearance template Z and a search area characteristic graph through a similarity measurement function f to obtain a response graph ht; the similarity measurement function f is a Siamese network:
Figure BDA0002130277770000041
wherein
Figure BDA0002130277770000042
Is a convolution embedding function, which represents the cross-correlation of two feature maps, and b is an offset. After a target appearance template Z and a search area feature map are input, a function f generates a response map ht; the response map ht and the historical response map hi form the state s of the trackertThe tracker selects an action a according to a strategy pi given by the decision maker;
the decision maker is a trained strategy network and converts the state s of the trackertInputting the data into a policy network to obtain the regularization score of each decision-making action; the historical response graphs hi (i is 1-N) are from a response graph template pool, N historical response graphs are stored in the response graph template pool, and each historical response graph represents the latest good tracking result; then, the action with the highest score is selected and added to the set Ct (i ═ 1 to N); traversing each historical response map in the response map template pool; and finally, executing the action with the largest occurrence number in the set Ct (i is 1 to N) as a decision result of the decision maker.
The strategy gradient-based target tracking technology provided by the invention learns a strategy network through a strategy gradient algorithm in reinforcement learning, the strategy network can accurately identify unreliable tracking results, then through executing corresponding decision actions, wrong template updating is avoided, and a target can be found and found again in time when the target is lost, so that the difficulties in the target tracking technologies such as shielding, deformation and the like are effectively solved, the tracking precision and robustness are greatly improved, and meanwhile, higher speed is kept. Experiments prove that the method provided by the invention improves the performance by 5-6% in the original tracking frame.
Drawings
FIG. 1 is a general framework diagram of a strategy gradient-based object tracking technique;
FIG. 2 is a block flow diagram of a policy gradient-based target tracking technique;
FIG. 3 is a distance accuracy run result of an OTB-50 reference dataset;
FIG. 4 shows the results of OTB-50 baseline dataset overlap success rate operation;
FIG. 5 is a distance accuracy operation result of the OTB-100 reference data set;
FIG. 6 shows the results of the OTB-100 baseline data set overlap success rate operation.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings of the specification.
As shown in fig. 1 and 2:
(1) based on a SimFC tracking algorithm framework, cutting out an area where a first frame target of a video sequence is located, and zooming the area to a fixed size to obtain a target image; and inputting the target image into a convolutional neural network to obtain a target appearance template Z, wherein the size of the template image is 127 multiplied by 127.
(2) Based on a SimFC tracking algorithm framework, in a t frame, a corresponding area around the central position of a target in a t-1 frame is used as a search area, a search image X is obtained by cutting, the size of the search image is 255 multiplied by 255, the search image X is also zoomed to a fixed size and is input into a convolutional neural network, and a search area characteristic diagram is obtained.
(3) The template image Z is compared with candidate regions of the same size in the search region feature map. If the two image blocks describe the same target, the similarity measurement function f returns a high score, and in fact, the similarity measurement function f is a siamese network in depth.
Figure BDA0002130277770000061
Wherein
Figure BDA0002130277770000062
Is a convolution embedding function, which represents the cross-correlation of two feature maps, and b is an offset. After the target appearance template Z and the search area feature map are input, the function f generates a 33 × 33 response map ht.
The structure is about the full convolution of the search image X, the target appearance template Z in the step (1) is used as a convolution kernel to be convolved with the search area feature map in the step (2), so that a response map ht is obtained, and the position of the maximum value in the response map represents the central position of the target to be tracked. From the response map ht, the position of the target to be tracked can be preliminarily predicted, which allows the similarity function for all translated sub-windows in the search image to be calculated in one evaluation.
(4) And (4) inputting the response map ht obtained in the step (3) and the historical response map hi into a policy network together to obtain the regularization score of each decision-making action. The historical response maps hi (i is 1 to N) come from responses, and N historical response maps are stored in the response map template pool, and each historical response map corresponds to the latest good tracking result. Then, the action with the highest score is selected to be added to the set (i ═ 1 to N) for further reliable decision making.
The problem of reinforcement learning for policy networks can be generally viewed as a Markov Decision Process (MDP) in which agents interact with the environment through states, actions and rewards. In the tracking problem, the tracker is treated as a proxy. Given a state stThe agent must select an action a according to policy pi. After performing this action, the agent will give a positive or negative reward R based on the current bounding box to object overlap ratio IOUt. By maximizing the expected reward, the agent learns an optimal strategy to take action.
Strategy gradient algorithm learning strategy pi optimizes the depth strategy network by using gradient descent:
Δθ=α▽θlogπθ(at|st)Rτ (2)
wherein R isτRepresenting the return of the whole process. During the training process, action samples are drawn from the policy network and then awarded through evaluation of the selected action. With this reward information, the strategy can be optimized by updating the parameters to maximize the desired reward.
The selectable actions are tracking, updating and re-detecting respectively. The update and tracking actions determine whether the tracker uses the predicted location information to update the appearance template of the target. When the re-detection action is executed, the re-detection module draws the candidate search areas where the M targets are most likely to appear through the particle filter around the previous target positions. For each candidate search region, the tracking network is reused to calculate a response map, and then a best candidate search region is selected by the confidence score:
Ci=max(fi)·cos(γ||Pi-Pt||) (3)
wherein f isiIs a response map of the ith candidate search region, PiAnd PtIs the ith candidate search area and the center position of the target in the previous frame, and γ is a predefined distance penalty parameter. Similarly, the position of the re-detection target is determined by the maximum value of the response map of the best candidate search area. Finally, the response map of the best candidate search area is input into the policy network again to check the reliability of the re-detection result. When the re-detection is performed twice or more in one frame, the re-detection result is discarded and the initial tracking result is used.
State stCan be represented as a tuple (h)i,ht),hiIs a historical response graph of a good tracking result, htIs the response map of the current frame. In previous approaches, only a single h was typically usedtTo describe the state st. However, because of the uncertainty of the tracking problem, the confidence of a response map may fluctuate in different sequences. For example, when a response graph shows a failed trace result in video a, but at the same time it may show a successful trace result in another video B that contains more challenging factors. So the present invention incorporates the current response graph htAnd historical response graph hiTo evaluate the reliability of the tracking result. In a sense, a policy network can be viewed as a similarity metric function, which yields hiAnd htThe similarity between the two is adopted so as to judge whether the current tracking result is good or badAnd performing further actions.
In the training process, regarding the tracking process of one frame as the whole process, a back propagation algorithm is executed using formula (2), and the reward function is defined as follows:
Figure BDA0002130277770000071
Figure BDA0002130277770000072
Figure BDA0002130277770000073
wherein, the overlap rate of the predicted frame b and the real frame g is represented by interaction-over-Union (IOU), which reflects the credibility of the tracking result of the given frame.
The policy network consists of two 516-dimensional fully-connected layers and one output layer, which outputs 3 actions. Each fully connected layer is initialized randomly and subjected to ReLU and batch regularization. The entire algorithm is trained on an Object Tracking Baseline (OTB) dataset for 200 cycles, each cycle ending after the agent interacts with all the training images. For each cycle, after 8192 samples are collected, the policy network starts learning. After each learning, the updated policy network will continue to sample for the next learning. The learning rate is 10 in the whole training process-6Down to 10-8And a batch size of 64 is used.
(5) And (4) repeating the step (4) until each historical response map in the response map template pool is traversed. And finally executing the action with the largest occurrence number in the set (i equals to 1 to N). If the decision action is an updating action, updating the target position and the appearance template according to the prediction result; if the decision is made as a tracking action, updating the location but not the appearance template according to the prediction result; if the action is decided as the re-detection action, the target position and the appearance template are not updated according to the prediction result, and the lost target is searched by using the re-detection module.
By adopting the tracking method, the OTB-50 and OTB-100 reference data sets shown in figure 3 are processed, and the processing results are shown in figures 3 to 6, so that the tracking method and the tracking system improve the performance by 5 to 6 percent.
A target tracking system based on strategy gradient is characterized by comprising a tracker and a decision maker; the tracker calculates a target appearance template Z and a search area characteristic graph through a similarity measurement function f to obtain a response graph ht; the similarity measurement function f is a Siamese network:
Figure BDA0002130277770000081
wherein
Figure BDA0002130277770000082
Is a convolution embedding function, which represents the cross-correlation of two feature maps, and b is an offset. After a target appearance template Z and a search area feature map are input, a function f generates a response map ht; the response map ht and the historical response map hi form the state s of the trackertThe tracker selects an action a according to a strategy pi given by the decision maker;
the decision maker is a trained strategy network and converts the state s of the trackertInputting the data into a policy network to obtain the regularization score of each decision-making action; the historical response graphs hi (i is 1-N) come from a response graph template pool, N historical response graphs are stored in the response graph template pool, and each historical response graph corresponds to the latest good tracking result; then, the action with the highest score is selected and added to the set Ct (i ═ 1 to N); traversing each historical response map in the response map template pool; and finally, executing the action with the largest occurrence number in the set Ct (i is 1 to N) as a decision result of the decision maker.

Claims (10)

1. A strategy gradient-based target tracking method is characterized by comprising the following steps:
(1) inputting the target image into a convolutional neural network to obtain a target appearance template Z;
(2) inputting the search image into a convolutional neural network to obtain a search area characteristic diagram;
(3) comparing the template image Z with a candidate area with the same size of the search area feature map, and calculating through a similarity measurement function f to obtain a response map ht; the similarity measurement function f is a Siamese network:
Figure FDA0003493375710000011
wherein
Figure FDA0003493375710000012
Is a convolution embedding function, which represents the cross correlation of two characteristic graphs, b is an offset; after a target appearance template Z and a search area feature map are input, a function f generates a response map ht;
(4) inputting the response image ht and the historical response image hi obtained in the step (3) into a policy network together to obtain the regularization score of each decision-making action; the historical response graphs hi (i is 1-N) come from a response graph template pool, N historical response graphs are stored in the response graph template pool, and each historical response graph corresponds to the latest good tracking result; then, the action with the highest score is selected and added to the set Ct (i ═ 1 to N);
(5) repeating the step (4) until each historical response map in the response map template pool is traversed; and finally, executing the action with the largest occurrence number in the set Ct (i equals to 1 to N).
2. The method of claim 1, wherein the policy network comprises a state stAction a, learning strategy pi and reward RtState s oftRepresented as a tuple (h)i,ht) Action a includes updating, tracking and redetecting, the reward RtAwarding a prize R according to the overlap ratio of the current bounding box to the objecttBy passingGradient descent algorithm to optimize the policy network:
Figure FDA0003493375710000013
wherein R isτRepresenting the return of the whole process, during the training process, extracting action samples from the strategy network, and then giving a reward R through the evaluation of the selected actiontThe strategy is optimized by updating parameters using the reward information to maximize the desired reward, resulting in a trained strategy network.
3. The method according to claim 2, wherein the training process considers a frame tracking process as a whole process, and a back propagation algorithm is performed using formula (2), and a reward function is defined as follows:
Figure FDA0003493375710000021
Figure FDA0003493375710000022
Figure FDA0003493375710000023
wherein, the Intersection-over-Unit, i.e. IOU represents the overlapping rate of the prediction box b and the real box g;
the strategy network consists of two 516-dimensional full-connection layers and an output layer, wherein the output layer outputs three actions of updating, tracking and re-detecting, and each full-connection layer is initialized randomly and is subjected to ReLU and batch regularization processing; the whole algorithm trains 200 periods on a target tracking reference OTB data set, and each period is finished after an agent interacts with all training images; for each cycle, after 8192 samples are collected, the policy network starts learning.
4. The method as claimed in claim 3, wherein after each learning of the policy network, the updated policy network continues to be sampled for the next learning, and the learning rate is 10 throughout the training process-6Down to 10-8And a batch size of 64 is used.
5. The method of claim 1, wherein if the most actions in the execution set Ct are update, the target position Pt is updated to Ptp, which is the predicted position of the current target, and the response map ht is added to the response map template pool, one old response map in the response map template pool is discarded, and the target appearance template Z is updated with the current target position information.
6. The method according to claim 1, wherein if the most action in the execution set Ct is tracking, the target position Pt is updated to Ptp, which is the predicted position of the current target.
7. The method of claim 1, wherein if the most actions in the execution set Ct are re-detection, a search area where a target is most likely to appear is obtained through a particle filter, a response map htc of the search area is calculated, a predicted target position Ptc of the re-detection area is obtained, ht — htc and Ptp — Ptc are updated, and then the predicted target position Ptc is input to the policy network again for decision-making.
8. The method of claim 7, wherein in the re-detection process, the particle filter maps out the most likely candidate search regions for the M targets, and for each candidate search region, the tracking network is reused to calculate the response map, and then a best candidate search region is selected by the confidence score:
Ci=max(fi)·cos(γ||Pi-Pt||) (3)
wherein f isiIs a response map of the ith candidate search region, PiAnd PtIs the ith candidate search area and the center position of the target in the previous frame, and γ is a predefined distance penalty parameter.
9. The method of claim 7, wherein when the re-detection is performed twice or more in a frame, the re-detection result is discarded and the initial tracking result is adopted.
10. A target tracking system based on strategy gradient is characterized by comprising a tracker and a decision maker; inputting a target image into a convolutional neural network to obtain a target appearance template Z, inputting a search image into the convolutional neural network to obtain a search area characteristic diagram, and calculating the target appearance template Z and the search area characteristic diagram by the tracker through a similarity measurement function f to obtain a response diagram ht; the similarity measurement function f is a Siamese network:
Figure FDA0003493375710000031
wherein
Figure FDA0003493375710000032
A convolution embedding function represents the cross correlation of two feature graphs, b is an offset, and after a target appearance template Z and a search area feature graph are input, a response graph ht is generated by a function f; the response map ht and the historical response map hi form the state s of the trackertThe tracker selects an action a according to a strategy pi given by the decision maker;
the decision maker is a trained strategy network and converts the state s of the trackertInputting into a policy network to obtain each decision actionMaking a regularization score; the historical response graphs hi (i is 1-N) come from a response graph template pool, N historical response graphs are stored in the response graph template pool, and each historical response graph corresponds to the latest good tracking result; then, the action with the highest score is selected and added to the set Ct (i ═ 1 to N); traversing each historical response map in the response map template pool; and finally, executing the action with the largest occurrence number in the set Ct (i is 1 to N) as a decision result of the decision maker.
CN201910638477.5A 2019-07-15 2019-07-15 Target tracking method and system based on strategy gradient Active CN110428447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910638477.5A CN110428447B (en) 2019-07-15 2019-07-15 Target tracking method and system based on strategy gradient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910638477.5A CN110428447B (en) 2019-07-15 2019-07-15 Target tracking method and system based on strategy gradient

Publications (2)

Publication Number Publication Date
CN110428447A CN110428447A (en) 2019-11-08
CN110428447B true CN110428447B (en) 2022-04-08

Family

ID=68409608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910638477.5A Active CN110428447B (en) 2019-07-15 2019-07-15 Target tracking method and system based on strategy gradient

Country Status (1)

Country Link
CN (1) CN110428447B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021142571A1 (en) * 2020-01-13 2021-07-22 深圳大学 Twin dual-path target tracking method
CN117765031B (en) * 2024-02-21 2024-05-03 四川盎芯科技有限公司 Image multi-target pre-tracking method and system for edge intelligent equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846358A (en) * 2018-06-13 2018-11-20 浙江工业大学 Target tracking method for feature fusion based on twin network
CN109543559A (en) * 2018-10-31 2019-03-29 东南大学 Method for tracking target and system based on twin network and movement selection mechanism
CN109636829A (en) * 2018-11-24 2019-04-16 华中科技大学 A kind of multi-object tracking method based on semantic information and scene information
CN109784155A (en) * 2018-12-10 2019-05-21 西安电子科技大学 Visual target tracking method, intelligent robot based on verifying and mechanism for correcting errors
CN109859241A (en) * 2019-01-09 2019-06-07 厦门大学 Adaptive features select and time consistency robust correlation filtering visual tracking method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9965717B2 (en) * 2015-11-13 2018-05-08 Adobe Systems Incorporated Learning image representation by distilling from multi-task networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846358A (en) * 2018-06-13 2018-11-20 浙江工业大学 Target tracking method for feature fusion based on twin network
CN109543559A (en) * 2018-10-31 2019-03-29 东南大学 Method for tracking target and system based on twin network and movement selection mechanism
CN109636829A (en) * 2018-11-24 2019-04-16 华中科技大学 A kind of multi-object tracking method based on semantic information and scene information
CN109784155A (en) * 2018-12-10 2019-05-21 西安电子科技大学 Visual target tracking method, intelligent robot based on verifying and mechanism for correcting errors
CN109859241A (en) * 2019-01-09 2019-06-07 厦门大学 Adaptive features select and time consistency robust correlation filtering visual tracking method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Asynchronous Methods for Deep Reinforcement Learning;Volodymyr Mnih 等;《arXiv》;20160616;1-19 *
Deep Reinforcement Learning Based Optimal Trajectory Tracking Control of Autonomous Underwater Vehicle;Runsheng Yu 等;《第36届中国控制会议论文集(D)》;20170728;4958-4965 *
Tracking as Online Decision-Making:Learning a Policy from Streaming Videos with Reinforcement Learning;James Supanˇciˇc 等;《arXiv》;20170717;1-11 *
一种自适应占空比的目标跟踪策略;沈伟华 等;《南昌大学学报(理科版)》;20150225;第39卷(第1期);39-49 *
基于值函数和策略梯度的深度强化学习综述;刘建伟 等;《计算机学报》;20181022;第42卷(第6期);1406-1438 *

Also Published As

Publication number Publication date
CN110428447A (en) 2019-11-08

Similar Documents

Publication Publication Date Title
CN111401201B (en) Aerial image multi-scale target detection method based on spatial pyramid attention drive
CN109146921B (en) Pedestrian target tracking method based on deep learning
CN108470355B (en) Target tracking method fusing convolution network characteristics and discriminant correlation filter
CN110120064B (en) Depth-related target tracking algorithm based on mutual reinforcement and multi-attention mechanism learning
CN111462191B (en) Non-local filter unsupervised optical flow estimation method based on deep learning
CN113052873B (en) Single-target tracking method for on-line self-supervision learning scene adaptation
CN107424177A (en) Positioning amendment long-range track algorithm based on serial correlation wave filter
CN111612817A (en) Target tracking method based on depth feature adaptive fusion and context information
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
CN113902991A (en) Twin network target tracking method based on cascade characteristic fusion
CN114332157B (en) Long-time tracking method for double-threshold control
CN110428447B (en) Target tracking method and system based on strategy gradient
CN112233145A (en) Multi-target shielding tracking method based on RGB-D space-time context model
CN115761393B (en) Anchor-free target tracking method based on template online learning
CN112686326A (en) Target tracking method and system for intelligent sorting candidate frame
CN106485283B (en) A kind of particle filter pedestrian target tracking based on Online Boosting
Li et al. Fish trajectory extraction based on object detection
CN114627156A (en) Consumption-level unmanned aerial vehicle video moving target accurate tracking method
CN115953570A (en) Twin network target tracking method combining template updating and trajectory prediction
CN113192110A (en) Multi-target tracking method, device, equipment and storage medium
CN116958057A (en) Strategy-guided visual loop detection method
CN116385915A (en) Water surface floater target detection and tracking method based on space-time information fusion
CN111915648B (en) Long-term target motion tracking method based on common sense and memory network
CN116168060A (en) Deep twin network target tracking algorithm combining element learning
CN116245913A (en) Multi-target tracking method based on hierarchical context guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant