CN111724414B - Basketball motion analysis method based on 3D gesture estimation - Google Patents
Basketball motion analysis method based on 3D gesture estimation Download PDFInfo
- Publication number
- CN111724414B CN111724414B CN202010579582.9A CN202010579582A CN111724414B CN 111724414 B CN111724414 B CN 111724414B CN 202010579582 A CN202010579582 A CN 202010579582A CN 111724414 B CN111724414 B CN 111724414B
- Authority
- CN
- China
- Prior art keywords
- player
- action
- network
- gesture
- human body
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 36
- 230000009471 action Effects 0.000 claims abstract description 58
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000001514 detection method Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 9
- 230000002787 reinforcement Effects 0.000 claims description 20
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 16
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000000875 corresponding effect Effects 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000001133 acceleration Effects 0.000 claims description 4
- 230000007123 defense Effects 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000006978 adaptation Effects 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims description 2
- 230000006399 behavior Effects 0.000 claims 2
- 230000003993 interaction Effects 0.000 claims 2
- 238000005516 engineering process Methods 0.000 abstract description 3
- 230000000386 athletic effect Effects 0.000 abstract description 2
- 208000027418 Wounds and injury Diseases 0.000 abstract 1
- 230000002860 competitive effect Effects 0.000 abstract 1
- 230000006378 damage Effects 0.000 abstract 1
- 208000014674 injury Diseases 0.000 abstract 1
- 230000006870 function Effects 0.000 description 7
- 230000001186 cumulative effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001668 ameliorated effect Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000007430 reference method Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
Video analysis technology is increasingly used in basketball athletic activities, and can effectively help athletes or teams to improve competition results. The invention provides a basketball training auxiliary system method based on gesture estimation and motion recognition pair analysis. The basketball game and training video are analyzed by the basketball game system, a coach discovers weaknesses in the game through the analysis result of the basketball game system, the competitive level is improved, the game is prepared in advance in a targeted mode, teammates and game strategies are arranged according to the characteristics of opponents, the analysis result is fed back to athletes, and the athletes are helped to reduce the risk of injury. The system adopts a multi-view three-dimensional gesture detection method, acquires 3D gesture and position information of a player on a court through a plurality of video cameras containing depth information, recognizes the action of the player and tracks the track of the player, finally predicts and analyzes the action of the player, constructs a player movement gesture and track model, and recognizes the movement gesture on a competition field.
Description
Technical Field
The invention relates to the technical field of motion analysis, in particular to a basketball motion analysis method based on 3D motion video analysis.
Background
In the big data age, sports teams have collected many statistics about players to make corresponding decisions to improve performance of the team; for example, a new athlete may be employed or an optimal tactic may be designed for a particular game. In particular, in the basketball field, a strong need has arisen for player athletic data analysis, ranging from initial manual annotation of video to video analysis-based companies such as Second Spectrum and STATS that reduce the amount of manual analysis effort by extracting information from a set of ceiling cameras placed at a venue. However, automated analysis of video sequences and assessment of player and team ability is still an open subject due to many challenges present in sports videos, such as multi-person occlusion, similar appearance or too fast and unstable movements. In addition, the strategic arrangement of the team playing process based on the ability assessment results of the players is also an important topic in sports competition research.
Assessment of performance capabilities of vision-based athletes includes problems with human body posture estimation, motion recognition, motion-based event feature analysis, and the like. Many expert students have developed studies on basketball video event detection and annotation work and have proposed many ideas and methods. From the feature analysis method of single modes such as initial hearing, vision, text and the like to the video analysis method of multi-mode feature combination; from the detection method combining the field characteristics to the detection method utilizing the universal model, the basketball video event detection technology has been greatly developed. Human posture estimation is the basis for behavioral ability assessment. Most research focuses on human body pose estimation of common 2D video, including both skeleton estimation and human body surface estimation. Such methods lack an effective solution for human occlusion in a game, similar appearance of the player's clothing, and lack sufficient information for the spatial location of the player on the course. The invention adopts multi-view video acquisition, and provides a method for estimating the gesture of a player based on a skeleton in a 3D space by combining information in a color and a depth map based on deep learning. Therefore, the shielding problem among multiple people can be effectively solved, and the accurate player space position can be obtained.
The traditional human body posture estimation algorithm is difficult to accurately position the position of a player on a competition field in a competition, particularly team sports, and particularly the situation that multiple players quickly cross the competition in basketball games. The player detection and tracking method provided by the invention estimates the similarity of key points of human gesture and gesture to be detected in each frame so as to strengthen the tracking of a specific person.
The invention saves the action analysis result and the motion trail of each match of the player into a database; and the analysis module is used for analyzing and predicting the action frequency, the spatial position and the time-varying characteristics of the player to obtain the competition characteristics and the current competition state of the player.
The invention provides a basketball game coach auxiliary decision-making system, which adopts three-dimensional human body motion estimation based on a depth neural network, combines multi-view 3D human body motion estimation and space track tracking, and locates multiple people in a complex environment. And (3) performing a player motion prediction algorithm based on space time by adopting a space-time double-flow depth neural network to obtain information such as running speed, track, motion and the like of the player, and establishing player sport images by combining the score and physical condition of the player to study a basketball game field evaluation algorithm. Finally, auxiliary decision making system for basketball game coach
The invention relates to an end-to-end sports video analysis and auxiliary decision making system which comprises modules of video acquisition, human body gesture detection, action recognition, motion track tracking, player motion characteristic analysis, match reinforcement learning decision making, data visualization analysis and the like.
Disclosure of Invention
In order to solve the technical problems, the invention provides a basketball motion analysis method based on 3D gesture estimation, which predicts the full 3D human gesture from RGBD input based on a computer vision method.
According to the basketball motion analysis method based on 3D gesture estimation, the human gesture is estimated from RGBD input, the human motion is identified, and the personnel is tracked. Predicting the coordinates of key points of a 3D human body by collecting a depth map and a color map, and generating an action sequence according to the key points of the human body and the skeleton map as input and identifying the actions of a person by the idle action when in use; generating a human body detection frame according to human body key point coordinates, generating a human track sequence as input of a tracker, and analyzing a human motion track, wherein the method specifically comprises the following steps of:
s1, RGBD multi-view attitude estimation based on voxel network:
estimating human keypoints w= (w 1,.. wJ) from the depth map and color en-route, J keypoint coordinates for use in the real world; and in the coordinate system is the same as the color sensor frame.
Converting a depth map to a color map using camera calibration, and manipulating the warped depth map D ε R N×M ;
Color key point detector: the keypoint detector is applied to the color image I, which generates a score map s 2D ∈R N×M×J The position likelihood of the keypoints; the largest mapping S2D of scores corresponds to the predicted keypoint locationsThe weights were fixed using Open Pose Library;
voxel pose network: voxel grid V e R for a given deformed depth map D K×K×K Calculation using k=64; for this purpose, the depth map D is converted into 3D world coordinates of the point cloud and the computing grid centerFrom D->Calculation in the neighborhood +.>As a predictive 2D "neck" key point +.>
Wherein K represents an intrinsic calibration matrix camera andis a homogeneous coordinate; selecting dr verification from depth map>Surrounding neighbors three effective depth values; calculating V by setting the element as 1, wherein at least one point exists in the point cloud, and the other positions are replaced by 0; selectingThe resolution of the selected voxel grid is about 3 cm;
voxel pose network obtains V and a series of score maps s 2D As input, and processed using 3D convolution of some columns; will s 2D Tiling along the Z axis; voxel pose network estimation score map s 3D ∈R K×K×K×J As its corresponding 2D is the same;
w VPN predicting through a voxel gesture network; using w VPN Z component of (2) and predicted 2D keypoint p 2D Calculate another set of world coordinates w projected The method comprises the steps of carrying out a first treatment on the surface of the The precision of these coordinates in the x and y directions is not limited by the choice of K; confidence of 2D network prediction (S2D score on p) is derived from w projected And w VPN The final prediction w is selected;
s2, an action recognition algorithm based on space-time double flow:
based on the RNN and the CNN, the self-adaptive neural network is checked and named as VA-RNN and VA-CNN; transforming, at the VA-RNN, the skeleton to a new representation by a suitable view point under the RNN-based view adaptation subnet, and a skeleton transformed by the primary LSTM network from the identified actions; VA-CNN is adapted to a sub-network based on CNN view and a main convolution network; each network is trained end-to-end by optimizing classification performance;
s3, tracking multiple people:
a traditional single hypothesis tracking method adopting recursive Kalman filtering and frame-to-frame data association is adopted;
tracking processing and kalman filtering framework: the camera is not calibrated and no self-motion information is available;
the characteristic pyramid network is adopted: the feature pyramid network predicts from multiple scales; the input video is firstly subjected to downsampling processing with three scales of 1/32,1/16 and 1/8; upsampling with the minimum-sized feature map and merging from the second, skipped connected, minimum-sized feature map, as well as other scales; mapping the prediction head to three scales in the fusion characteristic; one prediction head comprises a plurality of stacked convolution layers and outputs a compact (6A+D) xHxW prediction graph; the dense prediction graph is divided into three parts: 1) The size of the box classification result is 2AxHxW; 2) Regression coefficient of box 4AxHxW; 3) The size of the dense embedded graph is DxHxW;
s4, track detection:
analyzing the action, the movement track and the attack and defense countermeasures of the player;
s5, calibrating the court video shot by the wide-angle camera to form a regular court;
s6, player trajectory analysis:
estimating the gesture of the player by adopting a body gesture estimation algorithm, judging the action of the player according to the gesture information and the time sequence, and automatically marking the position of the player doing the corresponding action after the action is identified; in the aspect of the track of the player, real-time tracking of the player is realized by combining the gesture information of the player through a multi-person tracking algorithm; and calculating the movement acceleration of the player according to the running track of the player.
S7, generation of competition decision
Basketball tactics decisions can be abstracted into multi-step decision problems in reinforcement learning, and the search space is huge. Under the limited condition of the current competition field command decision training data, the applicability of the strategy learned by direct imitation learning is not strong. The reason for this is that the sampled decision training sample trajectory has limited ability to generalize the strategy function learned by supervised learning because it is not possible to include all state spaces. Although increasing training time and computing power can compensate for this deficiency to some extent, the problem of weak predictive and generalization capability cannot be fundamentally ameliorated. The deep reinforcement learning algorithm has stronger prediction and generalization capability compared with a supervised learning method because the deep reinforcement learning algorithm can realize end-to-end self-learning from perception to decision control based on a deep neural network.
In practical multi-step reinforcement learning, it is quite difficult to design the return function, but the back-derivation of the return function from the example data provided by the expert helps solve the problem, which is the idea of reverse reinforcement learning (Inverse Reinforcement Learning, IRL). Reverse reinforcement learning is considered as an important means for enhancing learning acceleration. The tactical intelligent decision process based on reverse reinforcement learning is known to be a state space S and an action space A, and a relatively rich actual combat command decision example data set is accumulated, so that a deep neural network with multiple hidden layers can be trained to 'fit' a return function by utilizing a technical solution combining deep learning and reverse reinforcement learning, and the decision example data is optimally distributed in the return function environment; then, the optimal COA under the specific index can be solved by utilizing the return function; finally, aiming at different tasks, the tactical decision agility evaluation index is combined with the tactical decision agility evaluation index to form a reasonable tactical scheme. The nonlinear "return function" based on the deep neural network can be considered as an empirical determination of the real-time situation by the coach.
Abstracting the scene into one MDP. The player information to be trained is taken as an intelligent agent, the competition faced by the player is taken as an environment, and the character acquires the current state and rewards from the competition and makes corresponding actions. The state is used for representing the characteristic information of the current role, can be directly the original information of the current game, and can be used for extracting the characteristics by coding, or can be semantic characteristics extracted through the history game and training state of the player. After abstracting the game into one MDP, the players facing the game fight decision-assessment network.
The reinforcement learning method is used to solve tactical decision problems, and can be considered as a multi-step reinforcement learning process on continuous state space and discrete action space, and is generally described by Markov decision process (Markov Decision Process, MDP). The Markov decision process is a random dynamic system based on Markov assumptions, and can be expressed by four tuples (S, A, P, R), wherein S represents a state set (States); a represents an Action set (Action); p (s|s, a) represents the probability of transitioning to the s-state after action a is taken in state s; r (s, a) represents the cumulative return obtained by taking action a in state s; r (s, a) represents the immediate return obtained by taking action a in state s.
The goal of the MDP learning task is to find an optimal strategy to maximize the cumulative return. In deduction of tactical action scheme, the interactive process of action entity Agent and competition field environment, at each time step, agent obtains state st by observing environment, then executing a certain action at, and environment generates st+1 and rt of next step according to at. The task goal of reinforcement learning is to find the optimal strategy pi x (a|s) (state-to-action mapping) pi x (a|s) =p (at=a|st=s) in a given MDP-based tactical decision process. The optimality herein refers to the fact that the Agent has the greatest cumulative return on a tactical decision trajectory.
The invention relates to a basketball motion analysis method based on 3D gesture estimation, which further comprises the following steps:
for S2, scores for RNN and CNN may also be fused to provide a thick prediction, denoted as VA-fusion scheme.
The invention relates to a basketball motion analysis method based on 3D gesture estimation, which further comprises the following steps:
s6, combining with the gesture information of the player, realizing real-time tracking of the player, drawing the motion trail of the player on a corrected court model according to a tracking sequence, intuitively displaying the running trail of the player and a concentrated area of shooting, and analyzing the shooting hit rate and the optimal shooting position of the player in combination with the scoring condition;
the beneficial effects of the invention are as follows: the full three-dimensional human body pose is predicted from multi-view RGBD input. It is superior to the existing reference method. The method predicts the pose of a person first in a 2D presented color image. The depth network requires the 2D pose and depth map as inputs and estimates the full three-dimensional pose from this information. Based on the spatial position of the three-dimensional human body gesture and the action recognition result, and combining with historical data such as player scores, team scores and the like, a reinforcement learning method is adopted to generate a training competition tactical countermeasure auxiliary decision result.
Drawings
FIG. 1 is a schematic diagram of an architecture of the present invention;
FIG. 2 is a schematic illustration of the game pose estimation results of a player's Bitummy Blender;
FIG. 3 is a three-dimensional human body posture result diagram of the algorithm in actual calculation;
FIG. 4 is a schematic representation of the full-field motion profile of a player Tim Duncan (Tim Duncan);
FIG. 5 is a schematic illustration of the full-field motion profile of Sabloc France (Thabo Sefolosha);
FIG. 6 is a schematic top view of player trajectory tracking;
FIG. 7 is a schematic illustration of the distance travelled by a part of the player in 2019 CBA overall resolution;
FIG. 8 is a schematic view of the scatter points of a basketball shot;
FIG. 9 is a schematic illustration of a heat map of a basketball shot;
FIG. 10 is a sectional view of a basketball shot;
FIG. 11 is a schematic diagram of a multi-agent reinforcement learning multi-antibody policy framework
FIG. 12 is a schematic diagram of generating tactical decisions
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
Example 1
The object of the invention herein is to estimate three-dimensional human body gestures from RGBD inputs of a plurality of cameras, recognize human body actions and track persons. Predicting the coordinates of key points of a 3D human body by collecting a depth map and a color map, and generating an action sequence according to the key points of the human body and the skeleton map as input and identifying the actions of a person by the idle action when in use; generating a human body detection frame according to the human body key point coordinates, taking the detection frame as the input of a tracker, generating a human track sequence, and analyzing the human motion track; as in fig. 1.
1 human body pose estimation
The object of the present invention is to estimate human keypoints w= (w 1,.. wJ) from the depth map and color en-route, J keypoint coordinates for use in the real world. In order not to lose generality, in the coordinate system defined by the present invention, the predictions of the present invention are the same as the color sensor frame.
For Kinect, color and depthThe sensors are closely located but they are still two different cameras. The method of the invention requires collocating two frames of information. Thus, the present invention uses a camera calibration to convert the depth map to a color map. As a result, the method of the present invention operates on a deformed depth map D εR N×M . Due to the effect of occlusion, the depth map D is sparse in terms of resolution and noise differences.
1) Color key point detector: the keypoint detector is applied to the color image I, which generates a score map s 1D ∈R N×M×J The location likelihood of the keypoints. The largest mapping S2D of scores corresponds to the predicted keypoint locationsThe present invention uses Open Pose Library fixed weights, other 2D pose estimation methods such as those used in Alpha Pose Library may also be employed herein.
2) Voxel pose network: voxel grid V e R for a given deformed depth map D K×K×K Calculated using k=64. For this purpose, the depth map D is converted into a point cloud and the 3D world coordinates of the grid center are calculated according to the inventionThe invention is->In-neighbor calculationAs a predictive 2D "neck" key point +.>
Wherein K represents an intrinsic calibration matrix camera andare homogeneous coordinates. The invention selects dr from depth map for verificationSurrounding neighbors have three valid depth values. According to the invention, V is calculated by setting the element as 1, at least one point exists in the point cloud, and other positions are replaced by 0. The resolution of the voxel grid selected by the present invention is about 3 cm.
Voxel pose network obtains V and a series of score maps s 2D As input and process them using 3D convolution of some columns. The invention will s 2D Tiling along the Z-axis corresponds to an orthogonal projection approximation. Voxel pose network estimation score map s 3D ∈R K×K×K×J In a manner similar to the likelihood of a keypoint as its corresponding 2D identity
The invention uses the following tests to complete the final prediction of the invention: in one aspect, w VPN And predicting through the voxel gesture network. In another aspect, the present invention uses w VPN Z component of (2) and predicted 2D keypoint p 2D Calculate another set of world coordinates w projected . The accuracy in the x and y directions for these coordinates is not limited by the choice of K. The invention is based on confidence of 2D network prediction (score of S2D on p) from w projected And w VPN The final prediction w is selected.
The network architecture of fig. 2, 3, using VoxelPoseNet, has its encoder and decoder architecture motivated by the U-net using dense block encoders. In decoding the full resolution score map, the present invention includes a plurality of intermediate losses resulting from S3D.
2 motion recognition
The invention designs two kinds of view self-adaptive neural networks based on RNN and CNN, named VA-RNN and VA-CNN. As shown in fig. 3, at VA-RNN (as shown at the top), the skeleton is transformed to a new representation by a suitable view point under the RNN-based view-adaptation subnet, and the primary LSTM network is used to transform the skeleton from the identified actions. VA-CNN (as shown at the bottom) is fitted with a sub-network based on CNN views, as well as a main convolutional network (ConvNet). Each network is end-to-end trained by optimizing classification performance. Alternatively, the present invention may fuse scores from both networks to provide a thick prediction, denoted as VA-fusion scheme.
3 Multi-person tracking
The present invention employs a conventional single hypothesis tracking method of recursive kalman filtering and frame-to-frame data correlation. In the following sections, the core of the system will be described in more detail.
Tracking processing and kalman filtering frameworks are general techniques. The present invention assumes a very popular tracking scenario where the camera is not calibrated and no self-motion information is available.
The present invention employs a Feature Pyramid Network (FPN). FPN predicts from multiple scales, thus improving performance in multi-target pedestrian detection. The neural network used in JDE is briefly described in figure two. The input video is first downsampled by three scales 1/32,1/16, 1/8. Then, upsampling with the smallest-sized (semantically strongest) feature map and merging from the second, connected-skipped, smallest-sized feature map, as well as other scales. Finally, the prediction header maps to three scales at the fusion feature. One prediction head consists of several stacked convolutional layers and an output dense size (6A+D) xHxW prediction graph, where A is the number of anchor templates allocated to this scale and D is the embedded size. The dense prediction graph is divided into three parts (tasks):
1) Size of box classification result 2AxHxW
2) Regression coefficient of box 4AxHxW
3) The size of the dense embedded graph is DxHxW
4 track detection
Through the action recognition and track tracking of the player, the invention constructs a set of monitoring system for player tracking and analysis. In this set of systems, the invention will analyze the player's actions, trajectory, and attack and defense countermeasures.
4.1 course correction
Because the camera shoots basketball games, the basketball games are not directly shot from the upper part of the basketball games, and the basketball games cannot be directly shaped. Meanwhile, in order to intuitively show the movement track of the player, a regular court model is required. At this time, it is necessary to calibrate the course video shot by the wide-angle camera to form a regular course.
4.2 player trajectory analysis
Trajectory analysis
Currently, track analysis of a player is finished by manually selecting videos collected by a camera, for example: the SportVU is software, a multi-angle multi-camera dynamic tracking technology is used for capturing the movement condition of a player in a court, 25 pictures are captured every second, and the time is recorded and automatically analyzed by a computer. Through processing, the data collected by the camera provides a very rich statistical database for analysis of the elements such as the movement speed, the movement distance, the distance between players, the ball control and the like. However, it is basically implemented by means of data experts in screening data and analyzing data. Although most specialists initially find the most objective data method, after all, the specialists are also people, and the preference is probably unavoidable. There are also different specialists who have their own ways to understand basketball, exaggerate the importance of certain data, and misuse certain irrelevant data, all with the possibility of drawing erroneous conclusions. Meanwhile, the 5 positions of the NBA are not divided into details, and the most objective conclusion cannot be obtained on some specific events.
5 tactical decisions
Abstracting the scene into one MDP. As shown in FIG. 12, we use the information of the player to be trained as an agent, the game to which the player faces as an environment, and the character obtains the current state and rewards from the game and makes corresponding actions. The state is used for representing the characteristic information of the current role, can be directly the original information of the current game, and can be used for extracting the characteristics by coding, or can be semantic characteristics extracted through the history game and training state of the player. After abstracting the game into one MDP, the players facing the game fight decision-assessment network.
The reinforcement learning method is used to solve tactical decision problems, and can be considered as a multi-step reinforcement learning process on continuous state space and discrete action space, and is generally described by Markov decision process (Markov Decision Process, MDP). The Markov decision process is a random dynamic system based on Markov assumptions, and can be expressed by four tuples (S, A, P, R), wherein S represents a state set (States); a represents an Action set (Action); p (s|s, a) represents the probability of transitioning to the s-state after action a is taken in state s; r (s, a) represents the cumulative return obtained by taking action a in state s; r (s, a) represents the immediate return obtained by taking action a in state s.
The goal of the MDP learning task is to find an optimal strategy to maximize the cumulative return. In deduction of tactical action scheme, the interactive process of action entity Agent and competition field environment, at each time step, agent obtains state st by observing environment, then executing a certain action at, and environment generates st+1 and rt of next step according to at. The task goal of reinforcement learning is to find the optimal strategy pi x (a|s) (state-to-action mapping) pi x (a|s) =p (at=a|st=s) in a given MDP-based tactical decision process. The optimality herein refers to the fact that the Agent has the greatest cumulative return on a tactical decision trajectory.
The system adopts an advanced human body posture estimation algorithm to estimate the posture of the player, and judges the actions of the player according to the posture information and the time sequence, such as: shooting, passing, and the like, and automatically marking the positions of the players doing corresponding actions after action recognition. In the aspect of the track of the player, real-time tracking of the player is realized by combining a multi-person tracking algorithm with the gesture information of the player, the motion track of the player is drawn on a corrected court model according to a tracking sequence, the running track of the player and the concentrated area of shooting are intuitively displayed, and the shooting hit rate and the optimal shooting position of the player are analyzed by combining the scoring condition. And calculating the movement acceleration of the player according to the running track of the player.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present invention, and these modifications and variations should also be regarded as the scope of the invention.
Claims (4)
1. A basketball motion analysis method based on 3D pose estimation, comprising: estimating human body gestures from one or more RGBD inputs, identifying human body actions and tracking personnel, analyzing player action patterns, and providing training basis and countermeasure analysis; predicting the coordinates of key points of a 3D human body by collecting a depth map and a color map, and generating an action sequence according to the key points of the human body and the skeleton map as input and identifying the actions of a person by the idle action when in use; generating a human body detection frame according to human body key point coordinates, taking the human body detection frame as input of a tracker, generating a human track sequence, analyzing a human motion track, making and optimizing an action scheme, generating and evaluating a plurality of action sequences to form a competition command decision, and specifically comprising the following steps:
s1, RGBD multi-view attitude estimation based on voxel network:
estimating human keypoints w= (w 1,.. wJ) from the depth map and the color map, J keypoint coordinates for use in the real world; and in the coordinate system, is identical to the color sensor frame;
converting a depth map to a color map using camera calibration, and manipulating the warped depth map D ε R N×M ;
Color key point detector: the keypoint detector is applied to the color image I, which generates a score map s 2D ∈R N×M×J The position likelihood of the keypoints; the largest mapping S2D of scores corresponds to the predicted keypoint locationsThe weights were fixed using Open Pose Library;
voxel pose network: voxel grid V e R for a given deformed depth map D K×K×K Calculation using k=64; for this purpose, the depth map D is converted into 3D world coordinates of the point cloud and the computing grid centerFrom D->Calculation in the neighborhood +.>As a predictive 2D "neck" key point +.>
Wherein K represents an intrinsic calibration matrix camera andis a homogeneous coordinate; selecting dr verification from depth map>Surrounding neighbors three effective depth values; calculating V by setting the element as 1, wherein at least one point exists in the point cloud, and the other positions are replaced by 0; the resolution of the voxel grid selected is 3 cm;
voxel pose network obtains V and a series of score maps s 2D As input, and processed using a series of 3D convolutions; will s 2D Tiling along the Z axis; voxel pose network estimation score map s 3D ∈R K×K×K×J As its corresponding 2D is the same;
w VPN predicting through a voxel gesture network; using w VPN Z component of (2) and predicted 2D keypoint p 2D Calculate another set of world coordinates w projected The method comprises the steps of carrying out a first treatment on the surface of the The precision of these coordinates in the x and y directions is not limited by the choice of K; based on the confidence of 2D network prediction, i.e., the score of S2D on p, fromw projected And w VPN The final prediction w is selected;
s2, an action recognition algorithm based on space-time double flow:
based on the RNN and the CNN, the self-adaptive neural network is checked and named as VA-RNN and VA-CNN; transforming, at the VA-RNN, the skeleton to a new representation by a suitable view point under the RNN-based view adaptation subnet, and a skeleton transformed by the primary LSTM network from the identified actions; VA-CNN is adapted to a sub-network based on CNN view and a main convolution network; each network is trained end-to-end by optimizing classification performance;
s3, tracking multiple people:
a traditional single hypothesis tracking method adopting recursive Kalman filtering and frame-to-frame data association is adopted;
tracking processing and kalman filtering framework: the camera is not calibrated and no self-motion information is available;
the characteristic pyramid network is adopted: the feature pyramid network predicts from multiple scales; the input video is firstly subjected to downsampling processing with three scales of 1/32,1/16 and 1/8; upsampling with the minimum-sized feature map and merging from the second, skipped connected, minimum-sized feature map, as well as other scales; mapping the prediction head to three scales in the fusion characteristic; one prediction head comprises a plurality of stacked convolution layers and outputs a compact (6A+D) xHxW prediction graph; the dense prediction graph is divided into three parts: 1) The size of the box classification result is 2AxHxW; 2) Regression coefficient of box 4AxHxW; 3) The size of the dense embedded graph is DxHxW;
s4, track detection:
analyzing the action, the movement track and the attack and defense countermeasures of the player;
s5, calibrating the court video shot by the wide-angle camera to form a regular court;
s6, player trajectory analysis:
estimating the gesture of the player by adopting a body gesture estimation algorithm, judging the action of the player according to the gesture information and the time sequence, and automatically marking the position of the player doing the corresponding action after the action is identified; in the aspect of the track of the player, real-time tracking of the player is realized by combining the gesture information of the player through a multi-person tracking algorithm; calculating the movement acceleration of the player according to the running track of the player;
s7, generation of competition command decisions
In competition countermeasure scheme deduction, based on an intelligent agent of a deep reinforcement learning algorithm, a deep neural network is continuously updated in continuous interaction with a competition field environment, learning experience is accumulated to guide continuous behavior selection, and finally an optimal action sequence meeting the requirement of a coach is generated.
2. The basketball motion analysis method based on 3D pose estimation of claim 1, further comprising:
for S2, scores for RNN and CNN may also be fused to provide a thick prediction, denoted as VA-fusion scheme.
3. A basketball movement analysis method based on 3D pose estimation according to claim 1, wherein the intelligent agent based on the deep reinforcement learning algorithm continuously updates its deep neural network in continuous interaction with the course environment, accumulates learning experience to guide its continuous behavior selection, and finally generates an optimal action sequence meeting the requirement of a coach, and further comprising:
and S7, generating a human body detection frame according to human body key point coordinates of the competition histories of the two parties of the competition, taking the human body detection frame as input of a tracker, generating a human track sequence, analyzing the human motion track, making and optimizing an action scheme, generating and evaluating processes of a plurality of action sequences, and forming a competition command decision.
4. The basketball motion analysis method based on 3D pose estimation according to claim 2, further comprising:
and S6, combining with the gesture information of the player, realizing real-time tracking of the player, drawing the motion trail of the player on a corrected court model according to a tracking sequence, intuitively displaying the running trail of the player and the concentrated area of shooting, and analyzing the shooting hit rate and the optimal shooting position of the player in combination with the scoring condition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010579582.9A CN111724414B (en) | 2020-06-23 | 2020-06-23 | Basketball motion analysis method based on 3D gesture estimation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010579582.9A CN111724414B (en) | 2020-06-23 | 2020-06-23 | Basketball motion analysis method based on 3D gesture estimation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111724414A CN111724414A (en) | 2020-09-29 |
CN111724414B true CN111724414B (en) | 2024-01-26 |
Family
ID=72568389
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010579582.9A Active CN111724414B (en) | 2020-06-23 | 2020-06-23 | Basketball motion analysis method based on 3D gesture estimation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111724414B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12100104B2 (en) * | 2020-11-17 | 2024-09-24 | International Institute Of Information Technology, Hyderabad | System and method for automatically reconstructing 3D model of an object using machine learning model |
CN112668549B (en) * | 2021-01-15 | 2023-04-07 | 北京格灵深瞳信息技术股份有限公司 | Pedestrian attitude analysis method, system, terminal and storage medium |
CN113011310B (en) * | 2021-03-15 | 2022-03-11 | 中国地质大学(武汉) | Method and device for collecting shooting exercise amount, computer equipment and storage medium |
CN113312840B (en) * | 2021-05-25 | 2023-02-17 | 广州深灵科技有限公司 | Badminton playing method and system based on reinforcement learning |
CN113343843A (en) * | 2021-06-04 | 2021-09-03 | 北京格灵深瞳信息技术股份有限公司 | Target tactical recognition method and device, electronic equipment and storage medium |
CN113506210A (en) * | 2021-08-10 | 2021-10-15 | 深圳市前海动竞体育科技有限公司 | Method for automatically generating point maps of athletes in basketball game and video shooting device |
CN113705445B (en) * | 2021-08-27 | 2023-08-04 | 深圳龙岗智能视听研究院 | Method and equipment for recognizing human body posture based on event camera |
CN113837063B (en) * | 2021-10-15 | 2024-05-10 | 中国石油大学(华东) | Reinforcement learning-based curling motion field analysis and auxiliary decision-making method |
CN114155256B (en) * | 2021-10-21 | 2024-05-24 | 北京航空航天大学 | Method and system for tracking deformation of flexible object by using RGBD camera |
CN114998991A (en) * | 2022-06-01 | 2022-09-02 | 浙江蓝鸽科技有限公司 | Campus intelligent playground system and motion detection method based on same |
CN114783039B (en) * | 2022-06-22 | 2022-09-16 | 南京信息工程大学 | Motion migration method driven by 3D human body model |
CN115937895B (en) * | 2022-11-11 | 2023-09-19 | 南通大学 | Speed and strength feedback system based on depth camera |
CN117612242B (en) * | 2023-05-30 | 2024-06-28 | 黑龙江大学 | Monitoring display system for player motion state tracking foul |
CN116645726B (en) * | 2023-05-30 | 2024-02-02 | 首都师范大学 | Behavior recognition method and system for space-time double-branch fusion by utilizing three-dimensional human body recovery |
CN117789094B (en) * | 2023-12-29 | 2024-09-03 | 内蒙古大学 | Group behavior detection and recognition method and system based on deep learning |
CN117807818B (en) * | 2024-03-01 | 2024-05-10 | 西安慧金科技有限公司 | Industrial furnace life prediction method combined with dynamic basket ring optimization algorithm |
CN118506248A (en) * | 2024-06-19 | 2024-08-16 | 长春职业技术学院 | Sports match scoring system and method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108722A (en) * | 2018-01-17 | 2018-06-01 | 深圳市唯特视科技有限公司 | A kind of accurate three-dimensional hand and estimation method of human posture based on single depth image |
CN109165253A (en) * | 2018-08-15 | 2019-01-08 | 宁夏大学 | A kind of method and apparatus of Basketball Tactical auxiliary |
CN110674785A (en) * | 2019-10-08 | 2020-01-10 | 中兴飞流信息科技有限公司 | Multi-person posture analysis method based on human body key point tracking |
CN110929596A (en) * | 2019-11-07 | 2020-03-27 | 河海大学 | Shooting training system and method based on smart phone and artificial intelligence |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9117147B2 (en) * | 2011-04-29 | 2015-08-25 | Siemens Aktiengesellschaft | Marginal space learning for multi-person tracking over mega pixel imagery |
-
2020
- 2020-06-23 CN CN202010579582.9A patent/CN111724414B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108722A (en) * | 2018-01-17 | 2018-06-01 | 深圳市唯特视科技有限公司 | A kind of accurate three-dimensional hand and estimation method of human posture based on single depth image |
CN109165253A (en) * | 2018-08-15 | 2019-01-08 | 宁夏大学 | A kind of method and apparatus of Basketball Tactical auxiliary |
CN110674785A (en) * | 2019-10-08 | 2020-01-10 | 中兴飞流信息科技有限公司 | Multi-person posture analysis method based on human body key point tracking |
CN110929596A (en) * | 2019-11-07 | 2020-03-27 | 河海大学 | Shooting training system and method based on smart phone and artificial intelligence |
Non-Patent Citations (1)
Title |
---|
体育视频分析中姿态估计进展的综述;宗立波;宋一凡;王熠明;马波;王东洋;李英杰;张鹏;小型微型计算机系统;第41卷(第008期);1751-1757 * |
Also Published As
Publication number | Publication date |
---|---|
CN111724414A (en) | 2020-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111724414B (en) | Basketball motion analysis method based on 3D gesture estimation | |
Monti et al. | Dag-net: Double attentive graph neural network for trajectory forecasting | |
Felsen et al. | What will happen next? forecasting player moves in sports videos | |
CN110462684B (en) | System, computer readable medium and method for implicit prediction of object movement | |
US11967086B2 (en) | Player trajectory generation via multiple camera player tracking | |
CN111444890A (en) | Sports data analysis system and method based on machine learning | |
Meng et al. | A video information driven football recommendation system | |
CN110210383B (en) | Basketball video semantic event recognition method integrating motion mode and key visual information | |
EP3945463B1 (en) | A computing system and a computer-implemented method for sensing gameplay events and augmentation of video feed with overlay | |
Pu et al. | Orientation and decision-making for soccer based on sports analytics and AI: A systematic review | |
Ding et al. | Machine learning model for feature recognition of sports competition based on improved TLD algorithm | |
Ait-Bennacer et al. | Applying Deep Learning and Computer Vision Techniques for an e-Sport and Smart Coaching System Using a Multiview Dataset: Case of Shotokan Karate. | |
Jiang et al. | Golfpose: Golf swing analyses with a monocular camera based human pose estimation | |
Pervaiz et al. | Artificial neural network for human object interaction system over Aerial images | |
CN115100744A (en) | Badminton game human body posture estimation and ball path tracking method | |
Skublewska-Paszkowska et al. | Attention Temporal Graph Convolutional Network for Tennis Groundstrokes Phases Classification | |
Wang et al. | [Retracted] Simulation of Tennis Match Scene Classification Algorithm Based on Adaptive Gaussian Mixture Model Parameter Estimation | |
Li et al. | Tracking and detection of basketball movements using multi-feature data fusion and hybrid YOLO-T2LSTM network | |
Almasi | Human movement analysis from the egocentric camera view | |
CN116958872A (en) | Intelligent auxiliary training method and system for badminton | |
CN114898275A (en) | Student activity track analysis method | |
Liwei | Research on classification and recognition of badminton batting action based on machine learning | |
US11640713B2 (en) | Computing system and a computer-implemented method for sensing gameplay events and augmentation of video feed with overlay | |
Felsen | Learning to predict human behavior from video | |
WO2023081456A1 (en) | Machine learning based video analysis, detection and prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |