[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111724414B - Basketball motion analysis method based on 3D gesture estimation - Google Patents

Basketball motion analysis method based on 3D gesture estimation Download PDF

Info

Publication number
CN111724414B
CN111724414B CN202010579582.9A CN202010579582A CN111724414B CN 111724414 B CN111724414 B CN 111724414B CN 202010579582 A CN202010579582 A CN 202010579582A CN 111724414 B CN111724414 B CN 111724414B
Authority
CN
China
Prior art keywords
player
action
network
gesture
human body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010579582.9A
Other languages
Chinese (zh)
Other versions
CN111724414A (en
Inventor
张鹏
王红艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningxia University
Original Assignee
Ningxia University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningxia University filed Critical Ningxia University
Priority to CN202010579582.9A priority Critical patent/CN111724414B/en
Publication of CN111724414A publication Critical patent/CN111724414A/en
Application granted granted Critical
Publication of CN111724414B publication Critical patent/CN111724414B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

Video analysis technology is increasingly used in basketball athletic activities, and can effectively help athletes or teams to improve competition results. The invention provides a basketball training auxiliary system method based on gesture estimation and motion recognition pair analysis. The basketball game and training video are analyzed by the basketball game system, a coach discovers weaknesses in the game through the analysis result of the basketball game system, the competitive level is improved, the game is prepared in advance in a targeted mode, teammates and game strategies are arranged according to the characteristics of opponents, the analysis result is fed back to athletes, and the athletes are helped to reduce the risk of injury. The system adopts a multi-view three-dimensional gesture detection method, acquires 3D gesture and position information of a player on a court through a plurality of video cameras containing depth information, recognizes the action of the player and tracks the track of the player, finally predicts and analyzes the action of the player, constructs a player movement gesture and track model, and recognizes the movement gesture on a competition field.

Description

Basketball motion analysis method based on 3D gesture estimation
Technical Field
The invention relates to the technical field of motion analysis, in particular to a basketball motion analysis method based on 3D motion video analysis.
Background
In the big data age, sports teams have collected many statistics about players to make corresponding decisions to improve performance of the team; for example, a new athlete may be employed or an optimal tactic may be designed for a particular game. In particular, in the basketball field, a strong need has arisen for player athletic data analysis, ranging from initial manual annotation of video to video analysis-based companies such as Second Spectrum and STATS that reduce the amount of manual analysis effort by extracting information from a set of ceiling cameras placed at a venue. However, automated analysis of video sequences and assessment of player and team ability is still an open subject due to many challenges present in sports videos, such as multi-person occlusion, similar appearance or too fast and unstable movements. In addition, the strategic arrangement of the team playing process based on the ability assessment results of the players is also an important topic in sports competition research.
Assessment of performance capabilities of vision-based athletes includes problems with human body posture estimation, motion recognition, motion-based event feature analysis, and the like. Many expert students have developed studies on basketball video event detection and annotation work and have proposed many ideas and methods. From the feature analysis method of single modes such as initial hearing, vision, text and the like to the video analysis method of multi-mode feature combination; from the detection method combining the field characteristics to the detection method utilizing the universal model, the basketball video event detection technology has been greatly developed. Human posture estimation is the basis for behavioral ability assessment. Most research focuses on human body pose estimation of common 2D video, including both skeleton estimation and human body surface estimation. Such methods lack an effective solution for human occlusion in a game, similar appearance of the player's clothing, and lack sufficient information for the spatial location of the player on the course. The invention adopts multi-view video acquisition, and provides a method for estimating the gesture of a player based on a skeleton in a 3D space by combining information in a color and a depth map based on deep learning. Therefore, the shielding problem among multiple people can be effectively solved, and the accurate player space position can be obtained.
The traditional human body posture estimation algorithm is difficult to accurately position the position of a player on a competition field in a competition, particularly team sports, and particularly the situation that multiple players quickly cross the competition in basketball games. The player detection and tracking method provided by the invention estimates the similarity of key points of human gesture and gesture to be detected in each frame so as to strengthen the tracking of a specific person.
The invention saves the action analysis result and the motion trail of each match of the player into a database; and the analysis module is used for analyzing and predicting the action frequency, the spatial position and the time-varying characteristics of the player to obtain the competition characteristics and the current competition state of the player.
The invention provides a basketball game coach auxiliary decision-making system, which adopts three-dimensional human body motion estimation based on a depth neural network, combines multi-view 3D human body motion estimation and space track tracking, and locates multiple people in a complex environment. And (3) performing a player motion prediction algorithm based on space time by adopting a space-time double-flow depth neural network to obtain information such as running speed, track, motion and the like of the player, and establishing player sport images by combining the score and physical condition of the player to study a basketball game field evaluation algorithm. Finally, auxiliary decision making system for basketball game coach
The invention relates to an end-to-end sports video analysis and auxiliary decision making system which comprises modules of video acquisition, human body gesture detection, action recognition, motion track tracking, player motion characteristic analysis, match reinforcement learning decision making, data visualization analysis and the like.
Disclosure of Invention
In order to solve the technical problems, the invention provides a basketball motion analysis method based on 3D gesture estimation, which predicts the full 3D human gesture from RGBD input based on a computer vision method.
According to the basketball motion analysis method based on 3D gesture estimation, the human gesture is estimated from RGBD input, the human motion is identified, and the personnel is tracked. Predicting the coordinates of key points of a 3D human body by collecting a depth map and a color map, and generating an action sequence according to the key points of the human body and the skeleton map as input and identifying the actions of a person by the idle action when in use; generating a human body detection frame according to human body key point coordinates, generating a human track sequence as input of a tracker, and analyzing a human motion track, wherein the method specifically comprises the following steps of:
s1, RGBD multi-view attitude estimation based on voxel network:
estimating human keypoints w= (w 1,.. wJ) from the depth map and color en-route, J keypoint coordinates for use in the real world; and in the coordinate system is the same as the color sensor frame.
Converting a depth map to a color map using camera calibration, and manipulating the warped depth map D ε R N×M
Color key point detector: the keypoint detector is applied to the color image I, which generates a score map s 2D ∈R N×M×J The position likelihood of the keypoints; the largest mapping S2D of scores corresponds to the predicted keypoint locationsThe weights were fixed using Open Pose Library;
voxel pose network: voxel grid V e R for a given deformed depth map D K×K×K Calculation using k=64; for this purpose, the depth map D is converted into 3D world coordinates of the point cloud and the computing grid centerFrom D->Calculation in the neighborhood +.>As a predictive 2D "neck" key point +.>
Wherein K represents an intrinsic calibration matrix camera andis a homogeneous coordinate; selecting dr verification from depth map>Surrounding neighbors three effective depth values; calculating V by setting the element as 1, wherein at least one point exists in the point cloud, and the other positions are replaced by 0; selectingThe resolution of the selected voxel grid is about 3 cm;
voxel pose network obtains V and a series of score maps s 2D As input, and processed using 3D convolution of some columns; will s 2D Tiling along the Z axis; voxel pose network estimation score map s 3D ∈R K×K×K×J As its corresponding 2D is the same;
w VPN predicting through a voxel gesture network; using w VPN Z component of (2) and predicted 2D keypoint p 2D Calculate another set of world coordinates w projected The method comprises the steps of carrying out a first treatment on the surface of the The precision of these coordinates in the x and y directions is not limited by the choice of K; confidence of 2D network prediction (S2D score on p) is derived from w projected And w VPN The final prediction w is selected;
s2, an action recognition algorithm based on space-time double flow:
based on the RNN and the CNN, the self-adaptive neural network is checked and named as VA-RNN and VA-CNN; transforming, at the VA-RNN, the skeleton to a new representation by a suitable view point under the RNN-based view adaptation subnet, and a skeleton transformed by the primary LSTM network from the identified actions; VA-CNN is adapted to a sub-network based on CNN view and a main convolution network; each network is trained end-to-end by optimizing classification performance;
s3, tracking multiple people:
a traditional single hypothesis tracking method adopting recursive Kalman filtering and frame-to-frame data association is adopted;
tracking processing and kalman filtering framework: the camera is not calibrated and no self-motion information is available;
the characteristic pyramid network is adopted: the feature pyramid network predicts from multiple scales; the input video is firstly subjected to downsampling processing with three scales of 1/32,1/16 and 1/8; upsampling with the minimum-sized feature map and merging from the second, skipped connected, minimum-sized feature map, as well as other scales; mapping the prediction head to three scales in the fusion characteristic; one prediction head comprises a plurality of stacked convolution layers and outputs a compact (6A+D) xHxW prediction graph; the dense prediction graph is divided into three parts: 1) The size of the box classification result is 2AxHxW; 2) Regression coefficient of box 4AxHxW; 3) The size of the dense embedded graph is DxHxW;
s4, track detection:
analyzing the action, the movement track and the attack and defense countermeasures of the player;
s5, calibrating the court video shot by the wide-angle camera to form a regular court;
s6, player trajectory analysis:
estimating the gesture of the player by adopting a body gesture estimation algorithm, judging the action of the player according to the gesture information and the time sequence, and automatically marking the position of the player doing the corresponding action after the action is identified; in the aspect of the track of the player, real-time tracking of the player is realized by combining the gesture information of the player through a multi-person tracking algorithm; and calculating the movement acceleration of the player according to the running track of the player.
S7, generation of competition decision
Basketball tactics decisions can be abstracted into multi-step decision problems in reinforcement learning, and the search space is huge. Under the limited condition of the current competition field command decision training data, the applicability of the strategy learned by direct imitation learning is not strong. The reason for this is that the sampled decision training sample trajectory has limited ability to generalize the strategy function learned by supervised learning because it is not possible to include all state spaces. Although increasing training time and computing power can compensate for this deficiency to some extent, the problem of weak predictive and generalization capability cannot be fundamentally ameliorated. The deep reinforcement learning algorithm has stronger prediction and generalization capability compared with a supervised learning method because the deep reinforcement learning algorithm can realize end-to-end self-learning from perception to decision control based on a deep neural network.
In practical multi-step reinforcement learning, it is quite difficult to design the return function, but the back-derivation of the return function from the example data provided by the expert helps solve the problem, which is the idea of reverse reinforcement learning (Inverse Reinforcement Learning, IRL). Reverse reinforcement learning is considered as an important means for enhancing learning acceleration. The tactical intelligent decision process based on reverse reinforcement learning is known to be a state space S and an action space A, and a relatively rich actual combat command decision example data set is accumulated, so that a deep neural network with multiple hidden layers can be trained to 'fit' a return function by utilizing a technical solution combining deep learning and reverse reinforcement learning, and the decision example data is optimally distributed in the return function environment; then, the optimal COA under the specific index can be solved by utilizing the return function; finally, aiming at different tasks, the tactical decision agility evaluation index is combined with the tactical decision agility evaluation index to form a reasonable tactical scheme. The nonlinear "return function" based on the deep neural network can be considered as an empirical determination of the real-time situation by the coach.
Abstracting the scene into one MDP. The player information to be trained is taken as an intelligent agent, the competition faced by the player is taken as an environment, and the character acquires the current state and rewards from the competition and makes corresponding actions. The state is used for representing the characteristic information of the current role, can be directly the original information of the current game, and can be used for extracting the characteristics by coding, or can be semantic characteristics extracted through the history game and training state of the player. After abstracting the game into one MDP, the players facing the game fight decision-assessment network.
The reinforcement learning method is used to solve tactical decision problems, and can be considered as a multi-step reinforcement learning process on continuous state space and discrete action space, and is generally described by Markov decision process (Markov Decision Process, MDP). The Markov decision process is a random dynamic system based on Markov assumptions, and can be expressed by four tuples (S, A, P, R), wherein S represents a state set (States); a represents an Action set (Action); p (s|s, a) represents the probability of transitioning to the s-state after action a is taken in state s; r (s, a) represents the cumulative return obtained by taking action a in state s; r (s, a) represents the immediate return obtained by taking action a in state s.
The goal of the MDP learning task is to find an optimal strategy to maximize the cumulative return. In deduction of tactical action scheme, the interactive process of action entity Agent and competition field environment, at each time step, agent obtains state st by observing environment, then executing a certain action at, and environment generates st+1 and rt of next step according to at. The task goal of reinforcement learning is to find the optimal strategy pi x (a|s) (state-to-action mapping) pi x (a|s) =p (at=a|st=s) in a given MDP-based tactical decision process. The optimality herein refers to the fact that the Agent has the greatest cumulative return on a tactical decision trajectory.
The invention relates to a basketball motion analysis method based on 3D gesture estimation, which further comprises the following steps:
for S2, scores for RNN and CNN may also be fused to provide a thick prediction, denoted as VA-fusion scheme.
The invention relates to a basketball motion analysis method based on 3D gesture estimation, which further comprises the following steps:
s6, combining with the gesture information of the player, realizing real-time tracking of the player, drawing the motion trail of the player on a corrected court model according to a tracking sequence, intuitively displaying the running trail of the player and a concentrated area of shooting, and analyzing the shooting hit rate and the optimal shooting position of the player in combination with the scoring condition;
the beneficial effects of the invention are as follows: the full three-dimensional human body pose is predicted from multi-view RGBD input. It is superior to the existing reference method. The method predicts the pose of a person first in a 2D presented color image. The depth network requires the 2D pose and depth map as inputs and estimates the full three-dimensional pose from this information. Based on the spatial position of the three-dimensional human body gesture and the action recognition result, and combining with historical data such as player scores, team scores and the like, a reinforcement learning method is adopted to generate a training competition tactical countermeasure auxiliary decision result.
Drawings
FIG. 1 is a schematic diagram of an architecture of the present invention;
FIG. 2 is a schematic illustration of the game pose estimation results of a player's Bitummy Blender;
FIG. 3 is a three-dimensional human body posture result diagram of the algorithm in actual calculation;
FIG. 4 is a schematic representation of the full-field motion profile of a player Tim Duncan (Tim Duncan);
FIG. 5 is a schematic illustration of the full-field motion profile of Sabloc France (Thabo Sefolosha);
FIG. 6 is a schematic top view of player trajectory tracking;
FIG. 7 is a schematic illustration of the distance travelled by a part of the player in 2019 CBA overall resolution;
FIG. 8 is a schematic view of the scatter points of a basketball shot;
FIG. 9 is a schematic illustration of a heat map of a basketball shot;
FIG. 10 is a sectional view of a basketball shot;
FIG. 11 is a schematic diagram of a multi-agent reinforcement learning multi-antibody policy framework
FIG. 12 is a schematic diagram of generating tactical decisions
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
Example 1
The object of the invention herein is to estimate three-dimensional human body gestures from RGBD inputs of a plurality of cameras, recognize human body actions and track persons. Predicting the coordinates of key points of a 3D human body by collecting a depth map and a color map, and generating an action sequence according to the key points of the human body and the skeleton map as input and identifying the actions of a person by the idle action when in use; generating a human body detection frame according to the human body key point coordinates, taking the detection frame as the input of a tracker, generating a human track sequence, and analyzing the human motion track; as in fig. 1.
1 human body pose estimation
The object of the present invention is to estimate human keypoints w= (w 1,.. wJ) from the depth map and color en-route, J keypoint coordinates for use in the real world. In order not to lose generality, in the coordinate system defined by the present invention, the predictions of the present invention are the same as the color sensor frame.
For Kinect, color and depthThe sensors are closely located but they are still two different cameras. The method of the invention requires collocating two frames of information. Thus, the present invention uses a camera calibration to convert the depth map to a color map. As a result, the method of the present invention operates on a deformed depth map D εR N×M . Due to the effect of occlusion, the depth map D is sparse in terms of resolution and noise differences.
1) Color key point detector: the keypoint detector is applied to the color image I, which generates a score map s 1D ∈R N×M×J The location likelihood of the keypoints. The largest mapping S2D of scores corresponds to the predicted keypoint locationsThe present invention uses Open Pose Library fixed weights, other 2D pose estimation methods such as those used in Alpha Pose Library may also be employed herein.
2) Voxel pose network: voxel grid V e R for a given deformed depth map D K×K×K Calculated using k=64. For this purpose, the depth map D is converted into a point cloud and the 3D world coordinates of the grid center are calculated according to the inventionThe invention is->In-neighbor calculationAs a predictive 2D "neck" key point +.>
Wherein K represents an intrinsic calibration matrix camera andare homogeneous coordinates. The invention selects dr from depth map for verificationSurrounding neighbors have three valid depth values. According to the invention, V is calculated by setting the element as 1, at least one point exists in the point cloud, and other positions are replaced by 0. The resolution of the voxel grid selected by the present invention is about 3 cm.
Voxel pose network obtains V and a series of score maps s 2D As input and process them using 3D convolution of some columns. The invention will s 2D Tiling along the Z-axis corresponds to an orthogonal projection approximation. Voxel pose network estimation score map s 3D ∈R K×K×K×J In a manner similar to the likelihood of a keypoint as its corresponding 2D identity
The invention uses the following tests to complete the final prediction of the invention: in one aspect, w VPN And predicting through the voxel gesture network. In another aspect, the present invention uses w VPN Z component of (2) and predicted 2D keypoint p 2D Calculate another set of world coordinates w projected . The accuracy in the x and y directions for these coordinates is not limited by the choice of K. The invention is based on confidence of 2D network prediction (score of S2D on p) from w projected And w VPN The final prediction w is selected.
The network architecture of fig. 2, 3, using VoxelPoseNet, has its encoder and decoder architecture motivated by the U-net using dense block encoders. In decoding the full resolution score map, the present invention includes a plurality of intermediate losses resulting from S3D.
2 motion recognition
The invention designs two kinds of view self-adaptive neural networks based on RNN and CNN, named VA-RNN and VA-CNN. As shown in fig. 3, at VA-RNN (as shown at the top), the skeleton is transformed to a new representation by a suitable view point under the RNN-based view-adaptation subnet, and the primary LSTM network is used to transform the skeleton from the identified actions. VA-CNN (as shown at the bottom) is fitted with a sub-network based on CNN views, as well as a main convolutional network (ConvNet). Each network is end-to-end trained by optimizing classification performance. Alternatively, the present invention may fuse scores from both networks to provide a thick prediction, denoted as VA-fusion scheme.
3 Multi-person tracking
The present invention employs a conventional single hypothesis tracking method of recursive kalman filtering and frame-to-frame data correlation. In the following sections, the core of the system will be described in more detail.
Tracking processing and kalman filtering frameworks are general techniques. The present invention assumes a very popular tracking scenario where the camera is not calibrated and no self-motion information is available.
The present invention employs a Feature Pyramid Network (FPN). FPN predicts from multiple scales, thus improving performance in multi-target pedestrian detection. The neural network used in JDE is briefly described in figure two. The input video is first downsampled by three scales 1/32,1/16, 1/8. Then, upsampling with the smallest-sized (semantically strongest) feature map and merging from the second, connected-skipped, smallest-sized feature map, as well as other scales. Finally, the prediction header maps to three scales at the fusion feature. One prediction head consists of several stacked convolutional layers and an output dense size (6A+D) xHxW prediction graph, where A is the number of anchor templates allocated to this scale and D is the embedded size. The dense prediction graph is divided into three parts (tasks):
1) Size of box classification result 2AxHxW
2) Regression coefficient of box 4AxHxW
3) The size of the dense embedded graph is DxHxW
4 track detection
Through the action recognition and track tracking of the player, the invention constructs a set of monitoring system for player tracking and analysis. In this set of systems, the invention will analyze the player's actions, trajectory, and attack and defense countermeasures.
4.1 course correction
Because the camera shoots basketball games, the basketball games are not directly shot from the upper part of the basketball games, and the basketball games cannot be directly shaped. Meanwhile, in order to intuitively show the movement track of the player, a regular court model is required. At this time, it is necessary to calibrate the course video shot by the wide-angle camera to form a regular course.
4.2 player trajectory analysis
Trajectory analysis
Currently, track analysis of a player is finished by manually selecting videos collected by a camera, for example: the SportVU is software, a multi-angle multi-camera dynamic tracking technology is used for capturing the movement condition of a player in a court, 25 pictures are captured every second, and the time is recorded and automatically analyzed by a computer. Through processing, the data collected by the camera provides a very rich statistical database for analysis of the elements such as the movement speed, the movement distance, the distance between players, the ball control and the like. However, it is basically implemented by means of data experts in screening data and analyzing data. Although most specialists initially find the most objective data method, after all, the specialists are also people, and the preference is probably unavoidable. There are also different specialists who have their own ways to understand basketball, exaggerate the importance of certain data, and misuse certain irrelevant data, all with the possibility of drawing erroneous conclusions. Meanwhile, the 5 positions of the NBA are not divided into details, and the most objective conclusion cannot be obtained on some specific events.
5 tactical decisions
Abstracting the scene into one MDP. As shown in FIG. 12, we use the information of the player to be trained as an agent, the game to which the player faces as an environment, and the character obtains the current state and rewards from the game and makes corresponding actions. The state is used for representing the characteristic information of the current role, can be directly the original information of the current game, and can be used for extracting the characteristics by coding, or can be semantic characteristics extracted through the history game and training state of the player. After abstracting the game into one MDP, the players facing the game fight decision-assessment network.
The reinforcement learning method is used to solve tactical decision problems, and can be considered as a multi-step reinforcement learning process on continuous state space and discrete action space, and is generally described by Markov decision process (Markov Decision Process, MDP). The Markov decision process is a random dynamic system based on Markov assumptions, and can be expressed by four tuples (S, A, P, R), wherein S represents a state set (States); a represents an Action set (Action); p (s|s, a) represents the probability of transitioning to the s-state after action a is taken in state s; r (s, a) represents the cumulative return obtained by taking action a in state s; r (s, a) represents the immediate return obtained by taking action a in state s.
The goal of the MDP learning task is to find an optimal strategy to maximize the cumulative return. In deduction of tactical action scheme, the interactive process of action entity Agent and competition field environment, at each time step, agent obtains state st by observing environment, then executing a certain action at, and environment generates st+1 and rt of next step according to at. The task goal of reinforcement learning is to find the optimal strategy pi x (a|s) (state-to-action mapping) pi x (a|s) =p (at=a|st=s) in a given MDP-based tactical decision process. The optimality herein refers to the fact that the Agent has the greatest cumulative return on a tactical decision trajectory.
The system adopts an advanced human body posture estimation algorithm to estimate the posture of the player, and judges the actions of the player according to the posture information and the time sequence, such as: shooting, passing, and the like, and automatically marking the positions of the players doing corresponding actions after action recognition. In the aspect of the track of the player, real-time tracking of the player is realized by combining a multi-person tracking algorithm with the gesture information of the player, the motion track of the player is drawn on a corrected court model according to a tracking sequence, the running track of the player and the concentrated area of shooting are intuitively displayed, and the shooting hit rate and the optimal shooting position of the player are analyzed by combining the scoring condition. And calculating the movement acceleration of the player according to the running track of the player.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present invention, and these modifications and variations should also be regarded as the scope of the invention.

Claims (4)

1. A basketball motion analysis method based on 3D pose estimation, comprising: estimating human body gestures from one or more RGBD inputs, identifying human body actions and tracking personnel, analyzing player action patterns, and providing training basis and countermeasure analysis; predicting the coordinates of key points of a 3D human body by collecting a depth map and a color map, and generating an action sequence according to the key points of the human body and the skeleton map as input and identifying the actions of a person by the idle action when in use; generating a human body detection frame according to human body key point coordinates, taking the human body detection frame as input of a tracker, generating a human track sequence, analyzing a human motion track, making and optimizing an action scheme, generating and evaluating a plurality of action sequences to form a competition command decision, and specifically comprising the following steps:
s1, RGBD multi-view attitude estimation based on voxel network:
estimating human keypoints w= (w 1,.. wJ) from the depth map and the color map, J keypoint coordinates for use in the real world; and in the coordinate system, is identical to the color sensor frame;
converting a depth map to a color map using camera calibration, and manipulating the warped depth map D ε R N×M
Color key point detector: the keypoint detector is applied to the color image I, which generates a score map s 2D ∈R N×M×J The position likelihood of the keypoints; the largest mapping S2D of scores corresponds to the predicted keypoint locationsThe weights were fixed using Open Pose Library;
voxel pose network: voxel grid V e R for a given deformed depth map D K×K×K Calculation using k=64; for this purpose, the depth map D is converted into 3D world coordinates of the point cloud and the computing grid centerFrom D->Calculation in the neighborhood +.>As a predictive 2D "neck" key point +.>
Wherein K represents an intrinsic calibration matrix camera andis a homogeneous coordinate; selecting dr verification from depth map>Surrounding neighbors three effective depth values; calculating V by setting the element as 1, wherein at least one point exists in the point cloud, and the other positions are replaced by 0; the resolution of the voxel grid selected is 3 cm;
voxel pose network obtains V and a series of score maps s 2D As input, and processed using a series of 3D convolutions; will s 2D Tiling along the Z axis; voxel pose network estimation score map s 3D ∈R K×K×K×J As its corresponding 2D is the same;
w VPN predicting through a voxel gesture network; using w VPN Z component of (2) and predicted 2D keypoint p 2D Calculate another set of world coordinates w projected The method comprises the steps of carrying out a first treatment on the surface of the The precision of these coordinates in the x and y directions is not limited by the choice of K; based on the confidence of 2D network prediction, i.e., the score of S2D on p, fromw projected And w VPN The final prediction w is selected;
s2, an action recognition algorithm based on space-time double flow:
based on the RNN and the CNN, the self-adaptive neural network is checked and named as VA-RNN and VA-CNN; transforming, at the VA-RNN, the skeleton to a new representation by a suitable view point under the RNN-based view adaptation subnet, and a skeleton transformed by the primary LSTM network from the identified actions; VA-CNN is adapted to a sub-network based on CNN view and a main convolution network; each network is trained end-to-end by optimizing classification performance;
s3, tracking multiple people:
a traditional single hypothesis tracking method adopting recursive Kalman filtering and frame-to-frame data association is adopted;
tracking processing and kalman filtering framework: the camera is not calibrated and no self-motion information is available;
the characteristic pyramid network is adopted: the feature pyramid network predicts from multiple scales; the input video is firstly subjected to downsampling processing with three scales of 1/32,1/16 and 1/8; upsampling with the minimum-sized feature map and merging from the second, skipped connected, minimum-sized feature map, as well as other scales; mapping the prediction head to three scales in the fusion characteristic; one prediction head comprises a plurality of stacked convolution layers and outputs a compact (6A+D) xHxW prediction graph; the dense prediction graph is divided into three parts: 1) The size of the box classification result is 2AxHxW; 2) Regression coefficient of box 4AxHxW; 3) The size of the dense embedded graph is DxHxW;
s4, track detection:
analyzing the action, the movement track and the attack and defense countermeasures of the player;
s5, calibrating the court video shot by the wide-angle camera to form a regular court;
s6, player trajectory analysis:
estimating the gesture of the player by adopting a body gesture estimation algorithm, judging the action of the player according to the gesture information and the time sequence, and automatically marking the position of the player doing the corresponding action after the action is identified; in the aspect of the track of the player, real-time tracking of the player is realized by combining the gesture information of the player through a multi-person tracking algorithm; calculating the movement acceleration of the player according to the running track of the player;
s7, generation of competition command decisions
In competition countermeasure scheme deduction, based on an intelligent agent of a deep reinforcement learning algorithm, a deep neural network is continuously updated in continuous interaction with a competition field environment, learning experience is accumulated to guide continuous behavior selection, and finally an optimal action sequence meeting the requirement of a coach is generated.
2. The basketball motion analysis method based on 3D pose estimation of claim 1, further comprising:
for S2, scores for RNN and CNN may also be fused to provide a thick prediction, denoted as VA-fusion scheme.
3. A basketball movement analysis method based on 3D pose estimation according to claim 1, wherein the intelligent agent based on the deep reinforcement learning algorithm continuously updates its deep neural network in continuous interaction with the course environment, accumulates learning experience to guide its continuous behavior selection, and finally generates an optimal action sequence meeting the requirement of a coach, and further comprising:
and S7, generating a human body detection frame according to human body key point coordinates of the competition histories of the two parties of the competition, taking the human body detection frame as input of a tracker, generating a human track sequence, analyzing the human motion track, making and optimizing an action scheme, generating and evaluating processes of a plurality of action sequences, and forming a competition command decision.
4. The basketball motion analysis method based on 3D pose estimation according to claim 2, further comprising:
and S6, combining with the gesture information of the player, realizing real-time tracking of the player, drawing the motion trail of the player on a corrected court model according to a tracking sequence, intuitively displaying the running trail of the player and the concentrated area of shooting, and analyzing the shooting hit rate and the optimal shooting position of the player in combination with the scoring condition.
CN202010579582.9A 2020-06-23 2020-06-23 Basketball motion analysis method based on 3D gesture estimation Active CN111724414B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010579582.9A CN111724414B (en) 2020-06-23 2020-06-23 Basketball motion analysis method based on 3D gesture estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010579582.9A CN111724414B (en) 2020-06-23 2020-06-23 Basketball motion analysis method based on 3D gesture estimation

Publications (2)

Publication Number Publication Date
CN111724414A CN111724414A (en) 2020-09-29
CN111724414B true CN111724414B (en) 2024-01-26

Family

ID=72568389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010579582.9A Active CN111724414B (en) 2020-06-23 2020-06-23 Basketball motion analysis method based on 3D gesture estimation

Country Status (1)

Country Link
CN (1) CN111724414B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12100104B2 (en) * 2020-11-17 2024-09-24 International Institute Of Information Technology, Hyderabad System and method for automatically reconstructing 3D model of an object using machine learning model
CN112668549B (en) * 2021-01-15 2023-04-07 北京格灵深瞳信息技术股份有限公司 Pedestrian attitude analysis method, system, terminal and storage medium
CN113011310B (en) * 2021-03-15 2022-03-11 中国地质大学(武汉) Method and device for collecting shooting exercise amount, computer equipment and storage medium
CN113312840B (en) * 2021-05-25 2023-02-17 广州深灵科技有限公司 Badminton playing method and system based on reinforcement learning
CN113343843A (en) * 2021-06-04 2021-09-03 北京格灵深瞳信息技术股份有限公司 Target tactical recognition method and device, electronic equipment and storage medium
CN113506210A (en) * 2021-08-10 2021-10-15 深圳市前海动竞体育科技有限公司 Method for automatically generating point maps of athletes in basketball game and video shooting device
CN113705445B (en) * 2021-08-27 2023-08-04 深圳龙岗智能视听研究院 Method and equipment for recognizing human body posture based on event camera
CN113837063B (en) * 2021-10-15 2024-05-10 中国石油大学(华东) Reinforcement learning-based curling motion field analysis and auxiliary decision-making method
CN114155256B (en) * 2021-10-21 2024-05-24 北京航空航天大学 Method and system for tracking deformation of flexible object by using RGBD camera
CN114998991A (en) * 2022-06-01 2022-09-02 浙江蓝鸽科技有限公司 Campus intelligent playground system and motion detection method based on same
CN114783039B (en) * 2022-06-22 2022-09-16 南京信息工程大学 Motion migration method driven by 3D human body model
CN115937895B (en) * 2022-11-11 2023-09-19 南通大学 Speed and strength feedback system based on depth camera
CN117612242B (en) * 2023-05-30 2024-06-28 黑龙江大学 Monitoring display system for player motion state tracking foul
CN116645726B (en) * 2023-05-30 2024-02-02 首都师范大学 Behavior recognition method and system for space-time double-branch fusion by utilizing three-dimensional human body recovery
CN117789094B (en) * 2023-12-29 2024-09-03 内蒙古大学 Group behavior detection and recognition method and system based on deep learning
CN117807818B (en) * 2024-03-01 2024-05-10 西安慧金科技有限公司 Industrial furnace life prediction method combined with dynamic basket ring optimization algorithm
CN118506248A (en) * 2024-06-19 2024-08-16 长春职业技术学院 Sports match scoring system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108722A (en) * 2018-01-17 2018-06-01 深圳市唯特视科技有限公司 A kind of accurate three-dimensional hand and estimation method of human posture based on single depth image
CN109165253A (en) * 2018-08-15 2019-01-08 宁夏大学 A kind of method and apparatus of Basketball Tactical auxiliary
CN110674785A (en) * 2019-10-08 2020-01-10 中兴飞流信息科技有限公司 Multi-person posture analysis method based on human body key point tracking
CN110929596A (en) * 2019-11-07 2020-03-27 河海大学 Shooting training system and method based on smart phone and artificial intelligence

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9117147B2 (en) * 2011-04-29 2015-08-25 Siemens Aktiengesellschaft Marginal space learning for multi-person tracking over mega pixel imagery

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108722A (en) * 2018-01-17 2018-06-01 深圳市唯特视科技有限公司 A kind of accurate three-dimensional hand and estimation method of human posture based on single depth image
CN109165253A (en) * 2018-08-15 2019-01-08 宁夏大学 A kind of method and apparatus of Basketball Tactical auxiliary
CN110674785A (en) * 2019-10-08 2020-01-10 中兴飞流信息科技有限公司 Multi-person posture analysis method based on human body key point tracking
CN110929596A (en) * 2019-11-07 2020-03-27 河海大学 Shooting training system and method based on smart phone and artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
体育视频分析中姿态估计进展的综述;宗立波;宋一凡;王熠明;马波;王东洋;李英杰;张鹏;小型微型计算机系统;第41卷(第008期);1751-1757 *

Also Published As

Publication number Publication date
CN111724414A (en) 2020-09-29

Similar Documents

Publication Publication Date Title
CN111724414B (en) Basketball motion analysis method based on 3D gesture estimation
Monti et al. Dag-net: Double attentive graph neural network for trajectory forecasting
Felsen et al. What will happen next? forecasting player moves in sports videos
CN110462684B (en) System, computer readable medium and method for implicit prediction of object movement
US11967086B2 (en) Player trajectory generation via multiple camera player tracking
CN111444890A (en) Sports data analysis system and method based on machine learning
Meng et al. A video information driven football recommendation system
CN110210383B (en) Basketball video semantic event recognition method integrating motion mode and key visual information
EP3945463B1 (en) A computing system and a computer-implemented method for sensing gameplay events and augmentation of video feed with overlay
Pu et al. Orientation and decision-making for soccer based on sports analytics and AI: A systematic review
Ding et al. Machine learning model for feature recognition of sports competition based on improved TLD algorithm
Ait-Bennacer et al. Applying Deep Learning and Computer Vision Techniques for an e-Sport and Smart Coaching System Using a Multiview Dataset: Case of Shotokan Karate.
Jiang et al. Golfpose: Golf swing analyses with a monocular camera based human pose estimation
Pervaiz et al. Artificial neural network for human object interaction system over Aerial images
CN115100744A (en) Badminton game human body posture estimation and ball path tracking method
Skublewska-Paszkowska et al. Attention Temporal Graph Convolutional Network for Tennis Groundstrokes Phases Classification
Wang et al. [Retracted] Simulation of Tennis Match Scene Classification Algorithm Based on Adaptive Gaussian Mixture Model Parameter Estimation
Li et al. Tracking and detection of basketball movements using multi-feature data fusion and hybrid YOLO-T2LSTM network
Almasi Human movement analysis from the egocentric camera view
CN116958872A (en) Intelligent auxiliary training method and system for badminton
CN114898275A (en) Student activity track analysis method
Liwei Research on classification and recognition of badminton batting action based on machine learning
US11640713B2 (en) Computing system and a computer-implemented method for sensing gameplay events and augmentation of video feed with overlay
Felsen Learning to predict human behavior from video
WO2023081456A1 (en) Machine learning based video analysis, detection and prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant