CN111724414B

CN111724414B - Basketball motion analysis method based on 3D gesture estimation

Info

Publication number: CN111724414B
Application number: CN202010579582.9A
Authority: CN
Inventors: 张鹏; 王红艳
Original assignee: Ningxia University
Current assignee: Ningxia University
Priority date: 2020-06-23
Filing date: 2020-06-23
Publication date: 2024-01-26
Anticipated expiration: 2040-06-23
Also published as: CN111724414A

Abstract

Video analysis technology is increasingly used in basketball athletic activities, and can effectively help athletes or teams to improve competition results. The invention provides a basketball training auxiliary system method based on gesture estimation and motion recognition pair analysis. The basketball game and training video are analyzed by the basketball game system, a coach discovers weaknesses in the game through the analysis result of the basketball game system, the competitive level is improved, the game is prepared in advance in a targeted mode, teammates and game strategies are arranged according to the characteristics of opponents, the analysis result is fed back to athletes, and the athletes are helped to reduce the risk of injury. The system adopts a multi-view three-dimensional gesture detection method, acquires 3D gesture and position information of a player on a court through a plurality of video cameras containing depth information, recognizes the action of the player and tracks the track of the player, finally predicts and analyzes the action of the player, constructs a player movement gesture and track model, and recognizes the movement gesture on a competition field.

Description

Basketball motion analysis method based on 3D gesture estimation

Technical Field

The invention relates to the technical field of motion analysis, in particular to a basketball motion analysis method based on 3D motion video analysis.

Background

In the big data age, sports teams have collected many statistics about players to make corresponding decisions to improve performance of the team; for example, a new athlete may be employed or an optimal tactic may be designed for a particular game. In particular, in the basketball field, a strong need has arisen for player athletic data analysis, ranging from initial manual annotation of video to video analysis-based companies such as Second Spectrum and STATS that reduce the amount of manual analysis effort by extracting information from a set of ceiling cameras placed at a venue. However, automated analysis of video sequences and assessment of player and team ability is still an open subject due to many challenges present in sports videos, such as multi-person occlusion, similar appearance or too fast and unstable movements. In addition, the strategic arrangement of the team playing process based on the ability assessment results of the players is also an important topic in sports competition research.

Assessment of performance capabilities of vision-based athletes includes problems with human body posture estimation, motion recognition, motion-based event feature analysis, and the like. Many expert students have developed studies on basketball video event detection and annotation work and have proposed many ideas and methods. From the feature analysis method of single modes such as initial hearing, vision, text and the like to the video analysis method of multi-mode feature combination; from the detection method combining the field characteristics to the detection method utilizing the universal model, the basketball video event detection technology has been greatly developed. Human posture estimation is the basis for behavioral ability assessment. Most research focuses on human body pose estimation of common 2D video, including both skeleton estimation and human body surface estimation. Such methods lack an effective solution for human occlusion in a game, similar appearance of the player's clothing, and lack sufficient information for the spatial location of the player on the course. The invention adopts multi-view video acquisition, and provides a method for estimating the gesture of a player based on a skeleton in a 3D space by combining information in a color and a depth map based on deep learning. Therefore, the shielding problem among multiple people can be effectively solved, and the accurate player space position can be obtained.

The traditional human body posture estimation algorithm is difficult to accurately position the position of a player on a competition field in a competition, particularly team sports, and particularly the situation that multiple players quickly cross the competition in basketball games. The player detection and tracking method provided by the invention estimates the similarity of key points of human gesture and gesture to be detected in each frame so as to strengthen the tracking of a specific person.

The invention saves the action analysis result and the motion trail of each match of the player into a database; and the analysis module is used for analyzing and predicting the action frequency, the spatial position and the time-varying characteristics of the player to obtain the competition characteristics and the current competition state of the player.

The invention provides a basketball game coach auxiliary decision-making system, which adopts three-dimensional human body motion estimation based on a depth neural network, combines multi-view 3D human body motion estimation and space track tracking, and locates multiple people in a complex environment. And (3) performing a player motion prediction algorithm based on space time by adopting a space-time double-flow depth neural network to obtain information such as running speed, track, motion and the like of the player, and establishing player sport images by combining the score and physical condition of the player to study a basketball game field evaluation algorithm. Finally, auxiliary decision making system for basketball game coach

The invention relates to an end-to-end sports video analysis and auxiliary decision making system which comprises modules of video acquisition, human body gesture detection, action recognition, motion track tracking, player motion characteristic analysis, match reinforcement learning decision making, data visualization analysis and the like.

Disclosure of Invention

In order to solve the technical problems, the invention provides a basketball motion analysis method based on 3D gesture estimation, which predicts the full 3D human gesture from RGBD input based on a computer vision method.

According to the basketball motion analysis method based on 3D gesture estimation, the human gesture is estimated from RGBD input, the human motion is identified, and the personnel is tracked. Predicting the coordinates of key points of a 3D human body by collecting a depth map and a color map, and generating an action sequence according to the key points of the human body and the skeleton map as input and identifying the actions of a person by the idle action when in use; generating a human body detection frame according to human body key point coordinates, generating a human track sequence as input of a tracker, and analyzing a human motion track, wherein the method specifically comprises the following steps of:

s1, RGBD multi-view attitude estimation based on voxel network:

estimating human keypoints w= (w 1,.. wJ) from the depth map and color en-route, J keypoint coordinates for use in the real world; and in the coordinate system is the same as the color sensor frame.

Converting a depth map to a color map using camera calibration, and manipulating the warped depth map D ε R ^N×M ；

Color key point detector: the keypoint detector is applied to the color image I, which generates a score map s _2D ∈R ^N×M×J The position likelihood of the keypoints; the largest mapping S2D of scores corresponds to the predicted keypoint locationsThe weights were fixed using Open Pose Library;

voxel pose network: voxel grid V e R for a given deformed depth map D ^K×K×K Calculation using k=64; for this purpose, the depth map D is converted into 3D world coordinates of the point cloud and the computing grid centerFrom D->Calculation in the neighborhood +.>As a predictive 2D "neck" key point +.>

Wherein K represents an intrinsic calibration matrix camera andis a homogeneous coordinate; selecting dr verification from depth map>Surrounding neighbors three effective depth values; calculating V by setting the element as 1, wherein at least one point exists in the point cloud, and the other positions are replaced by 0; selectingThe resolution of the selected voxel grid is about 3 cm;

voxel pose network obtains V and a series of score maps s _2D As input, and processed using 3D convolution of some columns; will s _2D Tiling along the Z axis; voxel pose network estimation score map s _3D ∈R ^K×K×K×J As its corresponding 2D is the same;

w _VPN predicting through a voxel gesture network; using w _VPN Z component of (2) and predicted 2D keypoint p _2D Calculate another set of world coordinates w _projected The method comprises the steps of carrying out a first treatment on the surface of the The precision of these coordinates in the x and y directions is not limited by the choice of K; confidence of 2D network prediction (S2D score on p) is derived from w _projected And w _VPN The final prediction w is selected;

s2, an action recognition algorithm based on space-time double flow:

based on the RNN and the CNN, the self-adaptive neural network is checked and named as VA-RNN and VA-CNN; transforming, at the VA-RNN, the skeleton to a new representation by a suitable view point under the RNN-based view adaptation subnet, and a skeleton transformed by the primary LSTM network from the identified actions; VA-CNN is adapted to a sub-network based on CNN view and a main convolution network; each network is trained end-to-end by optimizing classification performance;

s3, tracking multiple people:

a traditional single hypothesis tracking method adopting recursive Kalman filtering and frame-to-frame data association is adopted;

tracking processing and kalman filtering framework: the camera is not calibrated and no self-motion information is available;

the characteristic pyramid network is adopted: the feature pyramid network predicts from multiple scales; the input video is firstly subjected to downsampling processing with three scales of 1/32,1/16 and 1/8; upsampling with the minimum-sized feature map and merging from the second, skipped connected, minimum-sized feature map, as well as other scales; mapping the prediction head to three scales in the fusion characteristic; one prediction head comprises a plurality of stacked convolution layers and outputs a compact (6A+D) xHxW prediction graph; the dense prediction graph is divided into three parts: 1) The size of the box classification result is 2AxHxW; 2) Regression coefficient of box 4AxHxW; 3) The size of the dense embedded graph is DxHxW;

s4, track detection:

analyzing the action, the movement track and the attack and defense countermeasures of the player;

s5, calibrating the court video shot by the wide-angle camera to form a regular court;

s6, player trajectory analysis:

estimating the gesture of the player by adopting a body gesture estimation algorithm, judging the action of the player according to the gesture information and the time sequence, and automatically marking the position of the player doing the corresponding action after the action is identified; in the aspect of the track of the player, real-time tracking of the player is realized by combining the gesture information of the player through a multi-person tracking algorithm; and calculating the movement acceleration of the player according to the running track of the player.

S7, generation of competition decision

Basketball tactics decisions can be abstracted into multi-step decision problems in reinforcement learning, and the search space is huge. Under the limited condition of the current competition field command decision training data, the applicability of the strategy learned by direct imitation learning is not strong. The reason for this is that the sampled decision training sample trajectory has limited ability to generalize the strategy function learned by supervised learning because it is not possible to include all state spaces. Although increasing training time and computing power can compensate for this deficiency to some extent, the problem of weak predictive and generalization capability cannot be fundamentally ameliorated. The deep reinforcement learning algorithm has stronger prediction and generalization capability compared with a supervised learning method because the deep reinforcement learning algorithm can realize end-to-end self-learning from perception to decision control based on a deep neural network.

In practical multi-step reinforcement learning, it is quite difficult to design the return function, but the back-derivation of the return function from the example data provided by the expert helps solve the problem, which is the idea of reverse reinforcement learning (Inverse Reinforcement Learning, IRL). Reverse reinforcement learning is considered as an important means for enhancing learning acceleration. The tactical intelligent decision process based on reverse reinforcement learning is known to be a state space S and an action space A, and a relatively rich actual combat command decision example data set is accumulated, so that a deep neural network with multiple hidden layers can be trained to 'fit' a return function by utilizing a technical solution combining deep learning and reverse reinforcement learning, and the decision example data is optimally distributed in the return function environment; then, the optimal COA under the specific index can be solved by utilizing the return function; finally, aiming at different tasks, the tactical decision agility evaluation index is combined with the tactical decision agility evaluation index to form a reasonable tactical scheme. The nonlinear "return function" based on the deep neural network can be considered as an empirical determination of the real-time situation by the coach.

Abstracting the scene into one MDP. The player information to be trained is taken as an intelligent agent, the competition faced by the player is taken as an environment, and the character acquires the current state and rewards from the competition and makes corresponding actions. The state is used for representing the characteristic information of the current role, can be directly the original information of the current game, and can be used for extracting the characteristics by coding, or can be semantic characteristics extracted through the history game and training state of the player. After abstracting the game into one MDP, the players facing the game fight decision-assessment network.

The reinforcement learning method is used to solve tactical decision problems, and can be considered as a multi-step reinforcement learning process on continuous state space and discrete action space, and is generally described by Markov decision process (Markov Decision Process, MDP). The Markov decision process is a random dynamic system based on Markov assumptions, and can be expressed by four tuples (S, A, P, R), wherein S represents a state set (States); a represents an Action set (Action); p (s|s, a) represents the probability of transitioning to the s-state after action a is taken in state s; r (s, a) represents the cumulative return obtained by taking action a in state s; r (s, a) represents the immediate return obtained by taking action a in state s.

The goal of the MDP learning task is to find an optimal strategy to maximize the cumulative return. In deduction of tactical action scheme, the interactive process of action entity Agent and competition field environment, at each time step, agent obtains state st by observing environment, then executing a certain action at, and environment generates st+1 and rt of next step according to at. The task goal of reinforcement learning is to find the optimal strategy pi x (a|s) (state-to-action mapping) pi x (a|s) =p (at=a|st=s) in a given MDP-based tactical decision process. The optimality herein refers to the fact that the Agent has the greatest cumulative return on a tactical decision trajectory.

The invention relates to a basketball motion analysis method based on 3D gesture estimation, which further comprises the following steps:

for S2, scores for RNN and CNN may also be fused to provide a thick prediction, denoted as VA-fusion scheme.

s6, combining with the gesture information of the player, realizing real-time tracking of the player, drawing the motion trail of the player on a corrected court model according to a tracking sequence, intuitively displaying the running trail of the player and a concentrated area of shooting, and analyzing the shooting hit rate and the optimal shooting position of the player in combination with the scoring condition;

the beneficial effects of the invention are as follows: the full three-dimensional human body pose is predicted from multi-view RGBD input. It is superior to the existing reference method. The method predicts the pose of a person first in a 2D presented color image. The depth network requires the 2D pose and depth map as inputs and estimates the full three-dimensional pose from this information. Based on the spatial position of the three-dimensional human body gesture and the action recognition result, and combining with historical data such as player scores, team scores and the like, a reinforcement learning method is adopted to generate a training competition tactical countermeasure auxiliary decision result.

Drawings

FIG. 1 is a schematic diagram of an architecture of the present invention;

FIG. 2 is a schematic illustration of the game pose estimation results of a player's Bitummy Blender;

FIG. 3 is a three-dimensional human body posture result diagram of the algorithm in actual calculation;

FIG. 4 is a schematic representation of the full-field motion profile of a player Tim Duncan (Tim Duncan);

FIG. 5 is a schematic illustration of the full-field motion profile of Sabloc France (Thabo Sefolosha);

FIG. 6 is a schematic top view of player trajectory tracking;

FIG. 7 is a schematic illustration of the distance travelled by a part of the player in 2019 CBA overall resolution;

FIG. 8 is a schematic view of the scatter points of a basketball shot;

FIG. 9 is a schematic illustration of a heat map of a basketball shot;

FIG. 10 is a sectional view of a basketball shot;

FIG. 11 is a schematic diagram of a multi-agent reinforcement learning multi-antibody policy framework

FIG. 12 is a schematic diagram of generating tactical decisions

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

Example 1

The object of the invention herein is to estimate three-dimensional human body gestures from RGBD inputs of a plurality of cameras, recognize human body actions and track persons. Predicting the coordinates of key points of a 3D human body by collecting a depth map and a color map, and generating an action sequence according to the key points of the human body and the skeleton map as input and identifying the actions of a person by the idle action when in use; generating a human body detection frame according to the human body key point coordinates, taking the detection frame as the input of a tracker, generating a human track sequence, and analyzing the human motion track; as in fig. 1.

1 human body pose estimation

The object of the present invention is to estimate human keypoints w= (w 1,.. wJ) from the depth map and color en-route, J keypoint coordinates for use in the real world. In order not to lose generality, in the coordinate system defined by the present invention, the predictions of the present invention are the same as the color sensor frame.

For Kinect, color and depthThe sensors are closely located but they are still two different cameras. The method of the invention requires collocating two frames of information. Thus, the present invention uses a camera calibration to convert the depth map to a color map. As a result, the method of the present invention operates on a deformed depth map D εR ^N×M . Due to the effect of occlusion, the depth map D is sparse in terms of resolution and noise differences.

1) Color key point detector: the keypoint detector is applied to the color image I, which generates a score map s _1D ∈R ^N×M×J The location likelihood of the keypoints. The largest mapping S2D of scores corresponds to the predicted keypoint locationsThe present invention uses Open Pose Library fixed weights, other 2D pose estimation methods such as those used in Alpha Pose Library may also be employed herein.

2) Voxel pose network: voxel grid V e R for a given deformed depth map D ^K×K×K Calculated using k=64. For this purpose, the depth map D is converted into a point cloud and the 3D world coordinates of the grid center are calculated according to the inventionThe invention is->In-neighbor calculationAs a predictive 2D "neck" key point +.>

Wherein K represents an intrinsic calibration matrix camera andare homogeneous coordinates. The invention selects dr from depth map for verificationSurrounding neighbors have three valid depth values. According to the invention, V is calculated by setting the element as 1, at least one point exists in the point cloud, and other positions are replaced by 0. The resolution of the voxel grid selected by the present invention is about 3 cm.

Voxel pose network obtains V and a series of score maps s _2D As input and process them using 3D convolution of some columns. The invention will s _2D Tiling along the Z-axis corresponds to an orthogonal projection approximation. Voxel pose network estimation score map s _3D ∈R ^K×K×K×J In a manner similar to the likelihood of a keypoint as its corresponding 2D identity

The invention uses the following tests to complete the final prediction of the invention: in one aspect, w _VPN And predicting through the voxel gesture network. In another aspect, the present invention uses w _VPN Z component of (2) and predicted 2D keypoint p _2D Calculate another set of world coordinates w _projected . The accuracy in the x and y directions for these coordinates is not limited by the choice of K. The invention is based on confidence of 2D network prediction (score of S2D on p) from w _projected And w _VPN The final prediction w is selected.

The network architecture of fig. 2, 3, using VoxelPoseNet, has its encoder and decoder architecture motivated by the U-net using dense block encoders. In decoding the full resolution score map, the present invention includes a plurality of intermediate losses resulting from S3D.

2 motion recognition

The invention designs two kinds of view self-adaptive neural networks based on RNN and CNN, named VA-RNN and VA-CNN. As shown in fig. 3, at VA-RNN (as shown at the top), the skeleton is transformed to a new representation by a suitable view point under the RNN-based view-adaptation subnet, and the primary LSTM network is used to transform the skeleton from the identified actions. VA-CNN (as shown at the bottom) is fitted with a sub-network based on CNN views, as well as a main convolutional network (ConvNet). Each network is end-to-end trained by optimizing classification performance. Alternatively, the present invention may fuse scores from both networks to provide a thick prediction, denoted as VA-fusion scheme.

3 Multi-person tracking

The present invention employs a conventional single hypothesis tracking method of recursive kalman filtering and frame-to-frame data correlation. In the following sections, the core of the system will be described in more detail.

Tracking processing and kalman filtering frameworks are general techniques. The present invention assumes a very popular tracking scenario where the camera is not calibrated and no self-motion information is available.

The present invention employs a Feature Pyramid Network (FPN). FPN predicts from multiple scales, thus improving performance in multi-target pedestrian detection. The neural network used in JDE is briefly described in figure two. The input video is first downsampled by three scales 1/32,1/16, 1/8. Then, upsampling with the smallest-sized (semantically strongest) feature map and merging from the second, connected-skipped, smallest-sized feature map, as well as other scales. Finally, the prediction header maps to three scales at the fusion feature. One prediction head consists of several stacked convolutional layers and an output dense size (6A+D) xHxW prediction graph, where A is the number of anchor templates allocated to this scale and D is the embedded size. The dense prediction graph is divided into three parts (tasks):

1) Size of box classification result 2AxHxW

2) Regression coefficient of box 4AxHxW

3) The size of the dense embedded graph is DxHxW

4 track detection

Through the action recognition and track tracking of the player, the invention constructs a set of monitoring system for player tracking and analysis. In this set of systems, the invention will analyze the player's actions, trajectory, and attack and defense countermeasures.

4.1 course correction

Because the camera shoots basketball games, the basketball games are not directly shot from the upper part of the basketball games, and the basketball games cannot be directly shaped. Meanwhile, in order to intuitively show the movement track of the player, a regular court model is required. At this time, it is necessary to calibrate the course video shot by the wide-angle camera to form a regular course.

4.2 player trajectory analysis

Trajectory analysis

Currently, track analysis of a player is finished by manually selecting videos collected by a camera, for example: the SportVU is software, a multi-angle multi-camera dynamic tracking technology is used for capturing the movement condition of a player in a court, 25 pictures are captured every second, and the time is recorded and automatically analyzed by a computer. Through processing, the data collected by the camera provides a very rich statistical database for analysis of the elements such as the movement speed, the movement distance, the distance between players, the ball control and the like. However, it is basically implemented by means of data experts in screening data and analyzing data. Although most specialists initially find the most objective data method, after all, the specialists are also people, and the preference is probably unavoidable. There are also different specialists who have their own ways to understand basketball, exaggerate the importance of certain data, and misuse certain irrelevant data, all with the possibility of drawing erroneous conclusions. Meanwhile, the 5 positions of the NBA are not divided into details, and the most objective conclusion cannot be obtained on some specific events.

5 tactical decisions

Abstracting the scene into one MDP. As shown in FIG. 12, we use the information of the player to be trained as an agent, the game to which the player faces as an environment, and the character obtains the current state and rewards from the game and makes corresponding actions. The state is used for representing the characteristic information of the current role, can be directly the original information of the current game, and can be used for extracting the characteristics by coding, or can be semantic characteristics extracted through the history game and training state of the player. After abstracting the game into one MDP, the players facing the game fight decision-assessment network.

The system adopts an advanced human body posture estimation algorithm to estimate the posture of the player, and judges the actions of the player according to the posture information and the time sequence, such as: shooting, passing, and the like, and automatically marking the positions of the players doing corresponding actions after action recognition. In the aspect of the track of the player, real-time tracking of the player is realized by combining a multi-person tracking algorithm with the gesture information of the player, the motion track of the player is drawn on a corrected court model according to a tracking sequence, the running track of the player and the concentrated area of shooting are intuitively displayed, and the shooting hit rate and the optimal shooting position of the player are analyzed by combining the scoring condition. And calculating the movement acceleration of the player according to the running track of the player.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present invention, and these modifications and variations should also be regarded as the scope of the invention.

Claims

1. A basketball motion analysis method based on 3D pose estimation, comprising: estimating human body gestures from one or more RGBD inputs, identifying human body actions and tracking personnel, analyzing player action patterns, and providing training basis and countermeasure analysis; predicting the coordinates of key points of a 3D human body by collecting a depth map and a color map, and generating an action sequence according to the key points of the human body and the skeleton map as input and identifying the actions of a person by the idle action when in use; generating a human body detection frame according to human body key point coordinates, taking the human body detection frame as input of a tracker, generating a human track sequence, analyzing a human motion track, making and optimizing an action scheme, generating and evaluating a plurality of action sequences to form a competition command decision, and specifically comprising the following steps:

s1, RGBD multi-view attitude estimation based on voxel network:

estimating human keypoints w= (w 1,.. wJ) from the depth map and the color map, J keypoint coordinates for use in the real world; and in the coordinate system, is identical to the color sensor frame;

Wherein K represents an intrinsic calibration matrix camera andis a homogeneous coordinate; selecting dr verification from depth map>Surrounding neighbors three effective depth values; calculating V by setting the element as 1, wherein at least one point exists in the point cloud, and the other positions are replaced by 0; the resolution of the voxel grid selected is 3 cm;

voxel pose network obtains V and a series of score maps s _2D As input, and processed using a series of 3D convolutions; will s _2D Tiling along the Z axis; voxel pose network estimation score map s _3D ∈R ^K×K×K×J As its corresponding 2D is the same;

w _VPN predicting through a voxel gesture network; using w _VPN Z component of (2) and predicted 2D keypoint p _2D Calculate another set of world coordinates w _projected The method comprises the steps of carrying out a first treatment on the surface of the The precision of these coordinates in the x and y directions is not limited by the choice of K; based on the confidence of 2D network prediction, i.e., the score of S2D on p, fromw _projected And w _VPN The final prediction w is selected;

s2, an action recognition algorithm based on space-time double flow:

s3, tracking multiple people:

s4, track detection:

s6, player trajectory analysis:

estimating the gesture of the player by adopting a body gesture estimation algorithm, judging the action of the player according to the gesture information and the time sequence, and automatically marking the position of the player doing the corresponding action after the action is identified; in the aspect of the track of the player, real-time tracking of the player is realized by combining the gesture information of the player through a multi-person tracking algorithm; calculating the movement acceleration of the player according to the running track of the player;

s7, generation of competition command decisions

In competition countermeasure scheme deduction, based on an intelligent agent of a deep reinforcement learning algorithm, a deep neural network is continuously updated in continuous interaction with a competition field environment, learning experience is accumulated to guide continuous behavior selection, and finally an optimal action sequence meeting the requirement of a coach is generated.

2. The basketball motion analysis method based on 3D pose estimation of claim 1, further comprising:

3. A basketball movement analysis method based on 3D pose estimation according to claim 1, wherein the intelligent agent based on the deep reinforcement learning algorithm continuously updates its deep neural network in continuous interaction with the course environment, accumulates learning experience to guide its continuous behavior selection, and finally generates an optimal action sequence meeting the requirement of a coach, and further comprising:

and S7, generating a human body detection frame according to human body key point coordinates of the competition histories of the two parties of the competition, taking the human body detection frame as input of a tracker, generating a human track sequence, analyzing the human motion track, making and optimizing an action scheme, generating and evaluating processes of a plurality of action sequences, and forming a competition command decision.

4. The basketball motion analysis method based on 3D pose estimation according to claim 2, further comprising:

and S6, combining with the gesture information of the player, realizing real-time tracking of the player, drawing the motion trail of the player on a corrected court model according to a tracking sequence, intuitively displaying the running trail of the player and the concentrated area of shooting, and analyzing the shooting hit rate and the optimal shooting position of the player in combination with the scoring condition.