Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
Unless defined otherwise, all technical and scientific terms used in the embodiments of the application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.
In describing embodiments of the present application, unless otherwise indicated and limited thereto, the term "connected" should be construed broadly, for example, it may be an electrical connection, or may be a communication between two elements, or may be a direct connection, or may be an indirect connection via an intermediate medium, and it will be understood by those skilled in the art that the specific meaning of the term may be interpreted according to circumstances.
It should be noted that, the term "first\second\third" related to the embodiment of the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, it is to be understood that "first\second\third" may interchange a specific order or sequence where allowed. It is to be understood that the "first\second\third" distinguishing objects may be interchanged where appropriate such that embodiments of the application described herein may be practiced in sequences other than those illustrated or described herein.
Accurate action assessment refers to objective, comprehensive and accurate assessment of the posture, skill and manner of execution of a trainer while performing various exercises and actions. Through action evaluation, a trainer and a coach can know action quality, advantages and disadvantages of the trainer, and accordingly, a corresponding training plan and a guiding method are formulated. The motion assessment can be applied to various sports and training fields including sports, fitness training, rehabilitation therapy, and the like.
The action evaluation can help the trainer and the coach to know how the quality of the action meets the correct technical requirements, and by evaluating the performances of the action in aspects of posture, stability, strength output and the like, whether the action is correctly executed can be determined, and the wrong action execution mode can be timely found and corrected.
The action evaluation can help the trainer to know the advantages and the disadvantages of the trainer in different actions, and through evaluating various aspects of the actions, such as flexibility, strength, coordination and the like, the trainer can be found to perform excellently in some aspects, and the trainer can have the disadvantages in other aspects, thereby being beneficial to formulating a targeted training plan, further developing and improving the advantages of the trainer and improving the disadvantages.
Accurate motion assessment can help trainers and coaches find potential athletic risks and bad motion habits. By evaluating performance in terms of stability of motion, joint angle, body posture, etc., problems that may lead to injury can be identified and measures can be taken in time to make adjustments and improvements to reduce the risk of injury.
The action evaluation can provide basis for the coach to develop a personalized training plan. By knowing the action capability and characteristics of the trainers, the trainer can formulate corresponding training plans and guiding methods according to the requirements and levels of different trainers so as to achieve the optimal training effect.
Conventional action assessment methods generally require manual participation and mainly rely on observation and judgment by trainers and coaches, and have certain limitations, including time and effort consumption, high subjectivity, influence by experience and ability level of the assessors and the assessors, and the like. Observational assessment is one of the most common methods of action assessment, whereby a coach or evaluator assesses his quality and skill by directly observing the actions performed by a trainer. The observer needs to pay attention to the performance of the trainer in terms of posture, fluency of movement, strength output, etc., and evaluate the accuracy and effect of the movement according to experience and expertise. However, this method is susceptible to subjective factors, and different evaluators may have different observation and judgment criteria, resulting in inconsistent evaluation results.
Video analysis this method records the progress of a trainer's performance of an action using a camera device, and then evaluates the quality of the action by playing back and analyzing the video. Video analysis may provide more detail and angle to enable an evaluator to more carefully observe and analyze the performance of actions. However, this approach still requires subjective judgment by an evaluator, and may take a lot of time and effort to analyze and compare video data.
Some conventional motion assessment methods use sensor devices to measure and record motion data of a trainer, such as joint angles, force outputs, etc., which may be Inertial Measurement Units (IMUs), pressure sensors, resistance sensors, etc. By collecting and analyzing sensor data, an evaluator may obtain more objective action evaluation results. However, this approach requires specialized equipment and technical support, and in practice there may be some limitations such as accuracy and wearability of the sensor.
Computer vision is a discipline that studies how machines can understand and interpret content in images or videos, combining techniques and methods in many areas, such as image processing, pattern recognition, machine learning, and artificial intelligence, to enable machines to extract meaningful high-level semantic information from visual data. The development and application of computer vision technology has important roles in many areas, including but not limited to the following: computer vision can help machines automatically detect and identify specific target objects, such as faces, vehicles, objects, etc., in images or videos, which have wide application in the fields of face recognition, intelligent monitoring, automatic driving, etc. Computer vision can divide an image into different regions and semantically analyze the regions to understand different parts of the image and their meanings, which is very important for tasks such as medical image analysis, image understanding, scene understanding, and the like.
The computer vision can help the machine to recognize and analyze the gesture and the motion of the human body, thereby realizing motion evaluation and motion recognition, and having potential application value in the fields of physical training, body-building guidance, sports medicine and the like. Computer vision can use generative models and enhancement techniques to generate realistic images, or to enhance and repair existing images, making sense for applications such as image synthesis, virtual reality, and image enhancement.
The development of computer vision technology provides a new idea for constructing training, assessment and evaluation methods. By using the computer vision technology, the image or video data can be compared with the previously established model, and the key characteristics of the action can be automatically extracted and analyzed, so that the accurate assessment of the action quality is realized.
In one embodiment of the present invention, fig. 1 is a flowchart of an intelligent training and checking method based on computer vision provided in the embodiment of the present invention. Fig. 2 is a schematic diagram of a system architecture of an intelligent training and checking method based on computer vision according to an embodiment of the present invention. As shown in fig. 1 and 2, the intelligent training and checking method based on computer vision according to the embodiment of the invention includes: 110, acquiring a training action state monitoring image of a first training action of an object to be checked, which is acquired by a camera; 120, obtaining a reference motion image of the first training motion; 130, extracting action semantic features of the training action state monitoring image and the reference action image to obtain a training action semantic feature image and a reference action semantic feature image; 140, constructing semantic comparison features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic comparison feature vector; and 150, determining whether the first training action of the object to be checked is standard or not based on the global semantic comparison feature vector.
In the step 110, a training motion state monitoring image of a first training motion of the object to be examined acquired by the camera is acquired. The camera can accurately acquire the image data of the object to be examined for executing the first training action, and the definition and the stability of the image are maintained. By acquiring the training action state monitoring image, the visual performance of the object to be checked when the action is executed can be obtained, and image data is provided for subsequent action evaluation.
In the step 120, a reference motion image of the first training motion is acquired. When the reference motion image is acquired, a professional motion presenter or a pre-recorded standard motion video can be used as a reference. The reference action image provides canonical action execution demonstration and can be used as a standard of an evaluation object for comparing and evaluating actions of the object to be checked.
In the step 130, the motion semantic features of the training motion state monitoring image and the reference motion image are extracted to obtain a training motion semantic feature map and a reference motion semantic feature map. Extraction of motion semantic features may use computer vision techniques, such as deep learning models or feature extraction algorithms, to capture and represent key features of a motion from an image. By extracting action semantic features of training actions and reference actions, image data can be converted into feature representations with more expressive and comparable properties, and a basis is provided for subsequent action evaluation.
In the step 140, semantic contrast features between the training action semantic feature map and the reference action semantic feature map are constructed to obtain a global semantic contrast feature vector. The construction of semantic contrast features may use feature matching, similarity calculation, or other comparison methods to measure similarity or variability between training actions and reference actions. By constructing the global semantic comparison feature vector, the degree of difference between the training action and the reference action can be quantified, and a basis is provided for subsequent action normalization evaluation.
In the step 150, it is determined whether the first training action of the object to be examined is normalized based on the global semantic comparison feature vector. And judging whether the first training action of the object to be checked meets the specification or not by utilizing the global semantic comparison feature vector according to the specific evaluation standard and the threshold value. Based on the evaluation result of the global semantic comparison feature vector, whether the first training action of the object to be checked is standard or not can be rapidly and objectively determined, and feedback and guidance are provided for subsequent training and adjustment.
Aiming at the technical problems, the technical conception of the application is to analyze and compare the training action state monitoring image of the object to be checked with the reference action image corresponding to the training action by utilizing the computer vision technology and the intelligent algorithm, thereby realizing the identification of the action type and the state of the trainer and intelligently judging whether the object to be checked is standard or not according to the current training action.
Based on the above, in the technical scheme of the application, firstly, a training action state monitoring image of a first training action of an object to be checked, which is acquired by a camera, is acquired; and acquiring a reference motion image of the first training motion. The training action state monitoring image of the first training action of the object to be checked can reflect the action process and gesture information actually executed by the object to be checked. The reference motion image of the first training motion is a standard and normative reference motion used for helping the model learn the gesture and the execution mode of the standard motion.
And then, the training action state monitoring image and the reference action image are passed through a double-coupling twin detection model comprising a first-stage action feature extractor and a second action feature extractor to obtain a training action semantic feature image and a reference action semantic feature image, wherein the first-stage action feature extractor and the second action feature extractor have the same network structure. That is, the motion semantic features in the training motion state monitoring image and the reference motion image are extracted using the dual-coupled twinning detection model including the first stage motion feature extractor and the second stage motion feature extractor. These action semantic features may include pose information, key point locations, motion trajectories, etc. important feature information for describing and representing actions.
In a specific embodiment of the present application, extracting the motion semantic features of the training motion state monitoring image and the reference motion image to obtain a training motion semantic feature map and a reference motion semantic feature map includes: and the training action state monitoring image and the reference action image are subjected to a double-coupling twin detection model comprising a first-stage action feature extractor and a second-stage action feature extractor to obtain the training action semantic feature map and the reference action semantic feature map.
In particular, in the dual-coupled twin detection model, the first-stage motion feature extractor and the second-stage motion feature extractor have the same network structure, and process training motion state monitoring images and reference motion images, respectively. The design mode can ensure that the training action and the reference action have the same processing procedure in the characteristic extraction stage, so that the extracted characteristics have comparability and consistency, and action semantic interference information caused by the difference of the models is avoided.
In one embodiment of the present application, constructing semantic contrast features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic contrast feature vector includes: performing feature distribution optimization on the training action semantic feature map and the reference action semantic feature map to obtain an optimized training action semantic feature map and an optimized reference action semantic feature map; the optimized training action semantic feature map and the optimized reference action semantic feature map are respectively passed through a spatial attention layer to obtain a spatial visualization training action semantic feature map and a spatial visualization reference action semantic feature map; and calculating local feature metric coefficients between each group of corresponding feature matrices along the channel dimension between the spatial visualization training action semantic feature map and the spatial visualization reference action semantic feature map to obtain a global semantic comparison feature vector composed of a plurality of local feature metric coefficients.
And then, respectively passing the optimized training action semantic feature map and the optimized reference action semantic feature map through a spatial attention layer to obtain a spatial visualization training action semantic feature map and a spatial visualization reference action semantic feature map. Here, the feature representation capability of different regions in the image can be enhanced by the operation of the spatial attention layer. For training and reference actions, some regions may be of particular importance to the performance and effect of the action, such as the hands, legs, etc. By means of the spatial attention layer, the characterizability of these areas can be improved, so that the training actions and the reference actions have a better differentiation over the critical areas.
In a specific embodiment of the present application, the step of passing the optimized training action semantic feature map and the optimized reference action semantic feature map through a spatial attention layer to obtain a spatial visualization training action semantic feature map and a spatial visualization reference action semantic feature map, includes: performing depth convolution coding on the optimized training action semantic feature map and the optimized reference action semantic feature map by using a convolution coding part of the spatial attention layer to obtain an optimized training action convolution feature map and an optimized reference action convolution feature map; inputting the optimized training action convolution feature diagram and the optimized reference action convolution feature diagram into a spatial attention portion of the spatial attention layer to obtain an optimized training action spatial attention diagram and an optimized reference action spatial attention diagram; the optimized training action space attention force diagram and the optimized reference action space attention force diagram are activated through a Softmax activation function to obtain an optimized training action space attention feature diagram and an optimized reference action space attention feature diagram; calculating the position-based point multiplication of the optimized training action space attention feature map and the optimized training action convolution feature map to obtain the space visualization training action semantic feature map; and calculating the position-wise point multiplication of the optimized reference motion space attention feature map and the optimized reference motion convolution feature map to obtain the space visualization reference motion semantic feature map.
Further, local feature metric coefficients between the spatial visualization training action semantic feature graphs and the spatial visualization reference action semantic feature graphs along the channel dimension corresponding to each set of feature matrices are calculated to obtain global semantic comparison feature vectors composed of a plurality of local feature metric coefficients. Here, by calculating the local feature metric coefficient between each group of corresponding feature matrices along the channel dimension between the spatial visualization training motion semantic feature map and the spatial visualization reference motion semantic feature map, the difference between the two feature matrices can be measured, so as to capture the subtle difference between the training motion and the reference motion of the object to be examined.
In a specific example of the present application, the implementation manner of calculating the local feature metric coefficients between the spatial visualization training motion semantic feature map and the sets of corresponding feature matrices along the channel dimension between the spatial visualization reference motion semantic feature map to obtain the global semantic contrast feature vector composed of the plurality of local feature metric coefficients is to calculate the local feature metric coefficients between the sets of corresponding feature matrices along the channel dimension between the spatial visualization training motion semantic feature map and the spatial visualization reference motion semantic feature map with the following local feature metric formula to obtain the global semantic contrast feature vector composed of the plurality of local feature metric coefficients; the local characteristic measurement formula is as follows: Wherein, Training the feature matrix of the action semantic feature graph for the spatial visualizationThe characteristic value of the location is used to determine,In the feature matrix of the semantic feature map of the reference action for the spatial visualizationThe characteristic value of the location is used to determine,AndFor the height and width of the feature matrix,Is the firstThe number of local feature metric coefficients,A logarithmic function operation with a base of 2 is represented.
Here, the cross difference of the corresponding feature matrices of each group along the channel dimension between the spatial visualization training action semantic feature map and the spatial visualization reference action semantic feature map is represented by a local feature metric coefficient, namely, the difference of the feature matrix of the spatial visualization training action semantic feature map relative to the feature matrix of the spatial visualization reference action semantic feature map, and the difference of the feature matrix of the spatial visualization reference action semantic feature map relative to the feature matrix of the spatial visualization training action semantic feature map. The cross difference can extract the difference of the training motion state monitoring image of the first training motion of the object to be checked, which is acquired by the camera, and the reference motion image of the first training motion in the feature space, so that the high-dimensional implicit difference feature distribution between the training motion semantic feature distribution and the reference motion semantic feature distribution is represented.
And then, the global semantic comparison feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether a first training action of the object to be checked is standard or not.
In a specific embodiment of the present application, determining whether the first training action of the object to be examined is normalized based on the global semantic comparison feature vector includes: and the global semantic comparison feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first training action of the object to be checked is standard or not.
Specifically, the global semantic comparison feature vector is passed through a classifier to obtain a classification result, where the classification result is used to represent whether the first training action of the object to be examined is normalized, and the method includes: performing full-connection coding on the global semantic comparison feature vector by using a plurality of full-connection layers of the classifier to obtain a coding classification feature vector; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
In the above technical solution, the training motion semantic feature map and the reference motion semantic feature map respectively express image semantic features of the training motion state monitoring image and the reference motion image, that is, they follow spatial distribution of image semantic features in feature matrix dimension and follow channel distribution of convolutional neural network of motion feature extractor in channel dimension, so that the spatial information attribute corresponding to spatial distribution of image semantic features of the training motion semantic feature map and the reference motion semantic feature map in overall distribution dimension is more remarkable when the training motion semantic feature map and the reference motion semantic feature map are respectively passed through spatial attention layer by strengthening local spatial distribution of image semantic features. In this way, if the spatial information expression effect of the training action semantic feature map and the reference action semantic feature map serving as high-dimensional features can be improved, the expression effect of the training action semantic feature map and the reference action semantic feature map can be improved, and the expression effect of the global semantic contrast feature vector can be further improved.
Based on this, the applicant of the present application optimizes the training action semantic feature map and the reference action semantic feature map, expressed as: performing feature distribution optimization on the training action semantic feature map and the reference action semantic feature map by using the following optimization formula to obtain an optimized training action semantic feature map and an optimized reference action semantic feature map; wherein, the optimization formula is: Wherein, Is the first of the training action semantic feature diagram and the reference action semantic feature diagramThe characteristic value of the location is used to determine,For the local spatial partition coefficient(s),Is the scale of the local neighborhood and,Is the first of the training action semantic feature diagram and the reference action semantic feature diagramThe characteristic value of the location is used to determine,Is the first of the optimized training action semantic feature diagram and the optimized reference action semantic feature diagramCharacteristic values of the location.
Specifically, with each feature mapFor each feature map, the local segmentation space in the expanded Hilbert space is used as a referenceLocal integration of the curved surface is performed on the feature manifold in the high-dimensional feature space, so that each feature map is corrected based on the local integration processing of the integration functionPhase transition discontinuity points of the feature manifold expressed by the non-stationary data sequence after local spatial expansion, thereby obtaining finer structure and geometric features of the feature manifold, and improving each feature mapAnd the spatial information expression effect in the high-dimensional feature space is improved, so that the expression effect of the global semantic comparison feature vector is improved, and the accuracy of a classification result obtained by the classifier is improved.
In summary, the computer vision-based intelligent training and checking method is explained, which utilizes a computer vision technology and an intelligent algorithm to analyze and compare a training action state monitoring image of an object to be checked with a reference action image of a corresponding training action, thereby realizing the identification of action types and states of a trainer and intelligently judging whether the current training action of the object to be checked is standard or not.
FIG. 3 is a block diagram of an intelligent training and assessment system based on computer vision according to an embodiment of the present invention. As shown in fig. 3, the intelligent training and checking system 200 based on computer vision includes: a training motion state monitoring image acquisition module 210, configured to acquire a training motion state monitoring image of a first training motion of an object to be examined, which is acquired by a camera; a reference motion image obtaining module 220, configured to obtain a reference motion image of the first training motion; the action semantic feature extraction module 230 is configured to extract action semantic features of the training action state monitoring image and the reference action image to obtain a training action semantic feature map and a reference action semantic feature map; the semantic comparison feature construction module 240 is configured to construct semantic comparison features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic comparison feature vector; and a first training action judging module 250 for determining whether the first training action of the object to be checked is standard or not based on the global semantic comparison feature vector.
In the computer vision-based intelligent training and checking system, the action semantic feature extraction module is used for: and the training action state monitoring image and the reference action image are subjected to a double-coupling twin detection model comprising a first-stage action feature extractor and a second-stage action feature extractor to obtain the training action semantic feature map and the reference action semantic feature map.
It will be appreciated by those skilled in the art that the specific operation of the steps in the above-described computer vision-based intelligent training assessment system has been described in detail in the above description of the computer vision-based intelligent training assessment method with reference to fig. 1 to 2, and thus, repetitive descriptions thereof will be omitted.
As described above, the computer vision-based intelligent training assessment system 200 according to the embodiment of the present invention may be implemented in various terminal devices, for example, a server or the like for computer vision-based intelligent training assessment. In one example, the computer vision-based intelligent training assessment system 200 according to an embodiment of the present invention may be integrated into a terminal device as a software module and/or hardware module. For example, the computer vision-based intelligent training assessment system 200 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the computer vision-based intelligent training assessment system 200 can also be one of a number of hardware modules of the terminal device.
Alternatively, in another example, the computer vision-based intelligent training assessment system 200 and the terminal device may be separate devices, and the computer vision-based intelligent training assessment system 200 may be connected to the terminal device via a wired and/or wireless network and communicate the interaction information in accordance with the agreed data format.
Fig. 4 is an application scenario diagram of an intelligent training and checking method based on computer vision provided in an embodiment of the present invention. As shown in fig. 4, in the application scenario, first, a training motion state monitoring image of a first training motion of an object to be examined acquired by a camera is acquired (e.g., C1 as illustrated in fig. 4); and, acquiring a reference motion image (e.g., C2 as illustrated in fig. 4) of the first training motion; the acquired training motion state monitoring image and reference motion image are then input into a server (e.g., S as illustrated in fig. 4) deployed with a computer vision-based intelligent training assessment algorithm, wherein the server is capable of processing the training motion state monitoring image and the reference motion image based on the computer vision-based intelligent training assessment algorithm to determine whether the first training motion of the subject to be assessed is normative.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.