[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2024114500A1 - Human body pose recognition method and device - Google Patents

Human body pose recognition method and device Download PDF

Info

Publication number
WO2024114500A1
WO2024114500A1 PCT/CN2023/133598 CN2023133598W WO2024114500A1 WO 2024114500 A1 WO2024114500 A1 WO 2024114500A1 CN 2023133598 W CN2023133598 W CN 2023133598W WO 2024114500 A1 WO2024114500 A1 WO 2024114500A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
frame
network
target
preset
Prior art date
Application number
PCT/CN2023/133598
Other languages
French (fr)
Chinese (zh)
Inventor
殷政
张力文
栾元杰
金子杰
Original Assignee
天翼数字生活科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天翼数字生活科技有限公司 filed Critical 天翼数字生活科技有限公司
Publication of WO2024114500A1 publication Critical patent/WO2024114500A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of machine vision technology, and in particular to a method and device for human body posture recognition.
  • the human posture recognition task is mainly to obtain the coordinates of the joints of various parts of the human body from the video or picture captured by the camera.
  • Human posture recognition currently has many application fields, including virtual reality, human-computer interaction, sports training and analysis, abnormal behavior detection, etc.
  • the present application provides a method and device for human posture recognition, which are used to solve the technical problem that the prior art cannot take into account both high recognition accuracy and fast recognition speed at the same time.
  • the first aspect of the present application provides a human body posture recognition method, comprising:
  • a preset key point prediction network is used to predict key points of a human body in the target human body frame, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network;
  • the human body posture in the target person image frame is recognized according to the human body key points to obtain a human body posture recognition result.
  • the method combines a preset tracker and a human body detection network to perform human body detection on the target person image frame to obtain a target human body frame, wherein the preset tracker is a network for tracking the human body based on IOU and threshold, and also includes:
  • a preprocessing operation is performed on the character image frame to obtain a target character image frame, wherein the preprocessing includes clipping, mean reduction and normalization.
  • the preset tracker and the human body detection network are combined to perform human body detection on the target person image frame to obtain the target human body frame
  • the preset tracker is a network that tracks the human body based on IOU and threshold, including:
  • the preset tracker is used to determine whether the historical IOU of the previous frame is less than the threshold. If so, the human body detection network is used to perform human body detection on the target person image frame, obtain the target person frame, and update the current IOU;
  • the extended frame obtained by expanding the minimum bounding rectangle of the human body key points in the previous frame by a preset number of pixels is used as the target human body frame, and the current IOU is updated.
  • the preset key point prediction network is used to predict the key points of the human body in the target human body frame
  • the preset key point prediction network includes a reference posture estimation network and an interference part classification network, including:
  • a weighted calculation is performed based on the reference key points and the interfering vectors to obtain the key points of the human body.
  • the step of recognizing the human body posture in the target person image frame according to the human body key points to obtain the human body posture recognition result includes:
  • the human body posture in the target person image frame is recognized according to the human body skeleton diagram to obtain a human body posture recognition result.
  • a second aspect of the present application provides a human posture recognition device, comprising:
  • a human frame detection unit used to perform human body detection on a target person image frame in combination with a preset tracker and a human body detection network to obtain a target human body frame, wherein the preset tracker is a network that tracks a human body based on IOU and a threshold;
  • a key point prediction unit used to predict the key points of the human body in the target human body frame by using a preset key point prediction network, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network;
  • the human body posture recognition unit is used to recognize the human body posture in the target person image frame according to the human body key points to obtain a human body posture recognition result.
  • it also includes:
  • An image frame acquisition unit used to acquire multiple person image frames in the surveillance video
  • An image preprocessing unit is used to perform a preprocessing operation on the character image frame to obtain a target character
  • the image frame, the preprocessing includes clipping, mean subtraction and normalization.
  • it also includes:
  • An image frame acquisition unit used to acquire multiple person image frames in the surveillance video
  • the image preprocessing unit is used to perform a preprocessing operation on the character image frame to obtain a target character image frame, wherein the preprocessing includes clipping, mean reduction and normalization.
  • the human body frame detection unit comprises:
  • the first judgment subunit is used to judge whether the historical IOU of the previous frame is less than the threshold through the preset tracker. If so, the human body detection network is used to perform human body detection on the target person image frame to obtain the target person frame and update the current IOU;
  • the second judgment subunit is used to, if not, expand the minimum circumscribed rectangle of the human body key points of the previous frame by a preset number of pixels to obtain an expanded frame as the target human body frame, and update the current IOU.
  • the key point prediction unit comprises:
  • a reference prediction subunit used for obtaining reference key points in the target human body frame through a reference pose estimation network
  • An interference analysis subunit configured to perform interference analysis on the human body trunk in the target human body frame according to an interference part classification network to obtain an interference vector
  • the weighted calculation subunit is used to perform weighted calculation based on the reference key points and the interfering vectors to obtain the key points of the human body.
  • the human body posture recognition unit comprises:
  • a connecting subunit used for connecting the human body key points into a human body skeleton graph based on a key point matching algorithm
  • the recognition subunit is used to recognize the human body posture in the target person image frame according to the human body skeleton diagram to obtain a human body posture recognition result.
  • a method for human posture recognition including: performing human body detection on a target person image frame in combination with a preset tracker and a human body detection network to obtain a target person frame, wherein the preset tracker is a network for tracking a human body based on IOU and a threshold; using a preset key point prediction network to predict human body key points in the target person frame, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network; recognizing the human body posture in the target person image frame according to the human body key points to obtain a human body posture recognition result.
  • the human body posture recognition method provided in this application uses a tracker to cooperate with a human body detection network to detect human body postures. Box detection, tracking the human body based on IOU and threshold can reduce the number of human body detections, reduce redundant calculations, and speed up processing; in the key point prediction process, an interference part classification network is added to perform interference analysis, and key point prediction based on the interference analysis results can improve prediction accuracy; by making targeted improvements to the human posture recognition process, high recognition accuracy and fast recognition speed can be taken into account at the same time. Therefore, this application can solve the technical problem that the existing technology cannot take into account high recognition accuracy and fast recognition speed at the same time.
  • FIG1 is a schematic diagram of a flow chart of a method for human body posture recognition provided by an embodiment of the present application
  • FIG2 is another schematic flow chart of a method for human body posture recognition provided by an embodiment of the present application.
  • FIG3 is a schematic diagram of the structure of a human posture recognition device provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of a human frame detection process of the MMpose network model provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a human frame detection process combining a preset tracker and a human body detection network provided in an embodiment of the present application;
  • FIG6 is a schematic diagram of a key point prediction process of a preset key point prediction network provided in an embodiment of the present application.
  • FIG. 1 shows a first embodiment of a human body posture recognition method provided by the present application, including:
  • Step 101 Perform human body detection on a target person image frame by combining a preset tracker and a human body detection network to obtain a target human body frame.
  • the preset tracker is a network that tracks a human body based on IOU and a threshold.
  • the preset tracker is a network that tracks the human body in the image frame based on IOU (Intersection over Union) and the corresponding threshold.
  • the threshold can be set according to actual experience or actual conditions, and is not limited here.
  • IOU is a standard for measuring the accuracy of detecting corresponding objects in a specific data set. As long as a prediction range (bounding boxes, hereinafter referred to as bbox) is obtained in the output, the task can be measured by IoU.
  • the preset tracker can avoid multiple human detection networks. The large amount of calculation brought by repeated detection can speed up the algorithm processing speed.
  • the specific detection principle is that when the preset tracker can track a specific human body, the human body detection network only needs to detect the human body frame once until the tracked human body is lost, that is, it exceeds the preset tracking range, then restart the human body detection network to perform human body frame detection. Such a mechanism can greatly reduce the amount of calculation and improve the running speed.
  • the improved algorithm of the embodiment of the present application is based on the MMpose network model, which is a top-down type algorithm. Its human frame detection mechanism is to obtain the human body boundary box in the picture through the detection network, and then directly use the human body area map as the input map of the human skeleton key point prediction network for key point prediction calculation. This method is to constantly reread the human body frame detection, resulting in a large amount of calculation, which slows down the running process.
  • Step 102 predicting the key points of the human body in the target human body frame using a preset key point prediction network, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network.
  • the preset key point prediction network includes two important network layers, namely the reference pose estimation network and the interference part classification network.
  • the reference pose estimation network can directly predict the key points of the human body in the target human body frame.
  • an interference part classification network is added to extract the vector features of each part classification, thereby optimizing the prediction of key points.
  • a lightweight network can be selected to construct an interference part classification network, which can both improve the prediction accuracy and reduce the amount of calculation.
  • Step 103 Recognize the human body posture in the target person image frame according to the human body key points to obtain a human body posture recognition result.
  • the predicted key points of the human body are marked, and the points can be connected according to the marks to form the human skeleton lines. According to the direction of the lines or the geometric shapes formed by the lines, the human body posture in the image frame of the target person can be identified to obtain the human body posture recognition result.
  • the human posture recognition method provided in the embodiment of the present application performs human frame detection through a tracker in cooperation with a human body detection network. Tracking the human body according to IOU and threshold can reduce the number of human body detections, reduce redundant calculations, and speed up processing.
  • an interference part classification network is added to perform interference analysis, and key point prediction based on the interference analysis results can improve prediction accuracy. By making targeted improvements to the human posture recognition process, both high recognition accuracy and fast recognition speed can be taken into account. Therefore, the embodiment of the present application can solve the technical problem that the prior art cannot take both high recognition accuracy and fast recognition speed into account.
  • the present application provides a second embodiment of a human posture recognition method, including:
  • Step 201 Acquire multiple character image frames in a surveillance video.
  • Step 202 preprocess the character image frame to obtain a target character image frame, wherein the preprocessing includes clipping, mean reduction and normalization.
  • the monitoring device can continuously obtain the video stream, and the monitoring video is selected from the video stream.
  • FFmpeg video decoding multiple image frames can be extracted from the monitoring video.
  • This embodiment mainly analyzes human body movements, so it is sufficient to select the image frame with the person.
  • the preprocessing operation is to improve the quality of the image frame and facilitate the subsequent human frame detection and key point prediction.
  • mean reduction process and normalization process proposed in this embodiment other preprocessing processes can be added according to actual conditions, which are not limited here.
  • Step 203 determine whether the historical IOU of the previous frame is less than the threshold value through the preset tracker. If so, perform human body detection on the target person image frame through the human body detection network to obtain the target person frame and update the current IOU.
  • Step 204 If not, the expanded frame obtained by expanding the minimum bounding rectangle of the human body key points in the previous frame by a preset number of pixels is used as the target human body frame, and the current IOU is updated.
  • the current frame no longer uses the minimum bounding rectangle of the previous frame to extract the target human body frame, but needs to perform human body detection on the target person image frame through the human body detection network to obtain the target human body frame.
  • the historical IOU of the previous frame is not less than the threshold, it means that there is a large overlap between the minimum bounding rectangle of the human key points identified in the previous frame and the minimum bounding rectangle of the human key points identified in the previous frame, so the historical IOU calculated based on the two minimum bounding rectangles is greater than or equal to Threshold, the detection target is still in the image frame, and the current frame can use the minimum circumscribed rectangle of the human key points identified at the previous moment to expand the preset number of pixels to obtain the target human frame in the current frame, without redundant detection through the human detection network.
  • the first frame image needs to directly use the human body frame detected by the human body detection network, and then use the preset key point prediction network to identify the human body key points.
  • the expanded frame obtained by expanding the minimum circumscribed rectangle of the human body key points by a certain number of pixels can be regarded as the initial bbox.
  • the human body frame at this time does not have the corresponding IOU calculated by the human body frame at the previous moment, so the IOU of the first frame image processing is directly defined as 1.
  • bbox is the extended box obtained by expanding the minimum circumscribed rectangle of the human key points by a preset number of pixels, not the target human box extracted from the image frame. Therefore, each bbox is obtained after the human key point prediction is completed and is used to calculate the current IOU corresponding to the current frame. The current IOU is used to determine whether the next frame needs to use the human detection network for target detection.
  • the expansion is in pixels, and the specific number of pixels to be expanded, that is, the preset number can be set according to actual conditions and is not limited here. Moreover, the expansion refers to expanding outward from the four sides of the minimum circumscribed rectangle at the same time, not just one side.
  • the process of detecting the human body frame is a traditional method, which directly detects the human body frame through the human body detection network (Person detector), and the detection mechanism of this application please refer to Figure 5, where Tracker is a preset tracker that can track the human body, and Pose landmarks is a subsequent preset key point prediction network.
  • Person detector human body detection network
  • Step 205 Obtain benchmark key points in the target human frame through a benchmark pose estimation network.
  • Step 206 Perform interference analysis on the human body torso in the target human body frame according to the interference part classification network to obtain an interference vector.
  • Step 207 Perform weighted calculation based on the reference key points and the interfering vectors to obtain the human body key points.
  • the preset key point prediction network includes a reference posture estimation network and an interference part classification network.
  • the present embodiment adds an interference part classification network in the key point prediction process.
  • Image is the target human frame diagram obtained by the above detection.
  • the target human frame also needs to input the interference part classification network for interference analysis; then the results of the two are calculated through channel weighting to obtain a more accurate human key point prediction result.
  • the interference part classification network mainly predicts whether each joint point in the frame belongs to the main target, or in other words, the process of classifying the joint points belonging to the main target and the joint points not belonging to the main target.
  • pi is each joint point of the human body
  • vi is a binary variable, indicating whether the i-th joint point is an interference joint point.
  • the value 0 means a non-interference joint point
  • the value 1 means an interference joint point
  • k is the number of joint points.
  • This embodiment makes appropriate modifications to MobileNetV2 to adapt to the interference part classification task, and replaces the 1000-dimensional fully connected classifier used for image classification in the network MobileNetV2 with a 1 ⁇ 1 convolution with k output channels.
  • L lm is the loss function of the reference posture estimation network
  • L ic is the loss function of the interference part classification network
  • is the balance factor. Therefore, this embodiment can effectively improve the problem of poor key point prediction effect caused by multi-person interference at a relatively low computational cost.
  • Step 208 Connect the key points of the human body into a human skeleton graph based on a key point matching algorithm.
  • Step 209 Recognize the human body posture in the target person image frame according to the human body skeleton diagram to obtain a human body posture recognition result.
  • the key point matching algorithm can select a suitable algorithm according to the matching accuracy, and is not limited here.
  • the human skeleton image includes lines connected by key points and the original target person image frame, which are superimposed and displayed to facilitate the correspondence of human body parts.
  • the method provided in this embodiment not only retains the high precision of the Top-down type, but also achieves a faster speed than the Top-down type algorithm, reduces the amount of calculation and power consumption, and makes the algorithm more practical engineering significance.
  • the human posture recognition method provided in the embodiment of the present application performs human frame detection through a tracker in cooperation with a human body detection network. Tracking the human body according to IOU and threshold can reduce the number of human body detections, reduce redundant calculations, and speed up processing.
  • an interference part classification network is added to perform interference analysis, and key point prediction based on the interference analysis results can improve prediction accuracy. By making targeted improvements to the human posture recognition process, both high recognition accuracy and fast recognition speed can be taken into account. Therefore, the embodiment of the present application can solve the technical problem that the prior art cannot take both high recognition accuracy and fast recognition speed into account.
  • the present application also provides an embodiment of a human posture recognition device, including:
  • a human frame detection unit 301 is used to perform human body detection on a target person image frame in combination with a preset tracker and a human body detection network to obtain a target human body frame, wherein the preset tracker is a network for tracking a human body based on IOU and a threshold;
  • a key point prediction unit 302 is used to predict the key points of the human body in the target human body frame using a preset key point prediction network, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network;
  • the human body posture recognition unit 303 is used to recognize the human body posture in the target person image frame according to the human body key points to obtain the human body posture recognition result.
  • An image frame acquisition unit 304 is used to acquire multiple person image frames in the surveillance video
  • the image preprocessing unit 305 is used to perform preprocessing operations on the character image frame to obtain a target character image frame, and the preprocessing includes clipping, mean reduction and normalization.
  • the human frame detection unit 301 includes:
  • the first judgment subunit 3011 is used to judge whether the historical IOU of the previous frame is less than the threshold value through the preset tracker. If so, the human body detection is performed on the target person image frame through the human body detection network to obtain the target person frame and update the current IOU;
  • the second judgment subunit 3012 is used to, if not, expand the minimum bounding rectangle of the human body key points in the previous frame by a preset number of pixels to obtain an expanded frame as the target human body frame, and update the current IOU.
  • the key point prediction unit 302 includes:
  • a reference prediction subunit 3021 is used to obtain reference key points in the target human frame through a reference pose estimation network
  • the interference analysis subunit 3022 is used to perform interference analysis on the human body torso in the target human body frame according to the interference part classification network to obtain an interference vector;
  • the weighted calculation subunit 3023 is used to perform weighted calculation based on the reference key points and the interfering vectors to obtain the human body key points.
  • the human body posture recognition unit 303 includes:
  • a connecting subunit 3031 is used to connect the key points of the human body into a human skeleton graph based on a key point matching algorithm
  • the recognition subunit 3032 is used to recognize the human body posture in the target person image frame according to the human body skeleton diagram to obtain a human body posture recognition result.
  • the disclosed devices and methods can be It can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division. There may be other division methods in actual implementation.
  • multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art, or all or part of the technical solution.
  • the computer software product is stored in a storage medium, including a number of instructions for executing all or part of the steps of the method described in each embodiment of the present application through a computer device (which can be a personal computer, server, or network device, etc.).
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (full name in English: Read-Only Memory, English abbreviation: ROM), random access memory (full name in English: Random Access Memory, English abbreviation: RAM), disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The present application discloses a human body pose recognition method and device. The method comprises: combining a preset tracker and a human body detection network to perform human body detection on a target person image frame so as to obtain a target human body box, wherein the preset tracker is a network that tracks a human body on the basis of an IOU and a threshold value; predicting human body key points in the target human body box by using a preset key point prediction network, wherein the preset key point prediction network comprises a reference pose estimation network and an interference part classification network; and recognizing a human body pose in the target person image frame according to the human body key points to obtain a human body pose recognition result. The present application can solve the technical problem in the prior art that high recognition precision and high recognition speed cannot be achieved at the same time.

Description

一种人体姿态识别方法及装置Human body posture recognition method and device 技术领域Technical Field
本申请涉及机器视觉技术领域,尤其涉及一种人体姿态识别方法及装置。The present application relates to the field of machine vision technology, and in particular to a method and device for human body posture recognition.
背景技术Background technique
人体姿态识别任务主要是从摄像机捕获的视频或者图片中获取人体的各个部位的关节点坐标。人体姿态识别当下有许多应用领域,包括虚拟现实,人机交互,体育训练及分析,异常行为检测等。The human posture recognition task is mainly to obtain the coordinates of the joints of various parts of the human body from the video or picture captured by the camera. Human posture recognition currently has many application fields, including virtual reality, human-computer interaction, sports training and analysis, abnormal behavior detection, etc.
当前的人体姿态识别算法主要可以分为两类,一类是自上而下(Top-down),先检测每一个人体边界框,然后在通过关键点回归网络对每个人的骨骼关键点进行预测。一类是自下而上(Bottom-up),直接预测所有人体的骨骼关键点,然后将关键点连接为不同人体。前者在人体目标检测阶段会消耗大量计算成本,整体算法速度慢;而后者抗干扰能力差,算法识别精度较低。Current human posture recognition algorithms can be divided into two categories: one is top-down, which first detects each human bounding box and then predicts each person's skeleton key points through a key point regression network. The other is bottom-up, which directly predicts all human skeleton key points and then connects the key points into different human bodies. The former consumes a lot of computing cost in the human target detection stage and the overall algorithm is slow; while the latter has poor anti-interference ability and low algorithm recognition accuracy.
发明内容Summary of the invention
本申请提供了一种人体姿态识别方法及装置,用于解决现有技术无法同时兼顾较高识别精度和较快识别速度的技术问题。The present application provides a method and device for human posture recognition, which are used to solve the technical problem that the prior art cannot take into account both high recognition accuracy and fast recognition speed at the same time.
有鉴于此,本申请第一方面提供了一种人体姿态识别方法,包括:In view of this, the first aspect of the present application provides a human body posture recognition method, comprising:
结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,所述预置跟踪器为基于IOU和阈值对人体进行跟踪的网络;Combining a preset tracker and a human body detection network to perform human body detection on the target person image frame to obtain a target human body frame, wherein the preset tracker is a network that tracks the human body based on IOU and threshold;
采用预设关键点预测网络在所述目标人体框中预测出人体关键点,所述预设关键点预测网络包括基准姿态估计网络和干扰部位分类网络;A preset key point prediction network is used to predict key points of a human body in the target human body frame, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network;
根据所述人体关键点对所述目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。The human body posture in the target person image frame is recognized according to the human body key points to obtain a human body posture recognition result.
优选的,所述结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,所述预置跟踪器为基于IOU和阈值对人体进行跟踪的网络,之前还包括:Preferably, the method combines a preset tracker and a human body detection network to perform human body detection on the target person image frame to obtain a target human body frame, wherein the preset tracker is a network for tracking the human body based on IOU and threshold, and also includes:
在监控视频中获取多个人物图像帧;Acquire multiple person image frames in the surveillance video;
对所述人物图像帧进行预处理操作,得到目标人物图像帧,所述预处理包括剪裁处理、减均值处理和正态化处理。 A preprocessing operation is performed on the character image frame to obtain a target character image frame, wherein the preprocessing includes clipping, mean reduction and normalization.
优选的,所述结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,所述预置跟踪器为基于IOU和阈值对人体进行跟踪的网络,包括:Preferably, the preset tracker and the human body detection network are combined to perform human body detection on the target person image frame to obtain the target human body frame, and the preset tracker is a network that tracks the human body based on IOU and threshold, including:
通过预置跟踪器判断上一帧的历史IOU是否小于阈值,若是,则通过人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,并更新当前IOU;The preset tracker is used to determine whether the historical IOU of the previous frame is less than the threshold. If so, the human body detection network is used to perform human body detection on the target person image frame, obtain the target person frame, and update the current IOU;
若否,则将上一帧人体关键点的最小外接矩形扩展预置数量像素后得到的扩展框作为所述目标人体框,并更新所述当前IOU。If not, the extended frame obtained by expanding the minimum bounding rectangle of the human body key points in the previous frame by a preset number of pixels is used as the target human body frame, and the current IOU is updated.
优选的,所述采用预设关键点预测网络在所述目标人体框中预测出人体关键点,所述预设关键点预测网络包括基准姿态估计网络和干扰部位分类网络,包括:Preferably, the preset key point prediction network is used to predict the key points of the human body in the target human body frame, and the preset key point prediction network includes a reference posture estimation network and an interference part classification network, including:
通过基准姿态估计网络在所述目标人体框中获取基准关键点;Obtaining benchmark key points in the target human frame through a benchmark pose estimation network;
依据干扰部位分类网络对所述目标人体框中的人体躯干进行干扰性分析,得到干扰性向量;Performing interference analysis on the human body torso in the target human body frame according to the interference part classification network to obtain an interference vector;
基于所述基准关键点和所述干扰性向量进行加权计算,得到人体关键点。A weighted calculation is performed based on the reference key points and the interfering vectors to obtain the key points of the human body.
优选的,所述根据所述人体关键点对所述目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果,包括:Preferably, the step of recognizing the human body posture in the target person image frame according to the human body key points to obtain the human body posture recognition result includes:
基于关键点匹配算法将所述人体关键点连接为人体骨架图;Connecting the human body key points into a human body skeleton graph based on a key point matching algorithm;
根据所述人体骨架图对所述目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。The human body posture in the target person image frame is recognized according to the human body skeleton diagram to obtain a human body posture recognition result.
本申请第二方面提供了一种人体姿态识别装置,包括:A second aspect of the present application provides a human posture recognition device, comprising:
人体框检测单元,用于结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,所述预置跟踪器为基于IOU和阈值对人体进行跟踪的网络;A human frame detection unit, used to perform human body detection on a target person image frame in combination with a preset tracker and a human body detection network to obtain a target human body frame, wherein the preset tracker is a network that tracks a human body based on IOU and a threshold;
关键点预测单元,用于采用预设关键点预测网络在所述目标人体框中预测出人体关键点,所述预设关键点预测网络包括基准姿态估计网络和干扰部位分类网络;A key point prediction unit, used to predict the key points of the human body in the target human body frame by using a preset key point prediction network, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network;
人体姿态识别单元,用于根据所述人体关键点对所述目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。The human body posture recognition unit is used to recognize the human body posture in the target person image frame according to the human body key points to obtain a human body posture recognition result.
优选的,还包括:Preferably, it also includes:
图像帧获取单元,用于在监控视频中获取多个人物图像帧;An image frame acquisition unit, used to acquire multiple person image frames in the surveillance video;
图像预处理单元,用于对所述人物图像帧进行预处理操作,得到目标人物 图像帧,所述预处理包括剪裁处理、减均值处理和正态化处理。An image preprocessing unit is used to perform a preprocessing operation on the character image frame to obtain a target character The image frame, the preprocessing includes clipping, mean subtraction and normalization.
优选的,还包括:Preferably, it also includes:
图像帧获取单元,用于在监控视频中获取多个人物图像帧;An image frame acquisition unit, used to acquire multiple person image frames in the surveillance video;
图像预处理单元,用于对所述人物图像帧进行预处理操作,得到目标人物图像帧,所述预处理包括剪裁处理、减均值处理和正态化处理。The image preprocessing unit is used to perform a preprocessing operation on the character image frame to obtain a target character image frame, wherein the preprocessing includes clipping, mean reduction and normalization.
优选的,所述人体框检测单元,包括:Preferably, the human body frame detection unit comprises:
第一判断子单元,用于通过预置跟踪器判断上一帧的历史IOU是否小于阈值,若是,则通过人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,并更新当前IOU;The first judgment subunit is used to judge whether the historical IOU of the previous frame is less than the threshold through the preset tracker. If so, the human body detection network is used to perform human body detection on the target person image frame to obtain the target person frame and update the current IOU;
第二判断子单元,用于若否,则将上一帧人体关键点的最小外接矩形扩展预置数量像素后得到的扩展框作为所述目标人体框,并更新所述当前IOU。The second judgment subunit is used to, if not, expand the minimum circumscribed rectangle of the human body key points of the previous frame by a preset number of pixels to obtain an expanded frame as the target human body frame, and update the current IOU.
优选的,所述关键点预测单元,包括:Preferably, the key point prediction unit comprises:
基准预测子单元,用于通过基准姿态估计网络在所述目标人体框中获取基准关键点;A reference prediction subunit, used for obtaining reference key points in the target human body frame through a reference pose estimation network;
干扰分析子单元,用于依据干扰部位分类网络对所述目标人体框中的人体躯干进行干扰性分析,得到干扰性向量;An interference analysis subunit, configured to perform interference analysis on the human body trunk in the target human body frame according to an interference part classification network to obtain an interference vector;
加权计算子单元,用于基于所述基准关键点和所述干扰性向量进行加权计算,得到人体关键点。The weighted calculation subunit is used to perform weighted calculation based on the reference key points and the interfering vectors to obtain the key points of the human body.
优选的,所述人体姿态识别单元,包括:Preferably, the human body posture recognition unit comprises:
连接子单元,用于基于关键点匹配算法将所述人体关键点连接为人体骨架图;A connecting subunit, used for connecting the human body key points into a human body skeleton graph based on a key point matching algorithm;
识别子单元,用于根据所述人体骨架图对所述目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。The recognition subunit is used to recognize the human body posture in the target person image frame according to the human body skeleton diagram to obtain a human body posture recognition result.
从以上技术方案可以看出,本申请实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:
本申请中,提供了一种人体姿态识别方法,包括:结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,预置跟踪器为基于IOU和阈值对人体进行跟踪的网络;采用预设关键点预测网络在目标人体框中预测出人体关键点,预设关键点预测网络包括基准姿态估计网络和干扰部位分类网络;根据人体关键点对目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。In the present application, a method for human posture recognition is provided, including: performing human body detection on a target person image frame in combination with a preset tracker and a human body detection network to obtain a target person frame, wherein the preset tracker is a network for tracking a human body based on IOU and a threshold; using a preset key point prediction network to predict human body key points in the target person frame, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network; recognizing the human body posture in the target person image frame according to the human body key points to obtain a human body posture recognition result.
本申请提供的人体姿态识别方法,通过跟踪器配合人体检测网络进行人体 框检测,依据IOU和阈值对人体进行跟踪可以减少人体检测次数,降低了冗余计算,能够加快处理速度;而在关键点预测过程中加入了干扰部位分类网络进行干扰分析,基于干扰分析结果进行关键点预测能够提升预测准确度;通过对人体姿态识别过程进行针对性改进,可以同时兼顾较高的识别精度和较快的识别速度。因此,本申请能够解决现有技术无法同时兼顾较高识别精度和较快识别速度的技术问题。The human body posture recognition method provided in this application uses a tracker to cooperate with a human body detection network to detect human body postures. Box detection, tracking the human body based on IOU and threshold can reduce the number of human body detections, reduce redundant calculations, and speed up processing; in the key point prediction process, an interference part classification network is added to perform interference analysis, and key point prediction based on the interference analysis results can improve prediction accuracy; by making targeted improvements to the human posture recognition process, high recognition accuracy and fast recognition speed can be taken into account at the same time. Therefore, this application can solve the technical problem that the existing technology cannot take into account high recognition accuracy and fast recognition speed at the same time.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本申请实施例提供的一种人体姿态识别方法的一个流程示意图;FIG1 is a schematic diagram of a flow chart of a method for human body posture recognition provided by an embodiment of the present application;
图2为本申请实施例提供的一种人体姿态识别方法的另一个流程示意图;FIG2 is another schematic flow chart of a method for human body posture recognition provided by an embodiment of the present application;
图3为本申请实施例提供的一种人体姿态识别装置的结构示意图;FIG3 is a schematic diagram of the structure of a human posture recognition device provided in an embodiment of the present application;
图4为本申请实施例提供的MMpose网络模型的人体框检测过程示意图;FIG4 is a schematic diagram of a human frame detection process of the MMpose network model provided in an embodiment of the present application;
图5为本申请实施例提供的结合预置跟踪器和人体检测网络的人体框检测过程示意图;FIG5 is a schematic diagram of a human frame detection process combining a preset tracker and a human body detection network provided in an embodiment of the present application;
图6为本申请实施例提供的预设关键点预测网络预测关键点过程示意图。FIG6 is a schematic diagram of a key point prediction process of a preset key point prediction network provided in an embodiment of the present application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in the field without creative work are within the scope of protection of the present application.
为了便于理解,请参阅图1,本申请提供的一种人体姿态识别方法的实施例一,包括:For ease of understanding, please refer to FIG. 1 , which shows a first embodiment of a human body posture recognition method provided by the present application, including:
步骤101、结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,预置跟踪器为基于IOU和阈值对人体进行跟踪的网络。Step 101: Perform human body detection on a target person image frame by combining a preset tracker and a human body detection network to obtain a target human body frame. The preset tracker is a network that tracks a human body based on IOU and a threshold.
预置跟踪器是基于IOU(Intersection over Union)和对应的阈值对图像帧中的人体进行跟踪的网络;其中,阈值可以根据实际经验或者实际情况设置,在此不做限定;IOU是测量在特定数据集中检测相应物体准确度的一个标准。只要是在输出中得出一个预测范围(bounding boxes,后简称bbox)的任务都可以用IoU来进行测量。预置跟踪器通过对人体进行跟踪,可以避免人体检测网络多 次重复检测带来的较大计算量,可以加快算法处理速度。The preset tracker is a network that tracks the human body in the image frame based on IOU (Intersection over Union) and the corresponding threshold. The threshold can be set according to actual experience or actual conditions, and is not limited here. IOU is a standard for measuring the accuracy of detecting corresponding objects in a specific data set. As long as a prediction range (bounding boxes, hereinafter referred to as bbox) is obtained in the output, the task can be measured by IoU. By tracking the human body, the preset tracker can avoid multiple human detection networks. The large amount of calculation brought by repeated detection can speed up the algorithm processing speed.
具体检测原理是,预置跟踪器可以跟踪到具体的人体时,人体检测网络只需要检测一次人体框,直至跟踪的人体丢失,即超出预置跟踪范围,则重新启动人体检测网络进行人体框检测,这样的机制可以较大程度的减少计算量,提高运行速度。The specific detection principle is that when the preset tracker can track a specific human body, the human body detection network only needs to detect the human body frame once until the tracked human body is lost, that is, it exceeds the preset tracking range, then restart the human body detection network to perform human body frame detection. Such a mechanism can greatly reduce the amount of calculation and improve the running speed.
本申请实施例的改进算法是以MMpose网络模型为基准的,该网络模型是Top-down类型算法,它的人体框检测机制是通过检测网络获取图片中人体边界框,然后直接将人体区域图作为人体骨骼关键点预测网络的输入图进行关键点预测计算。该方法就是不断重读人体框检测,造成较大的计算量,拖慢了运行进程。The improved algorithm of the embodiment of the present application is based on the MMpose network model, which is a top-down type algorithm. Its human frame detection mechanism is to obtain the human body boundary box in the picture through the detection network, and then directly use the human body area map as the input map of the human skeleton key point prediction network for key point prediction calculation. This method is to constantly reread the human body frame detection, resulting in a large amount of calculation, which slows down the running process.
步骤102、采用预设关键点预测网络在目标人体框中预测出人体关键点,预设关键点预测网络包括基准姿态估计网络和干扰部位分类网络。Step 102: predicting the key points of the human body in the target human body frame using a preset key point prediction network, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network.
预设关键点预测网络包括两种重要的网络层,即基准姿态估计网络和干扰部位分类网络。基准姿态估计网络可以直接预测出目标人体框中的人体关键点,但是,在多人场景中会存在多个不同人体部分部位发生重叠的情况,这部分躯干对主体人的识别而言会造成干扰,进而导致关键点预测不准,所以,本实施例中增设干扰部位分类网络提取各部位分类的向量特征,进而优化关键点的预测。具体的,为了不增加整体算法的计算负担,可以选取轻量级网络构建干扰部位分类网络,既可以提高预测准确度,又可以降低计算量。The preset key point prediction network includes two important network layers, namely the reference pose estimation network and the interference part classification network. The reference pose estimation network can directly predict the key points of the human body in the target human body frame. However, in multi-person scenes, there will be overlaps of multiple different parts of the human body. This part of the torso will interfere with the recognition of the subject, resulting in inaccurate prediction of key points. Therefore, in this embodiment, an interference part classification network is added to extract the vector features of each part classification, thereby optimizing the prediction of key points. Specifically, in order not to increase the computational burden of the overall algorithm, a lightweight network can be selected to construct an interference part classification network, which can both improve the prediction accuracy and reduce the amount of calculation.
步骤103、根据人体关键点对目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。Step 103: Recognize the human body posture in the target person image frame according to the human body key points to obtain a human body posture recognition result.
预测出的人体关键点存在标记,根据标记就可以完成点与点的连接,形成人体骨架线条,根据线条方向,或者线条与线条形成的几何形状可以识别出目标人物图像帧中的人体姿态,得到人体姿态识别结果。The predicted key points of the human body are marked, and the points can be connected according to the marks to form the human skeleton lines. According to the direction of the lines or the geometric shapes formed by the lines, the human body posture in the image frame of the target person can be identified to obtain the human body posture recognition result.
本申请实施例提供的人体姿态识别方法,通过跟踪器配合人体检测网络进行人体框检测,依据IOU和阈值对人体进行跟踪可以减少人体检测次数,降低了冗余计算,能够加快处理速度;而在关键点预测过程中加入了干扰部位分类网络进行干扰分析,基于干扰分析结果进行关键点预测能够提升预测准确度;通过对人体姿态识别过程进行针对性改进,可以同时兼顾较高的识别精度和较快的识别速度。因此,本申请实施例能够解决现有技术无法同时兼顾较高识别精度和较快识别速度的技术问题。 The human posture recognition method provided in the embodiment of the present application performs human frame detection through a tracker in cooperation with a human body detection network. Tracking the human body according to IOU and threshold can reduce the number of human body detections, reduce redundant calculations, and speed up processing. In the key point prediction process, an interference part classification network is added to perform interference analysis, and key point prediction based on the interference analysis results can improve prediction accuracy. By making targeted improvements to the human posture recognition process, both high recognition accuracy and fast recognition speed can be taken into account. Therefore, the embodiment of the present application can solve the technical problem that the prior art cannot take both high recognition accuracy and fast recognition speed into account.
为了便于理解,请参阅图2,本申请提供了一种人体姿态识别方法的实施例二,包括:For ease of understanding, please refer to FIG. 2 . The present application provides a second embodiment of a human posture recognition method, including:
步骤201、在监控视频中获取多个人物图像帧。Step 201: Acquire multiple character image frames in a surveillance video.
步骤202、对人物图像帧进行预处理操作,得到目标人物图像帧,预处理包括剪裁处理、减均值处理和正态化处理。Step 202: preprocess the character image frame to obtain a target character image frame, wherein the preprocessing includes clipping, mean reduction and normalization.
监控设备可以不断获取视频流,监控视频即从视频流中选取,通过FFmpeg进行视频解码,监控视频中就可以提取出多个图像帧,相邻帧与帧之间存在时序关系,可以对连续帧进行检测和关键点预测,从而判断出人体姿态,甚至是人物行为。本实施例主要是对人体动作进行分析,所以选取存在人物的图像帧即可。The monitoring device can continuously obtain the video stream, and the monitoring video is selected from the video stream. Through FFmpeg video decoding, multiple image frames can be extracted from the monitoring video. There is a time sequence relationship between adjacent frames, and continuous frames can be detected and key points predicted, so as to judge the human posture and even the behavior of the person. This embodiment mainly analyzes human body movements, so it is sufficient to select the image frame with the person.
预处理操作是为了提升图像帧的质量,便于后续的人体框检测和关键点预测。除了本实施例提出的剪裁处理、减均值处理和正态化处理之外,还可以根据实际情况增设其他的预处理过程,在此不做限定。The preprocessing operation is to improve the quality of the image frame and facilitate the subsequent human frame detection and key point prediction. In addition to the clipping process, mean reduction process and normalization process proposed in this embodiment, other preprocessing processes can be added according to actual conditions, which are not limited here.
步骤203、通过预置跟踪器判断上一帧的历史IOU是否小于阈值,若是,则通过人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,并更新当前IOU。Step 203: determine whether the historical IOU of the previous frame is less than the threshold value through the preset tracker. If so, perform human body detection on the target person image frame through the human body detection network to obtain the target person frame and update the current IOU.
步骤204、若否,则将上一帧人体关键点的最小外接矩形扩展预置数量像素后得到的扩展框作为目标人体框,并更新当前IOU。Step 204: If not, the expanded frame obtained by expanding the minimum bounding rectangle of the human body key points in the previous frame by a preset number of pixels is used as the target human body frame, and the current IOU is updated.
如果每个图像帧都采用人体检测网络检测一次,那么一段监控视频需要多次调用人体检测网络,带来较大的计算量。所以,是否采用人体检测网络检测到当前的目标人体框还需要通过历史IOU判断,即启动预置跟踪器进行判断处理,进而决定是否需要人体检测网络处理该帧图像。If each image frame is detected once using the human body detection network, then a surveillance video needs to call the human body detection network multiple times, which brings a large amount of calculation. Therefore, whether the human body detection network is used to detect the current target human body frame needs to be judged through historical IOU, that is, the preset tracker is started for judgment and processing, and then it is determined whether the human body detection network is needed to process the frame image.
如果上一帧的历史IOU小于阈值,则说明上一帧图像帧识别出的人体关键点的最小外接矩形框与上上一帧识别的人体关键点的最小外接矩形框之间重叠部分较小,那基于两个最小外接矩形框计算得到的IOU就会小于阈值,检测目标可能已经移动出前面的图像帧,人体已经脱离的跟踪器允许的移动范围,所以当前帧就不再沿用上一帧的最小外接矩形框提取目标人体框,而是需要通过人体检测网络对目标人物图像帧进行人体检测,得到目标人体框。If the historical IOU of the previous frame is less than the threshold, it means that the overlap between the minimum bounding rectangle of the human key points identified in the previous image frame and the minimum bounding rectangle of the human key points identified in the previous frame is small, then the IOU calculated based on the two minimum bounding rectangles will be less than the threshold, and the detection target may have moved out of the previous image frame, and the human body has left the movement range allowed by the tracker. Therefore, the current frame no longer uses the minimum bounding rectangle of the previous frame to extract the target human body frame, but needs to perform human body detection on the target person image frame through the human body detection network to obtain the target human body frame.
如果上一帧的历史IOU不小于阈值,则说明上一帧图像帧识别出的人体关键点的最小外接矩形框与上上一帧识别的人体关键点的最小外接矩形框之间重叠部分较多,所以基于两个最小外接矩形框计算得到的历史IOU大于或者等于 阈值,检测目标仍然在图像帧内,当前帧则可以采用上一时刻识别出的人体关键点的最小外接矩形扩展预置数量像素后得到的扩展框在当前帧中提取目标人体框,不经过人体检测网络进行冗余检测。If the historical IOU of the previous frame is not less than the threshold, it means that there is a large overlap between the minimum bounding rectangle of the human key points identified in the previous frame and the minimum bounding rectangle of the human key points identified in the previous frame, so the historical IOU calculated based on the two minimum bounding rectangles is greater than or equal to Threshold, the detection target is still in the image frame, and the current frame can use the minimum circumscribed rectangle of the human key points identified at the previous moment to expand the preset number of pixels to obtain the target human frame in the current frame, without redundant detection through the human detection network.
需要说明的是,第一帧图像是需要直接采用人体检测网络检测出的人体框,然后采用预设关键点预测网络识别人体关键点,而此时人体关键点的最小外接矩形扩展一定像素得到的扩展框即可以看作初始bbox,此时的人体框没有前一时刻的人体框对应计算IOU,所以直接定义第一帧图像处理完成的IOU=1。It should be noted that the first frame image needs to directly use the human body frame detected by the human body detection network, and then use the preset key point prediction network to identify the human body key points. At this time, the expanded frame obtained by expanding the minimum circumscribed rectangle of the human body key points by a certain number of pixels can be regarded as the initial bbox. The human body frame at this time does not have the corresponding IOU calculated by the human body frame at the previous moment, so the IOU of the first frame image processing is directly defined as 1.
需要解释的是,bbox是人体关键点的最小外接矩形扩展预置数量像素后得到的扩展框,不是从图像帧中提取的目标人体框,所以,每个bbox都是在人体关键点预测完成后获取到的,用于计算当前帧对应的当前IOU,当前IOU则用于判定下一帧是否需要采用人体检测网络进行目标检测。It should be explained that bbox is the extended box obtained by expanding the minimum circumscribed rectangle of the human key points by a preset number of pixels, not the target human box extracted from the image frame. Therefore, each bbox is obtained after the human key point prediction is completed and is used to calculate the current IOU corresponding to the current frame. The current IOU is used to determine whether the next frame needs to use the human detection network for target detection.
需要说明的是,扩展是以像素为单位,并且具体扩展多少像素,即预置数量可根据实际情况设定,在此不做限定,而且扩展是指从最小外接矩形的四条边同时向外扩展,不是某一条边。It should be noted that the expansion is in pixels, and the specific number of pixels to be expanded, that is, the preset number can be set according to actual conditions and is not limited here. Moreover, the expansion refers to expanding outward from the four sides of the minimum circumscribed rectangle at the same time, not just one side.
请参阅4,其检测人体框的过程是传统方法,直接通过人体检测网络(Person detector)检测人体框,而本申请的检测机制请参阅图5,其中的Tracker即为预置跟踪器,可以对人体进行跟踪,Pose landmarks为后续的预设关键点预测网络。Please refer to 4, the process of detecting the human body frame is a traditional method, which directly detects the human body frame through the human body detection network (Person detector), and the detection mechanism of this application please refer to Figure 5, where Tracker is a preset tracker that can track the human body, and Pose landmarks is a subsequent preset key point prediction network.
步骤205、通过基准姿态估计网络在目标人体框中获取基准关键点。Step 205: Obtain benchmark key points in the target human frame through a benchmark pose estimation network.
步骤206、依据干扰部位分类网络对目标人体框中的人体躯干进行干扰性分析,得到干扰性向量。Step 206: Perform interference analysis on the human body torso in the target human body frame according to the interference part classification network to obtain an interference vector.
步骤207、基于基准关键点和干扰性向量进行加权计算,得到人体关键点。Step 207: Perform weighted calculation based on the reference key points and the interfering vectors to obtain the human body key points.
需要说明的是,预设关键点预测网络包括基准姿态估计网络和干扰部位分类网络。为了提高这种多人躯干重叠时,主体目标关键点的预测准确率,本实施例在关键点预测过程增设了干扰部位分类网络,请参阅图6,其中Image为上述检测得到的目标人体框图,目标人体框除了需要输入基准姿态估计网络进行关键点预测之外,还需要输入干扰部位分类网络进行干扰性分析;然后将二者的结果通过通道加权计算,得到更加准确的人体关键点预测结果。此外,每个当前帧图像预测出人体关键点后,都需要提取该人体关键点的最小外接矩形,并扩展预置数量像素得到扩展框,与上一帧对应的扩展框计算当前IOU。It should be noted that the preset key point prediction network includes a reference posture estimation network and an interference part classification network. In order to improve the prediction accuracy of the subject target key points when multiple torsos overlap, the present embodiment adds an interference part classification network in the key point prediction process. Please refer to Figure 6, where Image is the target human frame diagram obtained by the above detection. In addition to inputting the reference posture estimation network for key point prediction, the target human frame also needs to input the interference part classification network for interference analysis; then the results of the two are calculated through channel weighting to obtain a more accurate human key point prediction result. In addition, after each current frame image predicts the human key point, it is necessary to extract the minimum circumscribed rectangle of the human key point, and expand the preset number of pixels to obtain the extended frame, and calculate the current IOU with the extended frame corresponding to the previous frame.
干扰部位分类网络主要是预测框中每个关节点是否属于主体目标,或者说,对属于主体的关节点和不属于主体的关节点进行二分类的过程。本实施例中选 取轻量级网络MobileNetV2作为干扰部位分类网络的主干网络,用于提取适合干扰部位分类任务的特征,获得每个关节点的干扰性向量:
V=[v0p0,v1p1,.....vkpk],vi∈{0,1},i∈[0,k]
The interference part classification network mainly predicts whether each joint point in the frame belongs to the main target, or in other words, the process of classifying the joint points belonging to the main target and the joint points not belonging to the main target. The lightweight network MobileNetV2 is used as the backbone network of the interference part classification network to extract features suitable for the interference part classification task and obtain the interference vector of each joint point:
V=[ v0p0 v1p1 ..... vkpk ]vi∈ {0,1},i∈[0,k]
其中,pi为人体每个关节点,vi为二值变量,表示第i个关节点是否为干扰关节点,取值0即为非干扰关节点,取值1则为干扰关节点,k为关节点的数量。Among them, pi is each joint point of the human body, vi is a binary variable, indicating whether the i-th joint point is an interference joint point. The value 0 means a non-interference joint point, the value 1 means an interference joint point, and k is the number of joint points.
本实施例对MobileNetV2做出适当的修改,以适应干扰部位分类任务,将网络MobileNetV2中用于图像分类的1000维全连接分类器替换为输出通道数为k的1×1卷积。基于以上构成的预设关键点预测网络的损失函数表达为:
L=Llm+λLic
This embodiment makes appropriate modifications to MobileNetV2 to adapt to the interference part classification task, and replaces the 1000-dimensional fully connected classifier used for image classification in the network MobileNetV2 with a 1×1 convolution with k output channels. The loss function of the preset key point prediction network based on the above structure is expressed as:
L=L lm +λL ic
其中,Llm为基准姿态估计网络的损失函数,Lic为干扰部位分类网络的损失函数,λ为平衡因子。因此,本实施例可以在引入较小的计算代价的情况下有效的提高多人干扰造成的关键点预测效果较差的问题。Wherein, L lm is the loss function of the reference posture estimation network, L ic is the loss function of the interference part classification network, and λ is the balance factor. Therefore, this embodiment can effectively improve the problem of poor key point prediction effect caused by multi-person interference at a relatively low computational cost.
步骤208、基于关键点匹配算法将人体关键点连接为人体骨架图。Step 208: Connect the key points of the human body into a human skeleton graph based on a key point matching algorithm.
步骤209、根据人体骨架图对目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。Step 209: Recognize the human body posture in the target person image frame according to the human body skeleton diagram to obtain a human body posture recognition result.
关键点匹配算法可以根据匹配准确度选取合适的算法,在此不做限定。人体骨架图包括关键点连接成的线条和原始的目标人物图像帧,叠加显示便于对应人体部位。The key point matching algorithm can select a suitable algorithm according to the matching accuracy, and is not limited here. The human skeleton image includes lines connected by key points and the original target person image frame, which are superimposed and displayed to facilitate the correspondence of human body parts.
本实施例提供的方法既保留了Top-down类型的高精度,又得到了比Top-down类型算法更快的速度,降低了计算量和功耗,使得算法更具实际工程意义。The method provided in this embodiment not only retains the high precision of the Top-down type, but also achieves a faster speed than the Top-down type algorithm, reduces the amount of calculation and power consumption, and makes the algorithm more practical engineering significance.
本申请实施例提供的人体姿态识别方法,通过跟踪器配合人体检测网络进行人体框检测,依据IOU和阈值对人体进行跟踪可以减少人体检测次数,降低了冗余计算,能够加快处理速度;而在关键点预测过程中加入了干扰部位分类网络进行干扰分析,基于干扰分析结果进行关键点预测能够提升预测准确度;通过对人体姿态识别过程进行针对性改进,可以同时兼顾较高的识别精度和较快的识别速度。因此,本申请实施例能够解决现有技术无法同时兼顾较高识别精度和较快识别速度的技术问题。The human posture recognition method provided in the embodiment of the present application performs human frame detection through a tracker in cooperation with a human body detection network. Tracking the human body according to IOU and threshold can reduce the number of human body detections, reduce redundant calculations, and speed up processing. In the key point prediction process, an interference part classification network is added to perform interference analysis, and key point prediction based on the interference analysis results can improve prediction accuracy. By making targeted improvements to the human posture recognition process, both high recognition accuracy and fast recognition speed can be taken into account. Therefore, the embodiment of the present application can solve the technical problem that the prior art cannot take both high recognition accuracy and fast recognition speed into account.
为了便于理解,请参阅图3,本申请还提供了一种人体姿态识别装置的实施例,包括: For ease of understanding, please refer to FIG. 3 . The present application also provides an embodiment of a human posture recognition device, including:
人体框检测单元301,用于结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,预置跟踪器为基于IOU和阈值对人体进行跟踪的网络;A human frame detection unit 301 is used to perform human body detection on a target person image frame in combination with a preset tracker and a human body detection network to obtain a target human body frame, wherein the preset tracker is a network for tracking a human body based on IOU and a threshold;
关键点预测单元302,用于采用预设关键点预测网络在目标人体框中预测出人体关键点,预设关键点预测网络包括基准姿态估计网络和干扰部位分类网络;A key point prediction unit 302 is used to predict the key points of the human body in the target human body frame using a preset key point prediction network, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network;
人体姿态识别单元303,用于根据人体关键点对目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。The human body posture recognition unit 303 is used to recognize the human body posture in the target person image frame according to the human body key points to obtain the human body posture recognition result.
进一步地,还包括:Furthermore, it also includes:
图像帧获取单元304,用于在监控视频中获取多个人物图像帧;An image frame acquisition unit 304 is used to acquire multiple person image frames in the surveillance video;
图像预处理单元305,用于对人物图像帧进行预处理操作,得到目标人物图像帧,预处理包括剪裁处理、减均值处理和正态化处理。The image preprocessing unit 305 is used to perform preprocessing operations on the character image frame to obtain a target character image frame, and the preprocessing includes clipping, mean reduction and normalization.
进一步地,人体框检测单元301,包括:Furthermore, the human frame detection unit 301 includes:
第一判断子单元3011,用于通过预置跟踪器判断上一帧的历史IOU是否小于阈值,若是,则通过人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,并更新当前IOU;The first judgment subunit 3011 is used to judge whether the historical IOU of the previous frame is less than the threshold value through the preset tracker. If so, the human body detection is performed on the target person image frame through the human body detection network to obtain the target person frame and update the current IOU;
第二判断子单元3012,用于若否,则将上一帧人体关键点的最小外接矩形扩展预置数量像素后得到的扩展框作为目标人体框,并更新当前IOU。The second judgment subunit 3012 is used to, if not, expand the minimum bounding rectangle of the human body key points in the previous frame by a preset number of pixels to obtain an expanded frame as the target human body frame, and update the current IOU.
进一步地,关键点预测单元302,包括:Furthermore, the key point prediction unit 302 includes:
基准预测子单元3021,用于通过基准姿态估计网络在目标人体框中获取基准关键点;A reference prediction subunit 3021 is used to obtain reference key points in the target human frame through a reference pose estimation network;
干扰分析子单元3022,用于依据干扰部位分类网络对目标人体框中的人体躯干进行干扰性分析,得到干扰性向量;The interference analysis subunit 3022 is used to perform interference analysis on the human body torso in the target human body frame according to the interference part classification network to obtain an interference vector;
加权计算子单元3023,用于基于基准关键点和干扰性向量进行加权计算,得到人体关键点。The weighted calculation subunit 3023 is used to perform weighted calculation based on the reference key points and the interfering vectors to obtain the human body key points.
进一步地,人体姿态识别单元303,包括:Furthermore, the human body posture recognition unit 303 includes:
连接子单元3031,用于基于关键点匹配算法将人体关键点连接为人体骨架图;A connecting subunit 3031 is used to connect the key points of the human body into a human skeleton graph based on a key point matching algorithm;
识别子单元3032,用于根据人体骨架图对目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。The recognition subunit 3032 is used to recognize the human body posture in the target person image frame according to the human body skeleton diagram to obtain a human body posture recognition result.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可 以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods can be It can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以通过一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-Only Memory,英文缩写:ROM)、随机存取存储器(英文全称:Random Access Memory,英文缩写:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art, or all or part of the technical solution. The computer software product is stored in a storage medium, including a number of instructions for executing all or part of the steps of the method described in each embodiment of the present application through a computer device (which can be a personal computer, server, or network device, etc.). The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (full name in English: Read-Only Memory, English abbreviation: ROM), random access memory (full name in English: Random Access Memory, English abbreviation: RAM), disk or optical disk and other media that can store program codes.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。 As described above, the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

  1. 一种人体姿态识别方法,其特征在于,包括:A human body posture recognition method, characterized by comprising:
    结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,所述预置跟踪器为基于IOU和阈值对人体进行跟踪的网络;Combining a preset tracker and a human body detection network to perform human body detection on the target person image frame to obtain a target human body frame, wherein the preset tracker is a network that tracks the human body based on IOU and threshold;
    采用预设关键点预测网络在所述目标人体框中预测出人体关键点,所述预设关键点预测网络包括基准姿态估计网络和干扰部位分类网络;A preset key point prediction network is used to predict key points of a human body in the target human body frame, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network;
    根据所述人体关键点对所述目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。The human body posture in the target person image frame is recognized according to the human body key points to obtain a human body posture recognition result.
  2. 根据权利要求1所述的人体姿态识别方法,其特征在于,所述结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,所述预置跟踪器为基于IOU和阈值对人体进行跟踪的网络,之前还包括:The human posture recognition method according to claim 1 is characterized in that the preset tracker and the human body detection network are combined to perform human body detection on the target person image frame to obtain the target human body frame, and the preset tracker is a network that tracks the human body based on IOU and threshold, and the method also includes:
    在监控视频中获取多个人物图像帧;Acquire multiple person image frames in the surveillance video;
    对所述人物图像帧进行预处理操作,得到目标人物图像帧,所述预处理包括剪裁处理、减均值处理和正态化处理。A preprocessing operation is performed on the character image frame to obtain a target character image frame, wherein the preprocessing includes clipping, mean reduction and normalization.
  3. 根据权利要求1所述的人体姿态识别方法,其特征在于,所述结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,所述预置跟踪器为基于IOU和阈值对人体进行跟踪的网络,包括:The human posture recognition method according to claim 1 is characterized in that the preset tracker and the human body detection network are combined to perform human body detection on the target person image frame to obtain the target human body frame, and the preset tracker is a network that tracks the human body based on IOU and threshold, including:
    通过预置跟踪器判断上一帧的历史IOU是否小于阈值,若是,则通过人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,并更新当前IOU;The preset tracker is used to determine whether the historical IOU of the previous frame is less than the threshold. If so, the human body detection network is used to perform human body detection on the target person image frame, obtain the target person frame, and update the current IOU;
    若否,则将上一帧人体关键点的最小外接矩形扩展预置数量像素后得到的扩展框作为所述目标人体框,并更新所述当前IOU。If not, the extended frame obtained by expanding the minimum bounding rectangle of the human body key points in the previous frame by a preset number of pixels is used as the target human body frame, and the current IOU is updated.
  4. 根据权利要求1所述的人体姿态识别方法,其特征在于,所述采用预设关键点预测网络在所述目标人体框中预测出人体关键点,所述预设关键点预测网络包括基准姿态估计网络和干扰部位分类网络,包括:The human body posture recognition method according to claim 1 is characterized in that the human body key points are predicted in the target human body frame using a preset key point prediction network, the preset key point prediction network includes a reference posture estimation network and an interference part classification network, including:
    通过基准姿态估计网络在所述目标人体框中获取基准关键点;Obtaining benchmark key points in the target human frame through a benchmark pose estimation network;
    依据干扰部位分类网络对所述目标人体框中的人体躯干进行干扰性分析,得到干扰性向量;Performing interference analysis on the human body torso in the target human body frame according to the interference part classification network to obtain an interference vector;
    基于所述基准关键点和所述干扰性向量进行加权计算,得到人体关键点。A weighted calculation is performed based on the reference key points and the interfering vectors to obtain the key points of the human body.
  5. 根据权利要求1所述的人体姿态识别方法,其特征在于,所述根据所述人体关键点对所述目标人物图像帧中的人体姿态进行识别,得到人体姿态识别 结果,包括:The human body posture recognition method according to claim 1 is characterized in that the human body posture in the target person image frame is recognized according to the human body key points to obtain the human body posture recognition Results, including:
    基于关键点匹配算法将所述人体关键点连接为人体骨架图;Connecting the human body key points into a human body skeleton graph based on a key point matching algorithm;
    根据所述人体骨架图对所述目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。The human body posture in the target person image frame is recognized according to the human body skeleton diagram to obtain a human body posture recognition result.
  6. 一种人体姿态识别装置,其特征在于,包括:A human body posture recognition device, characterized by comprising:
    人体框检测单元,用于结合预置跟踪器和人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,所述预置跟踪器为基于IOU和阈值对人体进行跟踪的网络;A human frame detection unit, used to perform human body detection on a target person image frame in combination with a preset tracker and a human body detection network to obtain a target human body frame, wherein the preset tracker is a network that tracks a human body based on IOU and a threshold;
    关键点预测单元,用于采用预设关键点预测网络在所述目标人体框中预测出人体关键点,所述预设关键点预测网络包括基准姿态估计网络和干扰部位分类网络;A key point prediction unit, used to predict the key points of the human body in the target human body frame by using a preset key point prediction network, wherein the preset key point prediction network includes a reference posture estimation network and an interference part classification network;
    人体姿态识别单元,用于根据所述人体关键点对所述目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。The human body posture recognition unit is used to recognize the human body posture in the target person image frame according to the human body key points to obtain a human body posture recognition result.
  7. 根据权利要求6所述的人体姿态识别装置,其特征在于,还包括:The human body posture recognition device according to claim 6, characterized in that it also includes:
    图像帧获取单元,用于在监控视频中获取多个人物图像帧;An image frame acquisition unit, used to acquire multiple person image frames in the surveillance video;
    图像预处理单元,用于对所述人物图像帧进行预处理操作,得到目标人物图像帧,所述预处理包括剪裁处理、减均值处理和正态化处理。The image preprocessing unit is used to perform a preprocessing operation on the character image frame to obtain a target character image frame, wherein the preprocessing includes clipping, mean reduction and normalization.
  8. 根据权利要求6所述的人体姿态识别装置,其特征在于,所述人体框检测单元,包括:The human body posture recognition device according to claim 6, characterized in that the human body frame detection unit comprises:
    第一判断子单元,用于通过预置跟踪器判断上一帧的历史IOU是否小于阈值,若是,则通过人体检测网络对目标人物图像帧进行人体检测,得到目标人体框,并更新当前IOU;The first judgment subunit is used to judge whether the historical IOU of the previous frame is less than the threshold through the preset tracker. If so, the human body detection network is used to perform human body detection on the target person image frame to obtain the target person frame and update the current IOU;
    第二判断子单元,用于若否,则将上一帧人体关键点的最小外接矩形扩展预置数量像素后得到的扩展框作为所述目标人体框,并更新所述当前IOU。The second judgment subunit is used to, if not, expand the minimum circumscribed rectangle of the human body key points of the previous frame by a preset number of pixels to obtain an expanded frame as the target human body frame, and update the current IOU.
  9. 根据权利要求6所述的人体姿态识别装置,其特征在于,所述关键点预测单元,包括:The human body posture recognition device according to claim 6, characterized in that the key point prediction unit comprises:
    基准预测子单元,用于通过基准姿态估计网络在所述目标人体框中获取基准关键点;A reference prediction subunit, used for obtaining reference key points in the target human body frame through a reference pose estimation network;
    干扰分析子单元,用于依据干扰部位分类网络对所述目标人体框中的人体躯干进行干扰性分析,得到干扰性向量;An interference analysis subunit, used for performing interference analysis on the human body trunk in the target human body frame according to the interference part classification network to obtain an interference vector;
    加权计算子单元,用于基于所述基准关键点和所述干扰性向量进行加权计 算,得到人体关键点。A weighted calculation subunit is used to perform weighted calculation based on the reference key point and the interfering vector. Calculate and get the key points of the human body.
  10. 根据权利要求6所述的人体姿态识别装置,其特征在于,所述人体姿态识别单元,包括:The human body posture recognition device according to claim 6, characterized in that the human body posture recognition unit comprises:
    连接子单元,用于基于关键点匹配算法将所述人体关键点连接为人体骨架图;A connecting subunit, used for connecting the human body key points into a human body skeleton graph based on a key point matching algorithm;
    识别子单元,用于根据所述人体骨架图对所述目标人物图像帧中的人体姿态进行识别,得到人体姿态识别结果。 The recognition subunit is used to recognize the human body posture in the target person image frame according to the human body skeleton diagram to obtain a human body posture recognition result.
PCT/CN2023/133598 2022-11-30 2023-11-23 Human body pose recognition method and device WO2024114500A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211520227.XA CN115909497A (en) 2022-11-30 2022-11-30 Human body posture recognition method and device
CN202211520227.X 2022-11-30

Publications (1)

Publication Number Publication Date
WO2024114500A1 true WO2024114500A1 (en) 2024-06-06

Family

ID=86487728

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/133598 WO2024114500A1 (en) 2022-11-30 2023-11-23 Human body pose recognition method and device

Country Status (2)

Country Link
CN (1) CN115909497A (en)
WO (1) WO2024114500A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909497A (en) * 2022-11-30 2023-04-04 天翼数字生活科技有限公司 Human body posture recognition method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205623A1 (en) * 2016-12-08 2019-07-04 Tencent Technology (Shenzhen) Company Limited Facial tracking method and apparatus, and storage medium
CN111723687A (en) * 2020-06-02 2020-09-29 北京的卢深视科技有限公司 Human body action recognition method and device based on neural network
CN113850221A (en) * 2021-09-30 2021-12-28 北京航空航天大学 Attitude tracking method based on key point screening
CN115359516A (en) * 2022-08-29 2022-11-18 功夫链(上海)体育文化发展有限公司 Method for estimating key point jitter by 2D human body posture
CN115909497A (en) * 2022-11-30 2023-04-04 天翼数字生活科技有限公司 Human body posture recognition method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205623A1 (en) * 2016-12-08 2019-07-04 Tencent Technology (Shenzhen) Company Limited Facial tracking method and apparatus, and storage medium
CN111723687A (en) * 2020-06-02 2020-09-29 北京的卢深视科技有限公司 Human body action recognition method and device based on neural network
CN113850221A (en) * 2021-09-30 2021-12-28 北京航空航天大学 Attitude tracking method based on key point screening
CN115359516A (en) * 2022-08-29 2022-11-18 功夫链(上海)体育文化发展有限公司 Method for estimating key point jitter by 2D human body posture
CN115909497A (en) * 2022-11-30 2023-04-04 天翼数字生活科技有限公司 Human body posture recognition method and device

Also Published As

Publication number Publication date
CN115909497A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
Mahmood et al. Robust spatio-temporal features for human interaction recognition via artificial neural network
Kim et al. Simultaneous gesture segmentation and recognition based on forward spotting accumulative HMMs
CN102831439B (en) Gesture tracking method and system
CN111898504B (en) Target tracking method and system based on twin circulating neural network
WO2019023921A1 (en) Gesture recognition method, apparatus, and device
JP2018538631A (en) Method and system for detecting an action of an object in a scene
CN111598066A (en) Helmet wearing identification method based on cascade prediction
CN107832716B (en) Anomaly detection method based on active and passive Gaussian online learning
WO2024114500A1 (en) Human body pose recognition method and device
CN108830170A (en) A kind of end-to-end method for tracking target indicated based on layered characteristic
CN111105443A (en) Video group figure motion trajectory tracking method based on feature association
CN110807391A (en) Human body posture instruction identification method for human-unmanned aerial vehicle interaction based on vision
CN117541994A (en) Abnormal behavior detection model and detection method in dense multi-person scene
CN111639570B (en) Online multi-target tracking method based on motion model and single-target clue
CN112949569A (en) Effective extraction method of human body posture points for tumble analysis
Romaissa et al. Vision-based multi-modal framework for action recognition
CN110309729A (en) Tracking and re-detection method based on anomaly peak detection and twin network
Al-Obaidi et al. Temporal salience based human action recognition
CN117894065A (en) Multi-person scene behavior recognition method based on skeleton key points
CN112734800A (en) Multi-target tracking system and method based on joint detection and characterization extraction
CN117593794A (en) Improved YOLOv7-tiny model and human face detection method and system based on model
JP2007510994A (en) Object tracking in video images
Hu et al. Gesture detection from RGB hand image using modified convolutional neural network
CN111127355A (en) Method for finely complementing defective light flow graph and application thereof
CN117315767A (en) Dynamic gesture behavior recognition method and device based on AI recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23896651

Country of ref document: EP

Kind code of ref document: A1