CN113034580A

CN113034580A - Image information detection method and device and electronic equipment

Info

Publication number: CN113034580A
Application number: CN202110248659.9A
Authority: CN
Inventors: 罗宇轩; 朱泳明; 唐堂
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-06-25
Anticipated expiration: 2041-03-05
Also published as: CN113034580B

Abstract

The embodiment of the disclosure discloses an image information detection method and device and electronic equipment. One specific implementation of the method is as follows: acquiring a target image to be processed and a historical image processed at the last moment; the historical image comprises position information of at least one historical key point of a target human body part; and determining the position information of at least one key point of the target human body part in the target image according to the position information of the at least one historical key point. The position information of the key points of the target human body part in the target image can be determined through the position information of the historical key points, so that the posture of the target human body part can be determined, the operation efficiency is improved, and the user experience is improved.

Description

Image information detection method and device and electronic equipment

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to an image information detection method and apparatus, and an electronic device.

Background

Human target detection has been the focus in machine vision research. In the human target detection process, the current posture of the human body can be determined by detecting the key point information of the target human body part, and then basic information can be provided for the aspects of intelligent monitoring, behavior analysis and the like.

Disclosure of Invention

This disclosure is provided to introduce concepts in a simplified form that are further described below in the detailed description. This disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

The embodiment of the disclosure provides an image information detection method and device and electronic equipment.

In a first aspect, an embodiment of the present disclosure provides an image information detection method, where the method includes: acquiring a target image to be processed and a historical image processed at the last moment; the historical image comprises position information of at least one historical key point of a target human body part; and the method is used for determining the position information of at least one key point of the target human body part in the target image according to the position information of the at least one historical key point.

In a second aspect, an embodiment of the present disclosure provides an image information detecting apparatus, including: the acquisition module is used for acquiring a target image to be processed and a historical image processed at the last moment; the historical image comprises position information of at least one historical key point of a target human body part; and the determining module is used for determining the position information of at least one key point of the target human body part in the target image according to the position information of the at least one historical key point.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the image information detection method of the first aspect.

In a fourth aspect, the disclosed embodiments provide a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements the steps of the image information detection method described in the first aspect above.

According to the image information detection method, the image information detection device and the electronic equipment, the target image to be processed and the historical image processed at the last moment are obtained; the historical image comprises position information of at least one historical key point of a target human body part; and determining the position information of at least one key point of the target human body part in the target image according to the position information of the at least one historical key point. The position information of the key points of the target human body part in the target image can be determined through the position information of the historical key points, so that the posture of the target human body part can be determined, the operation efficiency is improved, and the user experience is improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

FIG. 1 is a flow diagram of one embodiment of an image information detection method according to the present disclosure;

FIG. 2 is a schematic flow chart diagram of another embodiment of an image information detection method according to the present disclosure;

FIG. 3 is a flow chart illustrating the training steps of the test model according to the present disclosure;

FIG. 4 is a schematic diagram of an embodiment of an image information detection apparatus according to the present disclosure;

FIG. 5 is an exemplary system architecture to which the image information detection method of one embodiment of the present disclosure may be applied;

fig. 6 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1, which shows a flowchart of an embodiment of an image information detection method according to the present disclosure, as shown in fig. 1, the image information detection method includes the following steps 101 to 102.

Step 101, acquiring a target image to be processed and a historical image processed at the last moment; the historical image comprises position information of at least one historical key point of a target human body part;

the target human body part may include, for example, a head part, a hand part, a foot part, and the like.

In some application scenarios, the target human body part included in the first frame image and the position information of at least one key point of the target human body part may be determined first. Thereafter, the first frame image may be determined as a history image of the second frame image, and the second frame image may be determined as a history image of the third frame image. By analogy, the previous frame image of each frame image can be determined as the history image corresponding to the frame image. The keypoints detected in the history image may be regarded as the above-described history keypoints. When the target human body part in the history image is a foot, the history key point may be, for example, a toe, an ankle, or the like.

In these application scenarios, the target human body part included in the image may be determined by, for example, a Single Shot multi-box detector (SSD), and the position information of at least one key point of the target human body part may be determined by a Convolutional gesture machine (CPM).

The target image can be regarded as an image to be processed at the current moment. After the target image is acquired, a history image processed at the last time may be acquired. Here, the history image processed at the last time may include, for example, an image of a frame preceding the processed target image.

And 102, determining the position information of at least one key point of the target human body part in the target image according to the position information of the at least one historical key point.

After the position information of at least one historical key point in the historical image is detected, the position information of a plurality of key points of the target human body part in the target image can be determined based on the position information of the historical key points.

After the position information of the key points is determined, the current posture of the target human body part can be determined. For example, after the position information of at least one historical key point of the foot in the historical image is obtained, the position information of at least one key point of the foot existing in the target image can be determined. If the current normal foot tilting state of the foot is determined according to the position information of the key points, a special effect of putting on the shoes can be added to the foot, so that the effect of putting on the shoes can be experienced by the user, and certain convenience is provided for the user whether to purchase new shoes or make special images.

In the related art, when tracking a target human body part, it is generally necessary to determine a confidence level of whether a local image of the target image, which is considered as the target human body part, is actually the target human body part. It is then also necessary to locate at least one keypoint of the target human body part. Thus, two processes (the process of determining the confidence and the process of locating the key point) are required to determine whether the target human body part is tracked, so that the process of tracking the target human body part has low operation efficiency.

In the embodiment, a target image to be processed and a history image processed at the last moment are obtained firstly; the historical image comprises position information of at least one historical key point of a target human body part; and then determining the position information of at least one key point of the target human body part in the target image according to the position information of the at least one historical key point. The position information of the key points of the target human body part in the target image can be determined through the position information of the historical key points, so that the posture of the target human body part can be determined, the operation efficiency is improved, and the user experience is improved.

Referring to fig. 2, which shows a flowchart of an embodiment of an image information detection method according to the present disclosure, as shown in fig. 2, the image information detection method includes the following steps 201 to 203.

Step 201, acquiring a target image to be processed and a historical image processed at the last moment; the historical image comprises position information of at least one historical key point of the target human body part.

The implementation process of step 201 and the technical effects thereof may be the same as or similar to step 101 in the embodiment shown in fig. 1, and are not described herein again.

Step 202, determining a target area image in the historical image according to the position information of the at least one historical key point;

after the position information of each of the historical key points in the historical image is determined, the target area image in the historical image may be determined based on the position information. The target area image here can be used to characterize a target body part.

In some alternative implementations, the step 202 may include the following steps:

step 2021, determining a minimum region image containing the at least one historical keypoint;

in some application scenarios, the minimum local image including a plurality of historical key points may be determined as the minimum area image. The minimum area image here may be, for example, a minimum rectangular area image including these historical key points.

Step 2022, determining the target area image in the history image based on the minimum area image.

After the minimum area image is determined, the target area image may be determined. In some application scenarios, the moving range of the target human body part between adjacent time points may be considered to be small, so that the minimum area image may be enlarged to obtain a target area image in which the target human body part may exist. For example, the minimum rectangular region image may be enlarged by 2 times to obtain a target region image in which a target human body part may exist.

The step 2021 and the step 2022 provide an implementation manner that can determine the target area image, and can determine the target area image more accurately and simply.

Step 203, determining the position information of at least one key point of the target human body part in the target image based on the target area image.

After the target area image is determined, the position information of a plurality of key points of the target human body part in the target image at the next moment can be predicted.

In the related art, the process of determining the confidence level and the process of determining the keypoints are generally different and complete processes, and then an operation of determining a local image needs to be performed when determining the confidence level and the keypoints, which causes low operation efficiency.

In this embodiment, a target area image (local image) may be determined first, position information of a key point in the target image is obtained through the target area image, and the two processes are converged into one process for operation, so that the operation efficiency is improved.

In some optional implementations, the step 203 includes: and inputting the target area image into a pre-trained detection model, and determining the position information of at least one key point of the target human body part in the target image according to a detection result output by the detection model.

The detection model can predict the position information of at least one key point of the target human body part at the next moment based on the target area image. In some application scenarios, the detection model may be similar to the CPM described above, for example.

In some optional implementations, the detection result includes a target region confidence, location information of at least one keypoint, and a location confidence corresponding to the at least one keypoint;

the target portion confidence may be regarded as a probability that the target human body portion is detected in the target region image. The probability may be, for example, 0.7, 0.8, etc.

The position information of the at least one key point may represent the position of each key point of the target human body part in the target area image.

The position confidence corresponding to the at least one keypoint may be regarded as the probability that each point really represents the corresponding keypoint of the target human body part in the target area image. The probability may include, for example, 0.8, 0.9, 0.7, 0.9, etc.

Thus, the above-mentioned determining the position information of at least one key point of the target human body part in the target image according to the detection result output by the detection model includes: firstly, judging whether the confidence of the target part is greater than a first preset confidence and whether the position confidence corresponding to the at least one key point is greater than a second preset confidence;

after the detection model outputs the confidence of the target portion, the position information of the at least one key point, and the position confidence corresponding to the at least one key point, it may be determined whether the confidence of the target portion is greater than a first preset confidence, and whether the confidence of the position corresponding to the at least one key point is greater than a second preset confidence.

The first preset confidence may be, for example, 0.95, 0.9, or the like, which substantially satisfies the condition for determining the local image for detecting the target human body part.

The second predetermined confidence level may be, for example, 0.92, 0.9, etc., which may be used to determine a sufficiently confident value of the detected confidence level.

In some application scenarios, when determining whether the position confidence corresponding to the at least one key point is greater than a preset second preset confidence, an average value of the position confidence corresponding to the plurality of key points may be calculated, or a minimum value of the position confidence may be determined. And then comparing the average value or the minimum value with a second preset confidence coefficient.

And then, if so, determining the position information of the at least one key point as the position information of the at least one key point of the target human body part in the target image.

When it is determined that the confidence of the target part is greater than the first preset confidence and the confidence of the position corresponding to the at least one keypoint is greater than the second preset confidence, the position information of the at least one keypoint output together can be determined as the position information of the corresponding keypoint in the target image.

Through the judgment process of the detection result, the output position information of the key points can be closer to the real position information of each key point in the target image.

In some optional implementations, the initial detection model corresponding to the detection model includes an initial first task network, an initial second task network, and an initial third task network;

in some application scenarios, the detection model may be obtained by training the initial detection model to converge. In these application scenarios, the initial detection model may be trained by a multitask deep learning approach. That is, a first task of determining confidence of the target portion, and a second task of determining position information and position confidence of the keypoints (i.e., a task of locating the keypoints) may be set. In this way, the initial second task network and the initial third task network described above can share the output of the initial first task network and mutually promote using the correlation between the first task and the second task.

Referring to fig. 3, a flowchart illustrating a training step of the detection model according to the present disclosure is shown, and as shown in fig. 3, the training step includes the following steps 301 to 304.

Step 301, obtaining a sample image for training;

in some application scenarios, the sample image may include an image in which the target human body part is presented, and an image in which the target human body part is not presented. By increasing the diversity of the sample images, the capability of the detection model for judging whether the target human body part exists in the images is improved.

Step 302, taking the sample image as an input of an initial first task network, and respectively inputting a local image output by the initial first task network into an initial second task network and an initial third task network;

the sample image is input into an initial first task network, and the partial image is output through the initial first task network. When the target human body part is present in the sample image, the local image may be an image of an area where the target human body part is present.

After the initial first task network outputs the partial image, the partial image may be used as an input for the initial second task network and the initial third task network.

Step 303, calculating a target part confidence coefficient of the target human body part detected in the sample image by using the initial second task network;

after the initial second task network receives the local image, the probability of whether the target human body part exists in the local image (i.e., the target part confidence) can be calculated.

Step 304, detecting the position information of at least one key point in the sample image by using the initial third task network, and detecting the position confidence corresponding to the at least one key point, so as to train the initial detection model into the detection model.

After the initial third task network receives the local image, the position information of at least one key point of the target human body part presented in the local image can be detected, and the position confidence corresponding to each key point is calculated. Here, the initial third task network may detect the position information of at least one key point and the position confidence corresponding to each key point based on a heat map (heatmap) corresponding to the local image, for example.

In this way, the initial second task network and the initial third task network can commonly use the partial image output by the initial first task network. The initial second task network and the initial third task network do not need to determine local images respectively during calculation, and time for determining the local images is saved. In addition, the initial second task network and the initial third task network are converged in the same model, so that the detection model converged based on the initial detection model can output the confidence of the target part, the position information of at least one key point and the position confidence corresponding to at least one key point.

In some optional implementations, the training step of the detection model may further include:

step 305, taking the confidence of the expected target part of the target human body part detected in the sample image as a first expected output of the initial detection model; and using the real position information of at least one key point of the target human body part presented in the sample image as a second expected output of the initial detection model; and taking the expected position confidence corresponding to the at least one key point as a third expected output of the initial detection model;

the desired target site location confidence may be 1, for example. The desired position confidence may be 1, for example. By inputting the three expected outputs into the initial detection model, the output of the initial detection model can be stimulated to approach the output result of the expected output as much as possible, and the prediction accuracy of the detection model is improved.

Step 306, training the initial detection model to converge based on the confidence of the target portion, the position information of the at least one key point, the position confidence corresponding to the at least one key point, the first expected output, the second expected output, and the third expected output.

Through the detection of a plurality of sample images, the initial detection model can be learned for a plurality of times, and then the output detection results are respectively close to the first expected output, the second expected output and the third expected output as much as possible until the initial detection model converges. Through the converged detection model, a more accurate predicted value can be output, and then when a target area image is input, the confidence of a target part in the target image, the position information of at least one key point and the position confidence corresponding to the at least one key point can be more accurately predicted.

Referring to fig. 4, which shows a schematic structural diagram of an embodiment of the image information detection apparatus according to the present disclosure, as shown in fig. 4, the image information detection apparatus includes an obtaining module 401 and a determining module 402. The acquiring module 401 is configured to acquire a target image to be processed and a history image processed at a previous time; the historical image comprises position information of at least one historical key point of a target human body part; a determining module 402, configured to determine, according to the location information of the at least one historical keypoint, location information of the at least one keypoint of the target human body part in the target image.

It should be noted that specific processing of the obtaining module 401 and the determining module 402 of the image information detecting apparatus and technical effects thereof can refer to the related descriptions of step 101 to step 102 in the corresponding embodiment of fig. 1, which are not repeated herein.

In some optional implementations of this embodiment, the determining module 402 is further configured to: determining a target area image in the historical image according to the position information of the at least one historical key point; and determining the position information of at least one key point of the target human body part in the target image based on the target area image.

In some optional implementations of this embodiment, the determining module 402 is further configured to: and inputting the target area image into a pre-trained detection model, and determining the position information of at least one key point of the target human body part in the target image according to a detection result output by the detection model.

In some optional implementations of this embodiment, the detection result includes a confidence of the target portion, location information of at least one keypoint, and a location confidence corresponding to the at least one keypoint; and the determining module 402 is further configured to: judging whether the confidence of the target part is greater than a first preset confidence and whether the position confidence corresponding to the at least one key point is greater than a second preset confidence; and if so, determining the position information of the at least one key point as the position information of the at least one key point of the target human body part in the target image.

In some optional implementation manners of this embodiment, the initial detection model corresponding to the detection model includes an initial first task network, an initial second task network, and an initial third task network; and the detection model is trained based on the following steps: acquiring a sample image for training; taking the sample image as the input of an initial first task network, and respectively inputting the local image output by the initial first task network into an initial second task network and an initial third task network; calculating a target part confidence coefficient of the target human body part detected in the sample image by using the initial second task network; and detecting the position information of at least one key point in the sample image by using the initial third task network, and detecting the position confidence corresponding to the at least one key point so as to train the initial detection model into the detection model.

In some optional implementations of this embodiment, the training step of the detection model further includes: determining a first expected output of the initial detection model based on the target body part confidence for the detected target body part; and using the real position information of at least one key point of the target human body part presented in the sample image as a second expected output of the initial detection model; and taking the expected position confidence corresponding to the at least one key point as a third expected output of the initial detection model; training the initial detection model to converge based on the target part confidence, the location information of the at least one keypoint, the location confidence corresponding to the at least one keypoint, the first expected output, the second expected output, and the third expected output.

In some optional implementations of this embodiment, the determining module 402 is further configured to: determining a minimum region image containing the at least one historical keypoint; and determining a target area image in the historical image based on the minimum area image.

Referring to fig. 5, an exemplary system architecture to which the image information detection method of one embodiment of the present disclosure may be applied is shown.

As shown in fig. 5, the system architecture may include

terminal devices

501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the

terminal devices

501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The terminal devices and servers described above may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., Ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The

terminal devices

501, 502, 503 may interact with a server 505 over a network 504 to receive or send messages or the like. The

terminal devices

501, 502, 503 may have various client applications installed thereon, such as a video distribution application, a search application, and a news application.

The

terminal devices

501, 502, 503 may be hardware or software. When the

terminal devices

501, 502, 503 are hardware, they may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal devices

501, 502, and 503 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 505 may be a server that can provide various services, for example, receives an image acquisition request transmitted by the

terminal apparatuses

501, 502, 503, performs analysis processing on the image acquisition request, and transmits the analysis processing result (for example, image data corresponding to the above-described acquisition request) to the

terminal apparatuses

501, 502, 503.

It should be noted that the image information detection method provided by the embodiment of the present disclosure may be executed by a server or a terminal device, and accordingly, the image information detection apparatus may be disposed in the server or the terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 5) suitable for implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, the electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target image to be processed and a historical image processed at the last moment; the historical image comprises position information of at least one historical key point of a target human body part; and the method is used for determining the position information of at least one key point of the target human body part in the target image according to the position information of the at least one historical key point.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not form a limitation on the unit itself in some cases, for example, the obtaining module 401 may also be described as "obtaining a target image to be processed and a history image processed at the last time; wherein the historical image comprises a module "of location information of at least one historical keypoint of the target human body part.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An image information detection method, comprising:

acquiring a target image to be processed and a historical image processed at the last moment; the historical image comprises position information of at least one historical key point of a target human body part;

and determining the position information of at least one key point of the target human body part in the target image according to the position information of the at least one historical key point.

2. The method according to claim 1, wherein the determining the position information of the at least one keypoint of the target human body part in the target image according to the position information of the at least one historical keypoint comprises:

determining a target area image in the historical image according to the position information of the at least one historical key point;

and determining the position information of at least one key point of the target human body part in the target image based on the target area image.

3. The method according to claim 2, wherein the determining the position information of at least one key point of the target human body part in the target image based on the target area image comprises:

and inputting the target area image into a pre-trained detection model, and determining the position information of at least one key point of the target human body part in the target image according to a detection result output by the detection model.

4. The method of claim 3, wherein the detection result comprises a target region confidence level, position information of at least one key point, and a position confidence level corresponding to the at least one key point; and

the determining the position information of at least one key point of the target human body part in the target image according to the detection result output by the detection model comprises the following steps:

judging whether the confidence of the target part is greater than a first preset confidence and whether the position confidence corresponding to the at least one key point is greater than a second preset confidence;

and if so, determining the position information of the at least one key point as the position information of the at least one key point of the target human body part in the target image.

5. The method of claim 3, wherein the initial detection model corresponding to the detection model comprises an initial first task network, an initial second task network, and an initial third task network; and

the detection model is trained based on the following steps:

acquiring a sample image for training;

taking the sample image as the input of an initial first task network, and respectively inputting the local image output by the initial first task network into an initial second task network and an initial third task network;

calculating a target part confidence coefficient of the target human body part detected in the sample image by using the initial second task network;

and detecting the position information of at least one key point in the sample image by using the initial third task network, and detecting the position confidence corresponding to the at least one key point so as to train the initial detection model into the detection model.

6. The method of claim 5, wherein the training step of the detection model further comprises:

determining a first expected output of the initial detection model based on the target body part confidence for the detected target body part; and

taking the real position information of at least one key point of the target human body part presented in the sample image as a second expected output of the initial detection model; and

taking the expected position confidence corresponding to the at least one key point as a third expected output of the initial detection model;

training the initial detection model to converge based on the target part confidence, the location information of the at least one keypoint, the location confidence corresponding to the at least one keypoint, the first expected output, the second expected output, and the third expected output.

7. The method according to claim 2, wherein the determining the target area image in the history image according to the position information of the at least one history key point comprises:

determining a minimum region image containing the at least one historical keypoint;

and determining a target area image in the historical image based on the minimum area image.

8. An image information detection apparatus, characterized by comprising:

the acquisition module is used for acquiring a target image to be processed and a historical image processed at the last moment; the historical image comprises position information of at least one historical key point of a target human body part;

and the determining module is used for determining the position information of at least one key point of the target human body part in the target image according to the position information of the at least one historical key point.

9. An electronic device, comprising:

one or more processors;

storage means having one or more programs stored thereon which, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.