WO2022009419A1

WO2022009419A1 - Learning device, utilization device, program, learning method, and utilization method

Info

Publication number: WO2022009419A1
Application number: PCT/JP2020/027027
Authority: WO
Inventors: 康平栗原; 大祐鈴木
Original assignee: 三菱電機株式会社
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2022-01-13
Also published as: JPWO2022009419A1; JP6797344B1

Abstract

This learning device is characterized by comprising a learning-side data acquisition unit (112) that acquires learning data including a thermal image, which is an image of a temperature distribution of a subject created using infrared rays emitted from the subject, and a visible image, which is an image of the subject created using visible light reflected from the subject, and a model generation unit (113) that generates a learned model for inferring a visible image from a thermal image by learning inference from the thermal image to the visible image using the learning data.

Description

Learning device, utilization device, program, learning method and utilization method

This disclosure relates to learning devices, utilization devices, programs, learning methods and utilization methods.

A general thermal infrared solid-state image sensor (hereinafter referred to as a thermal image sensor) visualizes incident infrared rays emitted by a subject, and the difference in temperature rise caused by absorbing the infrared rays becomes the shade of the image. Infrared rays emitted by the subject are focused by the lens and imaged on the image sensor.

While a thermal image sensor capable of acquiring thermal information can acquire information that cannot be acquired by a visible camera, for example, an inexpensive small sensor has a high image resolution, contrast, contour sharpness, or SN ratio. It gets smaller. Further, the thermal image sensor formed by the large sensor is expensive.

On the other hand, in the fields of home monitoring, smart building, crime prevention, etc., there are services that identify human behavior or posture and detect abnormal behavior. Human postures include standing (standing position), sitting (sitting position), and lying down (lying position). Thermal image sensors are advantageous because they have lower barriers to introduction than visible cameras from the viewpoint of privacy protection.

Here, a technique for generating a trained model by inputting a visible image or a distance image and a posture information (correct answer) of a subject, and estimating a posture information from a visible image or a distance image using the generated trained model. (See, for example, Patent Document 1).

Japanese Unexamined Patent Publication No. 2017-97577 (page 5)

Patent Document 1 describes a posture estimation device that estimates a posture from a visible image or a distance image. In this posture estimation device, the thermal image has a problem that the image quality such as the resolution or the SN ratio is smaller than that of the visible image or the distance image, and the posture estimation is not easy.

Therefore, one or more aspects of the present disclosure are intended to enable highly accurate estimation of the posture of a subject in a thermal image.

The learning device according to one aspect of the present disclosure uses infrared rays emitted from a subject to image a thermal image of the temperature distribution of the subject, and visible light reflected from the subject. By learning the inference from the thermal image to the visible image using the data acquisition unit that acquires the learning data including the visible image of the subject and the learning data, the thermal image is described as described above. It is characterized by including a model generation unit that generates a trained model for inferring a visible image.

The learning device according to one aspect of the present disclosure uses infrared rays emitted from a subject to image a thermal image of the temperature distribution of the subject, and visible light reflected from the subject. Inference from the thermal image to the visible image using a data acquisition unit that acquires learning data including a visible image of the subject and posture information indicating the posture of the subject, and the learning data. It is characterized by including a model generation unit that generates a trained model for inferring the posture from the thermal image by learning the inference from the visible image to the posture.

The learning device according to one aspect of the present disclosure uses infrared rays emitted from a subject to image a thermal image of the temperature distribution of the subject, and visible light reflected from the subject. A data acquisition unit that acquires learning data including a visible image of the subject, a contour extraction unit that extracts a contour image showing the contour of the subject from the thermal image, and the thermal image and the contour image. It is characterized by including a model generation unit that generates a trained model for inferring the visible image from the combination of the thermal image and the contour image by learning the inference from the combination to the visible image. ..

The utilization device according to one aspect of the present disclosure uses an infrared ray emitted from a subject to image a thermal image of the temperature distribution of the subject, and a visible light reflected from the subject. Learning to infer the visible image from the thermal image generated by learning the inference from the thermal image to the visible image using the learning data including the visible image in which the subject is imaged. Using the trained model, the storage unit that stores the completed model, the data acquisition unit that acquires the target thermal image data that shows the target thermal image that is the thermal image of the target subject that is the target subject, and the target heat. It is characterized by including a reasoning unit that infers a target visible image that is a visible image of the target subject from an image, and a posture estimation unit that estimates the posture of the target subject from the target visible image.

The utilization device according to one aspect of the present disclosure uses an infrared ray emitted from a subject to image a thermal image of the temperature distribution of the subject, and a visible light reflected from the subject. Using the learning data including the visible image obtained by imaging the subject and the posture information indicating the posture of the subject, the inference from the thermal image to the visible image is learned, and the visible image is changed to the posture. A storage unit that stores a trained model for inferring the posture from the thermal image generated by learning the inference of the above, and a target thermal image that is a thermal image of the target subject that is the target subject are shown. It is characterized by including a data acquisition unit for acquiring target thermal image data and an inference unit for inferring the posture of the target subject from the target thermal image using the trained model.

The utilization device according to one aspect of the present disclosure uses an infrared image emitted from a subject to image a thermal image of the temperature distribution of the subject, and a visible light reflected from the subject. Using the learning data including the visible image obtained by imaging the subject and the contour image data showing the contour image showing the contour of the subject extracted from the thermal image, the thermal image and the contour image are used. A storage unit that stores a trained model for inferring the visible image from the combination of the thermal image and the contour image, which is generated by learning the inference from the combination to the visible image, and a target subject. A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of a target subject, and a contour extraction unit that extracts a target contour image that is a contour image indicating the contour of the target subject from the target thermal image. And the inference unit that infers the target visible image that is the visible image of the target subject from the combination of the target thermal image and the target contour image using the trained model, and the target subject from the target visible image. It is characterized by including a posture estimation unit for estimating the posture of the image.

The program according to one aspect of the present disclosure uses a computer to use an infrared image emitted from a subject to image a thermal image of the temperature distribution of the subject and visible light reflected from the subject. The heat is obtained by learning the inference from the thermal image to the visible image using the data acquisition unit that acquires the learning data including the visible image of the subject and the learning data. It is characterized in that it functions as a model generation unit that generates a trained model for inferring the visible image from an image.

The program according to one aspect of the present disclosure uses a computer to use an infrared image emitted from a subject to image a thermal image of the temperature distribution of the subject and visible light reflected from the subject. A data acquisition unit that acquires learning data including a visible image of the subject and posture information indicating the posture of the subject, and the visible image from the thermal image using the learning data. It is characterized in that it functions as a model generation unit that generates a trained model for inferring the attitude from the thermal image by learning the inference to the image and the inference from the visible image to the attitude.

The program according to one aspect of the present disclosure uses a computer to use an infrared image emitted from a subject to use a thermal image that images the temperature distribution of the subject and visible light reflected from the subject. A data acquisition unit that acquires learning data including a visible image of the subject, a contour extraction unit that extracts a contour image showing the contour of the subject from the thermal image, and the thermal image and the contour. By learning the inference from the combination of images to the visible image, it is characterized by functioning as a model generation unit that generates a trained model for inferring the visible image from the combination of the thermal image and the contour image. And.

The program according to one aspect of the present disclosure uses a computer to use a thermal image that images the temperature distribution of the subject and visible light reflected from the subject by using infrared rays emitted from the subject. In order to infer the visible image from the thermal image generated by learning the inference from the thermal image to the visible image using the learning data including the visible image in which the subject is imaged. A storage unit that stores the trained model, a data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject that is the target subject, and the target heat using the trained model. It is characterized by functioning as an inference unit that infers a target visible image that is a visible image of the target subject from an image, and a posture estimation unit that estimates the posture of the target subject from the target visible image.

The program according to one aspect of the present disclosure uses a computer to use a thermal image that images the temperature distribution of the subject and visible light reflected from the subject by using infrared rays emitted from the subject. Then, using the learning data including the visible image obtained by imaging the subject and the posture information indicating the posture of the subject, the inference from the thermal image to the visible image is learned, and the visible image is used as the above-mentioned. A storage unit that stores a trained model for inferring the posture from the thermal image generated by learning the inference to the posture, and a target thermal image that is a thermal image of the target subject that is the target subject. It is characterized in that it functions as a data acquisition unit for acquiring the target thermal image data to be shown and a reasoning unit for inferring the posture of the target subject from the target thermal image using the trained model.

The program according to one aspect of the present disclosure uses a computer to use an infrared image emitted from a subject to image a thermal image of the temperature distribution of the subject and visible light reflected from the subject. Then, using the learning data including the visible image obtained by imaging the subject and the contour image data showing the contour image showing the contour of the subject extracted from the thermal image, the thermal image and the contour are used. A storage unit that stores a trained model for inferring the visible image from the combination of the thermal image and the contour image, which is generated by learning the inference from the combination of images to the visible image, and the target subject. A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject, and a contour extraction unit that extracts a target contour image that is a contour image indicating the contour of the target subject from the target thermal image. , The inference unit that infers the target visible image that is the visible image of the target subject from the combination of the target thermal image and the target contour image using the trained model, and the target subject from the target visible image. It is characterized in that it functions as a posture estimation unit that estimates the posture of the image.

The learning method according to one aspect of the present disclosure is to use an infrared ray emitted from a subject to image a thermal image of the temperature distribution of the subject and a visible light reflected from the subject. The visible image is inferred from the thermal image by acquiring learning data including a visible image obtained by imaging the subject and learning inference from the thermal image to the visible image using the learning data. It is characterized by generating a trained model for this purpose.

The learning method according to one aspect of the present disclosure is to use an infrared image emitted from a subject to image a thermal image of the temperature distribution of the subject and a visible light reflected from the subject. Learning data including a visible image obtained by imaging the subject and posture information indicating the posture of the subject is acquired, and the training data is used to infer from the thermal image to the visible image and the visible image. By learning the inference to the posture from the above, it is characterized in that a trained model for inferring the posture from the thermal image is generated.

The learning method according to one aspect of the present disclosure is to use an infrared ray emitted from a subject to image a thermal image of the temperature distribution of the subject and a visible light reflected from the subject. Learning data including a visible image of the subject is acquired, a contour image showing the contour of the subject is extracted from the thermal image, and inference from the combination of the thermal image and the contour image to the visible image. Is characterized by generating a trained model for inferring the visible image from the combination of the thermal image and the contour image.

The utilization method according to one aspect of the present disclosure is described above by acquiring target thermal image data showing a target thermal image which is a thermal image of the target subject which is the target subject and using infrared rays emitted from the subject. The visible from the thermal image using learning data including a thermal image that images the temperature distribution of the subject and a visible image that images the subject by using the visible light reflected from the subject. Using the trained model for inferring the visible image from the thermal image generated by learning the inference to the image, the target visible image which is the visible image of the target subject is obtained from the target thermal image. It is characterized in that the posture of the target subject is estimated from the target visible image by inferring.

The utilization method according to one aspect of the present disclosure is described above by acquiring target thermal image data showing a target thermal image which is a thermal image of the target subject which is the target subject and using infrared rays emitted from the subject. Learning data including a thermal image that images the temperature distribution of the subject, a visible image that images the subject by using visible light reflected from the subject, and posture information indicating the posture of the subject. Has been learned to infer the posture from the thermal image, which is generated by learning the inference from the thermal image to the visible image and learning the inference from the visible image to the posture using the above. It is characterized in that the posture of the target subject is inferred from the target thermal image using a model.

In the utilization method according to one aspect of the present disclosure, the target thermal image data showing the target thermal image which is the thermal image of the target subject which is the target subject is acquired, and the contour showing the contour of the target subject is obtained from the target thermal image. By extracting the target contour image which is an image and using the infrared rays emitted from the subject, the thermal image which imaged the temperature distribution of the subject and the visible light reflected from the subject are used. A combination of the thermal image and the contour image using learning data including a visible image of the subject and contour image data showing a contour image showing the contour of the subject extracted from the thermal image. Using the trained model for inferring the visible image from the combination of the thermal image and the contour image generated by learning the inference to the visible image from, the target thermal image and the target contour image. The target visible image, which is a visible image of the target subject, is inferred from the combination of the above, and the posture of the target subject is estimated from the target visible image.

According to one or more aspects of the present disclosure, the posture of the subject in the thermal image can be estimated with high accuracy.

It is a block diagram which shows schematic structure of the posture estimation system which concerns on Embodiments 1 and 2. It is a block diagram which shows schematic structure of the learning apparatus in Embodiments 1 and 2. It is a schematic diagram which shows an example of a three-layer neural network. It is a schematic diagram which shows an example of the structure of the trained model of the image conversion process which converts a thermal image into a visible image in Embodiment 1. FIG. It is a block diagram which shows the structure of a computer schematicly. It is a flowchart which shows the process which a learning apparatus learns. It is a block diagram which shows schematic structure of the posture estimation apparatus in Embodiment 1. FIG. It is a flowchart which shows the process which the posture estimation apparatus infers the visible image corresponding to a thermal image, and estimates the posture from the visible image. FIG. 3 is a block diagram schematically showing a configuration of a posture estimation device according to a second embodiment. It is a block diagram which shows schematic structure of the learning apparatus in Embodiment 3. FIG. It is a schematic diagram which shows an example of the structure of the trained model of the image conversion process which converts a thermal image and a contour image into a visible image in Embodiment 3. FIG. FIG. 3 is a block diagram schematically showing a configuration of a posture estimation device according to a third embodiment.

Embodiment 1.
FIG. 1 is a block diagram schematically showing the configuration of the posture estimation system 100 according to the first embodiment.
The posture estimation system 100 includes a learning device 110 that functions as a model generation device, and a posture estimation device 130 that functions as a utilization device. The processing method performed by the posture estimation device 130 is a utilization method.
In the posture estimation system 100, the posture estimation device 130 estimates the posture using the trained model learned by the learning device 110.

FIG. 2 is a block diagram schematically showing the configuration of the learning device 110.
The learning device 110 includes a learning side input unit 111, a learning side data acquisition unit 112, a model generation unit 113, a learning side learned model storage unit 114, and a learning side communication unit 115.

The learning side input unit 111 is an input unit that accepts input of learning data. The input learning data is given to the learning side data acquisition unit 112.
Here, the learning data is teacher data showing a combination of a thermal image and a visible image as a correct answer to be inferred from the thermal image.

The thermal image is acquired by imaging the temperature distribution of the subject by using the infrared rays radiated from the subject.
Further, the visible image is acquired by imaging the subject by using the visible light reflected from the subject. In the visible image, the appearance of the subject is imaged.

The learning side data acquisition unit 112 is a data acquisition unit that acquires learning data via the learning side input unit 111. The acquired learning data is given to the model generation unit 113.

The model generation unit 113 learns the visible image corresponding to the thermal image based on the learning data given from the learning side data acquisition unit 112. In other words, the model generation unit 113 generates a trained model for inferring the optimum visible image corresponding to the thermal image by learning the combination of the thermal image and the visible image shown in the training data. Specifically, the model generation unit 113 generates a trained model for inferring a visible image from a thermal image by learning inference from a thermal image to a visible image using training data.
Then, the model generation unit 113 stores the generated trained model in the learning side trained model storage unit 114 as the learning side trained model.

As the learning algorithm used by the model generation unit 113, known algorithms such as supervised learning, unsupervised learning, and reinforcement learning can be used. As an example, here, a case where a neural network is applied will be described.

Here, in the case of supervised learning, the thermal image and the visible image shown in the learning data need to be paired data containing the same subject. In the case of unsupervised learning, the thermal image and the visible image do not have to contain the same subject.

The model generation unit 113 learns a visible image corresponding to a thermal image by so-called supervised learning according to, for example, a neural network model.
Here, supervised learning is a method of learning the characteristics of the learning data by giving a set of input and result (label) data to the learning device as learning data, and inferring the result from the input. To say.

A neural network is composed of an input layer consisting of a plurality of neurons, an intermediate layer (hidden layer) consisting of a plurality of neurons, and an output layer consisting of a plurality of neurons. The intermediate layer may be one layer or two or more layers.

FIG. 3 is a schematic diagram showing an example of a three-layer neural network.
As shown in FIG. 3, in the case of a three-layer neural network, when a plurality of input values are input to the input layers X1 to X3, the first weights w11 to w16 (hereinafter, the first) are applied to the input values. The weight of one (also called W1) is multiplied. The calculated value, which is the value obtained by multiplying the input value by the first weights w11 to w16, is input to the intermediate layers Y1 and Y2. The calculated value is multiplied by the second weights w21 to w26 (hereinafter, also referred to as the second weight W2), and the output value, which is the value obtained by multiplying the calculated value by the second weights w21 to w26, is the output layer. It is output from Z1 to Z3. This output value varies depending on the value of the first weight W1 and the value of the second weight W2.

In the present embodiment, the neural network is subjected to so-called supervised learning according to the learning data created based on the combination of the thermal image and the visible image represented by the learning data acquired by the learning side data acquisition unit 112. Learn a trained model to infer the optimal visible image for the thermal image.

That is, the neural network inputs the thermal image to the input layer and adjusts the first weight W1 and the second weight W2 so that the result output from the output layer approaches the visible image as the correct answer. Learn the trained model.

FIG. 4 is a schematic diagram showing an example of the structure of a trained model of an image conversion process for converting a thermal image into a visible image.
In the trained model shown in FIG. 4, the layer of the decoder portion and the layer of the encoder portion have a symmetrical structure, and have a U-Net structure connected by a skip connection.

Returning to FIG. 2, the learning side trained model storage unit 114 stores the learning side trained model which is the trained model given by the model generation unit 113.

The learning side communication unit 115 sends the learning side learned model stored in the learning side learned model storage unit 114 to the posture estimation device 130.

The learning device 110 described above can be realized by a computer 160 as shown in FIG.
FIG. 5 is a block diagram schematically showing the configuration of the computer 160.
The computer 160 includes a communication device 161, an auxiliary storage device 162, a memory 163, and a processor 164.

The communication device 161 communicates data via a network, for example.
The auxiliary storage device 162 stores data and programs necessary for processing in the computer 160.
The memory 163 temporarily stores programs and data and provides a work area for the processor 164.
The processor 164 reads the program stored in the auxiliary storage device 162 into the memory 163, and executes the program to execute the processing in the computer 160.

The learning side input unit 111 and the learning side communication unit 115 described above can be realized by the communication device 161.
The learned model storage unit 114 on the learning side can be realized by the auxiliary storage device 162.

The learning side data acquisition unit 112 and the model generation unit 113 can be realized by the processor 164 executing the program read into the memory 163. Such a program may be provided through a network, or may be recorded and provided on a recording medium. That is, such a program may be provided, for example, as a program product.

FIG. 6 is a flowchart showing a process of learning by the learning device 110.
First, the learning side data acquisition unit 112 acquires learning data via the learning side input unit 111 (S10). Here, it is assumed that the thermal image data, which is the image data of the thermal image, and the visible image data, which is the image data of the visible image, which are used as the training data, are acquired at the same time. Not limited to. If the thermal image data can be associated with the visible image data used as the correct answer for the thermal image data, they may be acquired at different timings. The acquired learning data is given to the model generation unit 113.

Next, the model generation unit 113 learns the visible image, which is the output corresponding to the thermal image, by so-called supervised learning based on the combination of the thermal image and the visible image shown in the training data, and obtains the trained model. Generate (S11).

Next, the learning side learned model storage unit 114 stores the generated learning model (S12). Then, the learning side communication unit 115 transmits the learning model to the posture estimation device 130.

FIG. 7 is a block diagram schematically showing the configuration of the posture estimation device 130.
The posture estimation device 130 includes an inference device 140 and a posture estimation execution device 150 that functions as a posture estimation unit.

The inference device 140 infers a visible image from a thermal image by using the trained model given by the learning device 110 as an inference side learning model.
The inference device 140 includes an inference side communication unit 141, an inference side learned model storage unit 142, an inference side input unit 143, an inference side data acquisition unit 144, and an inference unit 145.

The inference side communication unit 141 receives the trained model from the learning device 110, and stores the trained model as the inference side trained model in the inference side trained model storage unit 142.
The inference side trained model storage unit 142 is a storage unit that stores the inference side trained model.

The inference side input unit 143 is an input unit that accepts input of thermal image data indicating a thermal image of a subject. The thermal image data input here is also referred to as target thermal image data. Further, the thermal image shown by the target thermal image data is also referred to as a target thermal image, and the subject included in the target thermal image, which is the target for estimating the posture, is also referred to as the target subject.
The inference side data acquisition unit 144 is a data acquisition unit that acquires target thermal image data via the inference side input unit 143. The acquired target thermal image data is given to the inference unit 145.

The inference unit 145 infers a visible image of the target subject from the thermal image shown by the target thermal image data by using the inference side learned model stored in the inference side learned model storage unit 142. In other words, the inference unit 145 inputs the thermal image indicated by the target thermal image data into the inference side trained model, and acquires the visible image corresponding to the thermal image inferred from the thermal image. Can be done. Then, the inference unit 145 generates visible image data indicating the inferred visible image, and gives the visible image data to the posture estimation execution device 150. The visible image data generated here is also referred to as target visible image data. Further, the visible image shown by the target visible image data, in other words, the inferred visible image is also referred to as the target visible image.

The posture estimation execution device 150 estimates the posture of the subject existing in the visible image from the visible image indicated by the target visible image data. As a method of estimating the posture, a large amount of correspondence between the visible image and the posture of the person (for example, the positional relationship of parts) is learned in advance, and when the visible image is input, the person corresponding to the visible image is used. There is a method of determining the posture based on the learning result.

The posture estimation device 130 described above can also be realized by a computer 160 as shown in FIG.
For example, the inference side communication unit 141 and the inference side input unit 143 can be realized by the communication device 161.
The inference side learned model storage unit 142 can be realized by the auxiliary storage device 162.

The inference side data acquisition unit 144 and the inference unit 145 can be realized by the processor 164 executing the program read into the memory 163. Such a program may be provided through a network, or may be recorded and provided on a recording medium. That is, such a program may be provided, for example, as a program product.

FIG. 8 is a flowchart showing a process in which the posture estimation device 130 infers a visible image corresponding to a thermal image and estimates a posture from the visible image.
First, the inference side data acquisition unit 144 acquires the target thermal image data showing the thermal image via the inference side input unit 143 (S20). The acquired target thermal image data is given to the inference unit 145.

Next, the inference unit 145 inputs the thermal image shown by the target thermal image data into the inference side learned model stored in the inference side learned model storage unit 142, and obtains a visible image corresponding to the thermal image. (S21).

Next, the inference unit 145 generates target visible image data indicating a visible image corresponding to the thermal image obtained by the inference side trained model, and gives the target visible image data to the posture estimation execution device 150 (S22). ).

Next, the posture estimation execution device 150 estimates the posture of the subject in the visible image indicated by the target visible image data (S23). Based on the posture estimated in this way, for example, it is possible to detect the abnormal behavior of the subject reflected in the thermal image.

As described above, according to the posture estimation system 100 according to the first embodiment, the thermal image output from the thermal image sensor or the like is converted into a visible image, and the posture estimation execution which is the learned posture estimator for the visible image is executed. The attitude of the subject in the thermal image can be estimated using the device 150. Therefore, it is possible to estimate the posture using an existing trained posture estimator for visible images.

Also, when using a posture estimator for thermal images, it is necessary to learn the relationship between the thermal image and the posture, and it is necessary to annotate the posture to the thermal image. Manual annotation work on thermal images cannot be performed with sufficient accuracy due to insufficient resolution of thermal images. In the first embodiment, since it is not necessary to use the posture estimator for the thermal image, these problems can be avoided.

In the first embodiment, the case where supervised learning is applied to the learning algorithm used by the model generation unit 113 has been described, but the first embodiment is not limited to such an example. For example, as for the learning algorithm, reinforcement learning, unsupervised learning, semi-supervised learning, or the like can be used in addition to supervised learning.

Further, the model generation unit 113 may learn the visible image corresponding to the thermal image according to the learning data created for the plurality of posture estimation devices including the posture estimation device 130. The model generation unit 113 may acquire learning data from a plurality of posture estimation devices used in the same area, or may collect learning data from a plurality of posture estimation devices that operate independently in different areas. The visible image corresponding to the thermal image may be learned by using the data.

Further, the model generation unit 113 can add or remove the posture estimation device for collecting learning data from the target on the way.
Further, the model generation unit 113 applies the trained model that has learned the visible image corresponding to the thermal image for one posture estimation device to another posture estimation device, and applies the trained model to the thermal image for the other posture estimation device. The corresponding visible image may be retrained to update the trained model.

Further, as the learning algorithm used in the model generation unit 113, deep learning, which learns the extraction of the feature amount itself, can also be used. Further, the model generation unit 113 may execute machine learning according to other known methods such as genetic programming, functional logic programming, or a support vector machine.

The learning device 110 and the inference device 140 are used to learn the visible image corresponding to the thermal image of the posture estimation system 100, and are connected to the posture estimation execution device 150 via a network, for example. It may be.
Further, the learning device 110, the inference device 140, or the posture estimation execution device 150 may exist on the cloud server.

Further, in the posture estimation system 100 according to the first embodiment described above, the learning device 110 and the posture estimation device 130 are separate devices. For example, the learning device 110 is provided in the posture estimation device 130. You may be. In such a case, the learning side communication unit 115 and the inference side communication unit 141 become unnecessary, and the learning side learned model storage unit 114 and the inference side learned model storage unit 142 can be integrated as the learned model storage unit. can.

In the posture estimation system 100 according to the first embodiment, the posture estimation device 130 infers a visible image corresponding to a thermal image by using the trained model generated by the learning device 110. Is not limited to such an example. For example, the posture estimation device 130 may acquire a trained model from the outside such as another system and infer a visible image corresponding to a thermal image based on the trained model.

Embodiment 2.
As shown in FIG. 1, the posture estimation system 200 according to the second embodiment includes a learning device 210 and a posture estimation device 230.

As shown in FIG. 2, the learning device 210 in the second embodiment includes a learning side input unit 111, a learning side data acquisition unit 212, a model generation unit 213, and a learning side learned model storage unit 114. , The learning side communication unit 115 is provided.
The learning side input unit 111, the learning side learned model storage unit 114, and the learning side communication unit 115 of the learning device 210 according to the second embodiment have the learning side input unit 111 and the learning side learned side of the learning device 110 according to the first embodiment. This is the same as the model storage unit 114 and the learning side communication unit 115.

The learning side data acquisition unit 212 acquires learning data via the learning side input unit 111. The learning data acquired in the second embodiment is the thermal image data showing the thermal image, the visible image data showing the visible image which is the correct answer corresponding to the thermal image, and the correct answer corresponding to the visible image. Includes posture information indicating the posture of the subject. The acquired learning data is given to the model generation unit 213.

The model generation unit 213 learns the visible image corresponding to the thermal image and the posture corresponding to the visible image based on the learning data given from the learning side data acquisition unit 212. In other words, the model generation unit 213 learns the combination of the thermal image and the visible image shown in the training data, and the combination of the visible image and the posture, in order to infer the optimum posture corresponding to the thermal image. Generate a trained model. Specifically, the model generation unit 113 learns to infer the posture from the thermal image by learning the inference from the thermal image to the visible image and the inference from the visible image to the posture using the learning data. Generate a finished model.
Then, the model generation unit 213 stores the generated trained model in the learning side trained model storage unit 114 as the learning side trained model.

FIG. 9 is a block diagram schematically showing the configuration of the posture estimation device 230 according to the second embodiment.
The attitude estimation device 230 includes an inference side communication unit 141, an inference side learned model storage unit 142, an inference side input unit 143, an inference side data acquisition unit 144, and an inference side 245.

The inference side communication unit 141, the inference side learned model storage unit 142, the inference side input unit 143, and the inference side data acquisition unit 144 of the attitude estimation device 230 according to the second embodiment are used for inference of the attitude estimation device 130 in the first embodiment. This is the same as the side communication unit 141, the inference side learned model storage unit 142, the inference side input unit 143, and the inference side data acquisition unit 144.

The inference unit 245 infers a visible image from the thermal image indicated by the target thermal image data by using the inference side learned model stored in the inference side trained model storage unit 142, and determines the posture from the visible image. Infer. In other words, the inference unit 145 estimates the posture of the subject existing in the thermal image, which is inferred from the thermal image, by inputting the thermal image indicated by the target thermal image data into the inference side trained model. do.

The posture estimation device 230 described above can also be realized by a computer 160 as shown in FIG.
For example, the inference side communication unit 141 and the inference side input unit 143 can be realized by the communication device 161.
The inference side learned model storage unit 142 can be realized by the auxiliary storage device 162.

The inference side data acquisition unit 144 and the inference unit 245 can be realized by the processor 164 executing the program read into the memory 163. Such a program may be provided through a network, or may be recorded and provided on a recording medium. That is, such a program may be provided, for example, as a program product.

As described above, according to the posture estimation system 200 according to the second embodiment, it is possible to estimate the posture of the subject from the thermal image output from the thermal image sensor or the like. By inputting the visible image and the posture as teacher data at the time of learning, it is possible to avoid the work of annotating the posture information on the thermal image.

Further, unlike the first embodiment, the scale of the network can be suppressed and the amount of calculation can be reduced by not generating and outputting the visible image in the utilization phase.

Embodiment 3.
As shown in FIG. 1, the posture estimation system 300 according to the third embodiment includes a learning device 310 and a posture estimation device 330.

FIG. 10 is a block diagram schematically showing the configuration of the learning device 310.
The learning device 310 includes a learning side input unit 111, a learning side data acquisition unit 312, a model generation unit 313, a learning side learned model storage unit 114, a learning side communication unit 115, and a learning side contour extraction unit 316. To prepare for.

The learning side input unit 111, the learning side learned model storage unit 114, and the learning side communication unit 115 of the learning device 310 according to the third embodiment have the learning side input unit 111 and the learning side learned side of the learning device 110 according to the first embodiment. This is the same as the model storage unit 114 and the learning side communication unit 115.

The learning side data acquisition unit 312 acquires learning data via the learning side input unit 111. The acquired learning data is given to the model generation unit 313.
Further, the learning side data acquisition unit 312 gives the learning side contour extraction unit 316 the thermal image data indicating the thermal image included in the acquired learning data as the learning side thermal image data.

The learning side contour extraction unit 316 is a contour extraction unit that extracts a contour image showing the contour of the subject from the thermal image shown by the learning side thermal image data. As the extraction method, there are a method using an edge detection process such as a canny method or a Sobel method, a method of combining binarization process and edge detection, and the like. In the edge detection process, the edge of the subject is detected. Further, in the combination of the binarization process and the edge detection, the edge detection process may be performed after the binarization process is performed on the thermal image. Then, the learning side contour extraction unit 316 gives the contour image data indicating the extracted contour image to the model generation unit 313 as the learning side contour image data.

The model generation unit 313 learns a visible image corresponding to a thermal image based on the learning data given by the learning side data acquisition unit 312 and the learning side contour image data given by the learning side contour extraction unit 316. In other words, the model generation unit 313 learns the combination of the thermal image shown by the training data, the contour image shown by the learning side contour image data, and the visible image shown by the training data, thereby forming the thermal image and the thermal image. Generate a trained model to infer the optimal visible image corresponding to the contour image. Specifically, the model generation unit 313 generates a trained model for inferring a visible image from a combination of thermal images and contour images by learning inference from a combination of thermal images and contour images to a visible image. do.
Then, the model generation unit 313 stores the generated learned model as the learning side learned model in the learning side learned model storage unit 114.

FIG. 11 is a schematic diagram showing an example of the structure of a trained model of an image conversion process for converting a thermal image and a contour image into a visible image in the third embodiment.
The trained model shown in FIG. 11 has a U-Net structure in which the layer of the decoder portion and the layer of the encoder portion have a symmetrical structure and are connected by a skip connection. The decoder portion comprises two parallel paths, the two paths being a path for decoding the thermal image and a path for decoding the contour image.

As a result, in the trained model shown in FIG. 11, the decoder portion has two paths in parallel, one of which decodes the thermal image and the other of which decodes the contour image. The two vector information decoded in the center layer of the model are concatenated, and the concatenated information is input to the encoder part.
By having such a structure, in the third embodiment, the visible image converted from the thermal image contains a larger amount of edge components, and the accuracy of posture estimation can be improved.

The learning device 310 described above can also be realized by a computer 160 as shown in FIG.
For example, the learning side data acquisition unit 312, the model generation unit 313, and the learning side contour extraction unit 316 can also be realized by the processor 164 executing the program read into the memory 163. Such a program may be provided through a network, or may be recorded and provided on a recording medium. That is, such a program may be provided, for example, as a program product.

FIG. 12 is a block diagram schematically showing the configuration of the posture estimation device 330.
The posture estimation device 330 includes an inference device 340 and a posture estimation execution device 150.
The posture estimation execution device 150 of the posture estimation device 330 in the third embodiment is the same as the posture estimation execution device 150 in the first embodiment.

The inference device 340 includes an inference side communication unit 141, an inference side learned model storage unit 142, an inference side input unit 143, an inference side data acquisition unit 344, an inference side 345, and an inference side contour extraction unit 346. Be prepared.
The inference side communication unit 141, the inference side learned model storage unit 142 and the inference side input unit 143 of the inference device 340 in the third embodiment have the inference side communication unit 141 and the inference side learned in the inference device 140 in the first embodiment. This is the same as the model storage unit 142 and the inference side input unit 143.

The inference side data acquisition unit 144 acquires the target thermal image data via the inference side input unit 143. Then, the inference side data acquisition unit 144 gives the acquired target thermal image data to the inference unit 345 and the inference side contour extraction unit 346.

The inference side contour extraction unit 346 is a contour extraction unit that extracts a contour image from the thermal image indicated by the target thermal image data. The extraction method is the same as that of the learning side contour extraction unit 316. Then, the inference side contour extraction unit 346 provides the inference side contour image data to the inference unit 345 with contour image data indicating the extracted contour image. The contour image extracted here is also referred to as a target contour image, and the inference side contour image data is also referred to as a target contour image data.

The inference unit 345 uses the inference side trained model stored in the inference side trained model storage unit 142 from the combination of the thermal image shown in the target thermal image data and the contour image shown in the inference side contour image data. , Infer the visible image. In other words, the inference unit 345 is inferred from the thermal image by inputting the thermal image shown by the target thermal image data and the contour image indicated by the inference side contour image data into the inference side trained model. A visible image corresponding to a thermal image can be acquired. Then, the inference unit 345 generates visible image data indicating the inferred visible image, and gives the visible image data to the posture estimation execution device 150. The visible image data generated here is also referred to as target visible image data. The visible image indicated by the target visible image data is also referred to as a target visible image.

The posture estimation device 330 described above can also be realized by a computer 160 as shown in FIG.
For example, the inference side data acquisition unit 344, the inference side 345, and the inference side contour extraction unit 346 can be realized by the processor 164 executing the program read into the memory 163. Such a program may be provided through a network, or may be recorded and provided on a recording medium. That is, such a program may be provided, for example, as a program product.

In general, since the thermal image has ambiguous contour information, the visible image generated by using the trained model also has an ambiguous contour. Since contour information is important for posture estimation, the accuracy of posture estimation is reduced for images with ambiguous contours.
On the other hand, according to the posture estimation system 300 according to the third embodiment, by inputting the thermal image and the contour image into the trained model at the same time, it is possible to generate a visible image in which the contour is not ambiguous. As a result, the attitude estimation accuracy from the generated visible image can be improved as compared with inputting the thermal image alone into the trained model.

100,200,300 posture estimation system, 110,210,310 learning device, 111 learning side input unit, 112,212,312 learning side data acquisition unit, 113,213,313 model generation unit, 114 learning side learned model storage Unit, 115 learning side communication unit, 316 learning side contour extraction unit, 130, 230, 330 attitude estimation device, 140, 340 inference device, 141 inference side communication unit, 142 inference side learned model storage unit, 143 inference side input unit , 144,344 Inference side data acquisition unit, 145,245,345 Inference unit, 346 Inference side contour extraction unit, 150 Attitude estimation execution device.

Claims

Includes a thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, and a visible image that images the subject by using visible light reflected from the subject. A data acquisition unit that acquires training data, and
It is provided with a model generation unit that generates a trained model for inferring the visible image from the thermal image by learning the inference from the thermal image to the visible image using the learning data. Characterized learning device.
The learning device according to claim 1, wherein the trained model has a U-Net structure in which a layer of a decoder portion and a layer of an encoder portion have a symmetrical structure and are connected by a skip connection. ..
A thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, a visible image that images the subject by using visible light reflected from the subject, and the above. A data acquisition unit that acquires learning data including attitude information indicating the attitude of the subject, and
By learning the inference from the thermal image to the visible image and the inference from the visible image to the posture using the training data, a trained model for inferring the posture from the thermal image is generated. A learning device characterized by having a model generator and a model generator.
Includes a thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, and a visible image that images the subject by using visible light reflected from the subject. A data acquisition unit that acquires training data, and
A contour extraction unit that extracts a contour image showing the contour of the subject from the thermal image, and a contour extraction unit.
A model generation unit that generates a trained model for inferring the visible image from the combination of the thermal image and the contour image by learning the inference from the combination of the thermal image and the contour image to the visible image. A learning device characterized by being equipped with.
The trained model has a U-Net structure in which the layer of the decoder portion and the layer of the encoder portion have a symmetrical structure and are connected by a skip connection.
The decoder part has two parallel paths,
The learning device according to claim 4, wherein the two paths are a path for decoding the thermal image and a path for decoding the contour image.
The learning device according to claim 4 or 5, wherein the contour extraction unit extracts the contour image from the thermal image by an edge detection process for detecting the edge of the subject.
According to claim 4 or 5, the contour extraction unit extracts the contour image from the thermal image by performing a binarization process on the thermal image and then performing an edge detection process. Described learning device.
Includes a thermal image that images the temperature distribution of the subject by using infrared rays emitted from the subject, and a visible image that images the subject by using the visible light reflected from the subject. A storage unit that stores a trained model for inferring the visible image from the thermal image, which is generated by learning the inference from the thermal image to the visible image using the training data.
A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject that is the target subject, and a data acquisition unit.
An inference unit that infers a target visible image, which is a visible image of the target subject, from the target thermal image using the trained model.
A utilization device including a posture estimation unit that estimates the posture of the target subject from the target visible image.
The utilization device according to claim 8, wherein the trained model has a U-Net structure in which a layer of a decoder portion and a layer of an encoder portion have a symmetrical structure and are connected by a skip connection. ..
A thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, a visible image that images the subject by using visible light reflected from the subject, and the above. Using the learning data including the posture information indicating the posture of the subject, the inference from the thermal image to the visible image is learned, and the inference from the visible image to the posture is learned. A storage unit that stores a trained model for inferring the posture from the thermal image, and a storage unit.
A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject that is the target subject, and a data acquisition unit.
An inflection device characterized by including an inference unit that infers the posture of the target subject from the target thermal image using the trained model.
Includes a thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, and a visible image that images the subject by using the visible light reflected from the subject. Using the training data and the contour image data showing the contour image showing the contour of the subject extracted from the thermal image, the inference from the combination of the thermal image and the contour image to the visible image is learned. A storage unit that stores a trained model for inferring the visible image from the combination of the thermal image and the contour image generated by the above.
A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject that is the target subject, and a data acquisition unit.
A contour extraction unit that extracts a target contour image, which is a contour image showing the contour of the target subject, from the target thermal image, and a contour extraction unit.
An inference unit that infers a target visible image, which is a visible image of the target subject, from a combination of the target thermal image and the target contour image using the trained model.
A utilization device including a posture estimation unit that estimates the posture of the target subject from the target visible image.
The trained model has a U-Net structure in which the layer of the decoder portion and the layer of the encoder portion have a symmetrical structure and are connected by a skip connection.
The decoder part has two parallel paths,
The utilization device according to claim 11, wherein the two paths are a path for decoding the thermal image and a path for decoding the contour image.
The utilization device according to claim 11 or 12, wherein the contour extraction unit extracts the contour image from the thermal image by an edge detection process.
According to claim 11 or 12, the contour extraction unit extracts the contour image from the thermal image by performing a binarization process on the thermal image and then performing an edge detection process. Described utilization device.
Computer,
Includes a thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, and a visible image that images the subject by using visible light reflected from the subject. Data acquisition unit that acquires training data, and
By learning the inference from the thermal image to the visible image using the training data, it can function as a model generation unit that generates a trained model for inferring the visible image from the thermal image. Characterized program.
Computer,
A thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, a visible image that images the subject by using visible light reflected from the subject, and the above. A data acquisition unit that acquires learning data including posture information indicating the posture of the subject, and
By learning the inference from the thermal image to the visible image and the inference from the visible image to the posture using the training data, a trained model for inferring the posture from the thermal image is generated. A program characterized by functioning as a model generator.
Computer,
Includes a thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, and a visible image that images the subject by using visible light reflected from the subject. Data acquisition unit that acquires training data,
A contour extraction unit that extracts a contour image showing the contour of the subject from the thermal image, and a contour extraction unit.
A model generation unit that generates a trained model for inferring the visible image from the combination of the thermal image and the contour image by learning the inference from the combination of the thermal image and the contour image to the visible image. A program characterized by functioning as.
Computer,
Includes a thermal image that images the temperature distribution of the subject by using infrared rays emitted from the subject, and a visible image that images the subject by using the visible light reflected from the subject. A storage unit that stores a trained model for inferring the visible image from the thermal image, which is generated by learning the inference from the thermal image to the visible image using the training data.
A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject that is the target subject.
An inference unit that infers a target visible image, which is a visible image of the target subject, from the target thermal image using the trained model, and
A program characterized by functioning as a posture estimation unit that estimates the posture of the target subject from the target visible image.
Computer,
A thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, a visible image that images the subject by using visible light reflected from the subject, and the above. Using the learning data including the posture information indicating the posture of the subject, the inference from the thermal image to the visible image is learned, and the inference from the visible image to the posture is learned. A storage unit that stores a trained model for inferring the posture from the thermal image,
A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject that is the target subject, and a data acquisition unit.
A program characterized by using the trained model to function as an inference unit that infers the posture of the target subject from the target thermal image.
Computer,
Includes a thermal image that images the temperature distribution of the subject by using infrared rays emitted from the subject, and a visible image that images the subject by using visible light reflected from the subject. Using the training data and the contour image data showing the contour image showing the contour of the subject extracted from the thermal image, the inference from the combination of the thermal image and the contour image to the visible image is learned. A storage unit that stores a trained model for inferring the visible image from the combination of the thermal image and the contour image generated by the above.
A data acquisition unit that acquires target thermal image data indicating a target thermal image that is a thermal image of the target subject that is the target subject.
A contour extraction unit that extracts a target contour image, which is a contour image showing the contour of the target subject, from the target thermal image.
An inference unit that infers a target visible image, which is a visible image of the target subject, from a combination of the target thermal image and the target contour image using the trained model, and
A program characterized by functioning as a posture estimation unit that estimates the posture of the target subject from the target visible image.
Includes a thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, and a visible image that images the subject by using visible light reflected from the subject. Get the training data,
A learning method characterized in that a trained model for inferring the visible image from the thermal image is generated by learning the inference from the thermal image to the visible image using the learning data.
A thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, a visible image that images the subject by using visible light reflected from the subject, and the above. Acquire learning data including posture information indicating the posture of the subject,
By learning the inference from the thermal image to the visible image and the inference from the visible image to the posture using the training data, a trained model for inferring the posture from the thermal image is generated. A learning method characterized by doing.
Includes a thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, and a visible image that images the subject by using visible light reflected from the subject. Get the training data,
A contour image showing the contour of the subject is extracted from the thermal image, and the contour image is extracted.
By learning the inference from the combination of the thermal image and the contour image to the visible image, a trained model for inferring the visible image from the combination of the thermal image and the contour image is generated. How to learn.
Acquires the target thermal image data indicating the target thermal image, which is the thermal image of the target subject, which is the target subject.
Includes a thermal image that images the temperature distribution of the subject by using infrared rays emitted from the subject, and a visible image that images the subject by using the visible light reflected from the subject. From the target thermal image using a trained model for inferring the visible image from the thermal image, which was generated by learning the inference from the thermal image to the visible image using the training data. , Inferring the target visible image, which is the visible image of the target subject,
A utilization method characterized by estimating the posture of the target subject from the target visible image.
Acquires the target thermal image data indicating the target thermal image, which is the thermal image of the target subject, which is the target subject.
A thermal image that images the temperature distribution of the subject by using infrared rays radiated from the subject, a visible image that images the subject by using visible light reflected from the subject, and the above. Using the learning data including the posture information indicating the posture of the subject, the inference from the thermal image to the visible image is learned, and the inference from the visible image to the posture is learned. A utilization method characterized by inferring the posture of the target subject from the target thermal image using a trained model for inferring the posture from the thermal image.
Acquires the target thermal image data indicating the target thermal image, which is the thermal image of the target subject, which is the target subject.
A target contour image, which is a contour image showing the contour of the target subject, is extracted from the target thermal image.
Includes a thermal image that images the temperature distribution of the subject by using infrared rays emitted from the subject, and a visible image that images the subject by using visible light reflected from the subject. Using the training data and the contour image data showing the contour image showing the contour of the subject extracted from the thermal image, the inference from the combination of the thermal image and the contour image to the visible image is learned. Using the trained model for inferring the visible image from the combination of the thermal image and the contour image generated by the above, the visible image of the target subject is obtained from the combination of the target thermal image and the target contour image. Infer the target visible image that is
A utilization method characterized by estimating the posture of the target subject from the target visible image.