WO2022162844A1 - 作業推定装置、作業推定方法、及び、作業推定プログラム - Google Patents
作業推定装置、作業推定方法、及び、作業推定プログラム Download PDFInfo
- Publication number
- WO2022162844A1 WO2022162844A1 PCT/JP2021/003099 JP2021003099W WO2022162844A1 WO 2022162844 A1 WO2022162844 A1 WO 2022162844A1 JP 2021003099 W JP2021003099 W JP 2021003099W WO 2022162844 A1 WO2022162844 A1 WO 2022162844A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- work
- user
- contact
- degree
- target candidate
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 42
- 238000004364 calculation method Methods 0.000 claims abstract description 89
- 238000001514 detection method Methods 0.000 claims abstract description 54
- 238000012545 processing Methods 0.000 claims description 45
- 230000006870 function Effects 0.000 description 23
- 238000005259 measurement Methods 0.000 description 14
- 238000003384 imaging method Methods 0.000 description 12
- 238000012986 modification Methods 0.000 description 11
- 230000004048 modification Effects 0.000 description 11
- 230000005484 gravity Effects 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 9
- 230000006399 behavior Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000012423 maintenance Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 239000000470 constituent Substances 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 210000001145 finger joint Anatomy 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Definitions
- the present disclosure relates to a work estimation device, a work estimation method, and a work estimation program.
- Patent Document 1 in a user's first-person viewpoint video, an object of interest that the user pays attention to is detected using an object detection result and an attention map. A technique for improving the recognition accuracy of user's actions by performing recognition is disclosed.
- An object of the present disclosure is to improve the accuracy of estimating a user's work by detecting contact between a work object and a target candidate object that is a work target candidate.
- the work estimation device is a gaze region estimation unit that estimates a gaze region, which is a region where the user gazes, using information indicating the user's line of sight;
- the work object used by the user and at least one target candidate object that is a work target candidate for the user are captured in an image, and the work object and the at least one target candidate object are identified.
- a degree-of-contact calculation unit that calculates a degree of contact indicating a degree of contact between the work object and each candidate object included in the at least one candidate object, based on the region of interest;
- the degree-of-contact calculation unit obtains the degree of contact indicating the degree of contact between the work object and each candidate object included in at least one candidate object based on the region where the user gazes, and the work estimation unit estimates the user's activity based on the degree of contact. Therefore, according to the present disclosure, it is possible to improve the accuracy of estimating the user's work by detecting contact between the work object and the target candidate object, which is a work target candidate.
- FIG. 2 shows an example of a software configuration of the work estimation device 200 according to Embodiment 1.
- FIG. 2 shows a hardware configuration example of the work estimation device 200 according to Embodiment 1.
- FIG. 4A and 4B are diagrams for explaining processing of a contact degree calculation unit 230 according to the first embodiment;
- FIG. A specific example of learning data D1 according to the first embodiment. 4 shows a configuration example of a learning device 400 according to Embodiment 1.
- FIG. 4 is a flowchart showing the operation of the learning device 400 according to Embodiment 1;
- FIG. 10 is a software configuration example of a work estimation device 200 according to a modification of Embodiment 1;
- FIG. 4 is a flowchart showing the operation of the work estimation device 200 according to the modified example of Embodiment 1;
- 2 shows an example of hardware configuration of a work estimation device 200 according to a modification of Embodiment 1.
- FIG. 1 shows an example configuration of the work estimation system 90 and an example software configuration of the work estimation device 200 .
- a work estimation system 90 includes a work estimation device 200 , an imaging device 300 , and a line-of-sight measurement device 350 .
- the black circles in the figure indicate that the lines touching the black circles are connected to each other. If no black circle is shown where the lines intersect, the lines are not touching each other.
- the imaging device 300 is a device that captures the state of the user's work, and is a camera as a specific example.
- a user is a target whose work is estimated by the work estimation system 90 .
- the user may check the details of the work or take a break, etc., without performing the work.
- the user does not have to be a person, such as a robot.
- the imaging device 300 transmits the image captured by the imaging device 300 to the work estimation device 200 as a captured image.
- the captured image may be a moving image or one or more still images.
- the captured image may be an RGB (Red-Green-Blue) image, a depth image, or both.
- the imaging device 300 may consist of multiple devices.
- the line-of-sight measurement device 350 is a device that measures the user's line of sight, and as a specific example, it is a device that includes a camera and is worn by the user on the head.
- the line-of-sight measurement device 350 transmits line-of-sight measurement information indicating the result of measuring the user's line of sight to the work estimation device 200 .
- the line-of-sight measurement device 350 may consist of a plurality of devices.
- the work estimation device 200 does not have to be directly connected to at least one of the imaging device 300 and the line-of-sight measurement device 350 .
- the work estimation device 200 may be connected to an external recording device such as a recorder that records data transmitted to the work estimation device 200 by at least one of the imaging device 300 and the line-of-sight measurement device 350. may receive information reproduced from pre-recorded data.
- the work estimation device 200 includes an object detection unit 210 , a gaze area estimation unit 220 , a contact degree calculation unit 230 , a work estimation unit 250 , and an estimation result storage unit 260 .
- the work estimation device 200 estimates the work performed by the user based on information from the imaging device 300 and the line-of-sight measurement device 350 .
- the object detection unit 210 detects objects and includes a work object detection unit 211 and a candidate object detection unit 215 .
- An object is a general term for target candidate objects and working objects.
- a target candidate object is a candidate for an object that is a user's work target.
- a work object is an object that a user uses in his or her work, such as the user's hand, the tool the user is using, or both.
- a work object may consist of multiple objects, such as a user's two hands or a user's hand and a tool.
- the object detection unit 210 detects the work object and at least one target candidate object from an image in which the work object used by the user and at least one target candidate object that is a candidate for the user's work target. to detect Images and videos are sometimes synonymous.
- the work object detection unit 211 detects work objects.
- the working object detection unit 211 includes a tool detection unit 212 and a hand detection unit 213 .
- the tool detection unit 212 detects tools used by the user based on the captured image.
- the hand detection unit 213 detects the user's hand based on the captured image.
- the candidate object detection unit 215 detects target candidate objects.
- the candidate object detector 215 is also called a target object detector.
- the gaze area estimation unit 220 estimates the gaze area using the information measured by the gaze measuring device 350 and indicating the user's gaze.
- a gaze area is an area where a user gazes.
- the gaze area may be a two-dimensional distribution with an arbitrary shape, a distribution having a maximum value at the viewpoint position, or a distribution set in advance. It may be a heat map that is calculated using.
- the viewpoint position is the position indicated by the line-of-sight measurement information, and is the position of the user's viewpoint.
- the gaze area estimation unit 220 may estimate the gaze area using time-series data indicating the position of the user's viewpoint.
- the gaze area may be an area determined according to the distance between the position of each target candidate object and the viewpoint position. As a specific example, within the area, the position of each target candidate object and the viewpoint position are within a predetermined range.
- the degree-of-contact calculation unit 230 calculates a degree of contact indicating the degree of contact between the working object and each candidate object included in at least one candidate object, based on the region of interest.
- the contact degree calculator 230 is also called an in-gazing-area target object positional relationship value calculator.
- the contact degree calculator 230 may calculate a weight corresponding to each target candidate object included in at least one target candidate object based on the region of interest, and obtain the contact degree using the calculated weight.
- the degree-of-contact calculation section 230 may obtain the degree of contact based on the distance between the work object and each target candidate object included in at least one target candidate object.
- the degree of contact may be obtained based on the arrangement of each candidate object included in the working object, and the degree of contact may be obtained based on the overlapping region between the working object and each candidate object included in at least one candidate object.
- the degree-of-contact calculation section 230 may obtain the degree of contact by appropriately combining the weighting based on the information indicating the region of interest and the working object and the candidate target object.
- the work estimation unit 250 estimates the user's work based on the output of the object detection unit 210 and the output of the contact degree calculation unit 230 .
- the work estimation unit 250 estimates the user's work based on the work object and the degree of contact.
- the work estimation unit 250 may estimate the user's work using a rule-based estimation method, or may use a learning model to estimate the user's work.
- the estimation result storage unit 260 stores the result of work estimation performed by the work estimation unit 250 .
- the estimation result storage unit 260 is also called a work estimation result storage unit, and may be outside the work estimation device 200 .
- FIG. 2 shows a hardware configuration example of the work estimation device 200.
- the work estimating device 200 consists of a computer 100 as shown in the figure.
- the computer 100 is composed of an arithmetic unit 101, a main memory device 102, an auxiliary memory device 103, a first interface 104, and a second interface 105, and is also called a computer.
- the work estimating device 200 may consist of a plurality of computers 100 .
- the arithmetic unit 101 is an IC (Integrated Circuit) that performs arithmetic processing and controls the hardware of the computer.
- the arithmetic unit 101 is, as a specific example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or a GPU (Graphics Processing Unit).
- the work estimating device 200 may include a plurality of computing devices that substitute for the computing device 101 .
- a plurality of arithmetic units share the role of the arithmetic unit 101 .
- the main memory device 102 is a device that temporarily stores signals from the arithmetic device 101 .
- the main storage device 102 is, as a specific example, a RAM (Random Access Memory).
- the data stored in the main storage device 102 is saved in the auxiliary storage device 103 as required.
- the auxiliary storage device 103 is a device for long-term storage of signals from the arithmetic device 101 .
- the auxiliary storage device 103 is, as a specific example, a ROM (Read Only Memory), a HDD (Hard Disk Drive), or a flash memory. Data stored in the auxiliary storage device 103 is loaded into the main storage device 102 as required.
- the main storage device 102 and the auxiliary storage device 103 may be configured integrally.
- the first interface 104 is a device that receives signals from the imaging device 300 connected to the computer 100 .
- the first interface 104 is a USB (Universal Serial Bus) terminal, or a communication device such as a communication chip or NIC (Network Interface Card).
- USB Universal Serial Bus
- NIC Network Interface Card
- the second interface 105 is an interface similar to the first interface 104 and is a device that receives signals from the line-of-sight measurement device 350 .
- the first interface 104 and the second interface 105 may be configured integrally.
- the auxiliary storage device 103 stores a work estimation program.
- the work estimating program is a program that causes a computer to implement the function of each unit included in the work estimating device 200 .
- the task estimation program is loaded into the main storage device 102 and executed by the arithmetic device 101 .
- the function of each unit included in the work estimation device 200 is implemented by software.
- Each unit of the work estimation device 200 uses a storage device as appropriate.
- the storage device comprises at least one of a main memory device 102 , an auxiliary memory device 103 , a register within the arithmetic device 101 , and a cache memory within the arithmetic device 101 .
- data and information may have the same meaning.
- the storage device may be independent of computer 100 .
- the estimation result storage unit 260 consists of a storage device. The functions of the main storage device 102 and auxiliary storage device 103 may be implemented by other storage devices.
- the work estimation program may be recorded on a computer-readable non-volatile recording medium.
- a nonvolatile recording medium is, for example, an optical disk or a flash memory.
- the work estimation program may be provided as a program product.
- the operation procedure of work estimation device 200 corresponds to the work estimation method.
- a program that implements the operation of the work estimation device 200 corresponds to a work estimation program.
- FIG. 3 is a flowchart showing an example of the operation of the work estimation device 200.
- FIG. The operation of the work estimating device 200 will be described with reference to this figure. In the description of this flowchart, it is assumed that one working object and one or more target candidate objects appear in the captured image.
- Step S101 Object detection processing
- the object detection unit 210 receives a captured image from the imaging device 300, detects a work object and target candidate objects appearing in the received captured image, and obtains information corresponding to each detected target candidate object.
- the information includes, as a specific example, attribute information indicating attributes of each candidate target object and information indicating an occupied area corresponding to each candidate target object.
- the occupied area is an area corresponding to the area occupied by each object in the captured image, and may be a rectangular area containing each object, or may be a set of pixels displaying each object.
- the method by which the object detection unit 210 detects the target candidate object may be a method using a marker attached to the target candidate object, or a machine learning-based method using a pre-learned model. good too. Also, the object detection unit 210 obtains an occupied area corresponding to the working object.
- Step S102 gaze area estimation process
- the gaze region estimation unit 220 receives the gaze measurement information from the gaze measurement device 350 and estimates the gaze region using the viewpoint position indicated by the received gaze measurement information.
- Step S103 contact degree calculation processing
- the degree-of-contact calculation section 230 calculates a contact index based on the working object and each candidate object detected by the object detection section 210 and the gaze area estimated by the gaze area estimation section 220 .
- the contact index quantifies the degree of contact between the working object and each candidate target object.
- the contact degree calculation unit 230 uses (1) distance, (2) overlapping area, (3) distance and direction 1, (4) overlapping area and direction, and (5) It is calculated based on either the distance or the second direction.
- the distance indicates that the degree-of-contact calculation unit 230 obtains the contact index corresponding to each candidate target object based on the shortness or length of the distance between the working object and each candidate target object.
- the overlapping area is determined by the contact degree calculation unit 230 based on the size of the area where the occupied area corresponding to each target candidate object and the occupied area corresponding to the work object overlap.
- the direction indicates that the degree-of-contact calculation unit 230 obtains the contact index corresponding to each candidate target object based on the direction of the work object with respect to the target candidate object and the direction of the work object.
- the degree-of-contact calculation unit 230 may calculate the contact index based on all of the distance, overlapping area, and direction. A specific example of calculating the contact index will be described below.
- the orientation of the working object corresponds to the placement of the working object.
- the orientation of the working object relative to the candidate target object is based on the placement of the candidate target object and the placement of the working object.
- the degree-of-contact calculation unit 230 calculates that the larger the overlapping area between the occupied area corresponding to the target candidate object and the gaze area and the greater the degree of gaze to the overlapping gaze area, The score is calculated such that the closer the target candidate object is, the larger the score value corresponding to the target candidate object. The score indicates the degree of contact.
- FIG. 4 schematically shows a specific example of the state of processing by the degree-of-contact calculation unit 230 using visual field images.
- the field-of-view video may be a video showing at least part of the user's field of view, and the captured image may be the field-of-view video.
- the gaze region estimation unit 220, the contact degree calculation unit 230, or the like may generate the field-of-view image based on the captured image.
- the contact degree calculation unit 230 executes processing based on the visual field image.
- the center of gravity go(i) is the center of gravity of the target candidate object C(i).
- the work object is a hand
- the center of gravity u is the position of the center of gravity of the hand.
- d(i) is the reciprocal of the distance value from the centroid u to the centroid go(i).
- FIG. 5 shows an example of the processing flow of the degree-of-contact calculation unit 230.
- FIG. 5 The processing of the degree-of-contact calculation unit 230 will be described with reference to this figure.
- An overlapping area Ov(i) indicates an area where the occupied area A(i) and the gaze area G overlap, and is a rectangle surrounding the area where the occupied area A(i) and the gaze area G overlap. It can be a region. Any area is not limited to a two-dimensional area, and may be a three-dimensional area.
- the degree-of-contact calculation unit 230 determines whether or not solids corresponding to each region overlap each other when obtaining overlapping regions.
- each area is two-dimensional
- the function Gf is a function that indicates the degree of user's gaze at each pixel included in the field-of-view image in the gaze area G.
- FIG. the function Gf is a function indicating the gaze distribution
- the function Gf(x, y) indicates the degree of user's gaze at the pixel corresponding to the coordinates (x, y).
- the function Gf(x, y) is a function whose function value is highest at the center point of the gaze area G and whose function value gradually decreases toward the edge of the gaze area G.
- contact degree calculation section 230 calculates weight W(i) corresponding to target candidate object C(i) as shown in [Formula 1]. The weight W(i) is calculated by dividing the integral value of the function Gf within the overlapping region Ov(i) by the number of pixels within the overlapping region Ov(i).
- the contact degree calculation unit 230 calculates a value corresponding to the distance between the work object and each target candidate object.
- the degree-of-contact calculation unit 230 calculates the reciprocal d(i) of the distance value indicating the distance as the value corresponding to the distance, as shown in [Equation 2].
- the degree-of-contact calculation unit 230 calculates the distance between the center of gravity u and the center of gravity go(i) of the work object as the distance between the work object and the target candidate object C(i). Calculate the inverse d(i).
- the degree-of-contact calculation unit 230 calculates a score that quantifies the degree of contact.
- the score S(i) is an index indicating the probability that the target candidate object C(i) is the user's work target. The higher the value of the score S(i), the higher the possibility that the target candidate object C(i) is the user's work target.
- Step S304 Contact degree calculator 230 outputs output information including the obtained score.
- the degree-of-contact calculation section 230 may rearrange the target candidate objects in descending order according to the corresponding score S, and link each target candidate object with the score corresponding to each target candidate object and output them.
- the degree-of-contact calculation section 230 may output only scores equal to or higher than a predetermined reference value.
- the output information output by the contact degree calculation unit 230 includes the attribute information of the target candidate object and the score corresponding to the target candidate object.
- the output information may include information of the occupied area of the target object.
- the occupied area information is, as a specific example, information indicating a set of position coordinates forming the occupied area.
- the degree-of-contact calculation unit 230 calculates the score such that the larger the area where the occupied area corresponding to the target candidate object and the occupied area corresponding to the work object overlap, the larger the score value corresponding to the target candidate object. do.
- FIG. 6 shows a specific example of how the degree of contact calculator 230 processes. This figure is similar to FIG.
- the occupied area U is an occupied area corresponding to a hand, which is a working object.
- Step S301 The contact degree calculator 230 calculates an occupation area A(i) corresponding to each target candidate object C(i).
- Step S302 Contact degree calculation section 230 calculates the size of the area where occupied area A(i) and occupied area U overlap.
- the contact degree calculator 230 calculates a score.
- the degree-of-contact calculation unit 230 calculates the occupied area A(i), the gaze area G, and the occupied area U instead of the area (A(i) ⁇ U) where the occupied area A(i) and the occupied area U overlap. may be calculated based on the region ((A(i) ⁇ G) ⁇ U) in which .
- the degree-of-contact calculation unit 230 calculates (1) a score having characteristics in distance, and the closer the direction of the work object to the target candidate object is, the closer the score value corresponding to the target candidate object is. Calculate the score so that
- FIG. 7 shows a specific example of how the contact degree calculator 230 processes. This figure is similar to FIG.
- the vector p i represents the relative position of the candidate target object C(i) with respect to the position of the working object.
- the direction of the vector pi is a direction starting from the center of gravity u and ending at the center of gravity go( i ).
- Vector h is a unit vector representing the orientation of the working object.
- the degree-of-contact calculation unit 230 uses the direction of the vector h as the direction of the first eigenvector obtained by performing principal component analysis on the region indicating the user's hand detected by the hand detection unit 213.
- the direction may be obtained using information indicating the joint positions of the user's fingers detected from the captured image or the field-of-view video.
- the degree-of-contact calculation unit 230 calculates the direction of the vector h using a predefined direction of the tool when the user uses the tool. good too.
- Step S301 The degree-of-contact calculation unit 230 executes the same process as in step S301 in (1) distance.
- Step S302 The degree-of-contact calculator 230 calculates values corresponding to the distance and the direction.
- the degree-of-contact calculation unit 230 first calculates a value corresponding to the distance in the same manner as in step S302 for (1) distance.
- the contact degree calculation unit 230 quantifies the degree of contact between the target candidate object and the work object by using the direction of the work object.
- the contact degree calculator 230 calculates the inner product ⁇ indicating the difference between the vector p and the vector h as the degree of contact, as shown in [Formula 3].
- the inner product ⁇ (i) represents the extent to which the working object faces the target candidate object C(i).
- the contact degree calculator 230 calculates a score.
- the function f is a function that associates the reciprocal d(i) of the distance value, which is an input variable, with the inner product ⁇ (i).
- the function f may be a function that linearly combines the input variables or a function that non-linearly associates the input variables.
- the degree-of-contact calculation unit 230 calculates a score that includes (2) the features of the score in the overlapping region and the features of the score calculated based on the direction as shown in (3) distance and direction.
- FIG. 8 shows a specific example of how the degree of contact calculator 230 processes. This figure is similar to FIGS. 4, 6 and 7.
- FIG. 8 shows a specific example of how the degree of contact calculator 230 processes. This figure is similar to FIGS. 4, 6 and 7.
- FIG. 8 shows a specific example of how the degree of contact calculator 230 processes. This figure is similar to FIGS. 4, 6 and 7.
- FIG. 8 shows a specific example of how the degree of contact calculator 230 processes. This figure is similar to FIGS. 4, 6 and 7.
- Step S301 The degree-of-contact calculation unit 230 executes the same process as in step S301 in (1) distance.
- Step S302 The degree-of-contact calculation unit 230 calculates the size of the overlapping area in the same manner as in step S302 in (2) the overlapping area. Further, the degree-of-contact calculation unit 230 quantifies the degree of contact between the target candidate object and the work object by using the direction of the work object, as in step S302 in (3) distance and direction. Hereinafter, in this step, it is assumed that the degree-of-contact calculation unit 230 has calculated the ratio A1(i) and the inner product ⁇ (i).
- Step S303 The contact degree calculator 230 calculates a score.
- the function f is the same as the function f described above.
- the contact degree calculator 230 may obtain a multidimensional vector as a score.
- the score S(i) for object C(i) is a two-dimensional vector shown below.
- S(i) [W(C(i)), f(d(i), ⁇ (i))]
- W(C(i)) indicates the weight by the gaze area
- f(d(i), ⁇ (i)) indicates a calculated value representing the positional relationship of the working object with respect to the object C(i).
- contact degree calculation section 230 may obtain W(C(i)) by the calculation methods shown in (1) to (4) above, or by the calculation method shown below.
- W(C(i))
- C(i)[x,y] represents the position of the object C(i)
- g(x,y) represents the viewpoint position. That is, the weight obtained by this formula is the weight according to the distance between the position of each object and the viewpoint position.
- Step S104 work estimation process
- the work estimating unit 250 uses the information indicating the tool output by the tool detecting unit 212 or the information indicating the user's hand output by the hand detecting unit 213 and the score output by the contact degree calculating unit 230 to determine whether the work performed by the user is performed. Estimate the work done. As a specific example, the work estimation unit 250 estimates the work performed by the user using either (i) a rule-based estimation method or (ii) a machine learning-based estimation method. Each estimation method will be specifically described below. It is assumed that the work object is the user's hand, object A, object B, and object C, which are target candidate objects, are input, and object A has the highest score.
- (i) Rule-Based Estimation Method Let us consider a case where task labels indicating tasks corresponding to combinations of each task object and each target candidate object are defined in advance.
- the work estimation unit 250 may estimate the user's work by searching for a work label corresponding to the combination of the user's "hand" and "object A". Also, consider a case where the work label is not defined, and one work label is assigned in advance to each of all combinations of input candidate target objects and scores. In this case, the work estimating unit 250 may estimate the work using all input candidate target objects and scores corresponding to each of the target candidate objects. Moreover, when both a hand and a tool appear in the captured image, the work estimation unit 250 determines that the tool is a work object and that the hand is not a work object. The task of the user may be estimated by determining and retrieving the task label corresponding to the combination of the tool and the target candidate object with which the tool is in contact.
- the work estimation unit 250 combines the information of the target candidate object appearing in the captured image and the score corresponding to the target candidate object by statistical machine learning. Estimate the user's work by inputting it into the machine.
- the work estimator 250 may use the action recognition method described in Patent Document 1.
- the work estimating unit 250 obtains, as a feature amount, data in which the information of the target candidate object and the information of the work object are associated, and uses the learned model to generate a label corresponding to the work or action corresponding to the obtained feature amount. infer.
- the work estimating unit 250 is a discriminator that can create learning data D1 associated by a graph structure or the like, and process the created learning data D1 in a graph structure such as a Graph Neural Network or graph embedding. Then, the user's work is estimated by inputting it into a learned discriminator in statistical machine learning.
- FIG. 9 shows a specific example of learning data D1 created using a graph.
- the nodes of the graph are labels representing objects
- the value of the edge between the node representing the work object and the node representing the target candidate object is the score value calculated by the contact degree calculation unit 230
- the value of the node between target candidate objects is an arbitrary fixed value c.
- the node may include information indicating the position of the object in the field-of-view image and information such as the size of the occupied area corresponding to the object.
- the node may include information on the position and orientation of the hand. If information indicating finger joints can be acquired, the node may include information indicating the joints.
- the node may include information indicating the position, direction, and occupied area of the tool in addition to the type of tool. Furthermore, when the work estimating device 200 acquires time-series data from the imaging device 300 and the line-of-sight measuring device 350, the work estimating unit 250 obtains the object detection results or object detection results corresponding to the data at each time in each time-series data. Data may be created using the positional relationship of , and a machine learning method may be employed in which the order of the created data in time series is taken into consideration.
- the work estimator 250 may use Temporal Convolutional Network as a specific example.
- FIG. 10 shows a configuration example of a learning device 400 for executing these processes.
- Learning device 400 includes learning data acquisition unit 410 , learning model generation unit 420 , and learning model storage unit 430 .
- the learning model storage unit 430 may be outside the learning device 400 .
- the learning device 400 may be configured integrally with the work estimation device 200 .
- the hardware configuration of the learning device 400 may be the same as that of the computer 100 .
- the learning data acquisition unit 410 acquires the learning data D1 as learning data.
- the learning data D1 is assumed to be data that can be input to the work estimation unit 250 .
- the data includes data that can be used when the work estimation unit 250 estimates the user's work.
- the learning model generation unit 420 builds a learning model that can process the data acquired by the learning data acquisition unit 410, and generates a learned model by executing learning based on the built learning model.
- a learning model is also called a machine learning model.
- the learning model storage unit 430 stores the learned model generated by the learning model generation unit 420.
- FIG. 11 shows an example of the flow of learning processing. The processing of the learning device 400 will be described with reference to this figure.
- Step S501 Learning data acquiring unit 410 acquires data that can be input to work estimating unit 250 as learning data, and uses the acquired learning data as information indicating an object that can be a work target, information indicating a work object, and information indicating a work object. It is expressed as data indicating information in a format in which information indicating an object to be obtained and information indicating an object for work are associated with each other.
- the learning data includes at least one of information indicating the gaze region and information indicating the score corresponding to the object.
- the learning data acquisition unit 410 may use, as a specific example, data whose elements are values indicating the positional relationship corresponding to each object. may The learning data acquisition unit 410 assigns labels representing work behaviors to the generated data.
- the learning model generation unit 420 generates a learning model by processing the learning data acquired by the learning data acquisition unit 410 .
- the learning data acquisition unit 410 may use a machine learning model such as a Graph Neural Network capable of processing a graph structure as a learning model, vectorize the learning data using a graph embedding method, A model that can then process the vectorized training data may then be utilized.
- the learning model generation unit 420 may use a model such as a Temporal Convolutional Network as a specific example when performing learning considering the relevance of data at each point in the time-series data.
- the learning model storage unit 430 stores a learned learning model generated by the learning model generation unit 420 .
- Step S105 Estimation result storage processing
- the estimation result storage unit 260 stores the output of the work estimation unit 250.
- the user's gaze area is used to narrow down the work target candidates. , obtains a score corresponding to an object that the user touches with a hand or a tool from among the narrowed down candidates, and estimates the user's work based on the obtained score.
- the degree-of-contact calculation unit 230 narrows down target candidate objects based on the region of interest G through the processes from step S301 to step S304, and then detects contact with the narrowed-down candidate objects. process.
- the present embodiment it is possible not only to estimate the work performed by the user with relatively high accuracy, but also to prevent occlusion by the work object. Therefore, according to the present embodiment, the robustness of estimating the work of the user is enhanced.
- a device to be maintained and inspected is an object to be worked on
- a plurality of inspection locations that are candidates for work to be inspected are often close to each other. Therefore, it is difficult to estimate the inspection work corresponding to the specified location only from the combination of objects.
- target candidates are narrowed down by the region of interest, and then contact between a hand or a tool and a target candidate object is detected. Therefore, even if the target candidate objects are close to each other, it is possible to estimate the user's work on the object that is the user's work target with relatively high accuracy.
- the work estimating unit 250 may estimate the work by utilizing not only the score but also other information.
- the other information is, as a specific example, at least one of attribute information of each target candidate object and attribute information of the working object.
- the attribute information is, as a specific example, at least one of position information of the object, scale information, shape of the object, and detection certainty (described later). Also, a case where the other information is attribute information will be described.
- the learning data acquisition unit 410 acquires, as learning data, information including the attribute information of each target candidate object and the attribute information of the working object.
- the learning model generation unit 420 generates a learning model by processing learning data including attribute information of each target candidate object and attribute information of the working object.
- the degree-of-contact calculation section 230 may calculate a score based on the degree of contact between a part other than the user's hand and the target object.
- the degree-of-contact calculator 230 may calculate the score by considering the degree of contact between the tool and the user's hand. According to this modified example, it is possible to prevent the work estimation device 200 from erroneously recognizing that a tool that is displayed in the field-of-view image but left unattended is the tool that the user is using.
- the work estimation unit 250 may estimate the user's work when the user is working using a plurality of work objects. According to this modified example, the work estimation unit 250 can appropriately estimate the user's work even when the user is working with both hands.
- FIG. 12 shows a configuration example of the work estimation system 90 and a software configuration example of the work estimation device 200 according to this modification. Differences between the first embodiment and this modification will be mainly described below.
- the work estimation device 200 includes a work behavior information calculation unit 240 in addition to the constituent elements of the work estimation device 200 according to the first embodiment.
- the object detection unit 210 obtains the detection certainty for each target candidate object included in at least one target candidate object.
- the detection confidence is a value indicating the degree of accuracy of estimation of a detected target candidate object, and the higher the detection confidence, the more accurate the estimation of the target candidate object corresponding to the detection confidence.
- the detection confidence is, for example, an object classification probability calculated by a general object detection method such as SSD (Single Shot Multibox Detector) or Faster R-CNN (Convolutional Neural Network).
- the work behavior information calculation unit 240 obtains the update confidence by updating the detection confidence using the degree of contact.
- the work action information calculator 240 is also called a target object score updater.
- the update confidence is an index based on the degree of contact.
- the work estimation unit 250 estimates the work of the user based on the output of the object detection unit 210 and the output of the work behavior information calculation unit 240 .
- FIG. 13 is a flowchart showing an example of the operation of the work estimation device 200.
- FIG. The operation of the work estimating device 200 will be described with reference to this figure.
- Step S101 Object detection processing
- the processing in this step is the same as the processing in step S101 according to the first embodiment.
- the object detection unit 210 obtains information including detection certainty as information corresponding to each detected target candidate object.
- Step S111 Work action information calculation process
- the work behavior information calculation unit 240 calculates the detection confidence corresponding to each target candidate object output by the candidate object detection unit 215, and the score output by the contact degree calculation unit 230, which is associated with each target candidate object.
- the update confidence factor is calculated by updating using the calculated update confidence factor, and the calculated update confidence factor is output as a score.
- the work estimation device 200 estimates the work of the user in consideration of not only the degree to which the user is in contact with the work target object, but also the detection certainty calculated by the candidate object detection unit 215. be able to.
- the work action information calculation unit 240 may hold both the detection confidence and the score associated with the target candidate object calculated by the contact degree calculation unit 230 . Further, the work behavior information calculation unit 240 may use at least one of the position information of each object and the scale information of each object when calculating the update confidence, and may use other information about each object. good too.
- Step S104 work estimation process
- the processing in this step is the same as the processing in step S104 according to the first embodiment.
- the work estimation unit 250 uses the score output by the work action information calculation unit 240 instead of the score output by the contact degree calculation unit 230 .
- FIG. 14 shows a hardware configuration example of a work estimation device 200 according to this modification.
- the task estimation device 200 includes a processing circuit 108 in place of at least one of the arithmetic device 101, the main storage device 102, and the auxiliary storage device 103, as shown in the figure.
- the processing circuit 108 is hardware that implements at least part of each unit included in the work estimation device 200 .
- the processing circuit 108 may be dedicated hardware, or may be a processor that executes programs stored in the main memory device 102 .
- processing circuit 108 When processing circuit 108 is dedicated hardware, processing circuit 108 may be, by way of example, a single circuit, multiple circuits, a programmed processor, a parallel programmed processor, an ASIC (ASIC is an Application Specific Integrated Circuit), an FPGA. (Field Programmable Gate Array) or a combination thereof.
- Work estimation device 200 may include a plurality of processing circuits that substitute for processing circuit 108 . A plurality of processing circuits share the role of processing circuit 108 .
- Processing circuit 108 is implemented by hardware, software, firmware, or a combination thereof, as a specific example.
- the arithmetic device 101, the main memory device 102, the auxiliary memory device 103, and the processing circuit 108 are collectively referred to as "processing circuitry.” That is, the function of each functional component of work estimation device 200 is realized by processing circuitry. Other devices described herein may be similar to this variation.
- Embodiment 1 has been described, a plurality of portions of this embodiment may be combined for implementation. Alternatively, this embodiment may be partially implemented. In addition, the present embodiment may be modified in various ways as necessary, and may be implemented in any combination as a whole or in part.
- the above-described embodiments are essentially preferable examples, and are not intended to limit the scope of the present disclosure, its applications, and uses. The procedures described using flowcharts and the like may be changed as appropriate.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Psychiatry (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Social Psychology (AREA)
- Computational Linguistics (AREA)
- Image Analysis (AREA)
Abstract
Description
特許文献1は、ユーザの一人称視点映像において、対象物の検出結果と注意マップとを利用してユーザの注目する注目物体を検出し、注目物体と非注目物体との情報の組み合わせに基づいて行動認識を行うことによって、ユーザの行動の認識精度を向上させる技術を開示している。
ユーザの視線を示す情報を用いて前記ユーザが注視する領域である注視領域を推定する注視領域推定部と、
前記ユーザが使用している作業用物体と、前記ユーザの作業対象の候補である少なくとも1つの対象候補物体とが映っている映像から、前記作業用物体と、前記少なくとも1つの対象候補物体とを検出する物体検出部と、
前記注視領域に基づいて、前記作業用物体と前記少なくとも1つの対象候補物体が含む各対象候補物体とが接触している度合を示す接触度合を算出する接触度合算出部と、
前記作業用物体と前記接触度合とに基づいて前記ユーザの作業を推定する作業推定部と
を備える。
以下、本実施の形態について、図面を参照しながら詳細に説明する。
図1は、作業推定システム90の構成例と、作業推定装置200のソフトウェア構成例とを示している。作業推定システム90は、作業推定装置200と、撮像装置300と、視線計測装置350とを備える。本図中の黒い丸については、黒い丸に接している線が互いに接続していることを示す。複数の線が交差している箇所に黒い丸が示されていない場合、当該複数の線は互いに接していない。
主記憶装置102及び補助記憶装置103の機能は、他の記憶装置によって実現されてもよい。
作業推定装置200の動作手順は、作業推定方法に相当する。また、作業推定装置200の動作を実現するプログラムは、作業推定プログラムに相当する。
物体検出部210は、撮像装置300から撮像画像を受信し、受信した撮像画像に映っている作業用物体と対象候補物体とを検出し、検出した各対象候補物体に対応する情報を求める。当該情報は、具体例として、各対象候補物体の属性を示す属性情報と、各対象候補物体に対応する占有領域とを示す情報を含む。占有領域は、撮像画像において各物体が占有している領域に対応する領域であり、各物体を包含する矩形の領域であってもよく、各物体を表示する画素の集合であってもよい。物体検出部210が対象候補物体を検出する手法は、対象候補物体に付与されたマーカーを利用する手法であってもよく、事前に学習を済ませたモデルを用いた機械学習ベースの手法であってもよい。また、物体検出部210は作業用物体に対応する占有領域を求める。
注視領域推定部220は、視線計測装置350から視線計測情報を受信し、受信した視線計測情報が示す視点位置を用いて注視領域を推定する。
接触度合算出部230は、物体検出部210が検出した作業用物体及び各対象候補物体と、注視領域推定部220が推定した注視領域とに基づいて接触指標を算出する。接触指標は、作業用物体と各対象候補物体との接触の度合を定量化したものである。
以下、接触度合算出部230が距離に基づいて接触指標を求める具体例を説明する。接触度合算出部230は、対象候補物体に対応する占有領域と、注視領域とが重複している領域が大きくかつ重複している注視領域への注視の度合が大きいほど、また、作業用物体と対象候補物体とが近いほど、対象候補物体に対応するスコアの値が大きくなるようスコアを算出する。スコアは接触度合を示す。
本図において、占有領域A(i)(i=1,2,3)は、対象候補物体C(i)に対応する占有領域であり、対象候補物体C(i)を囲む矩形領域である。重心go(i)は対象候補物体C(i)の重心である。作業用物体は手であり、重心uは手の重心位置である。d(i)は、重心uから重心go(i)までの距離値の逆数である。
接触度合算出部230は、注視領域Gを示す情報を用いて、各対象候補物体C(i)(i=1,2,…)に対する重みを算出する。
具体例として、まず、接触度合算出部230は、各対象候補物体C(i)についての重複領域Ov(i)(=A(i)∩G)を算出する。重複領域Ov(i)は、占有領域A(i)と、注視領域Gとが重複している領域を示し、占有領域A(i)と、注視領域Gとが重複している領域を囲む矩形領域であってもよい。いずれの領域も、2次元の領域に限られず、3次元の領域であってもよい。各領域が3次元の領域である場合、接触度合算出部230は、重複領域を求める際に各領域に対応する立体同士が重複しているか否かを判定する。以下、各領域は2次元であるものとし、関数Gfを注視領域Gにおいて視界映像が含む各画素におけるユーザの注視の度合を示す関数とする。即ち、関数Gfは注視分布を示す関数であり、関数Gf(x,y)は座標(x,y)に対応する画素におけるユーザの注視の度合を示すものとする。関数Gf(x,y)は、具体例として、注視領域Gの中心地点における関数値が最も高く、注視領域Gの端に向かって関数値が次第に小さくなる関数である。なお、重複領域Ov(i)の面積が0である場合、重複領域Ov(i)に対応する対象候補物体C(i)については以下のステップの処理を実行しなくてもよい。即ち、接触度合算出部230は、本ステップにおいて注視領域Gに基づいて対象候補物体C(i)を絞りこんでもよい。
次に、接触度合算出部230は、対象候補物体C(i)に対応する重みW(i)を[数1]に示すように算出する。重みW(i)は、重複領域Ov(i)内における関数Gfの積分値を重複領域Ov(i)内の画素の数で除すことにより算出される。
接触度合算出部230は、作業用物体と各対象候補物体との距離に対応する値を算出する。
具体例として、接触度合算出部230は、[数2]に示すように、距離に対応する値として、距離を示す距離値の逆数d(i)を算出する。本例において、接触度合算出部230は、作業用物体と対象候補物体C(i)との距離として作業用物体の重心uと重心go(i)との距離を算出し、算出した距離値の逆数d(i)を算出する。
接触度合算出部230は接触の度合を定量化したスコアを算出する。
接触度合算出部230は、具体例として、重みW(i)と、距離値の逆数d(i)とを用いて、対象候補物体C(i)に対応するスコアS(i)(=W(i)・d(i))を算出する。スコアS(i)は、対象候補物体C(i)がユーザの作業対象である確からしさを示す指標である。スコアS(i)の値が大きいほど、対象候補物体C(i)がユーザの作業対象である可能性が高い。
接触度合算出部230は、求めたスコアを含む出力情報を出力する。接触度合算出部230は、対応するスコアSに応じて、降順に対象候補物体を並べ替え、各対象候補物体と各対象候補物体に対応するスコアとを紐づけて出力してもよい。接触度合算出部230は、所定の基準値以上のスコアのみを出力してもよい。
以下、接触度合算出部230が出力する出力情報は、対象候補物体の属性情報と、対象候補物体に対応するスコアとを含むものとする。出力情報は、対象物体の占有領域の情報を含んでもよい。占有領域の情報は、具体例として、占有領域を構成する位置座標の集合を示す情報である。
以下、接触度合算出部230が重複領域に基づいて接触指標を求める具体例を説明する。接触度合算出部230は、対象候補物体に対応する占有領域と、作業用物体に対応する占有領域とが重複する領域が大きいほど、対象候補物体に対応するスコアの値が大きくなるようスコアを算出する。
接触度合算出部230は、各対象候補物体C(i)に対応する占有領域A(i)を算出する。
接触度合算出部230は、占有領域A(i)と占有領域Uとが重複する領域の大きさを算出する。
具体例として、接触度合算出部230は、占有領域A(i)の面積に対する、占有領域A(i)と占有領域Uとが重複する領域の面積の割合A1(i)(=|A(i)∩U|/|A(i)|)を算出する。
接触度合算出部230はスコアを算出する。
接触度合算出部230は、具体例として、割合A1(i)に基づいて、スコアS(i)(=W(i)・A1(i))を算出する。
なお、接触度合算出部230は、占有領域A(i)と占有領域Uとが重複する領域(A(i)∩U)の代わりに、占有領域A(i)と注視領域Gと占有領域Uとが重複する領域((A(i)∩G)∩U)に基づいて割合A1(i)を算出してもよい。
以下、接触度合算出部230が距離及び方向に基づいて接触指標を求める具体例を説明する。接触度合算出部230は、(1)距離における特徴を有するスコアであって、対象候補物体に対する作業用物体の方向と、作業用物体の方向とが近いほど、対象候補物体に対応するスコアの値が大きくなるようスコアを算出する。
接触度合算出部230は、(1)距離におけるステップS301と同様の処理を実行する。
接触度合算出部230は、距離及び方向それぞれに対応する値を算出する。
具体例として、まず、接触度合算出部230は、(1)距離におけるステップS302と同様に距離に対応する値を算出する。
次に、接触度合算出部230は、作業用物体の方向を利用することにより、対象候補物体と作業用物体との接触の度合を定量化する。具体例として、接触度合算出部230は、当該接触の度合として、ベクトルpと、ベクトルhとの差を示す内積Δを[数3]に示すように算出する。本例において、作業用物体の方向が対象候補物体C(i)の重心を指す方向に近いほど、接触の度合である内積Δ(i)の値が大きい。また、内積Δ(i)は、作業用物体が対象候補物体C(i)の方向を向いている程度を表す。
接触度合算出部230はスコアを算出する。
接触度合算出部230は、具体例として、重みW(i)と、距離値の逆数d(i)と、内積Δ(i)とに基づいてスコアS(i)(=W(i)・f(d(i),Δ(i)))を算出する。ここで、関数fは、入力変数である距離値の逆数d(i)と内積Δ(i)とを関連付ける関数である。関数fは、入力変数を線形結合する関数であってもよく、入力変数を非線形に関連付ける関数であってもよい。
以下、接触度合算出部230が重複領域及び方向に基づいて接触指標を求める具体例を説明する。接触度合算出部230は、(2)重複領域におけるスコアが有する特徴と、(3)距離及び方向において示したように方向に基づいて算出したスコアが有する特徴とを有するスコアを算出する。
接触度合算出部230は、(1)距離におけるステップS301と同様の処理を実行する。
接触度合算出部230は、(2)重複領域におけるステップS302と同様に重複する領域の大きさを算出する。また、接触度合算出部230は、(3)距離及び方向におけるステップS302と同様に、作業用物体の方向を利用することにより、対象候補物体と作業用物体との接触の度合を定量化する。
以下、本ステップにおいて、接触度合算出部230は、割合A1(i)と内積Δ(i)を算出したものとする。
接触度合算出部230はスコアを算出する。
接触度合算出部230は、具体例として、割合A1(i)と内積Δ(i)とに基づいて、スコアS(i)(=W(i)・f(A1(i),Δ(i)))を算出する。ここで、関数fは前述の関数fと同様である。
接触度合算出部230は、多次元のベクトルをスコアとして求めてもよい。具体例として、物体C(i)に対するスコアS(i)は、以下に示す2次元ベクトルである。
S(i)=[W(C(i)),f(d(i),Δ(i))]
ここで、W(C(i))は注視領域による重みを示し、f(d(i),Δ(i))は物体C(i)に対する作業用物体の位置関係を代表する計算値を示す。このとき、接触度合算出部230は、W(C(i))を、上記(1)から(4)に示す計算方法によって求めてもよく、以下に示す計算方法によって求めてもよい。
W(C(i))=|C(i)[x,y]-g(x,y)|
ここで、C(i)[x,y]は物体C(i)の位置を表し、g(x,y)は視点位置を表す。即ち、この式によって求まる重みは、各物体の位置と視点位置との距離に従う重みである。
作業推定部250は、道具検出部212が出力した道具を示す情報又は手検出部213が出力したユーザの手を示す情報と、接触度合算出部230が出力したスコアとを用いて、ユーザが行っている作業を推定する。作業推定部250は、具体例として、(i)ルールベースの推定手法と(ii)機械学習ベースの推定手法とのいずれかを用いてユーザが行っている作業を推定する。以下、それぞれの推定手法について具体的に説明する。なお、作業用物体がユーザの手であり、対象候補物体である物体Aと物体Bと物体Cとが入力され、物体Aのスコアが最大であるものとする。
各作業用物体と各対象候補物体と組み合わせに対応する作業を示す作業ラベルがあらかじめ定義されている場合を考える。この場合において、作業推定部250は、ユーザの「手」と「物体A」との組み合わせに対応する作業ラベルを検索することにより、ユーザの作業を推定してもよい。
また、当該作業ラベルが定義されておらず、入力された対象候補物体とスコアとの全ての組み合わせそれぞれに対して、作業ラベルが1つ予め割り当てられている場合を考える。この場合において、作業推定部250は、入力された全ての対象候補物体と各対象候補物体に対応するスコアとを利用して作業を推定してもよい。
また、作業推定部250は、手と道具との両方が撮像画像内に出現している場合に、当該道具は作業用物体であり、かつ、当該手は作業用物体ではない可能性が高いと判断し、当該道具と当該道具が接触している対象候補物体との組み合わせに対応する作業ラベルを検索することによりユーザの作業を推定してもよい。
作業推定部250は、撮像画像に出現した対象候補物体の情報と、対象候補物体に対応するスコアとの組み合わせを、統計的機械学習によって学習した学習済みの識別機に入力することによってユーザの作業を推定する。作業推定部250は、特許文献1に記載の行動認識手法を利用してもよい。
作業推定部250は、対象候補物体の情報と作業用物体の情報とを関連付けたデータを特徴量として求め、学習済モデルを用いて、求めた特徴量に対応する作業又は行動に対応するラベルを推論する。作業推定部250は、具体例として、グラフ構造等によって関連付けた学習データD1を作成し、作成した学習データD1を、Graph Neural Network又はグラフ埋め込み等のグラフ構造を処理することができる識別機であって、統計的機械学習における学習済みの識別機に入力することによりユーザの作業を推定する。
さらに、作業推定装置200が撮像装置300及び視線計測装置350から時系列データを取得しているとき、作業推定部250は、各時系列データにおける各時刻のデータに対応する物体の検出結果又は物体の位置関係を用いてデータを作成し、作成したデータの時系列における順序を考慮した機械学習手法を採用してもよい。作業推定部250は、具体例として、Temporal Convolutional Networkを用いてもよい。
学習データ取得部410は、作業推定部250に入力され得るデータを学習データとして取得し、取得した学習データを、作業対象となり得る物体を示す情報と、作業用物体を示す情報と、作業対象となり得る物体を示す情報と作業用物体とを示す情報とを関連付けた形式の情報とを示すデータとして表現する。学習データは、注視領域を示す情報と、物体に対応するスコアを示す情報との少なくともいずれかを含む。学習データ取得部410は、複数の情報を関連付ける際に、具体例として、各物体に対応する位置関係を示す値を要素としたデータを用いてもよく、図9に示すようなグラフ構造を用いてもよい。学習データ取得部410は、生成したデータに対して作業行動を表すラベルを付与する。
学習モデル生成部420は、学習データ取得部410が取得した学習データを処理することによって学習モデルを生成する。
具体例として、学習データがグラフ構造を示す場合を考える。この場合において、学習データ取得部410は、グラフ構造を処理することができるGraph Neural Network等の機械学習モデルを学習モデルとして利用してもよく、グラフ埋め込み手法を利用して学習データをベクトル化し、その後、ベクトル化した学習データを処理することができるモデルを利用してもよい。学習モデル生成部420は、時系列データの各時点におけるデータの関連性も考慮して学習する場合、具体例として、Temporal Convolutional Network等のモデルを活用してもよい。
学習モデル記憶部430は、学習モデル生成部420が生成した学習済みである学習モデルを記憶する。
推定結果記憶部260は、作業推定部250の出力を記憶する。
以上のように、本実施の形態によれば、ユーザ視点の映像内に存在する作業用物体と各対象候補物体との組み合わせに加えて、ユーザの注視領域を利用して作業対象の候補を絞り込み、絞り込んだ候補の中でユーザが手又は道具により接触している物体に対応するスコアを求め、求めたスコアに基づいてユーザの作業を推定する。具体的には、接触度合算出部230は、ステップS301からステップS304までの処理によって、注視領域Gに基づいて対象候補物体を絞りこみ、その後、絞り込んだ候補の物体に対する接触を検出するという2段階の処理を実行する。そのため、本実施の形態によれば、ユーザが行う作業を比較的高い精度で推定できるだけでなく、作業用物体によるオクルージョンを防止することができる。従って、本実施の形態によれば、ユーザの作業の推定に対する頑健性が高まる。
また、保守点検の対象である機器が作業対象の物体である場合、作業対象の候補である複数の点検箇所が互いに近接していることが多い。そのため、物体の組み合わせのみによって、特定した箇所に対応する点検作業を推定することは困難である。本実施の形態によれば、注視領域によって対象候補を絞り込み、その後、手又は道具と対象候補物体との接触を検出する。そのため、対象候補物体が互いに近接していたとしても、ユーザの作業の対象である物体に対するユーザの作業を比較的高い精度で推定することができる。
<変形例1>
作業推定部250は、スコアだけでなく他の情報を活用して作業を推定してもよい。他の情報は、具体例として、対象候補物体各々の属性情報と、作業用物体の属性情報との少なくともいずれかである。ここで、属性情報は、具体例として、物体の位置情報と、スケール情報と、物体の形状と、後述の検出確信度との少なくともいずれかである。
また、他の情報が属性情報である場合について説明する。学習データ取得部410は、対象候補物体各々の属性情報と、作業用物体の属性情報とを含む情報を学習データとして取得する。学習モデル生成部420は、対象候補物体各々の属性情報及び作業用物体の属性情報を含む学習データを処理することにより学習モデルを生成する。
接触度合算出部230は、ユーザの手以外の部位と対象物体との接触の度合に基づいてスコアを算出してもよい。
接触度合算出部230は、道具とユーザの手との接触の度合を考慮してスコアを算出してもよい。
本変形例によれば、視界映像に映っているものの放置されている道具を、ユーザが使用している道具であると作業推定装置200が誤って認識することを防ぐことができる。
作業推定部250は、ユーザが複数の作業用物体を用いて作業している場合におけるユーザの作業を推定してもよい。
本変形例によれば、作業推定部250は、ユーザが両手を用いて作業している場合においても、ユーザの作業を適切に推定することができる。
図12は、本変形例に係る作業推定システム90の構成例と作業推定装置200のソフトウェア構成例とを示している。以下、実施の形態1と本変形例との差異を主に説明する。
物体検出部210は、少なくとも1つの対象候補物体が含む各対象候補物体について、検出確信度を求める。検出確信度は、検出された対象候補物体の推定の正確さの度合を示す値であり、検出確信度が高いほど、検出確信度に対応する対象候補物体の推定が正確である。検出確信度は、具体例として、SSD(Single Shot Multibox Detector)又はFaster R-CNN(Convolutional Neural Network)等の一般的な物体検知手法により算出された物体の分類確率である。
作業行動情報計算部240は、接触度合を用いて検出確信度を更新することによって更新確信度を求める。作業行動情報計算部240は、対象物体スコア更新部とも呼ばれる。更新確信度は接触度合に基づく指標である。
作業推定部250は、物体検出部210の出力と、作業行動情報計算部240の出力とに基づいてユーザの作業を推定する。
本ステップにおける処理は実施の形態1に係るステップS101における処理と同様である。ただし、物体検出部210は、検出した各対象候補物体に対応する情報として、検出確信度を含む情報を求める。
作業行動情報計算部240は、候補物体検出部215が出力した各対象候補物体に対応する検出確信度を、接触度合算出部230が出力したスコアであって、各対象候補物体に紐づくスコアを用いて更新することにより更新確信度を算出し、算出した更新確信度をスコアとして出力する。
本ステップの処理により、作業推定装置200は、作業対象である物体にユーザが接触している度合だけでなく、候補物体検出部215が算出した検出確信度も考慮してユーザの作業を推定することができる。作業行動情報計算部240は、検出確信度と、接触度合算出部230が算出した対象候補物体に紐づくスコアとの両方を保持してもよい。また、作業行動情報計算部240は、更新確信度を算出する際に、各物体の位置情報と各物体のスケール情報との少なくともいずれかを用いてもよく、各物体に関するその他の情報を用いてもよい。
本ステップにおける処理は実施の形態1に係るステップS104における処理と同様である。ただし、作業推定部250は、接触度合算出部230が出力したスコアの代わりに作業行動情報計算部240が出力したスコアを用いる。
図14は、本変形例に係る作業推定装置200のハードウェア構成例を示している。
作業推定装置200は、本図に示すように、演算装置101と主記憶装置102と補助記憶装置103との少なくとも1つに代えて、処理回路108を備える。
処理回路108は、作業推定装置200が備える各部の少なくとも一部を実現するハードウェアである。
処理回路108は、専用のハードウェアであってもよく、また、主記憶装置102に格納されるプログラムを実行するプロセッサであってもよい。
作業推定装置200は、処理回路108を代替する複数の処理回路を備えてもよい。複数の処理回路は、処理回路108の役割を分担する。
演算装置101と主記憶装置102と補助記憶装置103と処理回路108とを、総称して「プロセッシングサーキットリー」という。つまり、作業推定装置200の各機能構成要素の機能は、プロセッシングサーキットリーにより実現される。本明細書に記載されている他の装置についても、本変形例と同様であってもよい。
実施の形態1について説明したが、本実施の形態のうち、複数の部分を組み合わせて実施しても構わない。あるいは、本実施の形態を部分的に実施しても構わない。その他、本実施の形態は、必要に応じて種々の変更がなされても構わず、全体としてあるいは部分的に、どのように組み合わせて実施されても構わない。
なお、前述した実施の形態は、本質的に好ましい例示であって、本開示と、その適用物と、用途の範囲とを制限することを意図するものではない。フローチャート等を用いて説明した手順は、適宜変更されてもよい。
Claims (12)
- ユーザの視線を示す情報を用いて前記ユーザが注視する領域である注視領域を推定する注視領域推定部と、
前記ユーザが使用している作業用物体と、前記ユーザの作業対象の候補である少なくとも1つの対象候補物体とが映っている映像から、前記作業用物体と、前記少なくとも1つの対象候補物体とを検出する物体検出部と、
前記注視領域に基づいて、前記作業用物体と前記少なくとも1つの対象候補物体が含む各対象候補物体とが接触している度合を示す接触度合を算出する接触度合算出部と、
前記作業用物体と前記接触度合とに基づいて前記ユーザの作業を推定する作業推定部と
を備える作業推定装置。 - 前記物体検出部は、前記少なくとも1つの対象候補物体が含む各対象候補物体について、前記ユーザの作業対象であると推定される度合を示す検出確信度を求め、
前記作業推定装置は、さらに、
前記接触度合を用いて前記検出確信度を更新することによって更新確信度を求める作業行動情報計算部を備え、
前記作業推定部は、前記更新確信度を用いて前記ユーザの作業を推定する請求項1に記載の作業推定装置。 - 前記接触度合算出部は、前記注視領域に基づいて前記少なくとも1つの対象候補物体が含む各対象候補物体に対応する重みを算出し、算出した重みを用いて前記接触度合を求める請求項2に記載の作業推定装置。
- 前記注視領域推定部は、前記ユーザの視点の位置を示す時系列データを用いて前記注視領域を推定する請求項1から3のいずれか1項に記載の作業推定装置。
- 前記接触度合算出部は、前記作業用物体と前記少なくとも1つの対象候補物体が含む各対象候補物体との距離に基づいて前記接触度合を求める請求項1から4のいずれか1項に記載の作業推定装置。
- 前記接触度合算出部は、前記作業用物体の配置と前記少なくとも1つの対象候補物体が含む各対象候補物体の配置とに基づいて前記接触度合を求める請求項1から5のいずれか1項に記載の作業推定装置。
- 前記接触度合算出部は、前記作業用物体と前記少なくとも1つの対象候補物体が含む各対象候補物体とが重複する領域に基づいて前記接触度合を求める請求項1から6のいずれか1項に記載の作業推定装置。
- 前記作業用物体は、前記ユーザの手又は前記ユーザが使用している道具である請求項1から7のいずれか1項に記載の作業推定装置。
- 前記作業推定部は、ルールベースの推定手法により前記ユーザの作業を推定する請求項1から8のいずれか1項に記載の作業推定装置。
- 前記作業推定部は、学習モデルを用いて前記ユーザの作業を推定する請求項1から8のいずれか1項に記載の作業推定装置。
- 注視領域推定部が、ユーザの視線を示す情報を用いて前記ユーザが注視する領域である注視領域を推定し、
物体検出部が、前記ユーザが使用している作業用物体と、前記ユーザの作業対象の候補である少なくとも1つの対象候補物体とが映っている映像から、前記作業用物体と、前記少なくとも1つの対象候補物体とを検出し、
接触度合算出部が、前記注視領域に基づいて、前記作業用物体と前記少なくとも1つの対象候補物体が含む各対象候補物体とが接触している度合を示す接触度合を算出し、
作業推定部が、前記作業用物体と前記接触度合とに基づいて前記ユーザの作業を推定する作業推定方法。 - ユーザの視線を示す情報を用いて前記ユーザが注視する領域である注視領域を推定する注視領域推定処理と、
前記ユーザが使用している作業用物体と、前記ユーザの作業対象の候補である少なくとも1つの対象候補物体とが映っている映像から、前記作業用物体と、前記少なくとも1つの対象候補物体とを検出する物体検出処理と、
前記注視領域に基づいて、前記作業用物体と前記少なくとも1つの対象候補物体が含む各対象候補物体とが接触している度合を示す接触度合を算出する接触度合算出処理と、
前記作業用物体と前記接触度合とに基づいて前記ユーザの作業を推定する作業推定処理とをコンピュータである作業推定装置に実行させる作業推定プログラム。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/003099 WO2022162844A1 (ja) | 2021-01-28 | 2021-01-28 | 作業推定装置、作業推定方法、及び、作業推定プログラム |
JP2022577924A JP7254262B2 (ja) | 2021-01-28 | 2021-01-28 | 作業推定装置、作業推定方法、及び、作業推定プログラム |
CN202180091309.4A CN116745808A (zh) | 2021-01-28 | 2021-01-28 | 作业估计装置、作业估计方法和作业估计程序 |
DE112021006095.3T DE112021006095B4 (de) | 2021-01-28 | 2021-01-28 | Arbeitsermittlungsvorrichtung, arbeitsermittlungsverfahren und arbeitsermittlungsprogramm |
US18/210,948 US20230326251A1 (en) | 2021-01-28 | 2023-06-16 | Work estimation device, work estimation method, and non-transitory computer readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/003099 WO2022162844A1 (ja) | 2021-01-28 | 2021-01-28 | 作業推定装置、作業推定方法、及び、作業推定プログラム |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/210,948 Continuation US20230326251A1 (en) | 2021-01-28 | 2023-06-16 | Work estimation device, work estimation method, and non-transitory computer readable medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022162844A1 true WO2022162844A1 (ja) | 2022-08-04 |
Family
ID=82652752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/003099 WO2022162844A1 (ja) | 2021-01-28 | 2021-01-28 | 作業推定装置、作業推定方法、及び、作業推定プログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US20230326251A1 (ja) |
JP (1) | JP7254262B2 (ja) |
CN (1) | CN116745808A (ja) |
DE (1) | DE112021006095B4 (ja) |
WO (1) | WO2022162844A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7483179B1 (ja) | 2023-06-20 | 2024-05-14 | 三菱電機株式会社 | 推定装置、学習装置、推定方法及び推定プログラム |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017187694A1 (ja) * | 2016-04-28 | 2017-11-02 | シャープ株式会社 | 注目領域画像生成装置 |
WO2017222070A1 (ja) * | 2016-06-23 | 2017-12-28 | Necソリューションイノベータ株式会社 | 作業分析装置、作業分析方法、及びコンピュータ読み取り可能な記録媒体 |
JP2019144861A (ja) * | 2018-02-21 | 2019-08-29 | 中国電力株式会社 | 安全判定装置、安全判定システム、安全判定方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6103765B2 (ja) | 2013-06-28 | 2017-03-29 | Kddi株式会社 | 行動認識装置、方法及びプログラム並びに認識器構築装置 |
JP6415026B2 (ja) | 2013-06-28 | 2018-10-31 | キヤノン株式会社 | 干渉判定装置、干渉判定方法、コンピュータプログラム |
-
2021
- 2021-01-28 DE DE112021006095.3T patent/DE112021006095B4/de active Active
- 2021-01-28 CN CN202180091309.4A patent/CN116745808A/zh active Pending
- 2021-01-28 WO PCT/JP2021/003099 patent/WO2022162844A1/ja active Application Filing
- 2021-01-28 JP JP2022577924A patent/JP7254262B2/ja active Active
-
2023
- 2023-06-16 US US18/210,948 patent/US20230326251A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017187694A1 (ja) * | 2016-04-28 | 2017-11-02 | シャープ株式会社 | 注目領域画像生成装置 |
WO2017222070A1 (ja) * | 2016-06-23 | 2017-12-28 | Necソリューションイノベータ株式会社 | 作業分析装置、作業分析方法、及びコンピュータ読み取り可能な記録媒体 |
JP2019144861A (ja) * | 2018-02-21 | 2019-08-29 | 中国電力株式会社 | 安全判定装置、安全判定システム、安全判定方法 |
Also Published As
Publication number | Publication date |
---|---|
DE112021006095B4 (de) | 2024-09-12 |
US20230326251A1 (en) | 2023-10-12 |
CN116745808A (zh) | 2023-09-12 |
JP7254262B2 (ja) | 2023-04-07 |
DE112021006095T5 (de) | 2023-08-31 |
JPWO2022162844A1 (ja) | 2022-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10497111B2 (en) | Information processing apparatus and method of selecting viewpoint for measuring object, and measuring system | |
CN114661169B (zh) | 基于视觉的触觉测量方法、装置、设备及存储介质 | |
JP6045549B2 (ja) | 感情および行動を認識するための方法およびシステム | |
KR101612605B1 (ko) | 얼굴 특징점 추출 방법 및 이를 수행하는 장치 | |
EP2579210A1 (en) | Face feature-point position correction device, face feature-point position correction method, and face feature-point position correction program | |
CN110598559B (zh) | 检测运动方向的方法、装置、计算机设备和存储介质 | |
US20210338109A1 (en) | Fatigue determination device and fatigue determination method | |
CN111488775B (zh) | 注视度判断装置及方法 | |
JP6487642B2 (ja) | 手指形状の検出方法、そのプログラム、そのプログラムの記憶媒体、及び、手指の形状を検出するシステム。 | |
JP6326847B2 (ja) | 画像処理装置、画像処理方法および画像処理プログラム | |
CN111429482A (zh) | 目标跟踪方法、装置、计算机设备和存储介质 | |
CN111126268A (zh) | 关键点检测模型训练方法、装置、电子设备及存储介质 | |
CN110717385A (zh) | 一种动态手势识别方法 | |
TW202201275A (zh) | 手部作業動作評分裝置、方法及電腦可讀取存儲介質 | |
JP5704909B2 (ja) | 注目領域検出方法、注目領域検出装置、及びプログラム | |
JP7254262B2 (ja) | 作業推定装置、作業推定方法、及び、作業推定プログラム | |
CN114360047A (zh) | 举手手势识别方法、装置、电子设备及存储介质 | |
JP6786015B1 (ja) | 動作分析システムおよび動作分析プログラム | |
JP6892155B2 (ja) | 人体部位推定装置、人体部位推定方法、及びプログラム | |
JP2019120577A (ja) | 位置推定装置、位置推定方法及び位置推定用コンピュータプログラム | |
US10755439B2 (en) | Estimation device, estimation method and storage medium | |
US10796435B2 (en) | Image processing method and image processing apparatus | |
JP2021047538A (ja) | 画像処理装置、画像処理方法、及びプログラム | |
JP2016162072A (ja) | 特徴量抽出装置 | |
CN114202804A (zh) | 行为动作识别方法、装置、处理设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21922860 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022577924 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180091309.4 Country of ref document: CN |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21922860 Country of ref document: EP Kind code of ref document: A1 |