WO2018207334A1 - Image recognition device, image recognition method, and image recognition program - Google Patents
Image recognition device, image recognition method, and image recognition program Download PDFInfo
- Publication number
- WO2018207334A1 WO2018207334A1 PCT/JP2017/017985 JP2017017985W WO2018207334A1 WO 2018207334 A1 WO2018207334 A1 WO 2018207334A1 JP 2017017985 W JP2017017985 W JP 2017017985W WO 2018207334 A1 WO2018207334 A1 WO 2018207334A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- feature
- parameter
- parameters
- attribute
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
Definitions
- the present invention relates to an image recognition device, an image recognition method, and an image recognition program for learning a sample image and recognizing the image.
- Patent Document 1 An example of an image recognition apparatus is described in Patent Document 1.
- the image input unit normalizes the position and size of the face of the target person with respect to the face image set, and smoothes the image histogram for correcting the luminance change. Thereafter, shading correction is performed to remove the influence of shadows, and image compression and normalization are performed.
- each face image included in the face image set is subjected to KL expansion (Karhunen-Loeve expansion) to obtain an eigenvalue and a coefficient, and a coefficient value is input as a feature amount.
- KL expansion Kerhunen-Loeve expansion
- Patent Document 1 uses only an image having a “face” as a recognition target to construct an eigenvector space that expresses the face-likeness, and obtains features used for learning of a neural network therefrom.
- Non-Patent Document 1 describes greedy layer training for each layer as an example of a learning method in deep learning such as a neural network.
- Non-Patent Document 2 describes an example of a neural network called an adversary autoencoder among neural networks called an autoencoder.
- FIG. 7 of the present application p. 2 disclosed in FIG. A schematic diagram of 1 is shown.
- Non-Patent Document 3 describes an example of a neural network called a denoising autoencoder.
- Patent Document 1 assumes recognition of a face image, but can be applied to any authentication target other than a face in principle. For example, as feature extraction for determining whether or not a specific product produced in a factory is normal from a product image, it is also possible to construct an eigenvector space that expresses the quality of a normal product from a set of normal product images .
- the eigenvector space that expresses the quality of normal products in this method is constructed with high accuracy, even if it is difficult to obtain an abnormal sample image related to the recognition target object, learning is performed mainly on the normal sample image. Thus, it can be accurately determined whether or not it is a recognition target.
- an image that should be determined not to be a recognition target more specifically, an image that does not have a predetermined attribute shared by the recognition target is referred to as an “abnormal sample image”.
- An image to be determined as a recognition target more specifically, an image having a predetermined attribute shared by the recognition target is referred to as a “normal sample image”.
- Patent Document 1 since the method described in Patent Document 1 constitutes the eigenvector space by a simple method, there is a problem that the determination accuracy deteriorates when the number of dimensions of the eigenvector space is small.
- an object of the present invention is to provide an image recognition apparatus, an image recognition method, and an image recognition program capable of accurately recognizing an image even in a situation where an abnormal sample image related to a recognition target is difficult to obtain.
- the image recognition apparatus uses at least one or more first images having a predetermined attribute shared by the authentication targets, and uses a second image which is an image whose unknown whether or not the attribute is present.
- Parameter calculating means for calculating one or more parameters for extracting the attribute-likeness or the feature representing the attribute-likeness, feature extracting means for extracting a feature from the input image using the parameters, and at least Determination means for determining whether or not the second image is an authentication target based on a feature extracted from the second image, and the feature extraction means is obtained by combining calculation elements including the parameter.
- An input layer in which the number of calculation elements belonging to at least one layer includes image information, the calculation elements constituting two or more layers from input to output Using small neural network than the number of calculation elements, and extracting a feature.
- the image recognition method is an image in which the information processing apparatus uses at least one or more first images having a predetermined attribute shared by the authentication target and whether or not the attribute is unknown. Calculating one or more parameters for extracting the attribute-likeness or features representing the attribute-likeness from a second image, and using the parameters to extract features from the input image; A neural network obtained by determining whether or not a second image is an authentication target based on features extracted from an image and combining calculation elements including the parameters when extracting the features.
- the neural network includes two or more layers of calculation elements from input to output, and the number of calculation elements belonging to at least one layer is smaller than the number of calculation elements of the input layer to which image information is input Characterized by using Ttowaku.
- the image recognition program uses at least one or more first images having a predetermined attribute shared by authentication objects in a computer, and the second image is an unknown image having the attribute.
- the number of calculation elements that constitute two or more layers from input to output and that belong to at least one layer is Using small neural network than the number of calculation elements of the input layer information is input, characterized in that to extract features.
- an image can be recognized with high accuracy even in a situation where an abnormal sample image related to a recognition target is difficult to obtain.
- FIG. 1 is a block diagram illustrating an example of an image recognition apparatus 10 according to the present embodiment.
- the image recognition apparatus 10 illustrated in FIG. 1 includes a feature extraction unit 11, a parameter calculation unit 12, and a determination unit 13.
- Feature extracting means 11 extracts features from an input image to be recognized (hereinafter referred to as a test image).
- the feature extraction unit 11 has one or more parameters 111 for extracting features for determining the object likeness to be determined, and performs a predetermined calculation using the parameters 111 on the test image.
- the feature is extracted by performing.
- the parameters 111 are not particularly limited as long as the calculation characteristics of the features in the feature extraction unit 11 change according to the values.
- the parameter 111 may be expressed as a feature extraction parameter.
- the feature extracted from the image by the feature extraction unit 11 may be expressed as an image feature.
- the parameter calculation means 12 calculates each value of the parameter 111 used by the feature extraction means 11 for feature extraction.
- the value of the parameter 111 is generally calculated using a set of learning images prepared in advance such as a normal sample image and an abnormal sample image, but is not limited thereto.
- the calculation method of the parameter 111 of this embodiment is mentioned later.
- the determination unit 13 determines whether the test image is a recognition target based on the image feature extracted by the feature extraction unit 11.
- the predetermined attribute is a property common to the object of interest in the image, and can be arbitrarily set.
- the predetermined attributes include “person (in the sense of a specific species in animals and plants)”, “specific individual”, and “(part of a person having a predetermined part such as eyes, nose and mouth) with respect to the subject. No.) ”and“ non-defective product ”(in the sense of non-defective product quality) for specific objects produced at the factory.
- a non-defective product of a specific object produced in a factory is a recognition target will be described as an example.
- the parameter calculation unit 12 determines the value of the parameter 111 included in the feature extraction unit 11 using an image set (normal sample image set) having a predetermined attribute, and the feature extraction unit 11 determines the determined parameter 111.
- the determination unit 13 determines the test image based on the image feature obtained from the test image. It is determined whether it has the attribute of. If it is determined that the test image has a predetermined attribute, the test image is determined to be an authentication target. If the test image is determined not to have the predetermined attribute, the test image is not an authentication target. It is determined.
- the parameter calculation unit 12 learns a function for converting the input image into a feature space that best restores the input image according to the target object (in this example, having a predetermined attribute), and obtains it as a learning result.
- the value of the parameter 111 may be determined from the obtained function.
- the feature extraction unit 11 may convert the learning image and the test image into the feature space based on the value of the parameter 111 obtained as a result of learning.
- the determination unit 13 may determine whether the object shown in the test image is a target object based on the proximity between them in the feature space converted by the feature extraction unit 11. Good.
- a suitable example of the feature extraction method in the feature extraction means 11 is a method using a neural network.
- the feature extraction unit 11 itself may be realized by a program constituting a neural network (more specifically, a processor that operates according to the program).
- a neural network is a calculation model obtained by combining calculation elements including specific parameters.
- a represents the unit number of the calculation element.
- T represents vector transposition.
- p represents the number of input signals.
- “ ⁇ ” Represents an inner product of vectors.
- f is called an activation function, and for example, a sigmoid function, a ReLU function, or the like is used.
- the parameter calculation means is learning means for learning the neural network.
- the error back propagation method uses known optimization techniques such as steepest descent so that the final output of the hierarchical neural network is as small as possible with the teacher signal. (Bias) update.
- the parameter has the property of approaching the optimum value by updating the parameter multiple times.
- the learning sample used for one calculation does not necessarily need to be all image sample pairs belonging to the learning sample set, but only some image sample pairs (partial image sample pair set) belonging to the learning sample set. May be used.
- the method of taking the partial image sample pair set may be randomly selected at each iteration. For example, when the steepest descent method is used as an optimization method, it is called a stochastic gradient method. If the calculation results of the parameters of the calculation element finally obtained in C iterations are compared with the results of two independent trials, the parameters are calculated based on different partial image sample pair sets. Generally, even if they generally match, they do not exactly match.
- the initial value of the calculation element parameter may be randomly given before the parameter calculation. In this case, the final calculation result of the calculation element parameter does not exactly match.
- the final calculation results of the parameters of a plurality of calculation elements using different optimization methods are generally different.
- the parameter calculation means 12 may obtain the parameters by a probabilistic method such as random selection of the initial value of the weight of the neural network or the learning sample to be selected during learning, or using a different optimization method. Good. Modifications that can be configured using such properties will be described later separately.
- the parameter calculation unit 12 can use a known learning method other than the above.
- the neural network includes two or more layers including an input layer for inputting image information, and the number of calculation elements in at least one layer is smaller than the number of calculation elements in the input layer.
- An example of a configured auto encoder type neural network will be described.
- the feature obtained from the image is information output from at least one layer of the auto-encoder neural network.
- FIG. 2 is an explanatory diagram showing an example of a neural network used by the feature extraction unit 11 when extracting features.
- the example neural network shown in FIG. 2 includes seven layers including an input layer and an output layer. Each layer includes one or more computing elements. Circles represent computing elements.
- the feature extraction unit 11 may use the output of the fourth layer counted from the input layer as an image feature.
- the neural network shown in this example is a type of neural network called the above-described auto encoder type, that is, a self-encoder. Although the self-encoder in a narrow sense is composed of three layers, in recent years, a configuration expanded to multiple layers has been proposed.
- the network configuration of the self-encoder is not limited to three layers, and generally has a plurality of layers, and that the number of calculation elements of at least one layer is smaller than the number of elements of the input layer. This is a configuration requirement.
- the parameter calculation unit 12 preferably uses the image set having a predetermined attribute to learn the parameter of each calculation element in the neural network as described above and determine the value. It is.
- the parameter calculation means 12 may calculate the value of the parameter 111 using a set of pairs of normal sample images as learning samples and normal sample images as teacher signals.
- a suitable example of the determination method in the determination unit 13 is to calculate a distance between an image feature obtained from an image having a predetermined attribute and an image feature obtained from a test image, and perform a test based on the distance.
- This is a method for determining whether an image has a predetermined attribute.
- an existing distance may be used.
- Euclidean distance, city area distance, and the like can be used, but the distance is not limited thereto.
- similarity instead of distance.
- the degree of similarity for example, an inner product between vectors when the feature is regarded as a vector and an angle formed by the vector can be used, but are not limited thereto. In the example described later, the Euclidean distance is used for the determination.
- the determination criterion may be reversed as compared with the case of distance, but the description is omitted because it is self-evident.
- the determination unit 13 extracts a predetermined feature amount (for example, proximity between them in the feature space) based on each feature from each of the image set having the predetermined attribute and the test image, Based on the extracted feature quantity, it may be determined whether or not the test image has a predetermined attribute.
- a predetermined feature amount for example, proximity between them in the feature space
- FIG. 3 is a flowchart showing an example of the operation of the learning step ST1 of the present embodiment.
- the value of the parameter 111 is determined mainly using a learning image having a predetermined attribute.
- the parameter calculation means 12 calculates the value of the parameter 111 using a set of learning images given in advance (step ST11).
- the parameter calculation means 12 performs learning so that, for example, when the learning image is given to the input layer of the neural network, the output layer becomes the learning image itself.
- the learning method may be a known method. For example, when the number of layers of the neural network is 3, learning may be performed using a known method such as an error back propagation method. When the number of layers is 4 or more, for example, greedy learning for each layer described in Non-Patent Document 1 can be used.
- the feature extraction unit 11 calculates an image feature for each learning image using the value of the parameter 111 calculated by the parameter calculation unit 12 (step ST12). For example, when the neural network shown in FIG. 2 is obtained by learning, the feature extraction unit 11 can use the output values of the two calculation elements in the fourth layer as image features as described above.
- FIG. 4 is a flowchart showing an example of the operation of the determination step SD1 of the present embodiment.
- the determination step SD1 an image feature is calculated from the test image based on the determined value of the feature extraction parameter to determine the test image.
- the feature extraction unit 11 extracts an image feature from the test image using the value of the parameter 111 calculated by the parameter calculation unit 12 in the learning step ST1 (step SD11).
- the determination unit 13 determines whether the test image has a predetermined attribute by comparing the image feature of the learning image obtained in step ST12 with the learning feature of the test image obtained in step SD11. (Step SD12).
- Attribute determination methods include, for example, the following methods. That is, the determination unit 13 uses the image feature of the learning image whose distance from the image feature of the test image is nth (n is an integer equal to or greater than 1), and the distance Dist_n between the image features is smaller than the real number th. In some cases (or less than th), it may be determined that the test image has a predetermined attribute.
- the values of n and th are arbitrarily determined.
- n and th according to the above determination method can be determined by the following method. For example, when n is fixed to one arbitrary value, when all learning images are used as test images while th is gradually decreased from a large value to a small value, the test image (learning image) is correctly determined. Th is obtained so that the rate of detection (detection rate) is 100%. The smallest th among such th is used. For example, when a learning image (normal sample image) having a predetermined attribute is used as a test image, a rate at which the test image (learning image) has a predetermined attribute may be obtained as a detection rate.
- the result of determining that the determination means 13 has a predetermined attribute is less likely to include those that do not have the predetermined attribute, but there is a tendency for many leaks to occur.
- the determination result of the test image including the image not having the predetermined attribute will be omitted as much as possible and that the determination result does not include an error.
- FIG. 5 is an explanatory diagram illustrating an example of determining th in the determination unit 13.
- FIG. 5B is a case where n is not fixed, that is, when both n and th are varied. It is explanatory drawing which shows the example of determination of n and th.
- FIG. 5 (a) shows the detection rate when the value of th is decreased from 0.05 to 0.05 from 0.2 to 0.05 while the value of n is fixed to 1.
- both n and th can be changed.
- a value in the vicinity of the boundary between an area indicating a set of values where the detection rate is 100% and an area indicating a set of values not being 100% may be employed.
- determination may be tried independently using a plurality of value sets, and the final determination result may be obtained by summing up the results.
- an image other than the learning image may be used as a test image for determining the values of n and th.
- only an image having no predetermined attribute may be used as a test image. In this case, it is only necessary to obtain th that achieves 100% as a detection rate that does not have a predetermined attribute. And the largest th among such th may be adopted. In this case, the determination unit 13 determines whether or not the test image has a predetermined attribute.
- n and th can also be determined using an image set in which an image having no predetermined attribute and an image having the predetermined attribute are mixed. In that case, it is desirable to have a label relating to the correct answer (whether or not it has an attribute) for each image in the image set.
- the parameter 111 is preferably learned even when the learning image is only an image having a predetermined attribute (normal sample). Therefore, even when only a small number (abnormal samples) having no predetermined attribute is available or not available at all, it is possible to determine whether the unknown sample is normal or abnormal with high accuracy.
- the function of the feature extraction means 11 for calculating the feature from the image is regarded as a mathematical function by using a neural network for learning of the feature extraction parameter, a more complicated function than the case of the principal component analysis is obtained. Can learn. For this reason, more accurate determination can be realized.
- the distance between the image feature space and the actual image dissimilarity does not necessarily match, but if the image features of the test image and the learning image are close to each other, an arbitrary distance value such as the Euclidean distance is set. Using this method to approximate the degree of difference between images is a commonly performed operation, and in principle, highly accurate attribute determination can be expected.
- FIG. 6 is a block diagram illustrating an example of the image recognition apparatus 20 according to the second embodiment.
- the image recognition apparatus 20 illustrated in FIG. 6 includes a feature extraction unit 21, a parameter calculation unit 22, and a determination unit 23.
- Feature extracting means 21 extracts features (image features) from an input image (test image) to be recognized, and calculates a distance value to be described later.
- the feature extraction unit 21 has one or more parameters 211 as in the first embodiment, and extracts image features by performing a predetermined calculation using the parameters 211 on the test image. .
- the feature extraction means 21 of the present embodiment extracts features using a neural network called a hostile self-encoder as shown in Non-Patent Document 2 among the auto encoder type neural networks.
- the feature of the hostile self-encoder is that it can perform learning (that is, calculation of the parameter 211) according to a distribution specified in advance, such as an m-dimensional normal distribution (m is an integer of 1 or more) and an m-dimensional mixed normal distribution. It is. Therefore, if a hostile self-encoder is used, it is possible to obtain image features according to a distribution designated in advance, such as an m-dimensional normal distribution (m is an integer of 1 or more) and an m-dimensional mixed normal distribution.
- FIG. 7 is an explanatory diagram schematically showing a configuration example of a hostile self-encoder disclosed in Non-Patent Document 2.
- the feature extraction means 21 may extract features using, for example, a hostile self-encoder as shown in FIG.
- p (z) represents a positive sample.
- Q (Z) represents a negative sample.
- “Adversary cost” in the figure is a cost for distinguishing a negative sample from a positive sample.
- the upper part of FIG. 7 corresponds to a self-encoder, and the lower part corresponds to an identifying network described later.
- a calculation element parameter calculation (learning) method of the neural network having such a configuration includes a reconstruction phase and a regularization phase.
- the reconstruction phase for example, when a learning sample is input, the self-encoding inherent in the hostile self-encoder so that the output becomes the learning sample itself so that an output that reconstructs the input image can be obtained. Learn the vessel.
- the discriminative network inherent in hostile self-coding (a network that identifies whether the input sample is a sample resulting from a commanded distribution or a sample generated by a self-encoder) To learn the identification network.
- the parameter calculation means 22 obtains each value of the parameter 211 used by the feature extraction means 21 for feature extraction by the calculation (learning) method proposed for the hostile self-encoder described above. Details of the configuration of the hostile self-encoder and the parameter calculation method are described in Non-Patent Document 2 above.
- the determination means 23 can use the determination means 13 of the first embodiment, but in the following, the characteristics of the image features extracted by the feature extraction means 21, that is, the predetermined distribution used by the hostile self-encoder for learning. A determination method using the property of will be described.
- a distance such as an Euclidean distance or a Mahalanobis distance can be calculated between a point on the space and an average vector.
- the determination unit 23 is based on the distance between the point in the m-dimensional space where the image feature calculated for the test image exists and the point in the m-dimensional space that is the average vector of the image feature calculated for the learning image.
- it may be determined whether the test image has a predetermined attribute.
- a value of a variance covariance matrix can be used.
- the probability can be calculated for an area in an m-dimensional space, so a statistical test with a null hypothesis that a test image has a predetermined attribute is performed. It becomes possible to do.
- the determination unit 23 uses parameters in the predetermined distribution (parameters that determine the shape of the distribution such as an average vector of an m-dimensional normal distribution or a variance-covariance matrix).
- the presence / absence of an attribute may be determined using any calculated index.
- the determination unit 23 may determine the presence / absence of an attribute with respect to the distance by applying a th determination method when n is fixed to an arbitrary value. Further, when the determination unit 13 of the first embodiment is used as the determination unit 23, the determination unit 23 assumes that the learning algorithm is probabilistic, for example, a plurality of determination results for one test image. Alternatively, a plurality of indices may be obtained, and the test image may be determined based on the obtained determinations or the plurality of indices. Note that the method of obtaining a plurality of determination results or a plurality of indices may calculate a parameter each time and perform a plurality of determinations, or include a plurality of pairs of parameter calculation means and feature extraction means, and test each of them.
- An image may be input to perform feature extraction, and a plurality of determination results or a plurality of indexes may be obtained based on the result. It is also possible to provide a plurality of image recognition means for performing from parameter calculation to feature extraction and determination, and obtaining a determination result by inputting a test image for each. The same applies to other embodiments.
- the determination means 23 determines the presence or absence of an attribute using one or more of these methods.
- the operation of this embodiment is roughly divided into a learning step ST2 and a determination step SD2. Also in the present embodiment, the learning step ST2 is performed prior to the determination step SD2.
- FIG. 8 is a flowchart showing an example of the operation of the learning step ST2 of the present embodiment.
- the value of the parameter 211 is determined mainly using a learning image having a predetermined attribute.
- the parameter calculation means 22 calculates the value of the parameter 211 using a set of learning images given in advance (step ST21).
- the feature extraction unit 21 may calculate an average vector in a predetermined distribution (for example, an m-dimensional normal distribution) as an image feature of the learning image set using the calculated value of the parameter 211.
- FIG. 9 is a flowchart showing an example of the operation of the determination step SD2 of the present embodiment.
- the distance from the average vector of the test image is calculated based on the determined value of the parameter 211 to determine the test image.
- the feature extraction unit 21 first extracts an image feature from the test image using the value of the parameter 211 calculated by the parameter calculation unit 12 in the learning step ST2 (step SD21).
- the feature extraction unit 21 extracts image features according to the m-dimensional normal distribution, more specifically, points on the m-dimensional section.
- the determination unit 223 calculates a distance from the average image feature vector of the learning image from the image feature of the test image obtained in step SD221, and when the distance is smaller than a predetermined value, the test image Is determined to have a predetermined attribute (step SD22).
- the feature extraction parameter is calculated so that the image feature calculated by the feature extraction unit 21 follows a probability distribution specified in advance. Then, based on the distance calculated on the probability distribution for the image feature extracted from the test image using such a feature extraction parameter, a predetermined attribute likelihood for the test image is determined. For this reason, according to the present embodiment, it is possible to give predetermined attribute-likeness with a probabilistic index.
- FIG. 10 is a block diagram illustrating an example of the image recognition device 30 according to the third embodiment.
- the image recognition apparatus 30 shown in FIG. 10 includes a feature extraction unit 31, a parameter calculation unit 32, and a determination unit 33.
- Feature extracting means 31 extracts a noise component as a feature (image feature) from an input image (test image) to be recognized.
- the feature extraction unit 31 has one or more parameters 311 as in the first embodiment, and extracts image features by performing a predetermined calculation using the parameters 311 on the test image. .
- the feature extraction means 31 of the present embodiment extracts features (noise components) from the test image using a neural network called a denoising encoder among the auto encoder type neural networks.
- the denoising encoder is configured to output the original data from the input data when the input data is data that is abnormal (damaged, etc.) in part due to noise added to the original data. Is done.
- the calculation element parameters are calculated by adding the noise to the original learning sample as the learning sample for the denoising encoder, and the teacher signal as the learning sample before adding noise.
- a known method such as a back propagation method may be used. Accordingly, the denoising auto encoder can have a feature that can remove noise in the input image in addition to the feature of the auto encoder type neural network.
- the noise component removed by the denoising auto encoder corresponds to a component recognized as a noise component by learning using one or more normal sample images or a component extracted by a method similar to the component. It can be said that it is one of the features that express the lack of attributes.
- the parameter calculation unit 32 obtains each value of the parameter 311 used by the feature extraction unit 31 for feature extraction so as to remove noise from the input image. More specifically, the parameter calculation means 32 calculates
- the configuration of a neural network called a denoising encoder is described in Non-Patent Document 3 above, for example.
- the parameter calculation unit 32 calculates, for example, a feature extraction parameter so that a learning image obtained by artificially adding noise to a learning image is input and a learning image before adding noise is output as an output. I do. At this time, if it is possible to use a case where there is an abnormality in a part of an image having a predetermined attribute, the image may be learned as an input.
- the feature extraction unit 31 obtains a noise-removed image from the test image using the parameter 311 learned as described above, and obtains the difference image between the obtained noise-removed image and the test image. It may be extracted as a noise component.
- difference information between the information (test image) input to the input layer of the learned denoising auto encoder and the information (noise-removed image) output from the output layer is used as an image feature. .
- the determination unit 33 determines whether the test image has a predetermined attribute based on the noise component (difference image) obtained from the test image by the feature extraction unit 31.
- the determination unit 33 may determine the test image based on the magnitude of the difference. For example, the determination unit 33 may calculate the sum of the pixel values of the difference image and determine that the predetermined attribute has a predetermined attribute when it is equal to or less than a predetermined value, but is not limited thereto.
- the basic concept of this embodiment is to extract only the noise component by taking the difference between the input image and its noise-removed image, and if the extracted noise component is small, it is determined that it has a predetermined attribute, and vice versa. In other words, it is determined that it does not have a predetermined attribute.
- the operation of the embodiment is also roughly divided into a learning step ST3 and a determination step SD3. Also in the present embodiment, the learning step ST2 is performed prior to the determination step SD2.
- FIG. 11 is a flowchart showing an example of the operation of the learning step ST3 of the present embodiment.
- the value of the parameter 311 is determined mainly using a learning image having a predetermined attribute and a noise image corresponding to the learning image.
- the parameter calculation means 32 first calculates the value of the parameter 311 using a learning image set having a predetermined attribute given in advance and a set of noise learning images corresponding to the learning image set (step ST31). ).
- FIG. 12 is a flowchart showing an example of the operation of the determination step SD3 of the present embodiment.
- the test image is determined by taking the difference between the test image and the noise-removed image based on the determined value of the parameter 311.
- the feature extraction unit 31 extracts a noise component as a feature of the test image based on the determined value of the parameter 311 and generates a noise-removed image of the test image (step SD31).
- the determination unit 33 calculates the difference between the noise-removed image generated in step SD31 and the test image, and determines whether the test image has a predetermined attribute based on the difference (difference image) (step) SD32).
- the present embodiment is configured to extract a noise component included in the test image and determine whether or not it has a predetermined attribute. For this reason, according to the present embodiment, it is possible to perform attribute determination that is easy to understand visually.
- the final determination method does not necessarily need to be a majority decision. For example, when it is determined that a predetermined attribute is obtained at an arbitrary number of trials of one or more, it is determined that the final determination method has the predetermined attribute as a whole. May be.
- a new index may be calculated, for example, by calculating an average value or a variance value from indices obtained by a plurality of trials, and a final determination may be made based on the value.
- Modification 2 In the second embodiment, an example is described in which a probability for attribute determination is calculated using a parameter or the like in a probability distribution specified in advance, and a statistical test is configured. If such a change is also considered as a stochastic phenomenon, the probability for attribute determination can be set as the simultaneous probability of r trials, and a new statistical test can be configured.
- the learning algorithm is probabilistic, and when learning of the feature extraction parameters of the neural network fails at a low frequency (features that are effective for attribute determination cannot be obtained) as a whole It is an example which can perform a highly accurate determination.
- the processing target data is described as an image, but the processing target data is not limited to an image.
- any data can be used as long as it can be converted into a signal format that can be input by a neural network.
- data obtained by performing arbitrary image processing on an image, a combination of a plurality of types of images captured using different sensors, Data obtained by adding an audio signal, annotation information, or the like to an image may be a processing target.
- Any information propagation method on the neural network and learning method for the neural network may be used as long as they are not substantially different from the methods described in the above embodiments.
- FIG. 13 is a schematic block diagram illustrating a configuration example of a computer according to the embodiment of the present invention.
- the computer 1000 includes a CPU 1001, a main storage device 1002, an auxiliary storage device 1003, an interface 1004, a display device 1005, and an input device 1006.
- the above-described image recognition device may be mounted on the computer 1000, for example.
- the operation of each device may be stored in the auxiliary storage device 1003 in the form of a program.
- the CPU 1001 reads out the program from the auxiliary storage device 1003 and develops it in the main storage device 1002, and executes the predetermined processing in the above embodiment according to the program.
- the auxiliary storage device 1003 is an example of a tangible medium that is not temporary.
- Other examples of the non-temporary tangible medium include a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, and a semiconductor memory connected via the interface 1004.
- the computer that has received the distribution may develop the program in the main storage device 1002 and execute the predetermined processing in the above embodiment.
- the program may be for realizing a part of predetermined processing in each embodiment.
- the program may be a difference program that realizes the predetermined processing in the above-described embodiment in combination with another program already stored in the auxiliary storage device 1003.
- the interface 1004 transmits / receives information to / from other devices.
- the display device 1005 presents information to the user.
- the input device 1006 accepts input of information from the user.
- some elements of the computer 1000 may be omitted. For example, if the device does not present information to the user, the display device 1005 can be omitted.
- each device is implemented by general-purpose or dedicated circuits (Circuitry), processors, etc., or combinations thereof. These may be constituted by a single chip or may be constituted by a plurality of chips connected via a bus. Moreover, a part or all of each component of each device may be realized by a combination of the above-described circuit and the like and a program.
- each device When some or all of the constituent elements of each device are realized by a plurality of information processing devices and circuits, the plurality of information processing devices and circuits may be centrally arranged or distributedly arranged. Also good.
- the information processing apparatus, the circuit, and the like may be realized as a form in which each is connected via a communication network, such as a client and server system and a cloud computing system.
- FIG. 14 is a block diagram showing an outline of the present invention.
- An image recognition apparatus 500 illustrated in FIG. 14 includes a parameter calculation unit 501, a feature extraction unit 502, and a determination unit 503.
- the parameter calculation unit 501 uses at least one or more first images having a predetermined attribute shared by the authentication target to determine whether the attribute calculation unit 501 has the attribute.
- One or more parameters are extracted for extracting the attribute-likeness or the feature representing the attribute-likeness from a second image that is an unknown image.
- Feature extraction unit 502 extracts features from the input image using the parameters calculated by parameter calculation unit 501. More specifically, the feature extraction unit 502 is a neural network obtained by combining calculation elements including the parameters, and the calculation elements form two or more layers from input to output, and at least one layer A feature is extracted using a neural network in which the number of computing elements belonging to the number is smaller than the number of computing elements in the input layer to which image information is input.
- Determination unit 503 determines whether or not the second image is an authentication target based on at least the feature extracted from the second image.
- the present invention can be used, for example, as an inspection device that detects foreign matter or poor quality products produced in a factory. Further, the present invention can be used not only as a factory-produced product, but also as an abnormality detection device that detects an abnormality of a general object. Further, the present invention is, for example, as a part of a biometric authentication device used for a security gate or the like, as an inspection device for confirming whether an input image is actually a part (face, human body, etc.) to be authenticated. Is also available.
- the present invention provides, for example, an image recognition unit that identifies the identity of an object among a plurality of frames with respect to an object such as a face, a human body, an object, or the like shown in the video in a tracking device that tracks a specific person in the video. Can also be used.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The image recognition device according to the present invention is provided with: a parameter calculation means which, by using at least one or more first images having a predetermined attribute that is common to subjects to be authenticated, calculates one or more parameters for extracting, from a second image, a feature indicating that the second image is likely to have the predetermined attribute, or unlikely to have the predetermined attribute, wherein it is not known whether or not the second image has the predetermined attribute; a feature extraction means which extracts a feature from an input image using the one or more parameters; and a determination means which determines whether or not the second image is a subject to be authenticated, on the basis of at least said feature extracted from the second image. The feature extraction means extracts a feature using a neural network that is obtained by connecting together calculation elements having predetermined parameters, and that includes two or more layers including input and output layers, each layer comprising calculation elements, wherein at least one layer comprises fewer calculation elements than the input layer, to which image information is input.
Description
本発明は、サンプル画像を学習して画像を認識する画像認識装置、画像認識方法および画像認識プログラムに関する。
The present invention relates to an image recognition device, an image recognition method, and an image recognition program for learning a sample image and recognizing the image.
画像認識装置の一例が特許文献1に記載されている。特許文献1に記載の方法は、まず、顔画像集合に対して、画像入力手段が対象人物の顔の位置と大きさを正規化し、輝度変化補正のために画像ヒストグラムを平滑化する。その後、影の影響を除去するためにシェーディング補正を行い、画像圧縮と正規化を行う。次に、顔画像集合に含まれる顔画像の各々についてKL展開(Karhunen-Loeve展開)して固有値と係数を求めるとともに、特徴量として係数の値を入力し、顔画像か否かの判定結果が出力されるようニューラルネットワークの学習を行う。また、特許文献1に記載の方法は、未知画像に対して顔画像か否かを判定行う際には、未知画像を上述で求めたKL展開の固有ベクトル空間上で表現し、得られた値(特徴量としての係数)を上述のニューラルネットに入力することで、判定を行う。
An example of an image recognition apparatus is described in Patent Document 1. In the method described in Patent Literature 1, first, the image input unit normalizes the position and size of the face of the target person with respect to the face image set, and smoothes the image histogram for correcting the luminance change. Thereafter, shading correction is performed to remove the influence of shadows, and image compression and normalization are performed. Next, each face image included in the face image set is subjected to KL expansion (Karhunen-Loeve expansion) to obtain an eigenvalue and a coefficient, and a coefficient value is input as a feature amount. The neural network is learned so that it can be output. In the method described in Patent Document 1, when determining whether or not an unknown image is a face image, the unknown image is expressed on the eigenvector space of the KL expansion obtained above and the obtained value ( Determination is performed by inputting a coefficient as a feature amount into the above-described neural network.
特許文献1に記載の方法は、認識対象とされた「顔」を有する画像のみを用いて、顔らしさを表現する固有ベクトル空間を構成し、そこからニューラルネットワークの学習に用いる特徴を得ている。
The method described in Patent Document 1 uses only an image having a “face” as a recognition target to construct an eigenvector space that expresses the face-likeness, and obtains features used for learning of a neural network therefrom.
また、本発明に関連する技術として、非特許文献1の付録には、ニューラルネットワークのようなディープラーニングにおける学習方法の例として、層毎の貪欲学習(greedy layer training)が記載されている。また、非特許文献2には、自己符号化器(Autoencoder)と呼ばれるニューラルネットワークのうち敵対的自己符号化器(Adversarial Autoencoder)と呼ばれるニューラルネットワークの例が記載されている。なお、本願の図7にも、非特許文献2のp.2に開示されているFig.1の略図が示されている。また、非特許文献3には、デノイジングオートエンコーダ(Denoising Autoencoder)と呼ばれるニューラルネットワークの例が記載されている。
In addition, as a technique related to the present invention, the appendix of Non-Patent Document 1 describes greedy layer training for each layer as an example of a learning method in deep learning such as a neural network. Non-Patent Document 2 describes an example of a neural network called an adversary autoencoder among neural networks called an autoencoder. In FIG. 7 of the present application, p. 2 disclosed in FIG. A schematic diagram of 1 is shown. Non-Patent Document 3 describes an example of a neural network called a denoising autoencoder.
特許文献1に記載の方法は、顔画像の認識を想定しているが、原理的には顔以外の任意の認証対象に対しても適用可能である。例えば、工場で生産されるある特定の製品が正常であるかどうかを製品画像から判定するための特徴抽出として、正常品画像集合から正常品らしさを表現する固有ベクトル空間を構成することも可能である。
The method described in Patent Document 1 assumes recognition of a face image, but can be applied to any authentication target other than a face in principle. For example, as feature extraction for determining whether or not a specific product produced in a factory is normal from a product image, it is also possible to construct an eigenvector space that expresses the quality of a normal product from a set of normal product images .
当該方法における正常品らしさを表現する固有ベクトル空間が精度良く構成されれば、認識対象物体に関する異常サンプル画像が入手しづらい状況においても、主として正常サンプル画像に基づいて学習することにより、未知画像に対して認識対象であるか否かを精度良く判定できる。
If the eigenvector space that expresses the quality of normal products in this method is constructed with high accuracy, even if it is difficult to obtain an abnormal sample image related to the recognition target object, learning is performed mainly on the normal sample image. Thus, it can be accurately determined whether or not it is a recognition target.
なお、本発明では、認識対象でないと判定されるべき画像、より具体的には、認識対象が共通に有する所定の属性を有しない画像を「異常サンプル画像」という。また、認識対象であると判定されるべき画像、より具体的には、認識対象が共通に有する所定の属性を有する画像を「正常サンプル画像」をいう。
In the present invention, an image that should be determined not to be a recognition target, more specifically, an image that does not have a predetermined attribute shared by the recognition target is referred to as an “abnormal sample image”. An image to be determined as a recognition target, more specifically, an image having a predetermined attribute shared by the recognition target is referred to as a “normal sample image”.
しかし、特許文献1に記載の方法は、単純な方法で固有ベクトル空間を構成しているため、固有ベクトル空間の次元数が小さい場合に判定精度が悪くなるという問題がある。
However, since the method described in Patent Document 1 constitutes the eigenvector space by a simple method, there is a problem that the determination accuracy deteriorates when the number of dimensions of the eigenvector space is small.
例えば、特許文献1に記載の方法は、画像を固有ベクトル空間に縮退させるときに、当該画像が有する固有ベクトルのうち、固有値の大きい方から固有ベクトル空間の次元数分の固有ベクトルしか利用しない。このため、判定に利用される固有ベクトルの係数が、上記の固有ベクトルの係数に限定される。このような方法では、利用されなかった固有ベクトルに対応する固有値が比較的大きい場合に、原画像に対する情報の損失が大きくなり、判定の精度が低下する。
For example, in the method described in Patent Document 1, when an image is reduced to an eigenvector space, only eigenvectors corresponding to the number of dimensions of the eigenvector space are used from the eigenvector having the larger eigenvalue. For this reason, the coefficient of the eigenvector used for determination is limited to the coefficient of the eigenvector. In such a method, when the eigenvalue corresponding to the eigenvector that has not been used is relatively large, the loss of information with respect to the original image becomes large, and the accuracy of the determination is reduced.
本発明は、上記課題に鑑み、認識対象に関する異常サンプル画像が入手しづらい状況においても、精度良く画像を認識できる画像認識装置、画像認識方法および画像認識プログラムを提供することを目的とする。
In view of the above problems, an object of the present invention is to provide an image recognition apparatus, an image recognition method, and an image recognition program capable of accurately recognizing an image even in a situation where an abnormal sample image related to a recognition target is difficult to obtain.
本発明による画像認識装置は、認証対象が共通に有する所定の属性を有する1つ以上の第1の画像を少なくとも用いて、該属性を有するか否かが未知の画像である第2の画像から該属性らしさまたは該属性らしくなさを表す特徴を抽出するための1つ以上のパラメータを計算するパラメータ計算手段と、該パラメータを用いて、入力された画像から特徴を抽出する特徴抽出手段と、少なくとも第2の画像から抽出される特徴に基づいて、第2の画像が認証対象であるか否かを判定する判定手段とを備え、特徴抽出手段は、該パラメータを含む計算素子を結合して得られるニューラルネットワークであって、入力から出力まで計算素子が2以上の層を構成し、かつ少なくとも1つの層に属する計算素子の数が、画像の情報が入力される入力層の計算素子の数よりも小さいニューラルネットワークを用いて、特徴を抽出することを特徴とする。
The image recognition apparatus according to the present invention uses at least one or more first images having a predetermined attribute shared by the authentication targets, and uses a second image which is an image whose unknown whether or not the attribute is present. Parameter calculating means for calculating one or more parameters for extracting the attribute-likeness or the feature representing the attribute-likeness, feature extracting means for extracting a feature from the input image using the parameters, and at least Determination means for determining whether or not the second image is an authentication target based on a feature extracted from the second image, and the feature extraction means is obtained by combining calculation elements including the parameter. An input layer in which the number of calculation elements belonging to at least one layer includes image information, the calculation elements constituting two or more layers from input to output Using small neural network than the number of calculation elements, and extracting a feature.
本発明による画像認識方法は、情報処理装置が、認証対象が共通に有する所定の属性を有する1つ以上の第1の画像を少なくとも用いて、該属性を有するか否かが未知の画像である第2の画像から該属性らしさまたは該属性らしくなさを表す特徴を抽出するための1つ以上のパラメータを計算し、該パラメータを用いて、入力された画像から特徴を抽出し、少なくとも第2の画像から抽出される特徴に基づいて、第2の画像が認証対象であるか否かを判定し、特徴を抽出する際に、該パラメータを含む計算素子を結合して得られるニューラルネットワークであって、入力から出力まで計算素子が2以上の層を構成し、かつ少なくとも1つの層に属する計算素子の数が、画像の情報が入力される入力層の計算素子の数よりも小さいニューラルネットワークを用いることを特徴とする。
The image recognition method according to the present invention is an image in which the information processing apparatus uses at least one or more first images having a predetermined attribute shared by the authentication target and whether or not the attribute is unknown. Calculating one or more parameters for extracting the attribute-likeness or features representing the attribute-likeness from a second image, and using the parameters to extract features from the input image; A neural network obtained by determining whether or not a second image is an authentication target based on features extracted from an image and combining calculation elements including the parameters when extracting the features. The neural network includes two or more layers of calculation elements from input to output, and the number of calculation elements belonging to at least one layer is smaller than the number of calculation elements of the input layer to which image information is input Characterized by using Ttowaku.
本発明による画像認識プログラムは、コンピュータに、認証対象が共通に有する所定の属性を有する1つ以上の第1の画像を少なくとも用いて、該属性を有するか否かが未知の画像である第2の画像から該属性らしさまたは該属性らしくなさを表す特徴を抽出するための1つ以上のパラメータを計算するパラメータ計算処理、該パラメータを用いて、入力された画像から特徴を抽出する特徴抽出処理、および少なくとも第2の画像から抽出される特徴に基づいて、第2の画像が認証対象であるか否かを判定する判定処理を実行させ、特徴抽出処理で、該パラメータを含む計算素子を結合して得られるニューラルネットワークであって、入力から出力まで計算素子が2以上の層を構成し、かつ少なくとも1つの層に属する計算素子の数が、画像の情報が入力される入力層の計算素子の数よりも小さいニューラルネットワークを用いて、特徴を抽出させることを特徴とする。
The image recognition program according to the present invention uses at least one or more first images having a predetermined attribute shared by authentication objects in a computer, and the second image is an unknown image having the attribute. A parameter calculation process for calculating one or more parameters for extracting the attribute-likeness or the feature representing the attribute-lessness from the image, and a feature extraction process for extracting a feature from the input image using the parameter; And a determination process for determining whether or not the second image is an authentication target based on at least a feature extracted from the second image, and a calculation element including the parameter is combined in the feature extraction process In the neural network obtained in this way, the number of calculation elements that constitute two or more layers from input to output and that belong to at least one layer is Using small neural network than the number of calculation elements of the input layer information is input, characterized in that to extract features.
本発明によれば、認識対象に関する異常サンプル画像が入手しづらい状況においても、精度良く画像を認識できる。
According to the present invention, an image can be recognized with high accuracy even in a situation where an abnormal sample image related to a recognition target is difficult to obtain.
以下、本発明の実施形態を図面を参照して説明する。
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
実施形態1.
[構成の説明]
図1は、本実施形態の画像認識装置10の例を示すブロック図である。図1に示す画像認識装置10は、特徴抽出手段11と、パラメータ計算手段12と、判定手段13とを備える。Embodiment 1. FIG.
[Description of configuration]
FIG. 1 is a block diagram illustrating an example of animage recognition apparatus 10 according to the present embodiment. The image recognition apparatus 10 illustrated in FIG. 1 includes a feature extraction unit 11, a parameter calculation unit 12, and a determination unit 13.
[構成の説明]
図1は、本実施形態の画像認識装置10の例を示すブロック図である。図1に示す画像認識装置10は、特徴抽出手段11と、パラメータ計算手段12と、判定手段13とを備える。
[Description of configuration]
FIG. 1 is a block diagram illustrating an example of an
特徴抽出手段11は、認識したい入力画像(以下、テスト画像と呼ぶ)から特徴を抽出する。特徴抽出手段11は、判定対象としたオブジェクトらしさを判定するための特徴を抽出するための1つ以上のパラメータ111を有しており、テスト画像に対して該パラメータ111を用いた所定の計算を行うことにより、特徴を抽出する。ここで、パラメータ111はそれぞれ、その値によって特徴抽出手段11における特徴の計算特性が変わるものであれば、特に問わない。以下、パラメータ111を特徴抽出パラメータと表現する場合がある。また、特徴抽出手段11が画像より抽出した特徴を画像特徴と表現する場合がある。
Feature extracting means 11 extracts features from an input image to be recognized (hereinafter referred to as a test image). The feature extraction unit 11 has one or more parameters 111 for extracting features for determining the object likeness to be determined, and performs a predetermined calculation using the parameters 111 on the test image. The feature is extracted by performing. Here, the parameters 111 are not particularly limited as long as the calculation characteristics of the features in the feature extraction unit 11 change according to the values. Hereinafter, the parameter 111 may be expressed as a feature extraction parameter. In addition, the feature extracted from the image by the feature extraction unit 11 may be expressed as an image feature.
パラメータ計算手段12は、特徴抽出手段11が特徴抽出に用いるパラメータ111の各々の値を計算する。パラメータ111の値は、例えば、正常サンプル画像や異常サンプル画像といった予め用意された学習画像の集合を用いて計算するのが一般的であるが、これに限定されない。なお、本実施形態のパラメータ111の計算方法については後述する。
The parameter calculation means 12 calculates each value of the parameter 111 used by the feature extraction means 11 for feature extraction. The value of the parameter 111 is generally calculated using a set of learning images prepared in advance such as a normal sample image and an abnormal sample image, but is not limited thereto. In addition, the calculation method of the parameter 111 of this embodiment is mentioned later.
判定手段13は、特徴抽出手段11によって抽出された画像特徴に基づいて、テスト画像が認識対象であるか否かを判定する。
The determination unit 13 determines whether the test image is a recognition target based on the image feature extracted by the feature extraction unit 11.
以下、本実施形態では、所定の属性を有するオブジェクト(物体、部品、部分等)を認識対象とする場合を例に説明する。この場合、所定の属性は、画像中の注目オブジェクトに共通する性質であり、任意に設定できる。所定の属性の例としては、被写体に関して「(動植物における特定の種という意味での)人」や「特定の個人」や「(目や鼻や口といった所定の部品を有する人の部位という意味での)顔」、工場で生産される特定物体に関して「(不良品でない品質という意味での)良品」などが挙げられるが、これらに限定されない。なお、以下では、さらに、「工場で生産される特定物体の良品」を認識対象とした場合を例に説明する。
Hereinafter, in the present embodiment, an example in which an object (object, part, part, etc.) having a predetermined attribute is a recognition target will be described. In this case, the predetermined attribute is a property common to the object of interest in the image, and can be arbitrarily set. Examples of the predetermined attributes include “person (in the sense of a specific species in animals and plants)”, “specific individual”, and “(part of a person having a predetermined part such as eyes, nose and mouth) with respect to the subject. No.) ”and“ non-defective product ”(in the sense of non-defective product quality) for specific objects produced at the factory. In the following, a case where “a non-defective product of a specific object produced in a factory” is a recognition target will be described as an example.
そのような認証対象が共通に有する所定の属性に着目すると、上記の構成要素の関係は、例えば次のように説明できる。すなわち、パラメータ計算手段12が、所定の属性を持つ画像集合(正常サンプル画像集合)を用いて、特徴抽出手段11が有するパラメータ111の値を決定し、特徴抽出手段11が、決定されたパラメータ111の値に基づいてテスト画像に対して、当該画像が有する所定の属性らしさを判定するための特徴を抽出し、判定手段13が、テスト画像から得られた該画像特徴に基づいてテスト画像が所定の属性を持つかどうかを判定する。なお、テスト画像が所定の属性を持つと判定された場合、該テスト画像は認証対象であると判定され、テスト画像が所定の属性を持たないと判定された場合、該テスト画像は認証対象でないと判定される。
Focusing on a predetermined attribute that such authentication targets have in common, the relationship between the above-described constituent elements can be explained as follows, for example. That is, the parameter calculation unit 12 determines the value of the parameter 111 included in the feature extraction unit 11 using an image set (normal sample image set) having a predetermined attribute, and the feature extraction unit 11 determines the determined parameter 111. Based on the value of the test image, a feature for determining the likelihood of a predetermined attribute of the image is extracted from the test image, and the determination unit 13 determines the test image based on the image feature obtained from the test image. It is determined whether it has the attribute of. If it is determined that the test image has a predetermined attribute, the test image is determined to be an authentication target. If the test image is determined not to have the predetermined attribute, the test image is not an authentication target. It is determined.
このとき、パラメータ計算手段12は、対象となるオブジェクト(本例では、所定の属性を有するもの)に応じて、入力画像を最も良く復元する特徴空間へ変換する関数を学習し、学習結果として得られた該関数より、パラメータ111の値を決定してもよい。また、特徴抽出手段11は、学習の結果得られたパラメータ111の値に基づいて学習画像とテスト画像をそれぞれ特徴空間に変換してもよい。また、判定手段13は、特徴抽出手段11により変換された特徴空間上でのそれらの間の近さに基づいて、テスト画像に写るオブジェクトが、対象となるオブジェクトであるかどうかを認定してもよい。
At this time, the parameter calculation unit 12 learns a function for converting the input image into a feature space that best restores the input image according to the target object (in this example, having a predetermined attribute), and obtains it as a learning result. The value of the parameter 111 may be determined from the obtained function. The feature extraction unit 11 may convert the learning image and the test image into the feature space based on the value of the parameter 111 obtained as a result of learning. Further, the determination unit 13 may determine whether the object shown in the test image is a target object based on the proximity between them in the feature space converted by the feature extraction unit 11. Good.
特徴抽出手段11における特徴抽出方法の好適な一例は、ニューラルネットワークを用いる方法である。なお、特徴抽出手段11そのものが、ニューラルネットワークを構成するプログラム(より具体的には、該プログラムにしたがって動作するプロセッサ)により実現されていてもよい。
A suitable example of the feature extraction method in the feature extraction means 11 is a method using a neural network. Note that the feature extraction unit 11 itself may be realized by a program constituting a neural network (more specifically, a processor that operates according to the program).
ニューラルネットワークとは、固有のパラメータを含む計算素子を結合して得られる計算モデルである。計算素子としては、例えば、素子固有のパラメータである重みwa=(wa1,wa2, ... ,wap)Tと、バイアスbaを用いて、入力信号x=(x1,x2, ... ,xp)からf(wa・x+ba)の値を計算し出力するモデルが利用できる。ここで、aは計算素子のユニット番号を表す。Tはベクトルの転置を表す。pは入力信号の数を表す。また、“・”はベクトルの内積を表す。また、fは活性化関数と呼ばれ、例えば、シグモイド関数、ReLU関数などが利用される。
A neural network is a calculation model obtained by combining calculation elements including specific parameters. As the calculation element, for example, an input signal x = (x1, x2,..., Xp) using a weight wa = (wa1, wa2,..., Wap) T which is a parameter specific to the element and a bias ba. ) To calculate and output the value of f (wa.x + ba). Here, a represents the unit number of the calculation element. T represents vector transposition. p represents the number of input signals. “·” Represents an inner product of vectors. Further, f is called an activation function, and for example, a sigmoid function, a ReLU function, or the like is used.
ここで、ニューラルネットワークに含まれる計算素子の集合を複数の部分集合に分割し、階層を構成することも可能である。このような階層型のニューラルネットワークに含まれる計算素子のパラメータ(重み、バイアス)の計算は、あらかじめ用意した「学習サンプルおよび学習サンプルに対する教師信号のペア」(以下、学習サンプルペアと呼ぶ)の集合(以下、学習サンプル集合と呼ぶ)を用いて計算する。このパラメータの計算のことを学習と呼ぶこともある。したがって、パラメータ計算手段を、ニューラルネットワークを学習する学習手段と言うこともできる。
Here, it is also possible to divide a set of computing elements included in the neural network into a plurality of subsets to form a hierarchy. Calculation of parameters (weights, biases) of calculation elements included in such a hierarchical neural network is performed by a set of “learning samples and a pair of teacher signals for learning samples” (hereinafter referred to as learning sample pairs). (Hereinafter referred to as a learning sample set). This parameter calculation is sometimes called learning. Therefore, it can be said that the parameter calculation means is learning means for learning the neural network.
具体的なパラメータの計算方法の1つに、誤差逆伝播法がある。誤差逆伝播法は、階層型のニューラルネットワークの最終出力が教師信号との誤差ができるだけ小さくなるよう、最急降下法等の既知の最適化手法を用いて、最終出力層に近い順にパラメータ(重み、バイアス)の更新を行う方法である。原理上、パラメータの更新を複数回行うことにより、パラメータは最適な値に近づいていく性質を持つ。ここで、1回の計算に使用する学習サンプルは、必ずしも学習サンプル集合に属するすべての画像サンプルペアである必要はなく、学習サンプル集合に属する一部の画像サンプルペア(部分画像サンプルペア集合)のみを用いてもよい。
One of the specific parameter calculation methods is the error back propagation method. The error back propagation method uses known optimization techniques such as steepest descent so that the final output of the hierarchical neural network is as small as possible with the teacher signal. (Bias) update. In principle, the parameter has the property of approaching the optimum value by updating the parameter multiple times. Here, the learning sample used for one calculation does not necessarily need to be all image sample pairs belonging to the learning sample set, but only some image sample pairs (partial image sample pair set) belonging to the learning sample set. May be used.
また、部分画像サンプルペア集合を用いてパラメータの計算を行う場合は、反復の度に部分画像サンプルペア集合の取り方を無作為に選びなおしてもよい。例えば、最適化手法として最急降下法を用いる場合は、確率的勾配法(Stochastic Gradient Decent)と呼ばれる。仮にC回の反復で最終的に得られる計算素子のパラメータの計算結果について、独立に試行する2回の結果を比較すると、異なる部分画像サンプルペア集合の取り方に基づいてパラメータが計算されるため、概ね一致しても厳密には一致しないことが一般的である。また、計算素子パラメータの初期値をパラメータ計算前にランダムに与えてもよく、この場合も計算素子のパラメータの最終計算結果は厳密には一致しない。また、異なる最適化手法を用いた複数の計算素子のパラメータの最終計算結果は一般的には異なる。これらは、計算素子のパラメータ計算方法が確率的な結果を生成する一例である。このように、パラメータ計算手段12は、ニューラルネットワークの重みの初期値や学習中に選ぶ学習サンプルの選択を無作為としたり、異なる最適化手法を用いる等、確率的な方法でパラメータを求めてもよい。このような性質を用いて構成することができる変形例については、別途、後述する。なお、パラメータ計算手段12は、上記以外の公知の学習方法を用いることも可能である。
In addition, when the parameter calculation is performed using the partial image sample pair set, the method of taking the partial image sample pair set may be randomly selected at each iteration. For example, when the steepest descent method is used as an optimization method, it is called a stochastic gradient method. If the calculation results of the parameters of the calculation element finally obtained in C iterations are compared with the results of two independent trials, the parameters are calculated based on different partial image sample pair sets. Generally, even if they generally match, they do not exactly match. In addition, the initial value of the calculation element parameter may be randomly given before the parameter calculation. In this case, the final calculation result of the calculation element parameter does not exactly match. In addition, the final calculation results of the parameters of a plurality of calculation elements using different optimization methods are generally different. These are examples where the parameter calculation method of the calculation element generates a probabilistic result. As described above, the parameter calculation means 12 may obtain the parameters by a probabilistic method such as random selection of the initial value of the weight of the neural network or the learning sample to be selected during learning, or using a different optimization method. Good. Modifications that can be configured using such properties will be described later separately. The parameter calculation unit 12 can use a known learning method other than the above.
以下、ニューラルネットワークが、画像の情報を入力する入力層を含めて2つ以上の層から構成され、かつ、少なくとも1つの層の計算素子の数が入力層の計算素子の数よりも小さくなるよう構成されるオートエンコーダ型ニューラルネットワークである場合を例に説明する。この場合、画像から得られる特徴は、オートエンコーダ型ニューラルネットワークの少なくとも1つの層から出力される情報とされる。
Hereinafter, the neural network includes two or more layers including an input layer for inputting image information, and the number of calculation elements in at least one layer is smaller than the number of calculation elements in the input layer. An example of a configured auto encoder type neural network will be described. In this case, the feature obtained from the image is information output from at least one layer of the auto-encoder neural network.
図2は、特徴抽出手段11が特徴抽出の際に用いるニューラルネットワークの例を示す説明図である。図2に示す例のニューラルネットワークは、入力層と出力層とを含む7つの層から構成されている。各層は、1つ以上の計算素子を含む。なお、丸印は計算素子を表している。このようなニューラルネットワークを用いる場合、特徴抽出手段11は、入力層から数えて第4番目の層の出力を、画像特徴として使用してもよい。なお、この例に示すニューラルネットワークは、上述したオートエンコーダ型すなわち自己符号化器と呼ばれる種類のニューラルネットワークである。狭義の自己符号化器は3つの層から構成されるが、近年、多層に拡張した構成が提案されている。従って、本明細書においても、自己符号化器のネットワーク構成を3層に限定せず、一般に複数層を持つことと、少なくとも1つの層の計算素子数が入力層の素子数よりも小さいことを構成要件とする。
FIG. 2 is an explanatory diagram showing an example of a neural network used by the feature extraction unit 11 when extracting features. The example neural network shown in FIG. 2 includes seven layers including an input layer and an output layer. Each layer includes one or more computing elements. Circles represent computing elements. When such a neural network is used, the feature extraction unit 11 may use the output of the fourth layer counted from the input layer as an image feature. The neural network shown in this example is a type of neural network called the above-described auto encoder type, that is, a self-encoder. Although the self-encoder in a narrow sense is composed of three layers, in recent years, a configuration expanded to multiple layers has been proposed. Therefore, also in this specification, the network configuration of the self-encoder is not limited to three layers, and generally has a plurality of layers, and that the number of calculation elements of at least one layer is smaller than the number of elements of the input layer. This is a configuration requirement.
オートエンコーダ型ニューラルネットワークを用いる場合、図2に示すように、画像特徴として素子の数が最小となった中間層の出力値を用いるのが好適な一例であるが、他の層の出力を画像特徴として用いることも可能である。
When using an auto-encoder type neural network, as shown in FIG. 2, it is a preferable example to use the output value of the intermediate layer having the minimum number of elements as the image feature. It can also be used as a feature.
パラメータ111の値の計算に関して、パラメータ計算手段12は、所定の属性を持つ画像集合を用いて、上記のようなニューラルネットワークにおける各計算素子のパラメータの学習を行い、値を決めるのが好適な一例である。上記の例で言えば、パラメータ計算手段12は、学習サンプルとしての正常サンプル画像と教師信号としての同正常サンプル画像のペアの集合を用いて、パラメータ111の値を計算してもよい。
Regarding the calculation of the value of the parameter 111, the parameter calculation unit 12 preferably uses the image set having a predetermined attribute to learn the parameter of each calculation element in the neural network as described above and determine the value. It is. In the above example, the parameter calculation means 12 may calculate the value of the parameter 111 using a set of pairs of normal sample images as learning samples and normal sample images as teacher signals.
また、判定手段13における判定方法の好適な一例は、所定の属性を持つ画像から得られた画像特徴とテスト画像から得られた画像特徴との間の距離を計算し、その距離に基づいてテスト画像が所定の属性を持っているか否かを判定する方法である。この場合の距離としては、既存のものを用いてもよく、例えば、ユークリッド距離、市街地距離などを用いることができるが、これらに限定されない。また、距離の代わりに類似度を用いることも可能である。類似度としては、例えば、特徴をベクトルと見なした時のベクトル間の内積、ベクトルのなす角度を用いることができるが、これらに限定されない。なお、後述の例では、判定にユークリッド距離を用いる。また、類似度を用いる場合は、距離の場合と比較して判定基準が逆になる場合があるが、自明な内容なのでその説明を省略する。
A suitable example of the determination method in the determination unit 13 is to calculate a distance between an image feature obtained from an image having a predetermined attribute and an image feature obtained from a test image, and perform a test based on the distance. This is a method for determining whether an image has a predetermined attribute. As the distance in this case, an existing distance may be used. For example, Euclidean distance, city area distance, and the like can be used, but the distance is not limited thereto. It is also possible to use similarity instead of distance. As the degree of similarity, for example, an inner product between vectors when the feature is regarded as a vector and an angle formed by the vector can be used, but are not limited thereto. In the example described later, the Euclidean distance is used for the determination. Further, when using the similarity, the determination criterion may be reversed as compared with the case of distance, but the description is omitted because it is self-evident.
なお、判定手段13は、所定の属性を持つ画像集合のそれぞれとテスト画像の双方からそれぞれの特徴に基づく所定の特徴量(例えば、特徴空間上でのそれらの間の近さ)を抽出し、抽出された特徴量を基に、テスト画像が所定の属性を持っているか否かを判定すればよい。
Note that the determination unit 13 extracts a predetermined feature amount (for example, proximity between them in the feature space) based on each feature from each of the image set having the predetermined attribute and the test image, Based on the extracted feature quantity, it may be determined whether or not the test image has a predetermined attribute.
[動作の説明]
次に、本実施形態の動作を説明する。本実施形態の動作は、学習ステップST1と判定ステップSD1とに大別される。ここで、学習ステップST1は、判定ステップSD1に先立って行われるものとする。 [Description of operation]
Next, the operation of this embodiment will be described. The operation of this embodiment is roughly divided into a learning step ST1 and a determination step SD1. Here, it is assumed that the learning step ST1 is performed prior to the determination step SD1.
次に、本実施形態の動作を説明する。本実施形態の動作は、学習ステップST1と判定ステップSD1とに大別される。ここで、学習ステップST1は、判定ステップSD1に先立って行われるものとする。 [Description of operation]
Next, the operation of this embodiment will be described. The operation of this embodiment is roughly divided into a learning step ST1 and a determination step SD1. Here, it is assumed that the learning step ST1 is performed prior to the determination step SD1.
図3は、本実施形態の学習ステップST1の動作の一例を示すフローチャートである。学習ステップST1では、主に所定の属性を有する学習画像を用いてパラメータ111の値を決定する。図3に示す例では、まず、パラメータ計算手段12が、あらかじめ与えられた学習画像の集合を用いて、パラメータ111の値を計算する(ステップST11)。
FIG. 3 is a flowchart showing an example of the operation of the learning step ST1 of the present embodiment. In the learning step ST1, the value of the parameter 111 is determined mainly using a learning image having a predetermined attribute. In the example shown in FIG. 3, first, the parameter calculation means 12 calculates the value of the parameter 111 using a set of learning images given in advance (step ST11).
ステップST11でパラメータ計算手段12は、例えば、ニューラルネットワークの入力層に学習画像を与えた場合に出力層が学習画像自身になるような学習を行う。学習法は既知の方式を使えばよい。例えば、ニューラルネットワークの階層の数が3の場合は、誤差逆伝播法など既知の方法を用いて学習すればよい。階層の数が4以上の場合は、例えば、上記の非特許文献1に記載されている層毎の貪欲学習を用いることができる。
In step ST11, the parameter calculation means 12 performs learning so that, for example, when the learning image is given to the input layer of the neural network, the output layer becomes the learning image itself. The learning method may be a known method. For example, when the number of layers of the neural network is 3, learning may be performed using a known method such as an error back propagation method. When the number of layers is 4 or more, for example, greedy learning for each layer described in Non-Patent Document 1 can be used.
非特許文献1に記載の方法は、まず、入力層に近い3層(入力層(=1層目)、2層目、3層目)に対して部分的に上述したような方法で3層のニューラルネットワークの計算素子のパラメータを定める。その後、1層目のバイアスおよび1層目から2層目の結合重みを固定し、2層目、3層目、4層目に対して、同様にパラメータを求める。この手続きを最終的に3層目が出力層となるまで繰り返すと、学習が終了する。学習が終了した後、計算された計算素子パラメータの値を初期値として、入力層から出力層のすべての層に対して誤差逆伝播法などでパラメータを再計算(一般に、Fine Tuningと呼ばれる)してもよい。
In the method described in Non-Patent Document 1, first, three layers are formed in a manner partially described above with respect to three layers close to the input layer (input layer (= 1 layer), second layer, and third layer). The parameters of the neural network computing element are determined. Thereafter, the bias of the first layer and the coupling weights of the first layer to the second layer are fixed, and parameters are similarly obtained for the second layer, the third layer, and the fourth layer. When this procedure is finally repeated until the third layer becomes the output layer, the learning ends. After learning is completed, the calculated element parameters are recalculated from the input layer to all the output layers using the error back-propagation method, etc. (generally called Fine Tuning). May be.
次いで、特徴抽出手段11が、パラメータ計算手段12が計算したパラメータ111の値を用いて、学習画像毎に画像特徴を計算する(ステップST12)。学習により、例えば、図2に示したニューラルネットワークが得られた場合、特徴抽出手段11は、上述したように第4層目の2つの計算素子の出力値を画像特徴として用いることができる。
Next, the feature extraction unit 11 calculates an image feature for each learning image using the value of the parameter 111 calculated by the parameter calculation unit 12 (step ST12). For example, when the neural network shown in FIG. 2 is obtained by learning, the feature extraction unit 11 can use the output values of the two calculation elements in the fourth layer as image features as described above.
また、図4は、本実施形態の判定ステップSD1の動作の一例を示すフローチャートである。判定ステップSD1では、決定された特徴抽出パラメータの値に基づいてテスト画像から画像特徴を計算して、テスト画像を判定する。図4に示す例では、まず、特徴抽出手段11が、学習ステップST1でパラメータ計算手段12により計算されたパラメータ111の値を用いて、テスト画像から画像特徴を抽出する(ステップSD11)。
FIG. 4 is a flowchart showing an example of the operation of the determination step SD1 of the present embodiment. In the determination step SD1, an image feature is calculated from the test image based on the determined value of the feature extraction parameter to determine the test image. In the example shown in FIG. 4, first, the feature extraction unit 11 extracts an image feature from the test image using the value of the parameter 111 calculated by the parameter calculation unit 12 in the learning step ST1 (step SD11).
次いで、判定手段13が、ステップST12で得られた学習画像の画像特徴と、ステップSD11で得られたテスト画像の学習特徴とを比較することにより、テスト画像が所定の属性を持つかどうかを判定する(ステップSD12)。
Next, the determination unit 13 determines whether the test image has a predetermined attribute by comparing the image feature of the learning image obtained in step ST12 with the learning feature of the test image obtained in step SD11. (Step SD12).
属性の判定方法には、例えば次の方法がある。すなわち、判定手段13は、テスト画像の画像特徴との距離がn番目(nは1以上の整数)に小さい学習画像の画像特徴を用いて、それら画像特徴間の距離Dist_nが実数thよりも小さい場合(あるいはth以下の場合)に、テスト画像が所定の属性を持つと判断してもよい。ここで、nおよびthの値は任意に定められる。なお、n=1の場合、上記方法は、最近傍の学習画像による判定に相当する。
Attribute determination methods include, for example, the following methods. That is, the determination unit 13 uses the image feature of the learning image whose distance from the image feature of the test image is nth (n is an integer equal to or greater than 1), and the distance Dist_n between the image features is smaller than the real number th. In some cases (or less than th), it may be determined that the test image has a predetermined attribute. Here, the values of n and th are arbitrarily determined. When n = 1, the above method corresponds to determination using the nearest learning image.
また、上記判定方法によるnとthを、次のような方法で定めることもできる。例えばnを任意の1つの値に固定する場合、thを大きな値から小さな値に徐々に減少させつつ、学習画像のすべてをテスト画像として使用したときに、当該テスト画像(学習画像)が正しく判定される率(検出率)が100%となるthを求める。そして、そのようなthのうち、最も小さなthを使う。例えば、所定の属性を有する学習画像(正常サンプル画像)をテスト画像に用いる場合には、検出率として、当該テスト画像(学習画像)が所定の属性を持つとされる率を求めればよい。
Also, n and th according to the above determination method can be determined by the following method. For example, when n is fixed to one arbitrary value, when all learning images are used as test images while th is gradually decreased from a large value to a small value, the test image (learning image) is correctly determined. Th is obtained so that the rate of detection (detection rate) is 100%. The smallest th among such th is used. For example, when a learning image (normal sample image) having a predetermined attribute is used as a test image, a rate at which the test image (learning image) has a predetermined attribute may be obtained as a detection rate.
上記方法において、thを小さくすると、判定手段13が所定の属性を持つと判断した結果の中に所定の属性を持たないものが含まれることが少なくなる一方で、漏れが多く生じる傾向がある。検出率100%を満たすthの中で最も小さいthを用いることにより、所定の属性を持たない画像を含むテスト画像の判定結果が、できるだけ漏れなく、判定結果に誤りが含まれなくなることが期待される。
In the above method, if th is made small, the result of determining that the determination means 13 has a predetermined attribute is less likely to include those that do not have the predetermined attribute, but there is a tendency for many leaks to occur. By using the smallest th among ths that satisfy the detection rate of 100%, it is expected that the determination result of the test image including the image not having the predetermined attribute will be omitted as much as possible and that the determination result does not include an error. The
図5は、判定手段13におけるthの決定例を示す説明図である。図5(a)は、n=1に固定した場合のthの決定例を示す説明図であり、図5(b)は、nが固定でない場合、すなわちnとthの双方を変動させた場合のnおよびthの決定例を示す説明図である。
FIG. 5 is an explanatory diagram illustrating an example of determining th in the determination unit 13. FIG. 5A is an explanatory diagram showing an example of determining th when n = 1 is fixed, and FIG. 5B is a case where n is not fixed, that is, when both n and th are varied. It is explanatory drawing which shows the example of determination of n and th.
図5(a)には、nの値を1に固定した上で、thの値を0.2~0.05まで0.05刻みで減少させたときの検出率が示されている。本例では、検出率100%を実現するthは、0.2,0.15,0.1の3つである。したがって、そのうちの最も小さいth=0.1が採用される。
FIG. 5 (a) shows the detection rate when the value of th is decreased from 0.05 to 0.05 from 0.2 to 0.05 while the value of n is fixed to 1. In this example, th that achieves a detection rate of 100% is three, 0.2, 0.15, and 0.1. Therefore, the smallest th = 0.1 is adopted.
また、図5(b)に示すように、nとthの双方を変動させることも可能である。この場合、表において検出率が100%となる値の組を示す領域と100%とならない値の組を示す領域の境界付近の値を採用すればよい。例えば、図5(b)に示す例では、破線の枠で示す値の組、すなわち(n,th)=(1,0.1)、(51,0.15)、(101,0.15)、(151,0.2)などの値を用いればよい。また、変形例1に示すように、複数の値の組を用いて独立に判定を試行し、それらの結果を集計して最終的な判定結果を得るようにしてよい。
Also, as shown in FIG. 5B, both n and th can be changed. In this case, a value in the vicinity of the boundary between an area indicating a set of values where the detection rate is 100% and an area indicating a set of values not being 100% may be employed. For example, in the example shown in FIG. 5B, a set of values indicated by a dashed frame, that is, (n, th) = (1, 0.1), (51, 0.15), (101, 0.15). ), (151, 0.2) or the like may be used. Further, as shown in the first modification, determination may be tried independently using a plurality of value sets, and the final determination result may be obtained by summing up the results.
また、n、thの値を定めるためのテスト画像として、学習画像以外の画像を用いても構わない。また、所定の属性を持たない画像のみをテスト画像として用いてもよく、この場合、検出率として所定の属性を持たないと判定する率が100%を実現するthを求めればよい。そして、そのようなthのうち最も大きなthを採用してもよい。この場合、判定手段13は、テスト画像が所定の属性を持たないかどうかを判定する。
Also, an image other than the learning image may be used as a test image for determining the values of n and th. In addition, only an image having no predetermined attribute may be used as a test image. In this case, it is only necessary to obtain th that achieves 100% as a detection rate that does not have a predetermined attribute. And the largest th among such th may be adopted. In this case, the determination unit 13 determines whether or not the test image has a predetermined attribute.
当然、所定の属性を持たない画像と持つ画像が混在した画像集合を用いてnおよびthの値を定めることもできる。その場合、画像集合中の各画像に対して正解(属性を持つか否か)に関するラベルを持っていることが望ましい。
Of course, the values of n and th can also be determined using an image set in which an image having no predetermined attribute and an image having the predetermined attribute are mixed. In that case, it is desirable to have a label relating to the correct answer (whether or not it has an attribute) for each image in the image set.
[効果の説明]
本実施の形態では、学習画像が所定の属性を持つ画像(正常サンプル)のみであっても、パラメータ111を好適に学習できるように構成されている。このため、所定の属性を持たない(異常サンプル)がごく少数しか入手できない、または全く入手できない場合でも、高精度に未知サンプルが正常か異常かを判定できる。 [Description of effects]
In the present embodiment, theparameter 111 is preferably learned even when the learning image is only an image having a predetermined attribute (normal sample). Therefore, even when only a small number (abnormal samples) having no predetermined attribute is available or not available at all, it is possible to determine whether the unknown sample is normal or abnormal with high accuracy.
本実施の形態では、学習画像が所定の属性を持つ画像(正常サンプル)のみであっても、パラメータ111を好適に学習できるように構成されている。このため、所定の属性を持たない(異常サンプル)がごく少数しか入手できない、または全く入手できない場合でも、高精度に未知サンプルが正常か異常かを判定できる。 [Description of effects]
In the present embodiment, the
特に、特徴抽出のパラメータの学習にニューラルネットワークを用いることにより、画像から特徴を計算する特徴抽出手段11の作用を数学的な関数と見なしたとき、主成分分析の場合よりも複雑な関数を学習することができる。このため、より高精度な判定が実現できる。一般的には、画像特徴の空間上における距離と画像の実際の相違度は必ずしも一致しないが、テスト画像と学習画像の画像特徴同士が近傍にある場合には、ユークリッド距離など任意の距離値を用いて画像同士の相違度を近似することは通常行われる操作であり、原理的に高精度な属性判定が期待できる。
In particular, when the function of the feature extraction means 11 for calculating the feature from the image is regarded as a mathematical function by using a neural network for learning of the feature extraction parameter, a more complicated function than the case of the principal component analysis is obtained. Can learn. For this reason, more accurate determination can be realized. In general, the distance between the image feature space and the actual image dissimilarity does not necessarily match, but if the image features of the test image and the learning image are close to each other, an arbitrary distance value such as the Euclidean distance is set. Using this method to approximate the degree of difference between images is a commonly performed operation, and in principle, highly accurate attribute determination can be expected.
実施形態2.
[構成の説明]
次に、本発明の第2の実施形態について説明する。なお、本実施形態の構成の一部は、第1の実施の形態と同一であるため、以下では主に異なる構成要素について説明する。図6は、第2の実施形態の画像認識装置20の例を示すブロック図である。図6に示す画像認識装置20は、特徴抽出手段21と、パラメータ計算手段22と、判定手段23とを備える。 Embodiment 2. FIG.
[Description of configuration]
Next, a second embodiment of the present invention will be described. Since a part of the configuration of the present embodiment is the same as that of the first embodiment, different components will be mainly described below. FIG. 6 is a block diagram illustrating an example of theimage recognition apparatus 20 according to the second embodiment. The image recognition apparatus 20 illustrated in FIG. 6 includes a feature extraction unit 21, a parameter calculation unit 22, and a determination unit 23.
[構成の説明]
次に、本発明の第2の実施形態について説明する。なお、本実施形態の構成の一部は、第1の実施の形態と同一であるため、以下では主に異なる構成要素について説明する。図6は、第2の実施形態の画像認識装置20の例を示すブロック図である。図6に示す画像認識装置20は、特徴抽出手段21と、パラメータ計算手段22と、判定手段23とを備える。 Embodiment 2. FIG.
[Description of configuration]
Next, a second embodiment of the present invention will be described. Since a part of the configuration of the present embodiment is the same as that of the first embodiment, different components will be mainly described below. FIG. 6 is a block diagram illustrating an example of the
特徴抽出手段21は、認識したい入力画像(テスト画像)から特徴(画像特徴)を抽出して、後述する距離値を計算する。特徴抽出手段21は、第1の実施形態と同様、1つ以上のパラメータ211を有しており、テスト画像に対して該パラメータ211を用いた所定の計算を行うことにより、画像特徴を抽出する。
Feature extracting means 21 extracts features (image features) from an input image (test image) to be recognized, and calculates a distance value to be described later. The feature extraction unit 21 has one or more parameters 211 as in the first embodiment, and extracts image features by performing a predetermined calculation using the parameters 211 on the test image. .
本実施形態の特徴抽出手段21は、オートエンコーダ型ニューラルネットワークのうち、さらに非特許文献2に示されているような敵対的自己符号化器と呼ばれるニューラルネットワークを用いて、特徴を抽出する。
The feature extraction means 21 of the present embodiment extracts features using a neural network called a hostile self-encoder as shown in Non-Patent Document 2 among the auto encoder type neural networks.
敵対的自己符号化器の特徴は、m次元正規分布(mは1以上の整数)やm次元混合正規分布など、あらかじめ指定された分布に従うような学習(すなわち、パラメータ211の計算)を行えることである。したがって、敵対的自己符号化器を用いれば、m次元正規分布(mは1以上の整数)やm次元混合正規分布などあらかじめ指定された分布に従う画像特徴を得ることができる。
The feature of the hostile self-encoder is that it can perform learning (that is, calculation of the parameter 211) according to a distribution specified in advance, such as an m-dimensional normal distribution (m is an integer of 1 or more) and an m-dimensional mixed normal distribution. It is. Therefore, if a hostile self-encoder is used, it is possible to obtain image features according to a distribution designated in advance, such as an m-dimensional normal distribution (m is an integer of 1 or more) and an m-dimensional mixed normal distribution.
図7は、非特許文献2に開示されている敵対的自己符号化器の構成例を模式的に示す説明図である。特徴抽出手段21は、例えば、図7に示されるような敵対的自己符号化器を用いて、特徴を抽出してもよい。図7において、p(z)はポジティブサンプルを表す。また、q(Z)はネガティブサンプルを表す。また、図中の“Adversarial cost”は、ポジティブサンプルからネガティブサンプルを区別するためのコストである。図7の上段が自己符号化器に相当し、下段が後述する識別的ネットワークに相当する。
FIG. 7 is an explanatory diagram schematically showing a configuration example of a hostile self-encoder disclosed in Non-Patent Document 2. The feature extraction means 21 may extract features using, for example, a hostile self-encoder as shown in FIG. In FIG. 7, p (z) represents a positive sample. Q (Z) represents a negative sample. In addition, “Adversary cost” in the figure is a cost for distinguishing a negative sample from a positive sample. The upper part of FIG. 7 corresponds to a self-encoder, and the lower part corresponds to an identifying network described later.
このような構成のニューラルネットワークの計算素子パラメータの計算(学習)方法は、reconstructionフェーズとregularizationフェーズとから構成される。reconstructionフェーズでは、入力画像を再構成するような出力が得られるよう、例えば、学習サンプルを入力した場合に、出力が学習サンプルそのものとなるように、敵対的自己符号化器に内在する自己符号化器の学習を行う。regularizationフェーズでは、敵対的自己符号化に内在する識別的ネットワーク(入力されたサンプルが、指令された分布から生じたサンプルか、自己符号化器により生成されたサンプルかを識別するネットワーク)を混乱させるように識別ネットワークを学習する。
A calculation element parameter calculation (learning) method of the neural network having such a configuration includes a reconstruction phase and a regularization phase. In the reconstruction phase, for example, when a learning sample is input, the self-encoding inherent in the hostile self-encoder so that the output becomes the learning sample itself so that an output that reconstructs the input image can be obtained. Learn the vessel. In the regularization phase, the discriminative network inherent in hostile self-coding (a network that identifies whether the input sample is a sample resulting from a commanded distribution or a sample generated by a self-encoder) To learn the identification network.
パラメータ計算手段22は、特徴抽出手段21が特徴抽出に用いるパラメータ211の各々の値を、上記のような敵対的自己符号化器のために提案された計算(学習)方式で求める。敵対的自己符号化器の構成およびパラメータの計算方法の詳細は、上記の非特許文献2に記載されている。
The parameter calculation means 22 obtains each value of the parameter 211 used by the feature extraction means 21 for feature extraction by the calculation (learning) method proposed for the hostile self-encoder described above. Details of the configuration of the hostile self-encoder and the parameter calculation method are described in Non-Patent Document 2 above.
以下、敵対的自己符号化器に対してm次元正規分布を指定する場合を例に説明する。
Hereinafter, an example in which an m-dimensional normal distribution is specified for a hostile self-encoder will be described.
判定手段23は、第1の実施形態の判定手段13を用いることもできるが、以下では、特徴抽出手段21が抽出する画像特徴の性質すなわち敵対的自己符号化器が学習に用いた所定の分布の性質を用いた判定方法を説明する。
The determination means 23 can use the determination means 13 of the first embodiment, but in the following, the characteristics of the image features extracted by the feature extraction means 21, that is, the predetermined distribution used by the hostile self-encoder for learning. A determination method using the property of will be described.
例えば、画像特徴がm次元正規分布に従う場合、その空間上の点と平均ベクトルとの間で、ユークリッド距離やマハラノビス距離といった距離を計算することができる。判定手段23は、テスト画像に対して計算された画像特徴が存在するm次元空間の点と学習画像に対して計算された画像特徴の平均ベクトルとされるm次元空間の点との距離に基づいて、テスト画像が所定の属性を持つかどうかを判定してもよい。なお、平均ベクトル以外にも、例えば、分散共分散行列の値を用いることも可能である。また、距離だけでなく、平均ベクトルや分散共分散行列の値から求められる任意の値(指標)を用いることも可能である。このようにして、属性判定のための確率、より具体的には、画像特徴の集合が従うと仮定した確率分布に対して、注目する画像特徴がどのくらいの確率で発生するかを示す値を求めてもよい。
For example, when an image feature follows an m-dimensional normal distribution, a distance such as an Euclidean distance or a Mahalanobis distance can be calculated between a point on the space and an average vector. The determination unit 23 is based on the distance between the point in the m-dimensional space where the image feature calculated for the test image exists and the point in the m-dimensional space that is the average vector of the image feature calculated for the learning image. Thus, it may be determined whether the test image has a predetermined attribute. In addition to the average vector, for example, a value of a variance covariance matrix can be used. In addition to the distance, it is also possible to use an arbitrary value (index) obtained from the value of the average vector or the variance-covariance matrix. In this way, a value indicating the probability of occurrence of the image feature of interest with respect to the probability for attribute determination, more specifically, the probability distribution assumed to follow the set of image features is obtained. May be.
特に、m次元正規分布においては、m次元空間上の領域に対して、その確率を計算することが可能であるから、テスト画像が所定の属性を持つことを帰無仮説とする統計的検定を行うことが可能になる。
In particular, in an m-dimensional normal distribution, the probability can be calculated for an area in an m-dimensional space, so a statistical test with a null hypothesis that a test image has a predetermined attribute is performed. It becomes possible to do.
このように、判定手段23は、画像特徴が所定の分布に従う場合に、所定の分布におけるパラメータ(m次元正規分布の平均ベクトルや分散共分散行列のように分布の形状を定めるパラメータ)を用いて計算される任意の指標を用いて、属性の有無を判定してもよい。
As described above, when the image feature follows a predetermined distribution, the determination unit 23 uses parameters in the predetermined distribution (parameters that determine the shape of the distribution such as an average vector of an m-dimensional normal distribution or a variance-covariance matrix). The presence / absence of an attribute may be determined using any calculated index.
なお、判定手段23は、本実施形態においても、nを任意の1つの値に固定する場合のthの決定方法を適用して、距離に対する属性の有無を判定してもよい。また、判定手段23として第1の実施形態の判定手段13を用いる場合、判定手段23は、例えば、学習アルゴリズムが確率的なものであると仮定して、1つのテスト画像について、複数の判定結果または複数の指標を得て、得られた複数の判定または複数の指標に基づいて、テスト画像を判定してもよい。なお、複数の判定結果または複数の指標を得る方法は、都度パラメータを計算させて複数回の判定を行ってもよいし、パラメータ計算手段と特徴抽出手段のペアを複数備え、それぞれに対してテスト画像を入力して特徴抽出を行わせ、その結果に基づき複数の判定結果または複数の指標を得てもよい。なお、パラメータの計算から特徴抽出および判定までを行う画像認識手段を複数備え、それぞれに対してテスト画像を入力して判定結果を得ることも可能である。なお、他の実施形態においても同様である。
In the present embodiment, the determination unit 23 may determine the presence / absence of an attribute with respect to the distance by applying a th determination method when n is fixed to an arbitrary value. Further, when the determination unit 13 of the first embodiment is used as the determination unit 23, the determination unit 23 assumes that the learning algorithm is probabilistic, for example, a plurality of determination results for one test image. Alternatively, a plurality of indices may be obtained, and the test image may be determined based on the obtained determinations or the plurality of indices. Note that the method of obtaining a plurality of determination results or a plurality of indices may calculate a parameter each time and perform a plurality of determinations, or include a plurality of pairs of parameter calculation means and feature extraction means, and test each of them. An image may be input to perform feature extraction, and a plurality of determination results or a plurality of indexes may be obtained based on the result. It is also possible to provide a plurality of image recognition means for performing from parameter calculation to feature extraction and determination, and obtaining a determination result by inputting a test image for each. The same applies to other embodiments.
判定手段23は、これらのうち1つ以上の方法を用いて属性の有無を判定する。
The determination means 23 determines the presence or absence of an attribute using one or more of these methods.
[動作の説明]
次に、本実施形態の動作を説明する。本実施形態の動作は、学習ステップST2と判定ステップSD2とに大別される。本実施形態においても、学習ステップST2は、判定ステップSD2に先だって行われるものとする。 [Description of operation]
Next, the operation of this embodiment will be described. The operation of this embodiment is roughly divided into a learning step ST2 and a determination step SD2. Also in the present embodiment, the learning step ST2 is performed prior to the determination step SD2.
次に、本実施形態の動作を説明する。本実施形態の動作は、学習ステップST2と判定ステップSD2とに大別される。本実施形態においても、学習ステップST2は、判定ステップSD2に先だって行われるものとする。 [Description of operation]
Next, the operation of this embodiment will be described. The operation of this embodiment is roughly divided into a learning step ST2 and a determination step SD2. Also in the present embodiment, the learning step ST2 is performed prior to the determination step SD2.
図8は、本実施形態の学習ステップST2の動作の一例を示すフローチャートである。学習ステップST2では、主に所定の属性を有する学習画像を用いてパラメータ211の値を決定する。図8に示す例では、パラメータ計算手段22が、あらかじめ与えられた学習画像の集合を用いて、パラメータ211の値を計算する(ステップST21)。この後、特徴抽出手段21が、計算されたパラメータ211の値を用いて、学習画像集合の画像特徴として、所定の分布(例えば、m次元正規分布)における平均ベクトルを計算してもよい。
FIG. 8 is a flowchart showing an example of the operation of the learning step ST2 of the present embodiment. In the learning step ST2, the value of the parameter 211 is determined mainly using a learning image having a predetermined attribute. In the example shown in FIG. 8, the parameter calculation means 22 calculates the value of the parameter 211 using a set of learning images given in advance (step ST21). Thereafter, the feature extraction unit 21 may calculate an average vector in a predetermined distribution (for example, an m-dimensional normal distribution) as an image feature of the learning image set using the calculated value of the parameter 211.
また、図9は、本実施形態の判定ステップSD2の動作の一例を示すフローチャートである。判定ステップSD2では、決定されたパラメータ211の値に基づいて、テスト画像の平均ベクトルからの距離を計算して、テスト画像を判定する。図9に示す例では、まず、特徴抽出手段21が、学習ステップST2でパラメータ計算手段12により計算されたパラメータ211の値を用いて、テスト画像から画像特徴を抽出する(ステップSD21)。本実施形態では、特徴抽出手段21は、m次元正規分布に従う画像特徴、より具体的にはm次元区間上の点を抽出する。
FIG. 9 is a flowchart showing an example of the operation of the determination step SD2 of the present embodiment. In the determination step SD2, the distance from the average vector of the test image is calculated based on the determined value of the parameter 211 to determine the test image. In the example shown in FIG. 9, the feature extraction unit 21 first extracts an image feature from the test image using the value of the parameter 211 calculated by the parameter calculation unit 12 in the learning step ST2 (step SD21). In the present embodiment, the feature extraction unit 21 extracts image features according to the m-dimensional normal distribution, more specifically, points on the m-dimensional section.
次いで、判定手段223が、ステップSD221で求められたテスト画像の画像特徴から、学習画像の画像特徴の平均ベクトルとの距離を計算し、その距離があらかじめ定めた値よりも小さい場合に、テスト画像が所定の属性を持つと判定する(ステップSD22)。
Next, the determination unit 223 calculates a distance from the average image feature vector of the learning image from the image feature of the test image obtained in step SD221, and when the distance is smaller than a predetermined value, the test image Is determined to have a predetermined attribute (step SD22).
[効果の説明]
本実施形態では、特徴抽出手段21により計算される画像特徴があらかじめ指定した確率分布に従うように特徴抽出パラメータが計算される。そして、そのような特徴抽出パラメータを用いてテスト画像から抽出される画像特徴について該確率分布上で計算される距離に基づいて、テスト画像に対する所定の属性らしさが判定される。このため、本実施形態によれば、所定の属性らしさを確率的な指標で与えることができる。 [Description of effects]
In the present embodiment, the feature extraction parameter is calculated so that the image feature calculated by thefeature extraction unit 21 follows a probability distribution specified in advance. Then, based on the distance calculated on the probability distribution for the image feature extracted from the test image using such a feature extraction parameter, a predetermined attribute likelihood for the test image is determined. For this reason, according to the present embodiment, it is possible to give predetermined attribute-likeness with a probabilistic index.
本実施形態では、特徴抽出手段21により計算される画像特徴があらかじめ指定した確率分布に従うように特徴抽出パラメータが計算される。そして、そのような特徴抽出パラメータを用いてテスト画像から抽出される画像特徴について該確率分布上で計算される距離に基づいて、テスト画像に対する所定の属性らしさが判定される。このため、本実施形態によれば、所定の属性らしさを確率的な指標で与えることができる。 [Description of effects]
In the present embodiment, the feature extraction parameter is calculated so that the image feature calculated by the
実施形態3.
[構成の説明]
次に、本発明の第2の実施形態について説明する。なお、本実施形態の構成の一部は、第1の実施の形態と同一であるため、以下では主に異なる構成要素について説明する。図10は、第3の実施形態の画像認識装置30の例を示すブロック図である。図10に示す画像認識装置30は、特徴抽出手段31と、パラメータ計算手段32と、判定手段33とを備える。 Embodiment 3. FIG.
[Description of configuration]
Next, a second embodiment of the present invention will be described. Since a part of the configuration of the present embodiment is the same as that of the first embodiment, different components will be mainly described below. FIG. 10 is a block diagram illustrating an example of theimage recognition device 30 according to the third embodiment. The image recognition apparatus 30 shown in FIG. 10 includes a feature extraction unit 31, a parameter calculation unit 32, and a determination unit 33.
[構成の説明]
次に、本発明の第2の実施形態について説明する。なお、本実施形態の構成の一部は、第1の実施の形態と同一であるため、以下では主に異なる構成要素について説明する。図10は、第3の実施形態の画像認識装置30の例を示すブロック図である。図10に示す画像認識装置30は、特徴抽出手段31と、パラメータ計算手段32と、判定手段33とを備える。 Embodiment 3. FIG.
[Description of configuration]
Next, a second embodiment of the present invention will be described. Since a part of the configuration of the present embodiment is the same as that of the first embodiment, different components will be mainly described below. FIG. 10 is a block diagram illustrating an example of the
特徴抽出手段31は、認識したい入力画像(テスト画像)から特徴(画像特徴)としてノイズ成分を抽出する。特徴抽出手段31は、第1の実施形態と同様、1つ以上のパラメータ311を有しており、テスト画像に対して該パラメータ311を用いた所定の計算を行うことにより、画像特徴を抽出する。
Feature extracting means 31 extracts a noise component as a feature (image feature) from an input image (test image) to be recognized. The feature extraction unit 31 has one or more parameters 311 as in the first embodiment, and extracts image features by performing a predetermined calculation using the parameters 311 on the test image. .
本実施形態の特徴抽出手段31は、オートエンコーダ型ニューラルネットワークのうち、さらにデノイジングエンコーダと呼ばれるニューラルネットワークを用いて、テスト画像から特徴(ノイズ成分)を抽出する。
The feature extraction means 31 of the present embodiment extracts features (noise components) from the test image using a neural network called a denoising encoder among the auto encoder type neural networks.
デノイジングエンコーダは、入力データを、本来のデータにノイズが付加される等によりその一部に異常(損傷等)があるデータとしたときに、該入力データから本来のデータを出力するよう構成される。このデノイジングオートエンコーダにおいて、計算素子パラメータの計算は、本来の学習サンプルに対してノイズを付加したものをデノイジングエンコーダ用の学習サンプルとし、教師信号をノイズ付加前の学習サンプルとして、誤差逆伝播法などの既知の方法を用いて行えばよい。これにより、デノイジングオートエンコーダは、オートエンコーダ型ニューラルネットワークの特徴に加えて、入力画像中のノイズを除去することができる特徴をもつことができる。このとき、デノイジングオートエンコーダによって除去されるノイズ成分は、1つ以上の正常サンプル画像を用いた学習によりノイズ成分と認定された成分または該成分と同様の方法で抽出された成分に相当し、属性らしくなさを表す特徴の1つであると言える。
The denoising encoder is configured to output the original data from the input data when the input data is data that is abnormal (damaged, etc.) in part due to noise added to the original data. Is done. In this denoising auto encoder, the calculation element parameters are calculated by adding the noise to the original learning sample as the learning sample for the denoising encoder, and the teacher signal as the learning sample before adding noise. A known method such as a back propagation method may be used. Accordingly, the denoising auto encoder can have a feature that can remove noise in the input image in addition to the feature of the auto encoder type neural network. At this time, the noise component removed by the denoising auto encoder corresponds to a component recognized as a noise component by learning using one or more normal sample images or a component extracted by a method similar to the component. It can be said that it is one of the features that express the lack of attributes.
パラメータ計算手段32は、入力された画像からノイズを除去するように、特徴抽出手段31が特徴抽出に用いるパラメータ311の各々の値を求める。より具体的には、パラメータ計算手段32は、デノイジングエンコーダのために提案された計算(学習)方式で求める。なお、デノイジングエンコーダと呼ばれるニューラルネットワークの構成は、例えば、上記の非特許文献3に記載されている。
The parameter calculation unit 32 obtains each value of the parameter 311 used by the feature extraction unit 31 for feature extraction so as to remove noise from the input image. More specifically, the parameter calculation means 32 calculates | requires with the calculation (learning) method proposed for the denoising encoder. The configuration of a neural network called a denoising encoder is described in Non-Patent Document 3 above, for example.
パラメータ計算手段32は、例えば、学習画像に対して人工的にノイズを付加したノイズ学習画像を入力とし、出力としてノイズ付加前の学習画像が出力されるように、特徴抽出パラメータの計算(学習)を行う。このとき、所定の属性を持つ画像の一部に異常があるような場合が利用可能な場合は、その画像を入力として学習してもよい。
The parameter calculation unit 32 calculates, for example, a feature extraction parameter so that a learning image obtained by artificially adding noise to a learning image is input and a learning image before adding noise is output as an output. I do. At this time, if it is possible to use a case where there is an abnormality in a part of an image having a predetermined attribute, the image may be learned as an input.
特徴抽出手段31は、そのようにして学習されたパラメータ311を用いて、テスト画像からノイズ除去画像を得て、得られたノイズ除去画像とテスト画像との差分を取ることにより、該差分画像をノイズ成分として抽出してもよい。換言すると、本実施形態では、学習されたデノイジングオートエンコーダの入力層に入力した情報(テスト画像)と出力層から出力される情報(ノイズ除去画像)との差分情報を、画像特徴として用いる。
The feature extraction unit 31 obtains a noise-removed image from the test image using the parameter 311 learned as described above, and obtains the difference image between the obtained noise-removed image and the test image. It may be extracted as a noise component. In other words, in this embodiment, difference information between the information (test image) input to the input layer of the learned denoising auto encoder and the information (noise-removed image) output from the output layer is used as an image feature. .
判定手段33は、特徴抽出手段31よってテスト画像から得られるノイズ成分(差分画像)に基づいて、テスト画像が所定の属性を持つかどうかを判定する。判定手段33は、差分の大きさによって、テスト画像を判定してもよい。例えば、判定手段33は、差分画像の画素値の総和を計算し、それがあらかじめ定められる値以下だった場合に所定属性を持つと判定してもよいが、これに限られない。
The determination unit 33 determines whether the test image has a predetermined attribute based on the noise component (difference image) obtained from the test image by the feature extraction unit 31. The determination unit 33 may determine the test image based on the magnitude of the difference. For example, the determination unit 33 may calculate the sum of the pixel values of the difference image and determine that the predetermined attribute has a predetermined attribute when it is equal to or less than a predetermined value, but is not limited thereto.
本実施形態の基本的なコンセプトは、入力画像とそのノイズ除去画像の差分を取ることによりノイズ成分のみを抽出し、抽出されたノイズ成分が小さければ所定の属性を持つと判定し、逆であれば所定の属性を持たないと判定することにある。
The basic concept of this embodiment is to extract only the noise component by taking the difference between the input image and its noise-removed image, and if the extracted noise component is small, it is determined that it has a predetermined attribute, and vice versa. In other words, it is determined that it does not have a predetermined attribute.
[動作の説明]
次に、本実施形態の動作を説明する。実施形態の動作も、学習ステップST3と判定ステップSD3とに大別される。本実施形態においても、学習ステップST2は、判定ステップSD2に先だって行われるものとする。 [Description of operation]
Next, the operation of this embodiment will be described. The operation of the embodiment is also roughly divided into a learning step ST3 and a determination step SD3. Also in the present embodiment, the learning step ST2 is performed prior to the determination step SD2.
次に、本実施形態の動作を説明する。実施形態の動作も、学習ステップST3と判定ステップSD3とに大別される。本実施形態においても、学習ステップST2は、判定ステップSD2に先だって行われるものとする。 [Description of operation]
Next, the operation of this embodiment will be described. The operation of the embodiment is also roughly divided into a learning step ST3 and a determination step SD3. Also in the present embodiment, the learning step ST2 is performed prior to the determination step SD2.
図11は、本実施形態の学習ステップST3の動作の一例を示すフローチャートである。学習ステップST3では、主に所定の属性を有する学習画像およびそれに対応するノイズ画像を用いてパラメータ311の値を決定する。図11に示す例では、まず、パラメータ計算手段32が、あらかじめ与えられた所定の属性を持つ学習画像集合とそれに対応するノイズ学習画像の集合を用いて、パラメータ311の値を計算する(ステップST31)。
FIG. 11 is a flowchart showing an example of the operation of the learning step ST3 of the present embodiment. In the learning step ST3, the value of the parameter 311 is determined mainly using a learning image having a predetermined attribute and a noise image corresponding to the learning image. In the example shown in FIG. 11, the parameter calculation means 32 first calculates the value of the parameter 311 using a learning image set having a predetermined attribute given in advance and a set of noise learning images corresponding to the learning image set (step ST31). ).
また、図12は、本実施形態の判定ステップSD3の動作の一例を示すフローチャートである。判定ステップSD3では、決定されたパラメータ311の値に基づいて、テスト画像とそのノイズ除去画像との差分をとり、テスト画像を判定する。
FIG. 12 is a flowchart showing an example of the operation of the determination step SD3 of the present embodiment. In the determination step SD3, the test image is determined by taking the difference between the test image and the noise-removed image based on the determined value of the parameter 311.
図12に示す例では、まず、特徴抽出手段31が、決定されたパラメータ311の値に基づいて、テスト画像の特徴としてノイズ成分を抽出し、テスト画像のノイズ除去画像を生成する(ステップSD31)。
In the example shown in FIG. 12, first, the feature extraction unit 31 extracts a noise component as a feature of the test image based on the determined value of the parameter 311 and generates a noise-removed image of the test image (step SD31). .
次いで、判定手段33が、ステップSD31で生成されたノイズ除去画像とテスト画像との差分を計算し、該差分(差分画像)に基づいてテスト画像が所定の属性を持つかどうかを判定する(ステップSD32)。
Next, the determination unit 33 calculates the difference between the noise-removed image generated in step SD31 and the test image, and determines whether the test image has a predetermined attribute based on the difference (difference image) (step) SD32).
[効果の説明]
本実施形態では、テスト画像中に含まれるノイズ成分を抽出して所定の属性を持つかどうかを判定するように構成されている。このため、本実施形態によれば、視覚的に理解しやすい属性判定が可能である。 [Description of effects]
In the present embodiment, it is configured to extract a noise component included in the test image and determine whether or not it has a predetermined attribute. For this reason, according to the present embodiment, it is possible to perform attribute determination that is easy to understand visually.
本実施形態では、テスト画像中に含まれるノイズ成分を抽出して所定の属性を持つかどうかを判定するように構成されている。このため、本実施形態によれば、視覚的に理解しやすい属性判定が可能である。 [Description of effects]
In the present embodiment, it is configured to extract a noise component included in the test image and determine whether or not it has a predetermined attribute. For this reason, according to the present embodiment, it is possible to perform attribute determination that is easy to understand visually.
[その他の実施形態]
上記の各実施形態では、ニューラルネットワークの特徴抽出パラメータの学習に、既知のアルゴリズムを用いている。この中には、パラメータの初期値を変更したり、学習の反復回数を変えたり、学習サンプルを与える順番を変えたりすることにより、求められるパラメータが変動するアルゴリズムがある。この変動を利用して、さらに高精度な判定を行うことができる。 [Other Embodiments]
In each of the above embodiments, a known algorithm is used for learning the feature extraction parameters of the neural network. Among them, there is an algorithm in which a required parameter varies by changing the initial value of the parameter, changing the number of learning iterations, or changing the order in which learning samples are given. By using this variation, it is possible to make a more accurate determination.
上記の各実施形態では、ニューラルネットワークの特徴抽出パラメータの学習に、既知のアルゴリズムを用いている。この中には、パラメータの初期値を変更したり、学習の反復回数を変えたり、学習サンプルを与える順番を変えたりすることにより、求められるパラメータが変動するアルゴリズムがある。この変動を利用して、さらに高精度な判定を行うことができる。 [Other Embodiments]
In each of the above embodiments, a known algorithm is used for learning the feature extraction parameters of the neural network. Among them, there is an algorithm in which a required parameter varies by changing the initial value of the parameter, changing the number of learning iterations, or changing the order in which learning samples are given. By using this variation, it is possible to make a more accurate determination.
[変形例1]
例えば、上記の各実施形態において、テスト画像が所定の属性を持つか否かの判定を、独立にr回(rは2以上の整数)試行し、その多数決を取ることも可能である。例えば、r=3の場合で、2回は所定の属性を持つと判定され、1回は所定の属性を持たないと判定された場合、判定手段はこれらの判定結果を基に、最終的に所定の属性を持つと判定してもよい。 [Modification 1]
For example, in each of the above embodiments, it is possible to independently determine whether or not a test image has a predetermined attribute by r times (r is an integer of 2 or more) and take a majority vote. For example, in the case of r = 3, when it is determined twice that it has a predetermined attribute and once it is determined that it does not have a predetermined attribute, the determination means finally determines based on these determination results You may determine with having a predetermined attribute.
例えば、上記の各実施形態において、テスト画像が所定の属性を持つか否かの判定を、独立にr回(rは2以上の整数)試行し、その多数決を取ることも可能である。例えば、r=3の場合で、2回は所定の属性を持つと判定され、1回は所定の属性を持たないと判定された場合、判定手段はこれらの判定結果を基に、最終的に所定の属性を持つと判定してもよい。 [Modification 1]
For example, in each of the above embodiments, it is possible to independently determine whether or not a test image has a predetermined attribute by r times (r is an integer of 2 or more) and take a majority vote. For example, in the case of r = 3, when it is determined twice that it has a predetermined attribute and once it is determined that it does not have a predetermined attribute, the determination means finally determines based on these determination results You may determine with having a predetermined attribute.
このようにすれば、仮に、学習アルゴリズムが確率的なものであることに起因して、低頻度でニューラルネットワークの特徴抽出パラメータの学習に失敗した(属性判定に有効な特徴が求められなかった)としても、全体として高精度な判定を行うことが可能である。
In this case, learning of the feature extraction parameters of the neural network failed at a low frequency due to the fact that the learning algorithm is probabilistic (effective features for attribute determination could not be obtained). However, it is possible to perform highly accurate determination as a whole.
なお、最終的な判定方法は、必ずしも多数決でなくてもよく、例えば、1回以上の任意の試行回数で所定の属性を持つと判定された場合に、全体として所定の属性を持つと判定してもよい。また、複数回の試行で得られた指標からそれらの平均値や分散値を計算するなど新たな指標を計算し、その値に基づいて最終的な判定を行ってもよい。
Note that the final determination method does not necessarily need to be a majority decision. For example, when it is determined that a predetermined attribute is obtained at an arbitrary number of trials of one or more, it is determined that the final determination method has the predetermined attribute as a whole. May be. In addition, a new index may be calculated, for example, by calculating an average value or a variance value from indices obtained by a plurality of trials, and a final determination may be made based on the value.
[変形例2]
また、上記の第2の実施形態では、あらかじめ指定した確率分布におけるパラメータ等を用いて、属性判定のための確率を計算し、統計的検定を構成する例について述べた。このような変動も確率的な現象ととらえれば、属性判定のための確率をr回試行の同時確率とし、新たな統計的検定を構成することができる。 [Modification 2]
In the second embodiment, an example is described in which a probability for attribute determination is calculated using a parameter or the like in a probability distribution specified in advance, and a statistical test is configured. If such a change is also considered as a stochastic phenomenon, the probability for attribute determination can be set as the simultaneous probability of r trials, and a new statistical test can be configured.
また、上記の第2の実施形態では、あらかじめ指定した確率分布におけるパラメータ等を用いて、属性判定のための確率を計算し、統計的検定を構成する例について述べた。このような変動も確率的な現象ととらえれば、属性判定のための確率をr回試行の同時確率とし、新たな統計的検定を構成することができる。 [Modification 2]
In the second embodiment, an example is described in which a probability for attribute determination is calculated using a parameter or the like in a probability distribution specified in advance, and a statistical test is configured. If such a change is also considered as a stochastic phenomenon, the probability for attribute determination can be set as the simultaneous probability of r trials, and a new statistical test can be configured.
これも、学習アルゴリズムが確率的なものであることに起因して、低頻度でニューラルネットワークの特徴抽出パラメータの学習に失敗した(属性判定に有効な特徴が求められなかった)場合に、全体として高精度な判定を行うことができる一例である。
This is also due to the fact that the learning algorithm is probabilistic, and when learning of the feature extraction parameters of the neural network fails at a low frequency (features that are effective for attribute determination cannot be obtained) as a whole It is an example which can perform a highly accurate determination.
また、上記の各実施形態では、処理対象データは画像であるとして説明したが、処理対象データは画像に限定されない。すなわち、ニューラルネットワークが入力できる信号形式に変換できるデータであれば何でもよく、例えば、画像に対して任意の画像処理を施したデータや、異なるセンサを用いて撮像した複数種類の画像の組み合わせや、画像に音声信号やアノテーション情報等が付加されたデータを処理対象としてもよい。また、ニューラルネットワーク上の情報伝播法、ニューラルネットワークの学習方法は、上記の各実施形態で説明した方法から本質的に相違しない限り、任意のものを用いてもよい。
In each of the above embodiments, the processing target data is described as an image, but the processing target data is not limited to an image. In other words, any data can be used as long as it can be converted into a signal format that can be input by a neural network.For example, data obtained by performing arbitrary image processing on an image, a combination of a plurality of types of images captured using different sensors, Data obtained by adding an audio signal, annotation information, or the like to an image may be a processing target. Any information propagation method on the neural network and learning method for the neural network may be used as long as they are not substantially different from the methods described in the above embodiments.
次に、本発明の実施形態にかかるコンピュータの構成例を示す。図13は、本発明の実施形態にかかるコンピュータの構成例を示す概略ブロック図である。コンピュータ1000は、CPU1001と、主記憶装置1002と、補助記憶装置1003と、インタフェース1004と、ディスプレイ装置1005と、入力装置1006とを備える。
Next, a configuration example of a computer according to the embodiment of the present invention will be shown. FIG. 13 is a schematic block diagram illustrating a configuration example of a computer according to the embodiment of the present invention. The computer 1000 includes a CPU 1001, a main storage device 1002, an auxiliary storage device 1003, an interface 1004, a display device 1005, and an input device 1006.
上述の画像認識装置は、例えば、コンピュータ1000に実装されてもよい。その場合、各装置の動作は、プログラムの形式で補助記憶装置1003に記憶されていてもよい。CPU1001は、プログラムを補助記憶装置1003から読み出して主記憶装置1002に展開し、そのプログラムに従って上記の実施形態における所定の処理を実施する。
The above-described image recognition device may be mounted on the computer 1000, for example. In that case, the operation of each device may be stored in the auxiliary storage device 1003 in the form of a program. The CPU 1001 reads out the program from the auxiliary storage device 1003 and develops it in the main storage device 1002, and executes the predetermined processing in the above embodiment according to the program.
補助記憶装置1003は、一時的でない有形の媒体の一例である。一時的でない有形の媒体の他の例として、インタフェース1004を介して接続される磁気ディスク、光磁気ディスク、CD-ROM、DVD-ROM、半導体メモリ等が挙げられる。また、このプログラムが通信回線によってコンピュータ1000に配信される場合、配信を受けたコンピュータは1000がそのプログラムを主記憶装置1002に展開し、上記の実施形態における所定の処理を実行してもよい。
The auxiliary storage device 1003 is an example of a tangible medium that is not temporary. Other examples of the non-temporary tangible medium include a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, and a semiconductor memory connected via the interface 1004. When this program is distributed to the computer 1000 via a communication line, the computer that has received the distribution may develop the program in the main storage device 1002 and execute the predetermined processing in the above embodiment.
また、プログラムは、各実施形態における所定の処理の一部を実現するためのものであってもよい。さらに、プログラムは、補助記憶装置1003に既に記憶されている他のプログラムとの組み合わせで上記の実施形態における所定の処理を実現する差分プログラムであってもよい。
Further, the program may be for realizing a part of predetermined processing in each embodiment. Furthermore, the program may be a difference program that realizes the predetermined processing in the above-described embodiment in combination with another program already stored in the auxiliary storage device 1003.
インタフェース1004は、他の装置との間で情報の送受信を行う。また、ディスプレイ装置1005は、ユーザに情報を提示する。また、入力装置1006は、ユーザからの情報の入力を受け付ける。
The interface 1004 transmits / receives information to / from other devices. The display device 1005 presents information to the user. The input device 1006 accepts input of information from the user.
また、実施形態における処理内容によっては、コンピュータ1000の一部の要素は省略可能である。例えば、装置がユーザに情報を提示しないのであれば、ディスプレイ装置1005は省略可能である。
Further, depending on the processing contents in the embodiment, some elements of the computer 1000 may be omitted. For example, if the device does not present information to the user, the display device 1005 can be omitted.
また、各装置の各構成要素の一部または全部は、汎用または専用の回路(Circuitry)、プロセッサ等やこれらの組み合わせによって実施される。これらは単一のチップによって構成されてもよいし、バスを介して接続される複数のチップによって構成されてもよい。また、各装置の各構成要素の一部又は全部は、上述した回路等とプログラムとの組み合わせによって実現されてもよい。
Also, some or all of the components of each device are implemented by general-purpose or dedicated circuits (Circuitry), processors, etc., or combinations thereof. These may be constituted by a single chip or may be constituted by a plurality of chips connected via a bus. Moreover, a part or all of each component of each device may be realized by a combination of the above-described circuit and the like and a program.
各装置の各構成要素の一部又は全部が複数の情報処理装置や回路等により実現される場合には、複数の情報処理装置や回路等は、集中配置されてもよいし、分散配置されてもよい。例えば、情報処理装置や回路等は、クライアントアンドサーバシステム、クラウドコンピューティングシステム等、各々が通信ネットワークを介して接続される形態として実現されてもよい。
When some or all of the constituent elements of each device are realized by a plurality of information processing devices and circuits, the plurality of information processing devices and circuits may be centrally arranged or distributedly arranged. Also good. For example, the information processing apparatus, the circuit, and the like may be realized as a form in which each is connected via a communication network, such as a client and server system and a cloud computing system.
次に、本発明の概要を説明する。図14は、本発明の概要を示すブロック図である。図14に示す画像認識装置500は、パラメータ計算手段501と、特徴抽出手段502と、判定手段503とを備える。
Next, the outline of the present invention will be described. FIG. 14 is a block diagram showing an outline of the present invention. An image recognition apparatus 500 illustrated in FIG. 14 includes a parameter calculation unit 501, a feature extraction unit 502, and a determination unit 503.
パラメータ計算手段501(例えば、パラメータ計算手段12、22、32)は、認証対象が共通に有する所定の属性を有する1つ以上の第1の画像を少なくとも用いて、該属性を有するか否かが未知の画像である第2の画像から該属性らしさまたは該属性らしくなさを表す特徴を抽出するための1つ以上のパラメータを計算する。
The parameter calculation unit 501 (for example, the parameter calculation unit 12, 22, 32) uses at least one or more first images having a predetermined attribute shared by the authentication target to determine whether the attribute calculation unit 501 has the attribute. One or more parameters are extracted for extracting the attribute-likeness or the feature representing the attribute-likeness from a second image that is an unknown image.
特徴抽出手段502(例えば、特徴抽出手段11、21、31)は、パラメータ計算手段501が計算したパラメータを用いて、入力された画像から特徴を抽出する。特徴抽出手段502は、より具体的には、該パラメータを含む計算素子を結合して得られるニューラルネットワークであって、入力から出力まで計算素子が2以上の層を構成し、かつ少なくとも1つの層に属する計算素子の数が、画像の情報が入力される入力層の計算素子の数よりも小さいニューラルネットワークを用いて、特徴を抽出する。
Feature extraction unit 502 (for example, feature extraction unit 11, 21, 31) extracts features from the input image using the parameters calculated by parameter calculation unit 501. More specifically, the feature extraction unit 502 is a neural network obtained by combining calculation elements including the parameters, and the calculation elements form two or more layers from input to output, and at least one layer A feature is extracted using a neural network in which the number of computing elements belonging to the number is smaller than the number of computing elements in the input layer to which image information is input.
判定手段503(例えば、判定手段13、23、33)は、少なくとも第2の画像から抽出される特徴に基づいて、第2の画像が認証対象であるか否かを判定する。
Determination unit 503 (for example, determination unit 13, 23, 33) determines whether or not the second image is an authentication target based on at least the feature extracted from the second image.
このような構成により、認識対象に関する異常サンプル画像が入手しづらい状況においても、精度良く画像を認識できる。
With such a configuration, even when it is difficult to obtain an abnormal sample image related to the recognition target, the image can be recognized with high accuracy.
以上、実施形態及び実施例を参照して本願発明を説明したが、本願発明は上記実施形態および実施例に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。
As mentioned above, although this invention was demonstrated with reference to embodiment and an Example, this invention is not limited to the said embodiment and Example. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
本発明は、例えば、工場における生産品のうち異物が混入している物や品質不良の物を検出する検査装置として利用可能である。また、本発明は、例えば、工場生産品に限らず一般のオブジェクトの異常を検知する異常検知装置としても利用可能である。また、本発明は、例えば、セキュリティゲート等に用いられる生体認証装置の一部として、入力された画像が実際に認証の対象となるパーツ(顔、人体など)であるかを確認する検査装置としても利用可能である。また、本発明は、例えば、映像中の特定の人物を追跡する追跡装置において、映像中に映る顔、人体、物体等のオブジェクトに関して、複数フレームの間でオブジェクトの同一性を特定する画像認識手段としても利用可能である。
The present invention can be used, for example, as an inspection device that detects foreign matter or poor quality products produced in a factory. Further, the present invention can be used not only as a factory-produced product, but also as an abnormality detection device that detects an abnormality of a general object. Further, the present invention is, for example, as a part of a biometric authentication device used for a security gate or the like, as an inspection device for confirming whether an input image is actually a part (face, human body, etc.) to be authenticated. Is also available. Further, the present invention provides, for example, an image recognition unit that identifies the identity of an object among a plurality of frames with respect to an object such as a face, a human body, an object, or the like shown in the video in a tracking device that tracks a specific person in the video. Can also be used.
10、20、30 画像認識装置
11、21、31 特徴抽出手段
111、211、311 パラメータ
12、22、32 パラメータ計算手段
13、23、33 判定手段
1000 コンピュータ
1001 CPU
1002 主記憶装置
1003 補助記憶装置
1004 インタフェース
1005 ディスプレイ装置
1006 入力装置
500 画像認識装置
501 パラメータ計算手段
502 特徴抽出手段
503 判定手段 10, 20, 30 Image recognition device 11, 21, 31 Feature extraction means 111, 211, 311 Parameter 12, 22, 32 Parameter calculation means 13, 23, 33 Determination means 1000 Computer 1001 CPU
1002Main storage device 1003 Auxiliary storage device 1004 Interface 1005 Display device 1006 Input device 500 Image recognition device 501 Parameter calculation means 502 Feature extraction means 503 Determination means
11、21、31 特徴抽出手段
111、211、311 パラメータ
12、22、32 パラメータ計算手段
13、23、33 判定手段
1000 コンピュータ
1001 CPU
1002 主記憶装置
1003 補助記憶装置
1004 インタフェース
1005 ディスプレイ装置
1006 入力装置
500 画像認識装置
501 パラメータ計算手段
502 特徴抽出手段
503 判定手段 10, 20, 30
1002
Claims (10)
- 認証対象が共通に有する所定の属性を有する1つ以上の第1の画像を少なくとも用いて、前記属性を有するか否かが未知の画像である第2の画像から前記属性らしさまたは前記属性らしくなさを表す特徴を抽出するための1つ以上のパラメータを計算するパラメータ計算手段と、
前記パラメータを用いて、入力された画像から前記特徴を抽出する特徴抽出手段と、
少なくとも前記第2の画像から抽出される前記特徴に基づいて、前記第2の画像が認証対象であるか否かを判定する判定手段とを備え、
前記特徴抽出手段は、前記パラメータを含む計算素子を結合して得られるニューラルネットワークであって、入力から出力まで前記計算素子が2以上の層を構成し、かつ少なくとも1つの層に属する計算素子の数が、画像の情報が入力される入力層の計算素子の数よりも小さいニューラルネットワークを用いて、前記特徴を抽出する
ことを特徴とする画像認識装置。 Using at least one or more first images having a predetermined attribute that are common to the authentication target, whether the attribute is likely to be attributed or not likely to be attributed from a second image that is an unknown image. Parameter calculating means for calculating one or more parameters for extracting features representing
Feature extraction means for extracting the features from the input image using the parameters;
Determination means for determining whether or not the second image is an authentication target based on at least the feature extracted from the second image;
The feature extraction means is a neural network obtained by combining calculation elements including the parameters, wherein the calculation elements constitute two or more layers from input to output, and the calculation elements belonging to at least one layer An image recognition apparatus characterized in that the feature is extracted using a neural network whose number is smaller than the number of calculation elements in the input layer to which image information is inputted. - 前記特徴は、前記ニューラルネットワークの中間層から出力される情報である
請求項1記載の画像認識装置。 The image recognition apparatus according to claim 1, wherein the characteristic is information output from an intermediate layer of the neural network. - 前記特徴抽出手段は、前記パラメータを含む前記ニューラルネットワークを用いて、前記第1の画像の各々および前記第2の画像から前記特徴を抽出し、
前記判定手段は、前記第1の画像の各々から抽出された前記特徴と前記第2の画像から抽出された特徴との距離を計算し、前記距離のうち大きさを基準に並べた際に所定の順位となる距離に基づいて、前記第2の画像を判定する
請求項1または請求項2記載の画像認識装置。 The feature extraction means extracts the feature from each of the first images and the second image using the neural network including the parameters,
The determination unit calculates a distance between the feature extracted from each of the first images and a feature extracted from the second image, and is predetermined when the distances are arranged on the basis of the size. The image recognition apparatus according to claim 1, wherein the second image is determined based on a distance having a rank of. - 前記パラメータ計算手段は、抽出される前記特徴が予め指定した分布に従うように、前記パラメータを計算し、
前記判定手段は、前記分布の形状を定める第2のパラメータを用いて少なくとも前記第2画像から抽出される前記特徴に基づき計算される前記第2の画像が前記属性を有するか否かを示す指標に基づいて、前記第2の画像を判定する
請求項1または請求項2記載の画像認識装置。 The parameter calculation means calculates the parameter so that the extracted feature follows a pre-specified distribution,
The determination means indicates whether or not the second image calculated based on at least the feature extracted from the second image using the second parameter that defines the shape of the distribution has the attribute. The image recognition apparatus according to claim 1, wherein the second image is determined based on the image. - 前記パラメータ計算手段は、前記属性を有し、かつノイズを含む1つ以上の第1の画像を少なくとも用いて、前記第2の画像から、前記第1の画像においてノイズとされたノイズ成分を除去するように、前記パラメータを計算し、
前記特徴抽出手段は、前記パラメータを用いて、少なくとも前記第2の画像から前記特徴として前記ノイズ成分を抽出し、
前記判定手段は、前記ノイズ成分に基づいて、前記第2の画像を判定する
請求項1記載の画像認識装置。 The parameter calculation means removes a noise component that is regarded as noise in the first image from the second image by using at least one or more first images having the attribute and including noise. Calculate the parameters as
The feature extraction means extracts the noise component as the feature from at least the second image using the parameter,
The image recognition apparatus according to claim 1, wherein the determination unit determines the second image based on the noise component. - 前記判定手段は、前記ノイズ成分が予め定めた閾値以外である場合に、前記第2の画像を認定対象と判定する
請求項5記載の画像認識装置。 The image recognition apparatus according to claim 5, wherein the determination unit determines that the second image is a certification target when the noise component is other than a predetermined threshold value. - 前記パラメータ計算手段は、確率的な方法で前記パラメータを計算し、
前記判定手段は、1つの前記第2の画像に対して、個別に計算された前記パラメータを用いて複数回の特徴抽出を行って得られる複数の判定結果に基づいて、前記第2の画像を判定する
請求項1から請求項6のうちのいずれかに記載の画像認識装置 The parameter calculation means calculates the parameter by a probabilistic method,
The determination means determines the second image based on a plurality of determination results obtained by performing feature extraction a plurality of times using the parameters calculated individually for one second image. The image recognition device according to any one of claims 1 to 6. - 前記パラメータ計算手段は、確率的な方法で前記パラメータを計算し、
前記判定手段は、1つの前記第2の画像に対して、個別に計算された前記パラメータを用いて複数回の特徴抽出を行って得られる複数の指標に基づいて、前記第2の画像を判定する
請求項4記載の画像認識装置 The parameter calculation means calculates the parameter by a probabilistic method,
The determination means determines the second image based on a plurality of indices obtained by performing feature extraction a plurality of times using the parameters calculated individually for one second image. The image recognition apparatus according to claim 4. - 情報処理装置が、
認証対象が共通に有する所定の属性を有する1つ以上の第1の画像を少なくとも用いて、前記属性を有するか否かが未知の画像である第2の画像から前記属性らしさまたは前記属性らしくなさを表す特徴を抽出するための1つ以上のパラメータを計算し、
前記パラメータを用いて、入力された画像から前記特徴を抽出し、
少なくとも前記第2の画像から抽出される前記特徴に基づいて、前記第2の画像が認証対象であるか否かを判定し、
前記特徴を抽出する際に、前記パラメータを含む計算素子を結合して得られるニューラルネットワークであって、入力から出力まで前記計算素子が2以上の層を構成し、かつ少なくとも1つの層に属する計算素子の数が、画像の情報が入力される入力層の計算素子の数よりも小さいニューラルネットワークを用いる
ことを特徴とする画像認識方法。 Information processing device
Using at least one or more first images having a predetermined attribute shared by the authentication object, the attribute-likeness or the attribute-likeness from the second image, which is an image whose unknown or not, is unknown. Calculate one or more parameters for extracting features representing
Extracting the features from the input image using the parameters;
Determining whether or not the second image is an authentication target based on at least the feature extracted from the second image;
A neural network obtained by combining calculation elements including the parameters when extracting the features, wherein the calculation elements constitute two or more layers from input to output and belong to at least one layer An image recognition method using a neural network in which the number of elements is smaller than the number of calculation elements in an input layer to which image information is input. - コンピュータに、
認証対象が共通に有する所定の属性を有する1つ以上の第1の画像を少なくとも用いて、前記属性を有するか否かが未知の画像である第2の画像から前記属性らしさまたは前記属性らしくなさを表す特徴を抽出するための1つ以上のパラメータを計算するパラメータ計算処理、
前記パラメータを用いて、入力された画像から前記特徴を抽出する特徴抽出処理、および
少なくとも前記第2の画像から抽出される前記特徴に基づいて、前記第2の画像が認証対象であるか否かを判定する判定処理を実行させ、
前記特徴抽出処理で、前記パラメータを含む計算素子を結合して得られるニューラルネットワークであって、入力から出力まで前記計算素子が2以上の層を構成し、かつ少なくとも1つの層に属する計算素子の数が、画像の情報が入力される入力層の計算素子の数よりも小さいニューラルネットワークを用いて、前記特徴を抽出させる
ことを特徴とする画像認識プログラム。 On the computer,
Using at least one or more first images having a predetermined attribute shared by the authentication object, the attribute-likeness or the attribute-likeness from the second image, which is an image whose unknown or not, is unknown. A parameter calculation process for calculating one or more parameters for extracting features representing
Whether the second image is an authentication target based on a feature extraction process for extracting the feature from the input image using the parameter, and at least the feature extracted from the second image To execute the determination process to determine
A neural network obtained by combining calculation elements including the parameters in the feature extraction process, wherein the calculation elements constitute two or more layers from input to output, and the calculation elements belonging to at least one layer An image recognition program characterized in that the feature is extracted by using a neural network whose number is smaller than the number of calculation elements in the input layer to which image information is inputted.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019516839A JP6798614B2 (en) | 2017-05-12 | 2017-05-12 | Image recognition device, image recognition method and image recognition program |
PCT/JP2017/017985 WO2018207334A1 (en) | 2017-05-12 | 2017-05-12 | Image recognition device, image recognition method, and image recognition program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2017/017985 WO2018207334A1 (en) | 2017-05-12 | 2017-05-12 | Image recognition device, image recognition method, and image recognition program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018207334A1 true WO2018207334A1 (en) | 2018-11-15 |
Family
ID=64105173
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2017/017985 WO2018207334A1 (en) | 2017-05-12 | 2017-05-12 | Image recognition device, image recognition method, and image recognition program |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6798614B2 (en) |
WO (1) | WO2018207334A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020144735A (en) * | 2019-03-08 | 2020-09-10 | 富士ゼロックス株式会社 | Image processing device and program |
WO2020183936A1 (en) * | 2019-03-12 | 2020-09-17 | 日本電気株式会社 | Inspection device, inspection method, and storage medium |
JPWO2020241074A1 (en) * | 2019-05-30 | 2020-12-03 | ||
CN112580794A (en) * | 2019-09-29 | 2021-03-30 | 佳能株式会社 | Attribute recognition device, method and system and neural network for recognizing object attributes |
CN113678143A (en) * | 2019-02-20 | 2021-11-19 | 蓝岩治疗有限公司 | Detection of target cells in large image datasets using artificial intelligence |
WO2022153480A1 (en) * | 2021-01-15 | 2022-07-21 | 日本電気株式会社 | Information processing device, information processing system, information processing method, and recording medium |
TWI775039B (en) * | 2020-01-21 | 2022-08-21 | 群邁通訊股份有限公司 | Method and device for removing document shadow |
TWI775038B (en) * | 2020-01-21 | 2022-08-21 | 群邁通訊股份有限公司 | Method and device for recognizing character and storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7021507B2 (en) * | 2017-11-14 | 2022-02-17 | 富士通株式会社 | Feature extraction device, feature extraction program, and feature extraction method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09161054A (en) * | 1995-12-13 | 1997-06-20 | Nec Corp | Fingerprint sorting device |
JP2014203135A (en) * | 2013-04-01 | 2014-10-27 | キヤノン株式会社 | Signal processor, signal processing method, and signal processing system |
WO2015008567A1 (en) * | 2013-07-18 | 2015-01-22 | Necソリューションイノベータ株式会社 | Facial impression estimation method, device, and program |
JP2017004350A (en) * | 2015-06-12 | 2017-01-05 | 株式会社リコー | Image processing system, image processing method and program |
-
2017
- 2017-05-12 JP JP2019516839A patent/JP6798614B2/en active Active
- 2017-05-12 WO PCT/JP2017/017985 patent/WO2018207334A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09161054A (en) * | 1995-12-13 | 1997-06-20 | Nec Corp | Fingerprint sorting device |
JP2014203135A (en) * | 2013-04-01 | 2014-10-27 | キヤノン株式会社 | Signal processor, signal processing method, and signal processing system |
WO2015008567A1 (en) * | 2013-07-18 | 2015-01-22 | Necソリューションイノベータ株式会社 | Facial impression estimation method, device, and program |
JP2017004350A (en) * | 2015-06-12 | 2017-01-05 | 株式会社リコー | Image processing system, image processing method and program |
Non-Patent Citations (1)
Title |
---|
SHO SONODA ET AL.: "Transportation aspect of infinitely deep denoising autoencoder", IEICE TECHNICAL REPORT, vol. 116, no. 300, 9 November 2016 (2016-11-09), pages 297 - 304 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113678143A (en) * | 2019-02-20 | 2021-11-19 | 蓝岩治疗有限公司 | Detection of target cells in large image datasets using artificial intelligence |
JP2022521205A (en) * | 2019-02-20 | 2022-04-06 | ブルーロック セラピューティクス エルピー | Detection of cells of interest in large image datasets using artificial intelligence |
JP7496364B2 (en) | 2019-02-20 | 2024-06-06 | ブルーロック セラピューティクス エルピー | Detecting Cells of Interest in Large Image Datasets Using Artificial Intelligence |
US11989960B2 (en) | 2019-02-20 | 2024-05-21 | Bluerock Therapeutics Lp | Detecting cells of interest in large image datasets using artificial intelligence |
JP2020144735A (en) * | 2019-03-08 | 2020-09-10 | 富士ゼロックス株式会社 | Image processing device and program |
JP7215242B2 (en) | 2019-03-08 | 2023-01-31 | 富士フイルムビジネスイノベーション株式会社 | Image processing device and program |
JP7248098B2 (en) | 2019-03-12 | 2023-03-29 | 日本電気株式会社 | Inspection device, inspection method and storage medium |
WO2020183936A1 (en) * | 2019-03-12 | 2020-09-17 | 日本電気株式会社 | Inspection device, inspection method, and storage medium |
JPWO2020183936A1 (en) * | 2019-03-12 | 2021-12-09 | 日本電気株式会社 | Inspection equipment, inspection method and storage medium |
JPWO2020241074A1 (en) * | 2019-05-30 | 2020-12-03 | ||
WO2020241074A1 (en) * | 2019-05-30 | 2020-12-03 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | Information processing method and program |
JP7454568B2 (en) | 2019-05-30 | 2024-03-22 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | Information processing method, information processing device and program |
CN112580794A (en) * | 2019-09-29 | 2021-03-30 | 佳能株式会社 | Attribute recognition device, method and system and neural network for recognizing object attributes |
TWI775038B (en) * | 2020-01-21 | 2022-08-21 | 群邁通訊股份有限公司 | Method and device for recognizing character and storage medium |
TWI775039B (en) * | 2020-01-21 | 2022-08-21 | 群邁通訊股份有限公司 | Method and device for removing document shadow |
WO2022153480A1 (en) * | 2021-01-15 | 2022-07-21 | 日本電気株式会社 | Information processing device, information processing system, information processing method, and recording medium |
JP7529052B2 (en) | 2021-01-15 | 2024-08-06 | 日本電気株式会社 | Information processing device, information processing system, information processing method, and program |
Also Published As
Publication number | Publication date |
---|---|
JP6798614B2 (en) | 2020-12-09 |
JPWO2018207334A1 (en) | 2019-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018207334A1 (en) | Image recognition device, image recognition method, and image recognition program | |
KR100442834B1 (en) | Method and system for face detecting using classifier learned decision boundary with face/near-face images | |
CN111754596B (en) | Editing model generation method, device, equipment and medium for editing face image | |
US20210326728A1 (en) | Anomaly detection apparatus, anomaly detection method, and program | |
KR102450374B1 (en) | Method and device to train and recognize data | |
CN110069985B (en) | Image-based target point position detection method and device and electronic equipment | |
WO2016138838A1 (en) | Method and device for recognizing lip-reading based on projection extreme learning machine | |
JP5214760B2 (en) | Learning apparatus, method and program | |
US10970313B2 (en) | Clustering device, clustering method, and computer program product | |
WO2019026104A1 (en) | Information processing device, information processing program, and information processing method | |
CN105225222B (en) | Automatic assessment of perceptual visual quality of different image sets | |
US11748450B2 (en) | Method and system for training image classification model | |
US9842279B2 (en) | Data processing method for learning discriminator, and data processing apparatus therefor | |
CN113313053B (en) | Image processing method, device, apparatus, medium, and program product | |
JP2015057630A (en) | Acoustic event identification model learning device, acoustic event detection device, acoustic event identification model learning method, acoustic event detection method, and program | |
JP2017062778A (en) | Method and device for classifying object of image, and corresponding computer program product and computer-readable medium | |
CN113095370A (en) | Image recognition method and device, electronic equipment and storage medium | |
WO2019167784A1 (en) | Position specifying device, position specifying method, and computer program | |
CN114842343A (en) | ViT-based aerial image identification method | |
CN117038055B (en) | Pain assessment method, system, device and medium based on multi-expert model | |
CN108154186B (en) | Pattern recognition method and device | |
KR20200110064A (en) | Authentication method and apparatus using transformation model | |
JP6600288B2 (en) | Integrated apparatus and program | |
US20210073586A1 (en) | Learning device, learning method, and storage medium | |
CN116935125A (en) | Noise data set target detection method realized through weak supervision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17909543 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019516839 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17909543 Country of ref document: EP Kind code of ref document: A1 |