Disclosure of Invention
The application discloses an unsupervised wafer defect detection method based on deep learning, which comprises the following steps:
Constructing a normal sample of the wafer image in a differential mode, extracting a feature map of the normal sample, and achieving the purpose of training the convolutional neural network by calculating Gaussian distribution of each pixel position of the normal sample;
In the reasoning stage, the Mahalanobis distance is calculated between the feature diagram of the diagram to be tested of the wafer and the extracted normal sample distribution, a Mahalanobis distance matrix is obtained, up-sampling is carried out on the Mahalanobis distance matrix, and the defective pixels are obtained through threshold segmentation.
The method, wherein: selecting a group of preset samples of the wafer image as a template and another defect sample as a sample to be repaired, and calculating Manhattan distance between the sample to be repaired and corresponding pixels of the template, so that two Manhattan distance diagrams can be obtained;
and when the pixels at the same position in the two Manhattan distance maps exceed a segmentation value at the same time, filling the pixels of the template at the position corresponding to the minimum Manhattan distance map into the to-be-repaired area of the to-be-repaired sample so as to complete the restoration of the pixels of the defect area and capture the normal sample.
The method, wherein: the feature map of the d-dimensional channel of the normal sample is extracted by means of a WIDE RESNET feature extractor, WIDE RESNET feature extractor is the feature map of the normal sample selected from the second layer of the backbone network.
The method, wherein: in the training stage of the convolutional neural network, the inverse matrix of the mean and covariance of each coordinate of the feature map of the normal sample of the second layer is extracted, namely, each position of the feature map of the normal sample of the second layer is described by a Gaussian distribution and is stored as the feature distribution.
The method, wherein: the feature map of the d-dimensional channel of the map under test is extracted with the same WIDE RESNET feature extractor or with another WIDE RESNET feature extractor.
The method, wherein: and in the reasoning stage, calculating the mahalanobis distance between the values of the coordinates of the feature map of each defect image of the image to be detected and the corresponding feature distribution of the normal sample so as to obtain the mahalanobis distance matrix.
The method, wherein: forming a defect thermodynamic diagram based on the mahalanobis distance matrix, performing the upsampling of the defect thermodynamic diagram including using a bicubic interpolation algorithm, and performing a normalization process on the defect thermodynamic diagram.
The method, wherein: a threshold value for dividing the defective pixel from the normal pixel is selected for the defect thermodynamic diagram, and the defective region is divided from the defect thermodynamic diagram by threshold segmentation.
The method, wherein: the defect types of the wafer at least comprise adhesion particles, stains and scratches on the wafer.
The method, wherein: the initial wafer image is subjected to template matching processing to obtain image data for constructing a normal sample; the wafer map to be measured is subjected to template matching processing to obtain image data for extracting a feature map of the map to be measured.
The application discloses a convolutional neural network based on deep learning, wherein a normal sample is required to be constructed in the convolutional neural network through difference, a characteristic diagram of the normal sample is extracted through the convolutional network, and Gaussian distribution of each pixel position of the normal sample is calculated to achieve the training purpose; and in the reasoning stage, calculating the mahalanobis distance between the characteristic diagram of the test diagram and the extracted normal sample distribution to obtain a mahalanobis distance matrix, up-sampling the mahalanobis distance matrix to an input size, and segmenting out defective pixels through a threshold value. The convolutional neural network based on deep learning can locate the defective region without any labeling cost. The abnormal pixel area can be judged only by extracting the distribution of the normal samples in the high-dimensional channel.
The application discloses another unsupervised wafer defect detection method based on deep learning, which comprises the following steps:
In the training phase, feature extraction of normal samples of the training image of the wafer is performed only once alone:
extracting a characteristic diagram of a normal sample, and achieving the purpose of training a convolutional neural network by calculating Gaussian distribution of each pixel position of the normal sample;
in the inference phase, defective pixel areas are located in a way that does not use any labels:
and calculating the mahalanobis distance between the feature map of the to-be-detected map of the wafer and the extracted normal sample distribution, obtaining a mahalanobis distance matrix, up-sampling the mahalanobis distance matrix, and obtaining the defective pixel through threshold segmentation.
The method, wherein: the initial training image is subjected to template matching processing to obtain image data for constructing a normal sample; the image data is obtained through template matching processing of the image to be detected and is used for extracting the feature image of the image to be detected.
The method, wherein: selecting a group of preset samples of the training image as a template and another defect sample as a sample to be repaired, and calculating the Manhattan distance between the sample to be repaired and the corresponding pixel of the template, so that two Manhattan distance graphs can be obtained; the defect number of a group of preset samples is smaller than that of a sample to be repaired;
and when the pixels at the same position in the two Manhattan distance maps exceed a segmentation value at the same time, filling the pixels of the template at the position corresponding to the minimum Manhattan distance map into the to-be-repaired area of the to-be-repaired sample so as to complete the restoration of the pixels of the defect area and capture the normal sample.
The method, wherein: the feature map of the d-dimensional channel of the normal sample is extracted by using a first WIDE RESNET feature extractor, and the first WIDE RESNET feature extractor is to select the feature map of the normal sample from the second layer of the backbone network.
The method, wherein: in the training stage of the convolutional neural network, the inverse matrix of the mean and covariance of each coordinate of the feature map of the normal sample of the second layer is extracted, namely, each position of the feature map of the normal sample of the second layer is described by a Gaussian distribution and is stored as the feature distribution.
The method, wherein: the feature map of the d-dimensional channel of the map under test is extracted with a first WIDE RESNET feature extractor or another second WIDE RESNET feature extractor.
The method, wherein: the training phase and the reasoning phase employ the same feature extractor (WIDE RESNET-50-2) that has been pre-trained in ImageNet to extract the feature map of the respective d-dimensional channel.
The method, wherein: the d associated with the first and second WIDE RESNET feature extractors is equal to 512.
The method, wherein: and in the reasoning stage, calculating the mahalanobis distance between the values of the coordinates of the feature map of each defect image of the image to be detected and the corresponding feature distribution of the normal sample so as to obtain the mahalanobis distance matrix.
The method, wherein: forming a defect thermodynamic diagram by calculating a mahalanobis distance, i.e. the calculated mahalanobis distance matrix is used to form the defect thermodynamic diagram, and upsampling the mahalanobis distance matrix corresponds to upsampling the defect thermodynamic diagram, the upsampling using a bicubic interpolation algorithm.
The method, wherein: normalization is performed on the defect thermodynamic diagram subjected to upsampling.
The method, wherein: the pixel value after the defect thermodynamic diagram is normalized is between 0 and 1.
The method, wherein: a threshold value for dividing the defective pixel from the normal pixel is selected for the defect thermodynamic diagram and the defective region is segmented from the defect thermodynamic diagram by threshold segmentation.
The method, wherein: the threshold value dividing the defective pixel value from the normal pixel value takes an arbitrary value between [0,1 ].
The application further discloses a convolutional neural network based on deep learning and used for realizing unsupervised wafer defect detection, which comprises the following steps:
the sample reduction module is used for constructing a normal sample of the training image of the wafer in a differential mode;
The feature extractor is used for extracting a feature map of a normal sample and a feature map of a to-be-detected map of the wafer;
the training unit is used for achieving the purpose of training the convolutional neural network by calculating the Gaussian distribution of each pixel position of the normal sample;
the reasoning unit is used for calculating the mahalanobis distance between the values of the coordinates of the feature map of each defect image in the to-be-detected map and the corresponding feature distribution of the normal sample so as to form a defect thermodynamic diagram;
and the output layer is used for upsampling the defect thermodynamic diagram and obtaining a defect pixel through threshold segmentation.
The convolutional neural network described above, wherein: the sample reduction module captures the normal samples, selects a group of preset samples of a training image as a template and another defect sample as a sample to be repaired, calculates the Manhattan distance between the sample to be repaired and the corresponding pixel of the template, and obtains two Manhattan distance diagrams;
And when the pixels at the same position in the two Manhattan distance maps exceed a segmentation value at the same time, filling the pixels of the template at the position corresponding to the minimum Manhattan distance map into the to-be-repaired area of the to-be-repaired sample so as to complete the restoration of the pixels of the defect area and obtain the normal sample.
The convolutional neural network described above, wherein: the feature extractor is used WIDE RESNET to extract the feature map of the d-dimensional channel of the normal sample, and WIDE RESNET the feature extractor selects the feature map of the normal sample from the second layer of the backbone network.
The convolutional neural network described above, wherein: in the training stage of the convolutional neural network, the inverse matrix of the mean and covariance of each coordinate of the feature map of the normal sample of the second layer is extracted, namely, each position of the feature map of the normal sample of the second layer is described by a Gaussian distribution and is stored as the feature distribution.
The convolutional neural network described above, wherein: and extracting the feature map of the d-dimensional channel of the map to be detected by using WIDE RESNET feature extractor.
The convolutional neural network described above, wherein: the inference unit forms a defect thermodynamic diagram based on the mahalanobis distance matrix obtained by the mahalanobis distance calculation.
The convolutional neural network described above, wherein: the upsampling of the defect thermodynamic diagram by the output layer includes using a bicubic interpolation algorithm and performing a normalization process on the defect thermodynamic diagram.
The convolutional neural network described above, wherein: the output layer selects a threshold value for dividing the defective pixel from the normal pixel according to the defect thermodynamic diagram, and the defective region is divided from the defect thermodynamic diagram through threshold value division.
The convolutional neural network described above, wherein: the initial training image of the wafer is subjected to template matching processing to obtain image data for constructing a normal sample; the image data of the image to be detected of the wafer is obtained through template matching processing and is used for extracting the characteristic image of the image to be detected.
The application also discloses an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the electronic device is characterized in that when the computer program is executed by the processor, the processor executes the method in any one of the technical schemes.
The application further discloses a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of the above technical solutions.
One of the advantages of the application is that the detection performance can be effectively improved, and the labeling cost can be saved. The defect position detection method overcomes the defect that the labeling cost generated by the supervision method adopted by the existing wafer defect detection is too high, for example, a great amount of software and hardware cost is basically consumed by a YOLO family V1-V7 series target detection network to label the defect position, and most semantic segmentation networks have very high pixel-level labeling cost. Pixel-level labeling costs are unacceptable as image resolution increases. In addition, the defect detection performance based on the conventional reconstruction method is often limited, for example, the detection performance is not always stable when the defect image restoration effect is good due to the reconstruction-based self-encoder or the variational self-encoder and other methods.
Detailed Description
The solution according to the application will now be described more fully hereinafter with reference to the accompanying drawings, in which examples are shown, by way of illustration only, and not in any way limiting, embodiments of the application, on the basis of which those skilled in the art may obtain solutions without making any inventive faculty.
Referring to fig. 1, based on the existing imaging technology, an Image image_data acquired by an Image processing module or an Image acquisition device of a wafer 10 is input and transmitted to the present algorithm module or neural network. Regarding the image processing module or image acquisition device related scheme, the manner in which the image data is obtained: optical imaging, SEM scanning electron microscope, X-ray imaging, spectroscopic ellipsometer, reflectometer, etc. The CD-SEM, and imaging techniques for 2-D BPR, scatterometry, and the like, and even CD-SAXS, XRR, reflectometry, TEM, AFM imaging. The image of the wafer 10 may be captured by a variety of different imaging approaches or measurement techniques.
Referring to fig. 1, an Image image_data of a wafer 10 is input in step S1. Note that when the unsupervised neural network or algorithm module accepts the Image image_data of the wafer 10, each die may be extracted by template matching, i.e., the required Image data is obtained by the template matching process. The die characterizes the desired experimental or image data. In order to achieve this, it is shown in the figure that in step S2, the die is extracted by means of template matching (TEMPLATE MATCHING), where the die may be image data for training in a training phase or image data of an image to be measured in an inference phase.
Referring to fig. 1, regarding template matching-related explanation: the visual inspection method herein may employ template matching to detect small defects on the wafer, such as by scanning the surface of the wafer with an electron microscope to obtain an image. To increase the ability to detect defects, a considerable library of templates is required. This results in a cumbersome template generation process and a time consuming process for selecting the corresponding suitable item from a large number of template libraries. One of the improved methods is to use the image of the previous frame as a template, to differentiate the image of the next frame, and to use the pixels contained in the image after the differentiation as microscopic defects of the wafer.
Referring to fig. 1, in an alternative embodiment, template matching is an important component of digital image processing. The process of spatially aligning two or more images acquired by different sensors or the same sensor for the same wafer at different times and under different imaging conditions, or looking for a corresponding pattern from a known pattern to another is called template matching. The template uses, for example, a known small image. Template matching is to search for a target in a large image, and the target in the image can be found and its coordinate position can be determined by a related algorithm, wherein the target is known to have the same size, direction and image as the template. The same applies to template matching of the Image image_data of the wafer 10.
Referring to fig. 1, in an alternative embodiment, an example is illustrated: template matching refers to matching a portion most similar to other image B within the current image a, and the image a is referred to as an input image and the image B is referred to as a template image. The template matching method may slide the template image B over the image a, traversing all pixels one by one to complete the matching. Template matching is the discovery of a small region of the image that matches a given sub-image throughout the image region. So template matching firstly needs a template image, namely a given sub-image, and additionally needs an image to be detected, namely a source image, and the matching degree of the template image and the overlapping of the template image is calculated from left to right and from top to bottom for the image to be detected, so that the matching degree is larger, and the possibility that the template image and the source image are identical is higher.
Referring to fig. 1, it should be noted that in many complex wafer surfaces, the weakness of large pattern variations needs to be overcome, and because not every frame of image is identical, a wafer micro defect detection model or network with high robustness and high accuracy is particularly important for the integrated circuit in the actual production stage.
Referring to fig. 1, the training phase herein only needs to be performed once to extract the feature distribution. Since normal samples of the wafer 10 are difficult to collect and are in an unsupervised application environment, abnormal samples of the wafer 10 are repaired. The repair of the abnormal sample involves step S3a shown in the figure, i.e. differential reduction of the normal sample. The feature of extracting the feature distribution only once will be described in detail hereinafter, which is a great feature and advantage of the present application.
Referring to fig. 1, for step S3a, the Image image_data used may be based on the wafer Image used for training, that is, training data of the training Image is extracted by template matching: for example, a set of predetermined samples of the training image of the wafer 10 is selected as a template, and another defective sample of the training image of the wafer 10 is selected as a sample to be repaired. In an alternative example, the steps S1-S2 are used to capture the predetermined sample and the sample to be repaired as templates before the step S3 a.
Referring to fig. 1, explanation is made regarding differential reduction: image differencing methods are often used when processing images, particularly video stream images. As the name suggests, image differencing is the subtraction of corresponding pixel values of two images to attenuate similar portions of the image, while highlighting varying portions of the image. For example, the differential image method may be capable of detecting the outline of a moving object and extracting the trajectory of a flash image.
Referring to fig. 1, there are two different approaches: the first is the difference between the current image and the fixed background image, and the second is the difference between two consecutive images. Most of the differences, i.e. differential reduction of normal samples, can be achieved using them. When the detection is started, a frame of image without a moving object is selected as a differential background image, the current image and the background image are differentiated when the moving object appears, when the detection of the moving object is finished, the background image is updated, and when the next moving object appears, the differentiation can be performed again. The differential result can remove a part of noise, and can remove a static background area irrelevant to the detection of a moving object, and the background image updating mechanism can also be better adapted to the change of the background and light to a certain extent. After the differential processing, only the moving object and part of noise remain in the differential image, for example, the moving object and part of noise are identified and positioned by using a positioning algorithm based on a projection method.
Referring to fig. 1, the abnormal sample is repaired by the difference in step S3a of the present application. Two samples with relatively few defects are selected as templates (Ta, tb), and the other sample with the defects is selected as a sample to be repaired. And calculating the Manhattan distance between the sample to be repaired and the corresponding pixel of the template to obtain two Manhattan distance maps (dist map A and dist map B). And selecting a group of preset samples of the wafer image as a template and another defect sample as a sample to be repaired, and calculating the Manhattan distance between the sample to be repaired and the corresponding pixel of the template to obtain two Manhattan distance graphs. Since normal samples of the wafer 10 are difficult to collect, differential may be used to repair abnormal samples.
Referring to fig. 1, manhattan distance formula: dist= |x-y|. Wherein X related to the Manhattan distance formula designates a coordinate pixel value for the repair sample, and Y designates a coordinate pixel value for the template.
Referring to fig. 1, a manhattan distance map (distmap a) is obtained by calculating manhattan distances between a sample to be repaired, such as X, and a corresponding pixel of the template Ta (i.e., Y), for example, a pixel at coordinates (i, j).
Referring to fig. 1, a manhattan distance map (distmap B) is obtained by calculating manhattan distances between samples to be repaired, such as X, and pixels at corresponding coordinates, e.g., coordinates (i, j), of a template Tb (i.e., Y).
Referring to fig. 1, in an alternative example, when the pixel or the pixel value of the coordinate (i, j) in the same position in two manhattan distance maps (dist map a and dist map B) exceeds a segmentation value at the same time, the pixel of the position corresponding to one template (i.e. by comparing min [ a (i, j), B (i, j) ] of the minimum manhattan distance map is taken to fill the to-be-repaired area of the to-be-repaired sample to complete the restoration of the pixel y (i, j) of the defect area, so as to capture the normal sample.
Referring to fig. 1, in an alternative example, a pixel value such as a (i, j) of a position such as a coordinate (i, j) in a manhattan distance map (distmap a) exceeds a division value, while a pixel value such as a B (i, j) of a position such as a coordinate (i, j) in a manhattan distance map (distmap B) exceeds a division value, and a pixel of a template corresponding to a minimum manhattan distance map (distmap) in a (i, j) and B (i, j) is padded to a region to be repaired such as a coordinate (i, j) of a sample to be repaired, and a pixel of the region to be repaired of the sample to be repaired is denoted by y (i, j).
Referring to fig. 1, in an alternative example, when the pixel of the same location, such as coordinate (i, j), exceeds a segmentation value at the same time in two manhattan distance maps (dist map a and dist map B), then a template (Ta or Tb) corresponding to the minimum one of the manhattan distance maps (dist map a or dist map B) at the location, such as coordinate (i.e., one of Ta (i, j) or Tb (i, j), is taken to fill in the region to be repaired, i.e., y (i, j), of the sample to be repaired, so as to complete the restoration of the defective region pixel.
Referring to fig. 1, in an alternative embodiment, a threshold (first threshold) is set as a division value for dividing a defective pixel from a normal pixel. In the two Manhattan distance maps (dist map A and dist map B), the Manhattan distance map dist map A and the Manhattan distance map dist map B which are used for simultaneously exceeding the division value for the pixels at the same position are filled in the region to be repaired by taking the position pixels of the template corresponding to the minimum Manhattan distance map (dist map), and then the reduction of the pixels of the defect region can be completed. The formula is as follows, where (i, j) is the pixel coordinate.
Referring to fig. 1, in an alternative example, if a (i, j) is smallest among a (i, j) and B (i, j), the manhattan distance map that takes the smallest of them is denoted by min [ a (i, j), B (i, j) ] and min [ a (i, j), B (i, j) ]=a (i, j). And taking a pixel Ta (i, j) of a template Ta corresponding to the minimum Manhattan distance map (distmap A) at the position (i, j) to fill the to-be-repaired area of the to-be-repaired sample. This procedure is y (i, j) =ta (i, j), min [ a (i, j), B (i, j) ]=a (i, j).
Referring to fig. 1, in an alternative example, if B (i, j) is the smallest of a (i, j) and B (i, j), the manhattan distance map that takes the smallest of them is denoted by min [ a (i, j), B (i, j) ] and min [ a (i, j), B (i, j) ]=b (i, j). And taking a pixel Tb (i, j) of a template Tb corresponding to the minimum Manhattan distance map (distmap B) at the position (i, j) to fill the to-be-repaired area of the to-be-repaired sample. This procedure is y (i, j) =tb (i, j), min [ a (i, j), B (i, j) ]=b (i, j).
Referring to fig. 1, even though the normal sample of the wafer 10 is difficult to collect, the normal sample can be obtained or restored by repairing the abnormal sample. The purpose of step S3a is just so.
Referring to fig. 1, step S3b of the present application, in an alternative example, a WIDE RESNET feature extractor is used to extract a feature map of the d-dimensional channel of the normal sample (resulting from the restoration of step S3 a). WIDE RESNET the feature extractor is a feature map that selects normal samples from the second layer of its backbone network.
Referring to fig. 1, in an alternative example, the training phase and the reasoning phase employ the same feature extractor WIDE RESNET-50-2 pre-trained in ImageNet to extract a d-dimensional channel feature map, d=512. The input aspect ratio of a model, such as a neural network model, may be scaled based on experimental data (die) actual, e.g., using 4:1, input size 224 x 896.
Referring to fig. 1, the wide res net can extract a feature map from the second layer, its channel: height: width=
512:28:112. In an alternative example ResNet typically has a 4-layer backbone network, where a second layer 512-dimensional feature map is selected. The first layer retains enough wafer image detail information but has little high-level semantic information. The third layer contains more abstract high-level semantic information, the fourth layer extracts information which is more biased to the ImageNet, and the second layer has detailed information and partial abstract information, and simultaneously the reasoning speed can be considered.
Referring to fig. 1, in step S3c of the present application, in an alternative example, the purpose of training the convolutional neural network is achieved by calculating the gaussian distribution of each pixel position of the normal sample. I.e. a gaussian distribution of the features is calculated.
Referring to fig. 1, in step S3c of the present application, in an alternative example, during the training phase of the convolutional neural network, an inverse matrix of the mean and covariance of each coordinate of the feature map of the normal sample belonging to the second layer is extracted, that is, each position of the feature map of the normal sample belonging to the second layer is described by a gaussian distribution and stored as a feature distribution.
Referring to fig. 1, the training phase needs to extract the mean μ of each coordinate (i, j) of the second layer normal sample feature map and the inverse matrix of the covariance Σ, and needs to ensure that the covariance is full-rank, i.e. each position is described by a gaussian distribution and stored for easy reasoning. One feature of the application is that: the training phase only requires one or more single extraction of features from the normal sample, with the exception of the normal sample.
Referring to fig. 1, to learn the normal image features at position (i, j), the set of embedded vectors at position (i, j) is calculated, for example, from N normal training images: x (ij) = { xk (ij), k e
[1, N ] }. In addition, the information carried by the collection can be summarized, and the multi-element Gaussian distribution N (mu (ij), sigma (ij)) generates X (ij), wherein mu (ij) is a sample mean value, sigma (ij) is a sample covariance, and each position (i, j) of a characteristic graph of a normal sample is described by one Gaussian distribution N (mu (ij), sigma (ij)) and is stored as a characteristic distribution. Description explanation about gaussian distribution: note that the operation of obtaining gaussian distribution based on the mean and covariance belongs to the category of the prior art, gaussian distribution is also a very important class of distribution in the fields of probability statistics, machine learning and the like, and the multi-element gaussian distribution is a representation form of unit gaussian distribution under high-dimensional data, so that mathematical deduction of the mean and variance of the multi-element conditional gaussian distribution is not repeated. Therefore, the inverse matrix of the mean and covariance of each coordinate of the feature map of the normal sample of the second layer can be extracted or calculated, so that each position of the feature map of the normal sample of the second layer is described by a gaussian distribution and stored as the equivalent training of the convolutional neural network.
Referring to fig. 1, in step S3d of the present application, in an alternative example, in a training stage of the convolutional neural network, an inverse matrix of a mean and a covariance of each coordinate of a feature map of a normal sample belonging to a second layer (a feature map of a second layer 512 dimension is selected in a 4-layer backbone network) is extracted, so that each position (i, j) of the feature map of the normal sample belonging to the second layer is described by a gaussian distribution based on the mean and the covariance, and is stored and recorded as a feature distribution. The gaussian distribution describes the pixel likelihood at each location. Step S3d achieves the purpose of calculating the feature distribution.
Referring to fig. 1, for step S3e, the Image image_data used may be based on the wafer Image used for reasoning, that is, the Image data of the Image to be measured is extracted by template matching: for example, each defective image of the image to be measured of the wafer 10 is selected for input to the WIDE RESNET feature extractor. I.e. step S3e is preceded by steps S1-S2.
Referring to fig. 1, for step S3e, the inference phase and training phase employ the same feature extractor WIDE RESNET-50-2 pre-trained in ImageNet to extract the d-dimensional channel feature map, d=512. The input aspect ratio of a model, such as a neural network model, may be scaled based on experimental data (die) actual, e.g., using 4:1, input size 224 x 896.
Referring to fig. 1, the wide res net may extract a feature map from the second layer, which channels: height: width=512:28:112. In an alternative example ResNet typically has a 4-layer backbone network, where a second layer 512-dimensional feature map is selected; the first layer retains enough wafer image detail information, but the high-level semantic information is few, the third layer contains more abstract high-level semantic information, the fourth layer extracts information which is more biased to ImageNet, and the second layer has both detail information and partial abstract information, and meanwhile, the reasoning speed can be considered, for example, in step S3e.
Referring to fig. 1, in an alternative example, in the reasoning stage, the mahalanobis distance is calculated by respectively comparing the values of the coordinates of the feature map of each defect image in the map to be measured with the corresponding feature distribution of the normal sample, so as to obtain a mahalanobis distance matrix.
Referring to fig. 1, in an alternative example, the inference stage calculates the mahalanobis distance, a formula such as DM, from the normal sample distribution, respectively, with the value x (i, j) of each defect image coordinate (i, j). A defect thermodynamic diagram of n×h×w is formed, N being the number of images input. For example, in the case of input size 224×896, the height and width are h=28 and w=112, respectively.
Referring to fig. 1, for step S4: a related anomaly score for test image location (i, j) is given using the Max distance DM (x ij) and DM (x ij) can be interpreted as the distance between the test patch embedded in x (i, j) and the learning distribution N (μ ij,Σij). In this way, in the reasoning stage (steps S3e-S6, etc.), the march distance can be calculated by respectively comparing the value x ij of each defect image coordinate (i, j) with the gaussian feature distribution of the normal sample.
See fig. 1, where DM (x ij) is calculated as above. Further, a mahalanobis distance matrix constituting the anomaly map may be calculated: m= (DM (x ij)) 1< i < w,1< j < h.
Referring to fig. 1, for step S5: forming a defect thermodynamic diagram based on the mahalanobis distance matrix, performing up-sampling on the defect thermodynamic diagram including performing a normalization process on the defect thermodynamic diagram using a bicubic interpolation algorithm.
Referring to fig. 1, for step S5: the upsampling of the defect thermodynamic diagram uses a bicubic interpolation algorithm. For example, the interpolation may be calculated from the surrounding 4*4 =16 known pixels, the weights being determined by the distance. The bicubic interpolation algorithm relieves the phenomenon of the outward expansion of the segmented regions to some extent.
Referring to fig. 1, the defect thermodynamic diagram is up-sampled to the input size 224×896 by bicubic interpolation algorithm, and since the thermodynamic diagram at this time cannot evaluate the appropriate threshold range, the defect thermodynamic diagram needs to be normalized, as follows:
Referring to fig. 1, with respect to normalization, for step S5, the ratio of the difference between the original pixel value x and the minimum pixel value x min to the difference between the maximum pixel value x max and the minimum pixel value x min is the new pixel value x new. The normalized pixel value range is any value between [0,1], i.e., between the dividing defect and the normal threshold (second threshold PIX TH) may take on a value of [0,1 ].
Referring to fig. 1, for step S6, a threshold value for dividing a defective pixel from a normal pixel is selected for a defective thermodynamic diagram, and a defective region is divided from the defective thermodynamic diagram by threshold segmentation. Outputting a segmented defect area, wherein a segmentation threshold value is used for segmenting defects and normal conditions, and the segmentation effect is accurate to the pixel level.
Referring to fig. 1, if the camera captures an Image image_data, the Image image_data includes gray values of each pixel. The chips on the wafer surface appear as nano-or micro-scale circuit element structures with complex color transformations and asperities, which generally exacerbate the degree of noise impact on the image. If the image quality does not meet the denoising requirement, the inferior image cannot be preferably applied to the field of micro-scale or even nano-scale wafer detection. The fine defect detection must be very dependent on the quality of the shot or image of the object to be measured, and if the image of the object to be measured is only a relatively rough gray scale, it is obvious that the measurement or defect analysis must deviate. This class of problems is particularly pronounced when wafers are brought into the micrometer or even nanometer range.
Referring to fig. 2, the upsampled result (S5-P) of the defect thermodynamic diagram is compared with the segmented result (S6-P) of the output segmented defect region through the previous training and reasoning: the former is the result of step S5, the latter is the result of step S6 and the former is the result of division by the threshold (second threshold PIX TH) dividing the defective pixel from the normal pixel, the latter result being obtained and the division effect given by taking PIX TH =0.4 as an example. Note that the defect thermodynamic diagram and the segmentation result diagram should actually be color diagrams, but this example is exemplarily illustrated by taking gray-scale diagrams as an example.
Referring to fig. 2, the upsampling result (S5-P) of the defect thermodynamic diagram is still a vague effect, and although the appearance and location of the defect are roughly described, the positioning accuracy and defect type are still suboptimal.
Referring to fig. 2, a threshold value for dividing a defective pixel from a normal pixel is selected for an up-sampling result (S5-P) of a defect thermodynamic diagram and a defective region is segmented from the up-sampling result (S5-P) of the defect thermodynamic diagram by threshold segmentation (S6-P), the segmented defective region is output and the segmentation threshold value is used for both the segmentation of the defect and the normal, the segmentation effect being accurate to the pixel level.
Referring to fig. 2, the segmentation result (S6-P) of the defect area has achieved a clearer effect, and the appearance and position of the defect can be given in detail, which is obviously sufficient for optimizing the positioning accuracy and defect type.
Referring to fig. 3, the local values of H and W are labeled for the upsampled result (S5-P) of the defect thermodynamic diagram, and the approximate defect morphology and approximate defect location are difficult to accurately determine from the coordinates. Taking the defect at the (i, j) = (100, 150) position as an example for the moment, the true morphology of this stripe-like defect is unknown and too ambiguous.
Referring to fig. 4, local values of H and W are labeled for the defect segmentation map (S6-P), and fine defect appearances and fine defect positions are easily precisely determined from the coordinates. Still taking the defect at the (i, j) = (100, 150) position as an example, the true morphology of this stripe-like defect was confirmed and very clear.
Referring to fig. 4, the present application provides a convolutional neural network based on deep learning, which judges an abnormal pixel region by extracting the distribution of a normal sample in a high-dimensional channel. According to the scheme, only a normal sample is needed to be constructed through difference, a characteristic diagram of the normal sample is extracted through a convolution network, and Gaussian distribution of each pixel position of the normal sample is calculated to achieve the training purpose. And in the reasoning stage, the Mahalanobis distance matrix is obtained by calculating the Mahalanobis distance between the characteristic diagram of the test diagram and the extracted normal sample distribution, the matrix is up-sampled to the input size, and the defective pixels are segmented through the threshold value. The defective area can be located without any labeling cost.
Referring to fig. 4, the application relates to an unsupervised wafer defect detection method based on deep learning, which relates to the field of wafer defects; the main problem solved is as follows: firstly, under the condition that the marking cost generated by the existing supervised method for detecting the wafer defects is too high, extremely high cost is required to mark the defect positions, and especially, the semantic segmentation network requires higher pixel-level marking cost, and the defect marking cost is exponentially increased in the environment of micro-level or even nano-level electronic components or integrated circuit wiring. Second, for integrated circuits, it is a key point how to reduce the complexity of the neural network model, training time, and test time, and maintain wafer level defect recognition performance. The generated embedded vector may carry redundant information, and the method and the device have the possibility of reducing the size of the embedded vector to the greatest extent, which combines a training mode and an reasoning mode to obtain one of the great advantages, and on the premise of guaranteeing the wafer defect identification performance, the method and the device simplify the neural network model and shorten the training time and the testing time, and simultaneously keep the neural network model to have the most efficient wafer defect identification performance. The Gaussian distribution of the normal sample obtained based on the Manhattan distance graph is connected with corresponding semantic information of a pre-trained CNN network in the aspects of network training and reasoning by calculating Manhattan distances between the sample to be repaired and corresponding pixels of the template, so that mutual embedded information carrying information from different semantic layers and different resolutions is obtained. For example, to encode fine granularity and global context. The normal sample introduces the characteristics of the wafer image in the calculation process of the Manhattan distance graph and brings abstract information into Gaussian distribution, so that the complexity of a model can be remarkably reduced and training and reasoning time can be reduced after the CNN network is integrated with the mutual embedded information. The process of integrating the CNN network into the mutual embedded information is also used for compressing the redundant information carried by the wafer training image or the image to be tested (the image of the wafer with the micron level or even the nanometer level is filled with massive and time-consuming redundant information), so that the process of combining the Manhattan distance graph with Gaussian distribution to train the convolutional neural network is not only used for compressing the redundant information, but also used for reducing the redundant information and still maintaining the wafer defect identification performance of the CNN network. Notably, the compressed redundant information is a double-edged sword, which reduces the amount of data that needs to be processed during the network training or reasoning phase, but also loses a significant amount of useful image information. The result of inferential images that reduce redundant information often has considerable uncertainty (e.g., much like the degree of blurring of fig. 3).
Referring to FIG. 4, in an alternative example, during the training phase, each patch of normal image is associated with its spatially corresponding activation vector in a pre-trained neural network activation graph. The activation map is allowed to have a lower resolution than the input image so that many pixels have the same embedding and then form a block of pixels that does not overlap the original image resolution.
Referring to fig. 5, the present application additionally discloses a convolutional neural network based on deep learning and used for implementing unsupervised wafer defect detection, comprising: the sample reduction module is used for constructing a normal sample of the training image of the wafer in a differential mode; a feature extractor (e.g., resNet) for extracting a feature map of a normal sample and extracting a feature map of a map to be measured of the wafer; the training unit is used for achieving the purpose of training the convolutional neural network CNN by calculating the Gaussian distribution of each pixel position of the normal sample; the reasoning unit is used for calculating the mahalanobis distance between the values of the coordinates of the feature map of each defect image in the to-be-detected map and the corresponding feature distribution of the normal sample so as to form a defect thermodynamic diagram; and an output layer for upsampling the defect thermodynamic diagram and obtaining a defective pixel (S5-6, for example) by threshold segmentation (PIX TH, for example).
Referring to fig. 5, for convolutional neural network CNN, a normal sample is extracted by a sample restoration module, which selects a training chart
And taking a group of preset samples of the image as templates (Ta and Tb) and another defect sample as a sample to be repaired, and calculating Manhattan distances between the sample to be repaired and corresponding pixels of the templates to obtain Manhattan distance maps (dist maps A and B).
Referring to fig. 5, for the convolutional neural network CNN, when the pixels at the same position in two manhattan distance maps simultaneously exceed a segmentation value, the pixels of a template corresponding to the minimum manhattan distance map at the position are taken to fill the to-be-repaired area of the to-be-repaired sample, so that the recovery of the pixels of the defect area is completed, and a normal sample is obtained.
Referring to fig. 5, for convolutional neural network CNN, in an alternative embodiment, a WIDE RESNET feature extractor is used to extract a feature map of the d-dimensional channel of the normal sample, WIDE RESNET feature extractor selects the feature map of the normal sample from the second layer of the backbone network. d dimension=512 dimension.
Referring to fig. 5, for the convolutional neural network CNN, in the training phase of the convolutional neural network, the inverse matrix of the mean and covariance of each coordinate of the feature map of the normal sample belonging to the second layer is extracted, that is, each position of the feature map of the normal sample belonging to the second layer is described by a gaussian distribution and stored as a feature distribution.
Referring to fig. 5, for convolutional neural network CNN, in an alternative embodiment, a WIDE RESNET feature extractor is used to extract a feature map of the d-dimensional channels of the map under test, for example, at the inference stage. d dimension=512 dimension.
Referring to fig. 5, for the convolutional neural network CNN, the inference unit forms a defect thermodynamic diagram based on the mahalanobis distance matrix obtained by the mahalanobis distance calculation.
Referring to fig. 5, for convolutional neural network CNN, the up-sampling performed by the output layer on the defect thermodynamic diagram includes using a bicubic interpolation algorithm, and a normalization process is performed on the defect thermodynamic diagram.
Referring to fig. 5, for the convolutional neural network CNN, the output layer selects one threshold value for dividing the defective pixel from the normal pixel for the defective thermodynamic diagram, and divides the defective region from the defective thermodynamic diagram by threshold segmentation.
Referring to fig. 5, for the convolutional neural network CNN, an initial training image of the wafer 10 is subjected to a template matching process to obtain image data for constructing a normal sample; the image data obtained from the map to be measured of the wafer 10 is used to extract the feature map of the map to be measured through template matching processing.
Referring to fig. 5, the application further discloses an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program when executed by the processor causes the processor to execute the defect detection method.
Referring to fig. 5, the present application additionally discloses a computer-readable storage medium having a computer program stored thereon, which when executed by a processor implements the foregoing defect detection method.
Referring to fig. 5, the neural network refers to a computer or server or processor that can run a computer program, other alternatives to the processor unit: a field programmable gate array, a complex programmable logic device or a field programmable analog gate array or a semi-custom ASIC or processor or microprocessor, a digital signal processor or integrated circuit or GPU, a software firmware program stored in memory, or the like.
Referring to fig. 5, for convolutional neural network CNN, in an alternative embodiment, the neural network's alternative algorithm includes at least linear regression, logistic regression, decision trees, support vector machines, and the like.
Referring to fig. 6, during the network training phase, the act of extracting features only once is performed.
Referring to fig. 6, in the defect inference stage, a defect thermodynamic diagram is formed and then a defect segmentation operation is performed.
Referring to fig. 6, an image of the wafer 10 acquired by the image processing module is first input and transmitted to the algorithm module.
Referring to fig. 6, the algorithm module accepts the image of the wafer 10 and extracts each die by template matching. For example, images of the wafer 10 for training are provided, and experimental data is provided.
Referring to fig. 6, the training phase is performed only once to extract the feature distribution, and the abnormal sample needs to be repaired first because the normal sample is difficult to collect. I.e., differentially restoring the normal samples.
Referring to fig. 6, a sheet defect sample is selected as a sample to be repaired; the Manhattan distance between the sample to be repaired and the corresponding pixel of the template can be calculated to obtain two Manhattan distance maps (dist map A and dist map B), and the Manhattan distance formula related to the two Manhattan distance maps is as follows: dist= |x-y|. Or dist= |x-y|. In the Manhattan formula, X or X designates a coordinate pixel value for a repair sample, and Y or Y designates a coordinate pixel value for a template.
Referring to fig. 6, a threshold value is set as a division value for dividing a defective pixel from a normal pixel; and filling the position pixels of the template corresponding to the smallest dist map into the to-be-repaired area by taking dist map A and dist map B of which the pixels at the same position exceed the segmentation value, thereby finishing the restoration of the pixels of the defect area.
Referring to fig. 6, for training and reasoning, the same feature extractor WIDE RESNET-50-2 pre-trained in ImageNet is used for the training phase and reasoning phase to extract the d-dimensional channel feature map, d=512. The feature extraction of the to-be-tested image in the feature extraction and defect reasoning steps of only one feature distribution extraction step adopts the same feature extractor pre-trained in ImageNet.
Referring to fig. 6, during the neural network training phase, the model input aspect ratio may be scaled according to the actual scale of the experimental data (die), here using 4:1, input size 224 x 896.WIDE RESNET can extract the feature map from the second layer and its channels: high: wide = 512:28:112.ResNet typically has a 4-layer backbone network, in this embodiment a second layer 512-dimensional feature map is selected. Wherein the first layer retains enough detail information but has little high-level semantic information. The third layer contains more abstract high-level semantic information, the fourth layer extracts information which is more biased to ImageNet, so that the second layer is selected to have detail information and partial abstract information, and the reasoning speed can be considered.
Referring to fig. 6, the training phase needs to extract the mean μ of each coordinate (i, j) of the second layer normal sample feature map and the inverse matrix of the covariance Σ (covariance full rank needs to be guaranteed), i.e. each position is described by a gaussian distribution and stored for easy reasoning. As shown in the figure, the training phase only needs to perform feature extraction of the normal sample once unless the normal sample has a change. I.e., training convolutional neural networks.
Referring to fig. 6, the algorithm module accepts the test map of the wafer 10 and extracts each die by template matching. For example, the image to be tested of the wafer 10 to be evaluated is monitored, and most of the scene is the actual production flow stage, so that defects are effectively found and early-warned from the image to be tested of the wafer 10 when required by a wafer factory, and the data to be tested is monitored.
Referring to fig. 6, in the neural network reasoning phase, the model input aspect ratio can be scaled according to the actual scale of the data under test (die), here using 4:1, input size 224 x 896.WIDE RESNET can extract the feature map from the second layer and its channels: high: wide = 512:28:112.ResNet typically has a 4-layer backbone network, in this embodiment a second layer 512-dimensional feature map is selected. Wherein the first layer retains enough detail information but has little high-level semantic information. The third layer contains more abstract high-level semantic information, the fourth layer extracts information which is more biased to ImageNet, so that the second layer is selected to have detail information and partial abstract information, and the reasoning speed can be considered.
Referring to fig. 6, the reasoning stage calculates mahalanobis distance from the normal sample distribution with the value x of each defect image coordinate (i, j) as follows. A defect thermodynamic diagram of n×h×w is formed, N being the number of images input, and h=28, w=112 in the case of an input size of 224×896 in one example. Thus forming a wafer defect thermodynamic diagram.
Referring to fig. 6, defect thermodynamic diagram upsampling uses a bicubic interpolation algorithm, calculating the to-be-interpolated value from surrounding 4*4 =16 known pixels, the weight being determined by the distance. The bicubic interpolation algorithm relieves the phenomenon of the outward expansion of the segmented regions to some extent. The defect thermodynamic diagram is up-sampled to the input size 224 x 896 through a bicubic interpolation algorithm, and the thermodynamic diagram cannot evaluate the proper threshold range, so that the defect thermodynamic diagram needs to be normalized, and the value range after normalization is between [0,1], namely any value between the dividing defect and the normal threshold value, namely [0,1 ].
Referring to fig. 6, up-sampling is performed up-sampling to a defect thermodynamic diagram of the input size. See fig. 3.
Referring to fig. 6, a divided defective area is output, the division threshold is the division of both the defect and normal cases, and the division effect is accurate to the pixel level. As shown, a segmentation effect of 0.4 is selected. The method can effectively separate the wafer defect types such as particles, stains, scratches and the like of the wafer. Pixel level defect segmentation is performed. See fig. 4.
Referring to fig. 6, the key points of the present application are: repairing the circular defect image, extracting WIDE RESNET a second layer of characteristic diagram to describe Gaussian distribution of each coordinate, and up-sampling defect thermodynamic diagram by using bicubic interpolation algorithm. The advantages are that: the defect area can be positioned without any marking cost, and the wafer defect detection effect is improved. The various aspects of the present application as described above may also be transplanted to object detection algorithms such as yolo and FASTER RCNN.
Referring to fig. 6, the application discloses an unsupervised wafer defect detection method based on deep learning. Feature extraction of normal samples of the training image of the wafer is performed only once during the training phase: and extracting a characteristic diagram of the normal sample, and achieving the purpose of training the convolutional neural network by calculating Gaussian distribution of each pixel position of the normal sample. The defective pixel areas are located in the inference stage in a way that does not use any labels: and calculating the mahalanobis distance between the feature map of the to-be-detected map of the wafer and the extracted normal sample distribution, obtaining a mahalanobis distance matrix, up-sampling the mahalanobis distance matrix, and obtaining the defective pixel through threshold segmentation.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: the initial training image is subjected to template matching processing to obtain image data for constructing a normal sample; the image data is obtained through template matching processing of the image to be detected and is used for extracting the feature image of the image to be detected.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: selecting a group of preset samples of the training image as a template and another defect sample as a sample to be repaired, and calculating the Manhattan distance between the sample to be repaired and the corresponding pixel of the template, so that two Manhattan distance graphs can be obtained; the defect number of a group of preset samples is smaller than that of a sample to be repaired; and when the pixels at the same position in the two Manhattan distance maps exceed a segmentation value at the same time, filling the pixels of the template at the position corresponding to the minimum Manhattan distance map into the to-be-repaired area of the to-be-repaired sample so as to complete the restoration of the pixels of the defect area and capture the normal sample.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: the feature map of the d-dimensional channel of the normal sample is extracted by using a first WIDE RESNET feature extractor, and the first WIDE RESNET feature extractor is to select the feature map of the normal sample from the second layer of the backbone network.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: and extracting an inverse matrix of the mean and covariance of each coordinate of the feature map of the normal sample of the second layer in the training stage of the convolutional neural network, namely describing each position of the feature map of the normal sample of the second layer by using Gaussian distribution, and storing the position as feature distribution.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: the feature map of the d-dimensional channel of the map under test is extracted with a first WIDE RESNET feature extractor or another second WIDE RESNET feature extractor.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: the training phase and the reasoning phase employ the same feature extractor (WIDE RESNET-50-2) that has been pre-trained in ImageNet, thereby extracting feature maps of the respective d-dimensional channels in the training phase and the reasoning phase.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: and in the reasoning stage, calculating the mahalanobis distance between the values of the coordinates of the feature map of each defect image of the image to be detected and the corresponding feature distribution of the normal sample so as to obtain the mahalanobis distance matrix.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: forming a defect thermodynamic diagram by calculating a mahalanobis distance, i.e. the calculated mahalanobis distance matrix is used to form the defect thermodynamic diagram, and upsampling the mahalanobis distance matrix corresponds to upsampling the defect thermodynamic diagram, the upsampling using a bicubic interpolation algorithm.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: normalization is performed on the defect thermodynamic diagram subjected to upsampling.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: the pixel value after the defect thermodynamic diagram is normalized is between 0 and 1.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: a threshold value for dividing a defective pixel from a normal pixel is selected for the defect thermodynamic diagram, and a defective region is divided from the defect thermodynamic diagram by threshold segmentation.
Referring to fig. 6, an unsupervised wafer defect detection method based on deep learning: the threshold value dividing the defective pixel value from the normal pixel value takes an arbitrary value between [0,1 ].
Referring to fig. 5, a computer operable with the neural network CNN includes, but is not limited to: a server, a personal computer system or mainframe computer system, a computer workstation, an image computer, a parallel processor, or any other arbitrary device known in the neural network arts may be operated. In general, the term computer system is defined broadly to encompass any device having one or more processors, which execute instructions from a memory medium. The relevant computer program of the neural network CNN is stored in a computer readable medium, such as a memory. Exemplary computer readable media include read-only memory or random-access memory or magnetic or optical disks or tape, and the like.
Referring to fig. 5, in an alternative embodiment, for neural network CNN, the integrated circuit defect shortcoming assessment method and the functions described by neural network CNN may be implemented in hardware or software or firmware, or any combination thereof. If implemented in software, the corresponding evaluation methods and functions may be stored on or run as one or more instructions or code on a computer-readable medium. Computer-readable media includes computer storage media and communication modules including any medium that can transfer a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of alternative example, and not limitation, such computer-readable media may comprise: RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, optical storage, or any other medium that can be used to carry or store corresponding program code in the form of instructions or data structures and that can be accessed by a general purpose or special purpose computer or a general purpose or special purpose processor. Any connection means may also be properly viewed as an accessory to the computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable or fiber optic cable, twisted pair, digital subscriber line, or wireless technology (infrared, radio, and microwave), then the coaxial cable, fiber optic cable, twisted pair, digital line, or wireless technology is included in the definition of the appendage. Magnetic disks and optical disks as used herein include: a general optical disc, a laser optical disc, or an optical disc or a digital versatile disc, a floppy disk and a blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing description and drawings set forth exemplary embodiments of the specific structure of the embodiments, and the above disclosure presents presently preferred embodiments, but is not intended to be limiting. Various alterations and modifications will no doubt become apparent to those skilled in the art after having read the above description. It is therefore intended that the appended claims be interpreted as covering all alterations and modifications as fall within the true spirit and scope of the invention. Any and all equivalent ranges and contents within the scope of the claims should be considered to be within the intent and scope of the present invention.