Paper • The following article is Open access

Automated visual inspection of CMS HGCAL silicon sensor surface using an ensemble of a deep convolutional autoencoder and classifier

Sonja Grönroos², Maurizio Pierini and Nadezda Chernyavskaya

Published 25 August 2023 • © 2023 The Author(s). Published by IOP Publishing Ltd
Machine Learning: Science and Technology, Volume 4, Number 3 Citation Sonja Grönroos et al 2023 Mach. Learn.: Sci. Technol. 4 035028 DOI 10.1088/2632-2153/aced7e

Download Article PDF

Article metrics

565 Total downloads
Video abstract views

Submit

Submit to this Journal

Dates

Received 16 March 2023
Accepted 4 August 2023
Published 25 August 2023

Peer review information

Method: Single-anonymous
Revisions: 1
Screened for originality? Yes

Buy this article in print

Journal RSS

Abstract

More than a thousand 8'' silicon sensors will be visually inspected to look for anomalies on their surface during the quality control preceding assembly into the High-Granularity Calorimeter for the CMS experiment at CERN. A deep learning-based algorithm that pre-selects potentially anomalous images of the sensor surface in real time was developed to automate the visual inspection. The anomaly detection is done by an ensemble of independent deep convolutional neural networks: an autoencoder and a classifier. The algorithm was deployed and has been continuously running in production, and data gathered were used to evaluate its performance. The pre-selection reduces the number of images requiring human inspection by 85%, with recall of 97%, and saves 15 person-hours per a batch of a hundred sensors. Data gathered in production can be used for continuous learning to improve the accuracy incrementally.

Export citation and abstract BibTeX RIS

Previous article in issue

Next article in issue

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Silicon sensors are used in high-energy physics experiments due to their sufficient radiation tolerance, energy resolution and cost-effectiveness. In the high radiation area, the active element of the High-Granularity Calorimeter (HGCAL) [1], which will replace the endcap calorimeters of the CMS [2] experiment at the Large Hadron Collider (LHC) [3], will consist of more than 270 00 hexagonal 8'' silicon sensor wafers to achieve unprecedented transverse and longitudinal segmentation. An HGCAL sensor is shown in figure 1(left). The producer of the sensors is Hamamatsu Photonics K.K, and the sensors will be delivered to CERN in batches. The HGCAL will be installed during the third long shutdown of the LHC in 2026/27. In order to ensure that the sensors meet the criteria for operation at the LHC, a fraction (5%) of each batch will undergo quality control in a dedicated clean room at CERN. Thus, more than a thousand sensors will be processed over the course of several months.

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** (left) An HGCAL silicon sensor wafer, with a pad zoomed in. The diameter of the sensor is 8''. (right) An example of a scan map.
Download figure:
Standard image High-resolution image

The quality control procedure adopted during the construction of the CMS silicon trackers involved a visual inspection in addition to quantification of several electrical properties [4]. Similarly, a major part of the quality control of the HGCAL sensors is the electrical characterization of the sensors, during which the sensors are biased up to 1000 V [5]. Defects and dust on the sensor surface can potentially lead to an electrical failure of the sensor. Examples of a typical defect, a scratch, and a dust particle are shown in figure 2. Given that these defects are rare and unwanted, they are referred to as anomalies. The anomalies can occur during manufacturing, packaging, delivery, or associated handling of the sensors. In an effort to prevent failures, the sensor surface is visually inspected and cleaned prior to the electrical characterization.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Scan images in RGB format of anomalous sensors with a scratch (left), and a dust particle (right). The difference in color is induced by lighting conditions.
Download figure:
Standard image High-resolution image

Traditionally, silicon visual inspection is carried out manually with the help of a microscope. This is because aside from exceptionally severe scratches, most of the anomalies are invisible to the naked eye. However, dozens of square meters of sensor surface will be inspected during the assembling of HGCAL, and therefore, a standardized and automated method must be in place. Previous to this work, hundreds of microscope images were taken using a scan program, and the images were inspected on a computer monitor by a human operator. This work presents a deep learning-based pre-selection algorithm (PSA) that fully automates the visual inspection. In addition, the PSA is believed to reduce human bias in the visual inspection [6]. The PSA is built upon the proof-of-concept work described in [7]. Although the PSA is presented in the context of HGCAL sensor quality control, the same approach could be applied to similar use cases of automating the visual inspection of images.

The PSA proposed in this work detects anomalous scan images via an ensemble of a deep convolutional autoencoder (AE) and a deep convolutional classifier neural network. The AE acts on each scan image as a pre-processing step to enhance anomalies, and the classifier acts on patches of the image, allowing for localization of anomalous areas (annotation). Additionally, due to the specific imaging setup used, a second classifier is employed to identify and disregard the patches corresponding to the background on which the sensor is placed, rather than the sensor surface itself. The PSA has been deployed in a clean room at CERN and data gathered in production are used to evaluate the performance of the anomaly detection. The performance is mainly measured using two metrics: First, the false negative rate (FNR), defined as

should be minimized for a reliable PSA. Second, a relatively large false positive rate (FPR), defined as

is allowed, but an upper limit of 10% is set to sufficiently automate the visual inspection.

This paper is structured as follows. The data acquisition and characteristics are described in section 2. An introduction to automated visual inspection is given in section 3, followed by a description of the proposed architecture and the model training process in section 4. The results of the deployment at CERN are presented in section 5. In section 6, the causes of incorrect predictions and the benefits and generalizability of the method are discussed. In addition, a proposal for continuous improvement of the anomaly detection capabilities is presented. Finally, conclusions are provided in section 7.

2. Setup and data set

A custom semi-automated visual inspection system, equipped with a programmable xy-stage, has been implemented in the clean room for HGCAL sensor testing. The xy-stage is a motorized table allowing horizontal motion, enabling precise and controlled movement of the sensor in a predefined scan pattern. By combining a microscope and a camera, the sensor is carefully examined as it is moved beneath the imaging setup. An example of a scan map is shown in figure 1(right), where 385 images are taken. A scan image, referred to as a whole image, contains 2720 × 3680 pixels and is stored in Bayer format [8]. The Bayer format is a particular arrangement of RGB color filters common to camera systems, which retains the color information but reduces the required bits per pixel from 24 to 8 bits. Examples of whole images in RGB format are shown in figure 2.

The images acquired during the semi-automated visual inspection require human inspection, which takes approximately two seconds per image. A small fraction of the images of a typical scan are anomalous, meaning that the operator has to inspect hundreds of normal images to find the anomalous ones. This makes the semi-automated visual inspection tiring, slow, and typically not 100% effective due to visual fatigue. Moreover, it can be biased by the inspector's overexposure to normal images which are prevalent in the data set. In addition, since multiple inspectors with varying experience and alertness share the quality control task, the visual inspection is further biased by their subjectivity.

The environmental conditions, such as zoom level, sensor alignment underneath the microscope and lighting conditions, can change in between scans, and the PSA must be invariant to these changes. An example of changing lighting conditions between the measurements is demonstrated in figure 2, where the left and right images differ in the overall hue. As the PSA is integrated into the data acquisition of the semi-automated system, it must be real-time, meaning that the images should be evaluated during the scan.

Taking into consideration the data imbalance, variable environment, and requirements for accuracy and speed, the PSA presented in this paper was developed using images acquired during semi-automated visual inspection of 50 sensors. As each scan results in hundreds of images, the scans of 50 sensors produce a data set of more than 25 000 images. The data set contains approximately 700 images with different types of anomalies, such as scratches, dust, and stains of variable sizes, and was acquired in batches over the course of several months. Fifteen sensor scans acquired after the deployment of the PSA are used to evaluate its performance.

3. Overview of existing methods for automated visual inspection

The task of the PSA is identification and localization of rarely occurring outliers in data. Sometimes, an analytical approach in the form of a series of image processing filters and functions can be used to detect anomalies instead of more complex methods such as deep learning. For example, anomalies have been detected from images of the silicon strip sensors of the Inner Tracker of the ATLAS detector [9] using methods such as a Gaussian filter and Sobel derivatives [10, 11]. However, due to changing environmental conditions (including room lighting) and characteristics of the normal HGCAL sensor surface, these methods cannot produce robust results. Thus, deep learning, and specifically deep convolutional neural networks (CNNs) [12], are explored in this work. CNNs are known to perform well in image classification tasks. Several classifier networks have been developed, such as the VGG16 [13]. Characteristic to its architecture are sequential convolutional layers with small 3 × 3 filters and 2 × 2 max pooling layers. However, classifier networks are not object detectors, as they do not indicate the location of the object. Instead, a widely used network for object detection is the Region-based Convolutional Neural Network (R-CNN) [14], which performs object detection via three distinct networks. The first network is a region proposal network, which extracts up to 2000 regions of interest from the input image. The regions of interest are passed onto the second model, which is a CNN that extracts the features of each region. Finally, a classifier CNN is applied on the features to produce the classification output in the form of bounding boxes. The R-CNN has been used in automating the visual inspection of silicon micro-strip sensors of the CBM experiment at the FAIR facility [15].

Unfortunately, the R-CNN is too slow for real-time object detection, as thousands of iterations are required per image to produce the detection output. Thus, faster versions of the model, such as the Faster R-CNN [16], have been developed. However, a preferred approach for real-time object detection is to perform both the region and feature extraction in the same network and with a single iteration for an image. An example of such a network is the You Only Look Once (YOLO) network [17], which splits an image into cells via a grid and predicts n bounding boxes and the class probabilities for each cell. The use of VGG16 and a version of YOLO known as YOLOv4-tiny have been studied in automating the visual inspection of wire bonds of the HGCAL sensor modules [18].

The above discussed CNNs have to be trained in a supervised fashion, with training samples from all classes. Self-supervised anomaly detection can be implemented using AEs, which are composed of two neural networks. The first network, known as the encoder, reduces the dimensionality of the input data into a representation referred to as the latent space. The second network is a decoder, which reconstructs the latent space back into the original dimensionality of the input. An AE is trained by minimizing the loss function, expressed as the reconstruction error. The reconstruction error is usually quantified as the mean absolute error between the input y and the reconstructed output $\hat{\textbf{y}}$ , and defined as

The squared error, defined as

can also be used. For anomaly detection, the AE is trained on only normal samples, and thus, it will reconstruct anomalies poorly. Convolutional AEs have been used for anomaly detection from image-like data in multiple applications [19–21], also at the LHC [22, 23]. For images, the reconstruction error is calculated pixel-wise, and localized increases in the reconstruction error indicate anomalies.

4. Proposed approach

In this section, the architecture of the PSA is described. The full inference pipeline of a whole image consists of the following steps:

(i)
Apply a patching grid.
(ii)
Apply a background detecting classifier, referred to as the background detector, to patches of the whole image.
(iii)
Apply an AE to the whole image and calculate the reconstruction error as the pixel-wise absolute difference D.
(iv)
Apply an anomaly detecting classifier, referred to as the anomaly detector, to the patches of D.

Thus, each whole image is iterated over three times, and notably, the images are kept in the Bayer format throughout the entire inference pipeline. The anomaly detection process is schematically shown in figure 3. The models were implemented using Keras [24] and TensorFlow 2 [25], and trained on a NVIDIA GeForce GTX 1080 GPU [26]. For each model, the corresponding data set was split into train, validation and test subsets following an 80:10:10 ratio, and the hyperparameters of all three models were optimized manually using their respective validation losses as a metric.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** The inference pipeline of the anomaly detection. The input image x is processed with an autoencoder, and the pixel-wise reconstruction error D is calculated between the input and the autoencoder output $\textbf{x}^{^{\prime}}$ . D is given as input to the anomaly detecting classifier in patches.
Download figure:
Standard image High-resolution image

**Figure 3.** The inference pipeline of the anomaly detection. The input image x is processed with an autoencoder, and the pixel-wise reconstruction error D is calculated between the input and the autoencoder output $\textbf{x}^{^{\prime}}$ . D is given as input to the anomaly detecting classifier in patches.
Download figure:
Standard image High-resolution image

4.1. Patching

The whole images are split into patches using a fixed grid. The patches are 160 × 160 pixels in size, and the default grid covers the entire image, resulting in 17 × 24 patches. The patching can be considered as a simplified version of the region proposal for R-CNN. For training the background and anomaly detectors, the patches are given the corresponding binary labels: 0 for sensor surface/normal and 1 for background/anomalous. Examples of anomalous and normal patches processed with the AE are shown in figure 4.

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** Sample of anomalous (top) and normal (bottom) patches of the sensor surface used to train the anomaly detector.
Download figure:
Standard image High-resolution image

The main reasons for patching are that the fraction of area covered by an anomaly is much larger for an anomalous patch than for an anomalous whole image, and that the input size for a classifier becomes smaller. Also, the patching allows the general location of the anomaly in the whole image to be computed. In addition, data augmentation can be done more efficiently by applying it to anomalous patches only, and a class can be under-sampled flexibly from patched data.

4.2. Background elimination

The approach outlined in this paper involves combining an AE with an anomaly detector. It is important to note that the background detector is not a required part of the inference pipeline and is included based on the particular imaging setup. In a typical scan of an HGCAL sensor, approximately 15% of the images include the background, which refers to the surface on which the sensor is placed. Occasionally, the background, which is usually a black sheet of plastic, contains features which can be incorrectly selected as anomalies. While FNR is the most important metric to optimize, to further reduce the FPR of the PSA, a background detector is applied to the patches before the whole image is autoencoded. Anomalies in the background patches are ignored.

A CNN with four convolutional layers using the ReLU activation, followed by dropout layers with a rate of 20%, and totaling 84 401 trainable parameters, was trained on patches that had been given binary labels corresponding to sensor surface and background. The training data and parameters are described in table 1. Classification between the sensor surface and background is a trivial task for a deep CNN, and a test accuracy of over 99% was achieved.

Table 1. Summary of training data and parameters for the background detector and the autoencoder. For the background detector, Class 0 refers to sensor surface and Class 1 to background.

	Background detector	Autoencoder
Whole images	962	16 000
Class 0 training patches	347 833	—
Class 1 training patches	20 183	—
Batch size	256	1
Epochs	55	277
Optimizer	Adam	Adam
Learning rate	10⁻⁴	10⁻⁴
Loss function	Binary cross-entropy	L2

4.3. Anomaly enhancement using an autoencoder

The structure of the AE is schematically shown in figure 5. The encoder consists of five convolutional layers, and in mirror-like fashion, the decoder consists of five transposed convolutional layers. The compression factor of the AE is 1600. The Exponential Linear Unit [27] was used as the activation function. In total, the AE has 126 353 trainable parameters, and it was trained with 16 000 normal whole images for 277 epochs, until validation loss reached a plateau. The AE was trained using the L2 loss. Due to memory constraints set by the GPU used for training, the batch size had to be set to one. The training data and parameters are summarized in table 1.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** An illustration of the dimensions of the autoencoder, which consists of two convolutional neural networks: the encoder and the decoder. The autoencoder takes a whole image as an input, and the compression factor into the latent space is 1600.
Download figure:
Standard image High-resolution image

The AE can be interpreted as a data pre-processing step that makes the subsequent anomaly detection and its localization more robust against environmental changes. The normal and constant features in the images are reduced, while the anomalies are enhanced. An example of how the AE reconstructs an anomalous whole image is shown in figure 6.

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** An example of the autoencoder output (center) for an anomalous input image (left). (right) Reconstruction error, measured as the absolute pixel-wise difference between the autoencoder output and the input. A dust particle is enhanced compared to the normal area, and can be easily isolated. The anomaly is zoomed in on all images.
Download figure:
Standard image High-resolution image

4.4. Anomaly detection

First, the performance of the AE as a standalone anomaly detector was studied using 1465 anomalous and 225 370 normal patches as training data. A threshold for the reconstruction error was determined based on a validation data set, consisting of 157 anomalous and 27 179 normal patches, such that the validation FNR and FPR were minimized. An increase in AE reconstruction error for anomalous patches is visible in figure 7, where the selected threshold is also indicated¹ . However, as the distributions overlap greatly, the AE reconstruction error cannot be used as a robust enough classifier. The test FNR is 27%, while FPR is 37%. Therefore, while the AE works very well as a pre-processing step that enhances the anomalies, it cannot be efficiently used to detect and localize them. To tackle anomaly detection and localization, an additional classifier was trained to detect the anomalies.

Figure 7. Refer to the following caption and surrounding text. — **Figure 7.** Normalized autoencoder reconstruction error for anomalous and normal patches. A threshold to classify new samples is also illustrated.
Download figure:
Standard image High-resolution image

A modified version of the VGG16 network was used as the anomaly detector classifier. The following modifications were made to the original network structure: the input size was decreased to 160 × 160, the number of filters in the hidden layers was decreased, dropout layers were added in between the fully connected layers, and the final softmax layer was replaced with a sigmoid layer. A normalizing pre-processing layer was used to scale features between zero and one. The resulting CNN with 23 layers has 2847 777 trainable parameters. The architecture is illustrated in figure 8, and a summary of the training data and parameters for the anomaly detector are given in table 2. In total, ∼10⁶ training patches were used.

Figure 8. Refer to the following caption and surrounding text. — **Figure 8.** An illustration of the architecture of the anomaly detector, which is a modified version of the VGG16 network.
Download figure:
Standard image High-resolution image

Table 2. Summary of training data and parameters for the anomaly detector. Class 0 refers to normal and Class 1 to anomalous.

Whole images	2813
Class 0 training patches	902 496
Class 1 training patches	1465
after augmentation	8790
Batch size	256
Epochs	20
Optimizer	Adam
Learning rate	10⁻⁴
Loss function	Focal loss

During inference, the expected normal-to-anomalous patch ratio is approximately 1900. Such an imbalanced training data set can lead to poor performance of the classifier, and therefore, data augmentation techniques and under-sampling the normal class were used on the training data set to bring this ratio to around 100. The anomalous patches were augmented by applying random and uniform brightness change in the range 0.75–1.25, multiples of 90^∘ rotations, and horizontal and vertical flipping. In addition, the focal loss [28], a dynamically weighted binary cross-entropy loss commonly used with imbalanced training data sets, was used as the loss function with default parameters γ = 2 and α = 0.25.

4.5. Validation

In production, the pre-selected whole images are shown to an inspector. In addition, a fraction (10%) of normal images are added to the set shown to the inspector to ensure the minimization of false negatives. The inspector either accepts, rejects or adds anomalous patches to validate the predictions. If only pre-selected images would have been shown, there could be a natural tendency to approve all images as anomalous and trust the PSA. The validated data set is used as ground truths for performance monitoring and for continuous learning to incrementally improve the accuracy of the PSA.

5. Results

Fifteen sensor scans acquired after the deployment of the PSA, corresponding to 2052 240 patches from 5030 whole images, were manually given ground truth labels. In addition to the FNR and FPR, defined in section 1, the performance is evaluated with the following metrics:

Recall

Recall measures the ability of a classification model to correctly identify all positive samples, minimizing the occurrence of false negatives. It is defined as:

Specificity

Specificity measures the ability of a classification model to accurately identify all negative samples and to minimize the occurrence of false positives. It is defined as:

Precision

Precision measures a classification model's ability to correctly identify positive samples and to minimize false positives. It is defined as:

F1-score

The F1-score is a balanced measure of precision and recall, and evaluates the overall performance of the classification model. It is defined as:

Balanced accuracy

Balanced accuracy evaluates a model's performance by considering both recall and specificity. It is defined as:

The performance results are reported in table 3, and the absolute and normalized confusion matrices are shown in figure 9 for the patches (left) and whole images (right). A whole image is pre-selected if one or more patches in it are classified as anomalous. Twelve anomalous whole images are incorrectly not pre-selected, resulting in an FNR of 2.5%. Two missed images were of novel anomalies and ten were of light scratches and small dust particles. Examples of missed anomalous whole images are shown in figure 10.

Figure 9. Refer to the following caption and surrounding text. — **Figure 9.** Confusion matrices for the patches (left) and the whole images (right). Top row corresponds to the confusion matrix normalized over ground truths.
Download figure:
Standard image High-resolution image

Figure 10. Refer to the following caption and surrounding text. — **Figure 10.** Examples of missed anomalous whole images. (a) A light stain (b), (c) light dust particle (d) novel black anomaly (e), (f) minor dust particle.
Download figure:
Standard image High-resolution image

Table 3. Results for 5030 whole images.

Metric	Value (%)
Recall	97.46
Specificity	93.75
Precision	61.75
False negative rate	2.54
False positive rate	6.25
F1-score	75.60
Balanced accuracy	95.60

The FNR of whole images is lower compared to the FNR of patches because most anomalous whole images have multiple anomalous patches. Examples of this can be seen in figures 11(a) and (b), where the scratch and dust particle have caused several patches to be classified as anomalous. Therefore, an anomalous whole image will still be selected even if not all anomalous patches in it are classified correctly. The FPR of whole images is less than 10%. Examples of whole images pre-selected to be anomalous are shown in figure 11, where images (d)–(f) are false positives. In total, 85% of all whole images can be considered normal and do not require any human inspection.

Figure 11. Refer to the following caption and surrounding text. — **Figure 11.** Examples of whole images pre-selected with the annotated patches. (a) A large scratch (b) dust particle and a stain (c) small dust particle (d) false positive (e) false positive on contact marks (f) false positive on a guard ring.
Download figure:
Standard image High-resolution image

An average sensor scan consisted of 335 images, for which the picture-taking time is 9.2 min. On average, an inspector inspects a whole image in two seconds, evaluating one scan in 11 min. On an RTX A2000 GPU, the inference time is less than five minutes for the scan. Assuming that the PSA removes the need to visually inspect 85% of the images, the human inspection required afterwards would take less than two minutes, corresponding to 15 person-hours saved per a batch of a hundred sensors. This corresponds to a significant reduction of human labor. The GPU is necessary, as the short run time allows parallel picture-taking and pre-selection. At variance, the evaluation of a scan would take approximately 45 min on a regular CPU, significantly delaying the subsequent electrical characterization.

6. Discussion

6.1. False negatives and positives

It was observed that false negatives can sometimes be attributed to the fixed grid. If only the default grid is used during inference, anomalies overlapping with or close to the grid lines can be missed. A proposed method for combating this is applying a secondary grid, which is illustrated in figure 3. The secondary grid is 16 × 23 patches in size, and it is overlaid on top of the default grid so that the patching is shifted from the top left corner by 80 pixels in both directions. Only the whole images not selected using the default grid would then be evaluated with the additional secondary grid. Evaluating an average scan using both grids would take less than six minutes on an RTX A2000 GPU, which is still faster than the picture-taking itself. Therefore, addition of the secondary grid is expected to decrease the FNR, without substantially increasing the inference time.

Instead, as the anomaly detection is invariant to the scan image location on the sensor, false positives can often be explained with the characteristics of the sensor surface. The anomaly detector tends to select patches at a guard ring² , cell numbers and cell borders as false positive, due to the increase in the AE reconstruction error in these areas. In addition, contact marks from previous electrical testing are a common source of false positives, see (e) in figure 11. However, these false positives are hard to eliminate without adding information of the scan image location to the analysis.

6.2. Continuous learning

Given that new data will be measured continuously, the PSA should be retrained to adapt to the new data. Using new normal images to train the AE would not improve its task of poor anomaly reconstruction, so it does not require retraining. Also the background detector is considered to be robust enough to not require immediate retraining. However, the accuracy of the anomaly detector could be improved via training with novel anomalous patches. For example, training instances such as the missed anomaly (d) in figure 10 did not exist in the original training data set.

Anomalous images acquired after the initial deployment were given ground truth labels to extend the training data set. The anomaly detector was retrained starting from randomly initialized weights with 2.1 times more anomalous patches than the original model. An independent test data set was used to compare the original and retrained models. The test data consists of 136 anomalous whole images and 754 normal whole images, for which the test metrics are presented in table 4. As can be seen, the retrained model performs significantly better on the test set.

Table 4. The performance of the pre-selection algorithm using the original and retrained anomaly detectors. Metrics are calculated for whole images using an independent test set. For each metric, the better value is indicated with bold.

Metric	Value (%) Original	Retrained
Recall	94.9	96.3
Specificity	85.0	87.8
Precision	53.3	58.7
False negative rate	5.2	3.7
False positive rate	15.0	12.2
F1-score	68.0	72.9
Balanced accuracy	90.0	92.1

6.3. Labor optimization and method generalizability

The main benefit of the PSA is the improved utilization of resources. Typically, the inspectors engaged in visual inspections are highly specialized experts, researchers in fundamental particles physics, or highly skilled engineers. Due to the budget constraints of the HGCAL project, hiring and training additional personnel to perform the visual inspection is not feasible. Therefore, these experts are also assigned the manual task of visual inspection, which is not the most efficient use of their expertise. With the implementation of the proposed PSA, the experts can dedicate more of their valuable time and abilities to their core responsibilities.

Furthermore, the PSA is highly generalizable. It can be applied to partial scans and is invariant of the image location on the sensor. Objects with varying shapes can be inspected by modifying the scan pattern. Additionally, while this study focused on a specific use case and incorporated an additional background detector, the general approach of combining an autoencoder and a classifier could be used to automate other quality control tasks that involve visual inspection and the identification of (small) anomalies.

7. Conclusion

A deep learning-based pre-selection algorithm (PSA) that fully automates the visual inspection of the silicon sensors produced for the construction of the CMS HGCAL detector was developed. An ensemble of a deep convolutional autoencoder and a neural network for classification is used, with a patching applied before the classification to allow the general localization of the anomalies in the images of the sensors. The automated visual inspection was deployed and is now a vital part of the quality control of the analyzed sensors.

The performance of the PSA was evaluated using fifteen full sensor scans acquired in production in a clean room dedicated to sensor testing at CERN The recall, which measures the fraction of anomalous images that are found, was 97.46%, with an acceptable FPR of 6.25%. The images are evaluated in real-time, and approximately 85% of all images can be discarded as normal, thus removing the need for human labor to inspect them. The developed automated visual inspection is standardized, and therefore also believed to be less biased by the subjectivity of a human inspector. On average, it saves 9 min of resources per sensor, and for each batch of a hundred sensors, this corresponds to 15 person-hours less required to manually inspect the images. As a typical inspector is a highly skilled expert who performs the visual inspection alongside their other responsibilities at CERN the PSA improves the utilization of expert resources.

The accuracy was considered sufficient for deployment, even though the PSA was shown to fail to select a small fraction of images with light and small scratches and dust particles, in addition to novel types of anomalies. It was demonstrated that as more anomalous images are acquired in production, the data can be used to retrain the anomaly detection model to further improve the accuracy of the PSA.

A major advantage of the presented approach is its intrinsic generality. The algorithm acts on microscope images of the sensor surface, each of which covers only a small fraction of the total surface area of a full 8'' sensor, and the anomaly detection is invariant to the image location on the sensor. Thus, the PSA is applicable to variable or incomplete scans, or partial sensors.

Thanks to its generality, accuracy and speed, the presented architecture of the pre-selection and annotation model could be used in other applications of automating the detection of small anomalies from images taken in a changing environment.

Acknowledgments

M P and S G are supported by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant Agreement No. 772369). The authors would like to thank Dr Thorben Quast for guidance and suggestions. The authors would also like to acknowledge the members of the CMS HGCAL group for their contributions to the collection of the data set.

Data availability statement

The data cannot be made publicly available upon publication because they contain commercially sensitive information. The data that support the findings of this study are available upon reasonable request from the authors.

Footnotes

1
Some patches, e.g. of the black areas on the sensor surface, are easy to reconstruct by the AE, appearing as the small bump at the lower range of the reconstruction error in figure 7.
2
A guard ring is a structure at the periphery of a sensor designed to protect it from currents from the cutting edge, and shown in figure 11(f) as the black line with false positives.

Please wait… references are loading.