research-article

Open access

LanCeX: A Versatile and Lightweight Defense Method against Condensed Adversarial Attacks in Image and Audio Recognition

Authors:

Zirui Xu,

Fuxun Yu,

Chenchen Liu,

Xiang ChenAuthors Info & Claims

ACM Transactions on Embedded Computing Systems, Volume 22, Issue 1

Article No.: 14, Pages 1 - 24

https://doi.org/10.1145/3555375

Published: 29 October 2022 Publication History

All formats PDF

Abstract

Convolutional Neural Networks (CNNs) are widely deployed in various embedded recognition applications. However, they demonstrate a considerable vulnerability to adversarial attacks, which leverage the well-designed perturbations to mislead the recognition results. Recently, for easier perturbation injection and higher attack effectiveness, the adversarial perturbations have been concentrated into a small area with various types and different data modalities. When defending such condensed adversarial attacks on the embedded recognition scenarios, most of the existing defense works highlight two critical issues. First, they are particularly designed for each individual condensed attack scenario, lacking enough versatility to accommodate attacks with different data modalities. Second, they rely on computation-intensive preprocessing techniques, which is impractical for time-sensitive embedded recognition scenarios. In this article, we propose LanCeX–a versatile and lightweight CNN defense solution against condensed adversarial attacks. By examining the CNN’s intrinsic vulnerability, we first identify the common attacking mechanism behind condensed adversarial attacks across different data modalities. Based on this mechanism, LanCeX can defend against various condensed attacks with the optimal computation workload in different recognition scenarios. Experiments show that LanCeX can achieve an average 91%, 85%, and 90% detection success rate and optimal adversarial mitigation performance in three recognition scenarios, respectively: image classification, object detection, and audio recognition. Moreover, LanCeX is at most 3× faster compared with the state-of-the-art defense methods, making it feasible to use with resource-constrained embedded systems.

1 Introduction

In the past few years, due to advanced algorithms and complex mobile usage demands, Convolutional Neural Networks (CNNs) have been widely applied in various embedded recognition applications, for example, image classification on smartphones, object detection on autonomous vehicles, and speech recognition on voice assistants [25, 42, 43]. However, recent research has revealed that CNN-powered recognition applications are facing a critical challenge–adversarial attacks. By injecting well-designed perturbations into input data, adversarial attacks can mislead recognition results. Recently, in order to achieve an easier injection and better attack effectiveness, adversarial perturbations have been aggressively concentrated into a small patch area with various types and even different data modalities–for example, images and audio–which we define as condensed adversarial attacks. Figure 1 illustrates a condensed adversarial attack example in a typical object detection scenario. The detector on the camera can successfully detect the objects in the original frame. However, when placing an adversarial patch [18] in front of the camera, the detector will totally neglect the original objects, causing a detection failure.

Fig. 1.

Being aware of the significant impact on recognition scenarios, many works have been proposed to detect and defend against condensed adversarial attacks [12, 21, 22, 27, 39, 44, 46, 48, 50]. However, when applying them to practical embedded recognition scenarios, three main issues arise. First, most of these methods are particularly designed for each individual attack scenario (e.g., image classification or object detection) without investigating the attacking mechanism under the condensed adversarial attacks across various application scenarios. Therefore, these methods lack enough versatility to accommodate condensed attacks with different data modalities. Second, most of these methods highly rely on computation-intensive data preprocessing techniques–for example, filtering [46], pixel-level entropy calculation [50], and even human efforts [39]—which conflicts with the fact that embedded recognition scenarios are time sensitive. Third, some of these methods can be implemented with specific settings only. For example, the method in [26] is designed only for vision transformers. The method in [44] generally requires a small kernel size to enable a minimal upper bound of corrupted feature numbers. Therefore, considering these three issues, the ideal defense methodology in the embedded recognition scenarios is expected to have two important characteristics: versatility and computation efficiency.

In this article, we propose LanCeX, a versatile and lightweight defense solution against condensed adversarial attacks in various embedded recognition scenarios. By interpreting the CNN’s vulnerability and analyzing the condensed adversarial attack process, we identify the common attack mechanism behind various condensed adversarial attacks across different data modalities. Inspired by this mechanism, we investigate the inference inconsistencies between condensed adversarial attacks and the natural recognition process. Such inference inconsistencies further benefit the defense methodology derivation. The proposed methodology can protect the recognition application from different types of condensed adversarial attacks. Even with different data modalities, it still can guide us to apply a similar defense strategy against condensed attacks.

We present the following main contributions in this work:

•

First, we interpret the CNN’s intrinsic vulnerability and analyze the attack processes the multiple recognition scenarios. We find that condensed adversarial attacks with distinct data modalities inherently share a common attacking mechanism.

•

By reviewing this common mechanism, we build the metrics to evaluate the inference inconsistencies between the condensed adversarial attack and the natural recognition process.

•

We propose a versatile and lightweight defense solution–LanCeX, which includes a detection stage to identify the potential adversarial patterns. Based on the detection result, a corresponding data recovery methodology is used to mitigate the adversarial perturbations in the input data.

•

In order to show the versatility of LanCeX, we apply the proposed defense methodology into three practical recognition scenarios: image classification, object detection, and audio recognition. In each scenario, we quantitatively analyze the method’s computational complexity to illustrate the lightweight computation cost.

Experiments show that our method can achieve an average 91%, 85%, and 90% detection successful rate for attack detection and optimal accuracy recovery in three computation scenarios, respectively. Moreover, our method is at most 3× faster than the state-of-the-art defense methods, which is feasible for various resource-constrained and time-sensitive embedded systems, such as mobile devices.

2 Background and Related Works

2.1 Condensed Adversarial Attacks

Adversarial attacks started to arouse researchers’ general concern with adversarial examples, which were first introduced by the authors of [10]. The adversarial examples were designed to project prediction errors into input space to generate noisesthat can perturb digital input data (e.g., images and audio clips) and manipulate prediction results. Since then, various adversarial attacks were proposed, such as L-BFGS [37], FGSM [10], CW [4], and so on. Most of these adversarial attacks share a similar mechanism, trying to cause the most error increment within model activation while regulating the imperceptible noises within the input space. Though the generated noises are invisible to human perception, these methods are less impractical for attacking real-world recognition applications due to two main reasons: (1) the environmental noises in the physical world, such as light, may disrupt the generated adversarial perturbations [17, 33, 36]; and (2) this kind of adversarial perturbation needs to manipulate the entire image and usually can be effective for only a single target image. However, in the physical world, the attacker cannot directly manipulate input data of the prediction model (such as the images collected by the on-board camera of an autonomous car).

Recently, with more aggressive methods, generated adversarial noises no longer restrict themselves to imperceptible changes. Instead, an image-independent patch is generated that is extremely salient to the CNN. Therefore, they are easily implemented in the various recognition applications from either the digit domain or the physical world, which is referred to as the condensed adversarial attack.

Attack in Image Classification: In the image classification scenario, The authors of [9] first leveraged a masking method to concentrate the adversarial perturbations into a small area and implement the attack on real traffic signs with taped graffiti. The authors of [15] focused on creating localized attacks covering as little as 2% of the image area instead of generating an image-sized noise pattern. The authors of [3] extended the scope of condensed attacks with adversarial patches. With more aggressive and concentrated patterns, these patches can be attached to physical objects arbitrarily and have strong model transferability.

Attack in Object Detection: When extending from image classification scenario to object detection, condensed adversarial attacks are also massively existed. There are two types of condensed adversarial attacks in object detection scenarios: targeted and untargeted. A targeted attack can mislead the detector to locate only the adversarial patch and make a targeted class prediction while an untargeted attack will disrupt the detector to output any results. Several related works have been proposed to prevent the detectors from correct detection. Song et al. proposed a disappearance and creation attack to fool Yolov2 [32] in traffic scenes [36]. Chen et al. adopt the expectation-over-transformation method to create more robust adversarial stop signs, which misled a faster R-CNN [34] to output errors [5]. Liu et al. proposed DPatch, which explicitly created patches that do not overlap with the objects of interest but can eliminate the original object localization and achieve both targeted and untargeted condensed attacks [19].

Attack in Audio Recognition: Beyond the aforementioned image-related cases, some condensed adversarial attacks also have been proposed in audio recognition areas. Yakura et al. proposed an audio condensed adversarial attack that can still be effective after playback and recording in the physical world [45]. Yuan et al. [47] first generated audio adversarial commands and further embedded them in a song that can be played over the air.

Compared with traditional imperceptible noise-based adversarial attacks, condensed adversarial attacks reduce attack difficulty and further impair the practicality and reliability of CNN-powered recognition applications.

2.2 Condensed Adversarial Attack Defense

Defense in Image Classification: There are several works that have been proposed to defend condensed adversarial attacks in the image classification process [12, 21, 22, 26, 27, 30, 39, 44]. Naseer et al. proposed a local gradient smoothing scheme against condensed adversarial attacks [27]. By regularizing gradients in the estimated noisy region before feeding images into CNN inference, their method can eliminate the potential impacts from adversarial attacks. McCoyd et al. proposed a cross-verification-based defense method [22]. It iteratively occludes different portions of the input image and analyzes the prediction results. Since the adversarial patch will not be occluded in most instances, they considered the minority of predictions as the true result. Rao et al. leveraged adversarial training to increase the robustness of networks that can defend the adversarial patch attack [30]. Tao et al. proposed an interpretability-based adversarial detection method that can identify the neurons in the CNN that correspond to human-perceptible attributes [39]. However, this method is only for detection and is specifically designed for human face recognition. Ma et al. developed a novel adversarial detection method that leverages neural network invariant checking [21]. They define two types of invariants: value invariants and provenance invariants in the model inference process. However, in order to obtain the two invariants, the proposed method needs to calculate the activation distribution of each layer and derive a sub-model for each layer, which introduces a lot of extra computation cost. Although these methods can achieve effective defense performance, most either introduce a tedious preprocessing or training process [22, 30] or lack enough flexibility to different adversarial patch settings (e.g., sizes) [26, 44]. Moreover, they are designed for solving specific adversarial attacks that are not integrated for different adversarial attack scenarios. Recently, Mu et al. proposed a defense method that measures the statistic heterogeneity between adversarial patches and benign image patches and replaces the adversarial patch with average values from other patches [26]. Although this method requires no additional parameters or extra training effort, it can be applied only on a vision transformer, which has limitations on application scenarios. Moreover, the authors of [44] propose a provable defense framework. They claim that a network with a small kernel size can impose a bound on the number of features that can be corrupted due to adversarial patches. In order to correctly mask the corrupted features, it requires most of the features to be benign. In our article, we present a similar finding: a small area of adversarial patch can generate extremely high activation values and will gradually overwhelm the benign values during network inference. However, our method does not depend on kernel size, and we still use the default kernel size for each model (i.e., 3 × 3).

Defense in Object Detection: Compared with image classification, attacking an object detector is more complicated. The reason resides in the detector architecture: detectors first locate the objects with different sizes at different locations on the image and then conduct classification. In that scenario, the number of the attacked objects in this case is much larger than the ones in the image classification scenario.

Currently, only a few methods have been proposed for defending the adversarial attack in object detection [6, 50]. Zhou et al. proposed a defense method that consists of an entropy-based proposal component and a gradient-based filtering component. The first component is used to identify the potential patch location; then, the second component can remove the patch by using filtering techniques. However, this method highly depends on the sliding window and entropy calculation across the entire input image, which introduces a huge computation workload. Therefore, it is inefficient to deploy this method on the embedded recognition scenarios. Chiang et al. proposed a pixel masking method to eliminate the condensed adversarial path. However, the path localization step in this method needs to introduce a U-Net inference, which is computation intensive [6].

Defense in Audio Recognition: Compared with images, audio data require extra processing efforts for recognition. Figure 2 shows a typical audio recognition process and the corresponding condensed adversarial attack. The audio waveform is first extracted as Mel-frequency Cepstral Coefficient (MFCC) features. Then, we leverage a CNN to achieve acoustic feature recognition, which can obtain the candidate phonemes. Finally, a lexicon and language model is applied to obtain the recognition result “open.” When the adversarial noise is injected into the original input waveform, the final recognition result is misled to “close.”

Fig. 2.

Several methods have been proposed to detect and eliminate such adversarial attacks [29, 46, 48]. Zeng et al. leveraged multiple Automatic Speech Recognition (ASR) systems to detect audio condensed adversarial attacks based on a cross-verification methodology [48]. However, their method lacks versatility, which cannot detect the adversarial attacks with model transferability. Yang et al. proposed an audio adversarial attack detection and defense method by exploring the temporal dependency in audio adversarial attacks [46]. However, their method requires multiple CNN recognition inferences, which is time-consuming. Rajaratnam et al. leveraged the random noise flooding to defend audio adversarial attacks [29]. Since the ASR systems are relatively robust to natural noise while the adversarial noise is not, by injecting random noise, the functionalities of adversarial noise can be destroyed. However, this method does not achieve a good defense performance.

In this article, we will reveal that most condensed adversarial attacks across different data modalities actually share a common attacking mechanism, providing opportunities to establish a unified defense methodology to accommodate attacks in multiple application scenarios.

3 Interpretation-oriented Condensed Adversarial Attacks Analysis

In this section, we first interpret CNN vulnerability and identify that neurons can be activated by condensed adversarial patterns with abnormal distinguished activation magnitudes. Then, we leverage the attention mechanism to reveal that all condensed adversarial attacks across different data modalities inherently share a common attacking mechanism. Finally, inspired by this mechanism, we propose metrics to measure the inference inconsistencies between the condensed adversarial attack and natural recognition process, which provides the design motivation for the following defense methodology development.

3.1 CNN Vulnerability Interpretation

Interpretation and Assumption: In a typical image or audio recognition process, the CNN extracts features from the original input data and gradually derives a prediction result. However, when injecting condensed adversarial perturbations into the original data, the CNN will be misled to a wrong prediction result. In Figure 1, by comparing with the original input, we find that an adversarial patch usually has no constraints in color, pattern content, and so forth. However, by sacrificing the semantic structures, these adversarial patches can cause abnormal activations during the CNN recognition process. Therefore, we make an assumptionthat the CNN lacks qualitative semantics distinguishing ability, which can be significantly activated by the non-semantic adversarial patch during CNN inference. To better interpret this vulnerability, we focus on a typical image condensed adversarial attack–with an adversarial patch attack as an example.

Assumption Verification: To verify our assumption, two sub-questions need to be addressed: (1) whether the non-semantic input patterns will lead to abnormal activations whereas the semantic input patterns generate normal activations and (2) whether the condensed adversarial patches are such non-semantic patterns that activate the neurons with different preferred classes.

Since these two questions are highly related to neurons in CNNs, we adopt a visualized CNN semantic analysis method–Activation Maximization Visualization (AM) [7]–to investigate the semantics of each neuron in the CNN. AM provides the intuitive CNN interpretation by visualizing each neuron’s most sensitive activation pattern. With AM, we can identify neurons’ preferred features and corresponding semantics and infer their activation exclusiveness to particular classes. The generation process of pattern $V(N_i^l)$ can be considered as synthesizing an input image to a CNN model that delicately maximizes the activation of the $i{\rm {th}}$ neuron $N_i^l$ in the $l{\rm {th}}$ layer. This process can be defined as

\begin{equation} V(N_i^l)=\mathop {\arg \max }\limits_{X} A_i^l(X), \qquad X \leftarrow X + \eta \cdot \frac{\partial A_i^l(X)}{\partial X} , \end{equation}

(1)

where, $A_i^l(X)$ is the activation of $N_i^l$ from an input image X, $\eta$ is the gradient ascent step size.

Figure 3 shows the visualized semantic input patterns by using AM. As the original AM method is designed for semantics interpretation, many feature regulations and hand-engineered natural image references are involved in generating interpretable visualization patterns. Therefore, we can get three AM patterns with an average activation magnitude value of 3.5 in Figure 3(a). The objects in the three patterns indicate that they have clear semantics. However, when we remove these semantics regulations in the AM process, we obtain three different visualized patterns, as shown in Figure 3(b). We find that these three patterns are non-semantic, but they have significant abnormal activation with an average magnitude value of 110. This phenomenon addresses the first question that the CNN neuron lacks the semantic distinguishing ability. Therefore, non-semantic input patterns will generate abnormal and significant activation.

Fig. 3.

To further address the second question, we investigate CNN neuron activation status during the condensed adversarial attack process. Figure 4 shows the AM patterns of neurons with the largest activation at the last convolutional layer of VGG-13 [35]. The salient object in the natural image is a dog, and the neurons that prefer dog class are highly activated. However, with the adversarial patch injected, the deception class (bird)–related neurons are significantly activated and quickly overwhelm the original neurons, misleading the prediction results.

Fig. 4.

Combining these two phenomena, we can verify our assumption that the CNN prediction process is mainly based on indiscriminate quantitative activation, lacking necessary qualitative semantics distinguishing ability. Therefore, adversarial patches can discard semantics’ original natural input patterns and cater to overwhelming activation with condensed non-semantic patterns.

3.2 Attack Process Analysis with Attention Mechanism

In order to reveal how condensed adversarial patches utilize such CNN vulnerability to manipulate the recognition process, we further examine the adversarial attack processes in three computation scenarios (image classification, object detection, and audio recognition) via an attention mechanism.¹

In our analysis, we calculate the attention values of the feature map from the spatial aspect. Given an intermediate input feature map $F \in R^{C \ast H \ast W}$ (here, C, H, and W are the channel depth, height, and width, respectively, of the feature maps), the spatial attention map is defined as

\begin{equation} A_s(F,x,y)=\frac{1}{C}\sum ^{C}_{i}F_{x,y}(i), \end{equation}

(2)

where i represents the $i{\rm {th}}$ feature map and $(x,y)$ indicates the spatial location. Therefore, a feature map F would have a spatial attention map with $H \times W$ size. Higher attention value indicates the higher activation generated by the object on the corresponding location.

Attention Analysis in Image Classification: Figure 5 shows the layer-level attention maps in a typical image classification process with VGG-13 [35]. The original image class is “Welsh springer spaniel” and the significant patterns (indicated by the higher brightness) are gradually focusing on the dog’s head and body (surrounded by the green circles) from shallow layers to deep layers. However, when attaching an adversarial patch on the top right corner of the original image, we find that the brightness level decreases at the dog’s location. Moreover, from the shallow layers to deep layers, the significant attention patterns shifted from dog to patch location (surrounded by the red circles). Eventually, at the last convolutional layer, the most significant attention patterns are totally concentrated on the patch and the final prediction result is manipulated to “Tench.” Through this example, we find that the adversarial patch can gradually draw higher attention from CNN and finally change the prediction result.

Fig. 5.

Attention Analysis in Object Detection: In image classification, only a single object needs to be classified. Therefore, if the attention patterns at the patch location are more significant than the original object, the adversarial attack can manipulate the prediction result. However, in the object detection scenario, the network usually needs to locate and classify multiple objects. Therefore, the adversarial patches are required to affect the entire image detection process. We first examine the attack process of untargeted adversarial attacks in object detection. Figure 6 illustrates the attention maps of YOLO [31] under a well-known condensed adversarial attack (DPatch) [19]. The light colors indicate significant attention patterns (white means that the attention values are extremely large). We easily find that the adversarial patch directly has the most significant attention patterns at the first layer of the neural network. Moreover, when the layer becomes deeper, the size of the attention pattern of the adversarial patch constantly increases and covers more than half of the feature map at the last convolutional layer in the feature extraction network. This characteristic illustrates why the adversarial patch can affect object detection processes on the entire image. We also investigate the attention maps under targeted condensed adversarial attacks in the object detection scenario and come to the same conclusion.

Fig. 6.

Attention Analysis in Audio Recognition: As mentioned earlier, in contrast to the image-related recognition process, the raw audio waveform needs to be preprocessed and converted into a frequency spectrum before feeding into the neural network to conduct recognition. Figure 7 shows the attention map comparison of AlexNet [16] when feeding original audio and adversarial audio. The original audio is recognized as “Yes” while the adversarial attack manipulates it to “No.” The patterns surrounded by the white circles have the most significant attention values. However, by adding the adversarial attack, the most significant attention locations are shifted (red circles) and the attention distribution on all of the attention maps also changed.

Fig. 7.

According to this attention-based attack analysis, we find that all of the condensed adversarial attacks across different data modalities actually share a common attacking mechanism: Since the condensed adversarial patterns can introduce abnormal activation and even overwhelm the natural input patterns, they will gradually manipulate the neural networks’ attention from the location of the original salient objects to the patch’s spatial location.

3.3 Inference Inconsistency between Condensed Adversarial Attack and Natural Input Recognition

The common attacking mechanism reveals that the condensed adversarial patterns will cause different recognition process compared with the original natural inputs. In order to identify the adversarial input, it is necessary to investigate this difference by evaluating the inference inconsistencies between adversarial attacks and the natural recognition process.

Inference Inconsistency Identification: Figure 8 shows a typical adversarial patch-based condensed attack. The patterns in the left circles are the primary activation sources from the input images, and the bars on the right are the neurons’ activations at the last convolutional layer. From the perspective of the input patterns, we identify the first inference inconsistency: the significant difference between the adversarial patch and primary activation source on the original image, which is referred to as Input Semantic Inconsistency. From the aspect of prediction activation magnitudes, we observe another inference inconsistency, Prediction Activation Inconsistency.

Fig. 8.

Inconsistency Metrics Formulation: We further define two metrics to indicate these two inconsistencies’ degrees.

(1) Input Semantic Inconsistency Metric: This metric measures the input semantic inconsistency between the non-semantic adversarial patches and the semantic local input patterns from the natural image. It can be defined as

\begin{equation} D(P_{pra},P_{ori}) = 1-S(P_{pra},P_{ori}), P_{pra} \xleftarrow {\Re } \Phi :{A_i^l(p)}, P_{ori} \xleftarrow {\Re } \Phi :{A_i^l(o)}, \end{equation}

(3)

where $P_{pra}$ and $P_{ori}$ represent the input patterns from the adversarial input and the original input. $\Phi :{A_i^l(p)}$ and $\Phi : {A_i^l(o)}$ represent the set of neurons’ activation produced by the adversarial patch and the original input, respectively. $\Re$ maps neurons’ activation to the primary local input patterns. S represents a similarity metric.

(2) Prediction Activation Inconsistency Metric: The second inconsistency is on the activation level, which reveals the activations’ magnitude distribution inconsistency in the last convolutional layer between the adversarial input and the original input. We also use a similar metric to measure it as

\begin{equation} D(f_{pra},f_{ori})=1-S(f_{pra},f_{ori}), f_{pra} \sim \Phi :{A_i^l(p)}, f_{ori} \sim \Phi :A_i^l(o), \end{equation}

(4)

where $f_{pra}$ and $f_{ori}$ represent the magnitude distribution of activation in the last convolutional layer generated by the adversarial input and the original input data.

For these two inconsistency metrics, we can easily obtain $P_{pra}$ and $f_{pra}$ since they come from the input data. However, $P_{ori}$ and $f_{ori}$ are not easily obtained because of the variety of the natural input data. Therefore, we need to synthesize the standard input data, which can provide the semantic input patterns and activation magnitude distribution. The synthesized input data for each prediction class can be obtained from a standard dataset. By feeding the CNN with a certain number of inputs from the standard dataset, we can record the average activation magnitude distribution at the last convolutional layer. We can also locate the primary semantic input patterns for each prediction class.

4 LanCeX Defense Methodology

In this section, guided by the two inconsistency metrics, we propose our versatile defense methodology, which consists of a self-detection stage and a data recovery stage in the CNN decision-making process.

4.1 Self-Detection for Condensed Adversarial Attack

Since both the input patterns and the prediction activations are directly calculated from CNN inference, a recognition application can easily achieve self-detection for condensed adversarial attacks. The entire self-detection stage can be summarized in the following four steps:

Step 1: The input data are fed into the CNN for one inference and we obtain the prediction class. Step 2: During the inference process, the CNN locates the primary activation sources from the practical input and calculates the activations in the last convolutional layer. Step 3: The CNN leverages the proposed metrics to measure the two inconsistencies $D(P_{pra},P_{ori})$ and $D(f_{pra},f_{ori})$ between the practical input and the synthesized data with the predicted class. Step 4: Once any inconsistency exceeds the given threshold, the CNN will consider the input as an adversarial input.

Figure 9 illustrates the method flow of the CNN self-detection in a traffic sign recognition example. For one input image, the CNN will first conduct the inference process and get the prediction result (e.g., 30 MPH). Then, a synthesized traffic sign pattern with 30 MPH is retrieved. After the inference, the significant activation source (green circle) will be located. Next, the detection stage will calculate the input semantic inconsistency with the expected semantic patterns (right circle) according to the prediction result. If the inconsistency exceeds a predefined threshold, the CNN will identify the input data as an adversarial input and further conduct the data recovery process to mitigate the adversarial perturbation from the input image.

Fig. 9.

4.2 Adversarial Input Mitigation via Data Recovery

As mentioned earlier, after a condensed adversarial attack has been detected by the self-detection stage, the data recovery method is further applied to mitigate the adversarial perturbation and thereby recover the attacked input data. Due to different characteristics between the image and audio data, we leverage the image inpainting method to recover the input image while using the activation denoising method to restore the input audio. We will derive specific defense methods from this methodology for three embedded recognition scenarios,–image classification, object detection, and audio recognition–in the following sections.

4.3 Computational Complexity Analysis

Computation cost is critical to the adversarial defense approaches for embedded recognition scenarios. Therefore, we leverage computational complexity to evaluate the methodology’s total computation cost. A low computational complexity indicates a small computation workload, proving that the proposed methodology is lightweight. In our defense methodology, the computational complexity is mainly contributed by the inner stages, such as the CNN inference, inconsistency metrics calculation, and data recovery. In the following three scenarios, we will analyze their computation complexity.

5 Scenario 1: Defense Against Condensed Adversarial Attacks in Image Classification

5.1 Defense Process in Image Classification

Primary Activation Pattern Localization: For the image condensed adversarial attacks defense, we mainly depend on the input semantic inconsistency in input pattern level. Therefore, we need to locate the primary activation source from the input image by adopting a CNN activation visualization method–Class Activation Mapping (CAM) [49]. Let $A_k(x,y)$ denote the value of the $k{\rm {th}}$ activation in the last convolutional layer at spatial location $(x,y)$.

We can compute a sum of all activations at the spatial location $(x,y)$ in the last convolutional layer as

\begin{equation} A_{T}(x,y)=\sum ^{1}_{K}A_k(x,y), \end{equation}

(5)

where K is the total number of activations in the last convolutional layer. The larger value of $A_{T}(x,y)$ indicates that the activation source in the input image at the corresponding spatial location is more important for classification results. For a natural input, it is the object pattern’s location whereas it is the adversarial patch’s location for an adversarial input.

In order to conduct further self-detection and data recovery, we need to determine the specific size of the primary activation pattern area. During this step, we first identify the location $(x_m,y_m)$ with the highest $A_{T}(x,y)$ on the input image. Next, if the adversarial patch size and shape are given, we can easily select the areas with the same size and shape based on the location $(x_m, y_m)$. However, when patch size and shape are not known beforehand, we need to first calculate the average activation value $A_a$ across the entire image. Then, starting from $(x_m,y_m)$, the surrounding locations whose values are higher than $A_a$ are considered in the pattern area.

Inconsistency Derivation: According to our preliminary analysis, the input adversarial patch contains much more high-frequency information than the natural semantic input patterns. We first leverage 2D Fast Fourier Transform (2D-FFT) [41] to transfer the patterns from the temporal domain to the frequency domain and thereby concentrate the low-frequency components together. Then, we convert the frequency-domain pattern to a binary pattern with an adaptive threshold. Figure 10 shows a conversion example, including adversarial patterns, expected synthesized patterns with the same prediction result, and natural input patterns. For binary patterns, we observe the significant difference between adversarial input and semantic synthesized input. Therefore, we replace $S(I_{pra},I_{ori})$ with the Jaccard Similarity Coefficient (JSC) [28] and propose our image inconsistency metric, which is formulated as

\begin{equation} D(P_{pra},P_{exp})=1-JSC(P_{pra},P_{exp})=\frac{|P_{pra}\bigcup P_{exp}|-|P_{pra}\bigcap P_{exp}|}{|P_{pra}\bigcup P_{exp}|}, \end{equation}

(6)

where $P_{exp}$ is the synthesized semantic pattern with predicted class. $P_{pra}\bigcap P_{exp}$ means the numbers of pixels where the pixel value of $P_{pra}$ and $P_{exp}$ both equal 1. For image classification, the input semantics patterns from expected prediction results can be referred by the ground-truth dataset. By testing a CNN model with a certain amount of data once, we can record the model’s preferred natural semantic input pattern by leveraging the CAM and size determination methods discussed earlier.

Fig. 10.

With the described inconsistency metric, we propose our specific defense methodology, which contains self-detection and image recovery, described in Figure 11.

Fig. 11.

Self-Detection: For each input image, we apply CAM to determine the source location of the largest model activations. Then, we crop the image to obtain patterns with maximum activations. During the semantic test, we calculate the inconsistency between $P_{pra}$ and $P_{exp}$. If it is higher than a predefined threshold $T_{ic}$, we consider an adversarial input detected. The threshold value $T_{ic}$ is determined by the prepossessing works. Specifically, for a given dataset (e.g., ImageNet-10), we first generate the synthesized semantic patterns for each class (e.g., 100 patterns in our experiment). Then, we calculate the inconsistency value across patterns in each class and assign the average value as $D_{avg}^{ground}(i)$, where i indicates the $i{\rm {th}}$ class. Next, we generate a certain number of adversarial patches for each class (10 in our experiment) and calculate the inconsistency value between them and the target synthesized semantic patterns. We consider the average inconsistency values as $D_{avg}^{adv}(i)$. Based on these settings, the value range of threshold $T_{ic}$ for each class is between $D_{avg}^{ground}(i)$ and $D_{avg}^{adv}(i)$.

Data Recovery: After the patch is detected and located, we conduct image data recovery by directly removing the patch from the original input data. In our case, considering the requirement of lightweight computation workload, two potential image inpainting methods are adopted: the Zero Mask and Telea methods [40] (shown in Figure 12).

Fig. 12.

Zero Mask directly sets all pixel values inside the patch area as 0, which achieves the smallest computation workload and has been applied already in a recent adversarial patch defense work [50]. As Figure 12 shows, the masked area will not affect the image classification results if the patch location is outside the object. However, when the attack is inside the object, directly masking the pattern with black will degrade further prediction performance. On the other hand, the Telea method achieves better inpainting performance while slightly sacrificing computation efficiency. We will evaluate the recovery performance of the two methods in terms of effectiveness and efficiency in Section 8.

5.2 Computational Complexity Analysis

The total computation complexity of the defense process in the image classification scenario is a result of the following four steps: CNN inference, maximum activation pattern localization, inconsistency metric calculation, and image interpolation. We model each step’s computational complexity as follows.

CNN Inference: When the input image is first fed into the CNN, the inference computational complexity $C_C$ is

\begin{equation} C_C \sim \mathcal {O} \left(\sum ^{L}_{i=1}\sum ^{n_i}_{j=1}{(r^j_i)}^2 n_{i-1} {h^j_i}{w^j_i}\right), \end{equation}

(7)

where ${(r^j_i)}^2$ represents the $j{\rm {th}}$ filter’s kernel size in the $i{\rm {th}}$ layer, $h^j_i w^j_i$ denotes the corresponding size of the output feature map, L is the total layer number and $n_i$ is the filter numbers in the $i{\rm {th}}$ layer.

Primary Activation Pattern Localization: Since computation complexities of other operations such as cropping are negligible, we consider CAM to contribute the primary computational complexity in this step. In CAM, each spatial location $(x,y)$ in the last convolutional layer is the weighted sum of K activations. Therefore, the total computational complexity is $C_M \sim \mathcal {O}(Kh^{n_L}_L w^{n_L}_L)$, where $h^{n_L}_L w^{n_L}_L$ is the size of the feature map at the last convolutional layer.

Inconsistency Metric Derivation: This step consists of 2D-FFT calculation and JSC calculation. According to the analysis in [20], the computational complexities of these two processes can be approximated to $C_F \sim \mathcal {O}(NlogN)$ and $C_J \sim \mathcal {O}(n_alogn_a)$, where N and $n_a$ represent the N pixel number in the input image and maximum activation pattern, respectively.

Image Inpainting: For Zero Mask, the total operation number is $C_z \sim \mathcal {O}(n)$, where n is the pixel number inside the patch. For the Telea method, the total computation complexity is $C_t \sim \mathcal {O}(3bn)$, where b represents the total operation number when inpainting each pixel.

Compared with activation localization, metric derivation, and image inpainting, the computational complexity of CNN inference dominates the entire computational complexity in the image scenario. Since our methodology involves only one CNN inference, it has the same-order computation workload as the normal CNN prediction.

6 Scenario 2: Defense Against Condensed Adversarial Attack in Object Detection

6.1 Defense Process in Object Detection

Potential Adversarial Patch Localization: Due to different attack results, the specific defense methods for targeted adversarial attacks and untargeted adversarial attacks need to be discussed separately.

(1) For a targeted attack, since only the adversarial patch can be located by the bounding box, the size and location information of the adversarial patch can be directly obtained. In this scenario, we depend on input semantic inconsistency and leverage Equation (6) to calculate the inconsistency. Next, the neural network can further conduct the self-detection process.

(2) For an untargeted attack, both the adversarial patch and normal objects will not be located and classified by the detection network. However, according to the attention map analysis, the adversarial patch can still attract the neural network’s attention from the original salient objects during the feature extraction process. We can leverage the CAM method at the shallow layers (e.g., the first or second convolutional layers) to determine the spatial location of the adversarial patch on the input image. The reason why we apply CAM at the shallow layers is that the activation pattern of the adversarial patch will constantly become larger when layers go deeper. Therefore, the spatial location calculated at the last convolutional layer is not accurate. Since an untargeted attack does not output a particular prediction class, it is impossible to generate the synthesized patterns from the dataset. Fortunately, since the adversarial patch contains higher information than the background and other objects [50], we can define an input information entropy inconsistency. We first leverage Discrete Entropy [11, 23] to define the information entropy $H_{pra}$ of a potential adversarial patch area:

\begin{equation} H_{pra} = - \frac{1}{3}\sum ^{3}_{i=1}\sum ^{255}_{j=0}(n_{ij}/N) log_2(n_{ij}/N) , \end{equation}

(8)

where i represents one of the RGB color channels and j represents the specific pixel value. $n_{ij}$ indicates the number of pixels with j gray value in channel i and N means the total pixel number on the potential area. Then, we compare the calculated entropy $H_{pra}$ with a synthesized $H_{exp}$. $H_{exp}$ can be obtained by randomly selecting some images from the dataset and calculating their average information entropy.

Based on this localization and inconsistency analysis, we further propose our defense methodology for condensed adversarial attacks in the object detection scenario, which is described in Figure 13.

Fig. 13.

Self-Detection: For a targeted attack, the located pattern conducts a semantics test. If the computed input semantic inconsistency value $D(P_{pra}, P_{exp})$ is higher than a predefined threshold $T_{odt}$, the network considers itself to have an adversarial input. For an untargeted attack, we calculate the inconsistency between the $H_{pra}$ and $H_{exp}$ and compare it to another predefined threshold $T_{odu}$.

Data Recovery: Since the input data in the object detection scenario is the image as well, we can leverage the same image inpainting methods proposed in Section 5. In contrast to the image classification scenario in which usually only one object needs to be classified at the center of the input image, there are multiple objects in the object-detection scenario. Some of these objects have small scales and are located at the border of the input image. Therefore, the adversarial patch will easily cover part of small-scale objects. If we directly mask the patch area with black by using Zero Mask, detection performance will significantly degrade. Because of this, we apply the Telea method only during data recovery in the object-detection scenario.

6.2 Computational Complexity Analysis

Computational complexity in the object detection scenario is mainly determined by the CNN inference, inconsistency metric calculation, and data recovery. Therefore, we model the computational complexity as following:

CNN Inference: In the object-detection scenario, the computational complexity $C_C$ can still be calculated via Equation (7). However, the layer number L is usually larger than the image classification since object detection includes two sub-steps: feature extraction and detection.

Inconsistency Metric Derivation: For a targeted attack, the computational complexity is the same as the image classification scenario. However, for an untargeted attack, the computational complexity introduced by information entropy calculation is $C_I \sim \mathcal {O} (3nlog_2n)$, where n represents the pixel number in the adversarial patch area.

7 Scenario 3: Defense Against Condensed Adversarial Attack in Audio Recognition

7.1 Defense Process in Audio Recognition

Inconsistency Derivation: In contrast to images, audio data requires more processing efforts. As Figure 2 shows, during audio recognition, the input waveform needs to pass MFCC conversion to be transferred from the time domain into the time-frequency domain. In that case, the original input audio data will lose semantics after the MFCC conversion. Therefore, we leverage prediction activation inconsistency to detect the audio condensed adversarial attacks.

We measure the activation magnitude distribution inconsistency between the practical input and the synthesized data with the same prediction class. We adopt a popular similarity evaluation method–the Pearson Correlation Coefficient (PCC) [2]. The inconsistency metric is

\begin{equation} D(f_{pra},f_{exp})=1-PCC(f_{pra},f_{exp})=1-\frac{E[(f_{pra}-\mu _{pra})(f_{exp}-\mu _{exp})]}{\sigma _{pra}\sigma _{exp}}, \end{equation}

(9)

where $f_{pra}$ and $f_{exp}$ represent activations in the last convolutional layer for both practical input and synthesized input. $\mu _a$ and $\mu _o$ are mean values of $f_{pre}$ and $f_{exp}$, $\sigma _{pra}$ and $\sigma _{exp}$ are standard derivations, and E is the overall expectation.

Self-Detection: With the established inconsistency metric, we further apply the self-detection stage to the CNN for the audio condensed adversarial attack. The detection flow is described as follows. We first obtain activations at the last convolutional layer for every possible input word by testing the CNN with a standard dataset. Then, we calculate the inconsistency value $D(I_{pra},I_{exp})$. If the model is attacked by the audio adversarial attack, $D(I_{pra},I_{exp})$ will exceed a pre-defined threshold $T_{ar}$. For $T_{ar}$, we leverage the similar determination method. First, in a given audio dataset (e.g., Voice Command dataset [24]), we generate the synthesized activation distributions for each class (50 in our experiment). Then the average inconsistency value $D_{avg}^{ground}(i)$ can be calculated, where i represents the $i{\rm {th}}$ class. Second, we generate a certain number of adversarial audio for each class (30 in our experiment) and compute the average inconsistency value $D_{avg}^{adv}(i)$ between them and the target synthesized activation distributions. Finally, the value range of the threshold $T_{ar}$ is between $D_{avg}^{ground}(i)$ and $D_{avg}^{adv}(i)$.

Data Recovery: After identifying the adversarial input audio, simply denying it can cause undesired consequences. Therefore, attacked audio recovery is considered to be one of the most acceptable solutions. We propose a new solution–“activation denoising”–as our defense method, which targets ablating adversarial effects from the activation level.² TActivation denoising takes advantage of the aforementioned last layer activation patterns, which have stable correlations with determined predication labels.

Our adversarial audio recovery method is shown in Figure 14. Based on detection results, we can identify the wrong prediction label and obtain the standard activation patterns of the wrong class in the last layer. (For the best performance, we locate the top-k activation index.) Then, we can find the activations with the same index. These activations are most potentially caused by the adversarial noises and supersede the original activations. Therefore, we suppress these activations to resurrect the original ones.

Fig. 14.

7.2 Computational Complexity Analysis

The computational complexity in the audio scenario is mainly determined by CNN inference and the inconsistency metric calculation, since other steps directly manipulate limited activation values with negligible computation workload involved. Therefore, we model the computational complexity as following:

CNN Inference: Since the audio has same inference process in the CNN, we use the same model in the image scenario to measure the computational complexity in the audio scenario.

Inconsistency Metric Derivation: The computational complexity of this step is contributed by the PCC calculation, which can be formulated as $C_P \sim \mathcal {O} ({n_L}^2)$, where $n_L$ is the activation number in the last layer.

8 Experiment and Evaluation

In this section, we evaluate LanCeX in terms of effectiveness and efficiency against condensed adversarial attacks in three computation scenarios: image classification, object detection, and audio recognition.

8.1 Defense Evaluation for Image Classification

Experiment Setup: The condensed adversarial attack method we defend in this section is the adversarial patch attack [3]. We iteratively conduct an attack process on Inception-V3 [38] with 1,000 randomly selected images from an ImageNet training set to generate a “cat” target adversarial patch. The generated patch with high transferability is utilized to attack two othermodels: VGG-13 [35] and ResNet-18 [13]. During the evaluation, we attach the generated patch on 500 images as adversarial examples and combine with other 500 natural images. All of these evaluation images are chosen from an ImageNet validation set. Specifically, in order to simulate different attack settings, the adversarial patches are generated with three different sizes (small: $40 \times 40$, medium: $60 \times 60$, large: $80 \times 80$) and two positions (fixed position and random position). For detection, we introduce three state-of-the-art detection methods as baselines: PM [12], NIC [21], and PatchGuard [44]. We reproduce PatchGuard [44] with official code and reproduce PM [12] and NIC [21] with our own implementation on Pytorch. Notably, since it is not clear how to derive the sub-models in NIC [21], we generate sub-models by using all of the previous layers before target layer l.

Detection Effectiveness: We first formulate the detection success rate $R_{d}$ as

\begin{equation} R_{d}= \frac{N_{a}+N_{n}}{N_{t}}, \end{equation}

(10)

where $N_{a}$ and $N_{n}$ represents the number of correctly detected adversarial inputs and natural inputs, respectively. $N_{t}$ means the total test image number. The higher $R_{d}$ indicates that the method has higher detection effectiveness. Then, we apply our defense method on all three models and test their detection success rates. Table 1 shows the overall detection performance with the same attacking setting (medium patch size and random position). On all three models, $R_{d}$ of LanCeX are from 89% to 91% while PM and PatchGuard are from 85% to 90%. NIC shows slightly higher $R_{d}$ than our method for ResNet-18. Table 2 demonstrates our method’s detection results under different attack settings. We find that detecting a fixed patch is easier than a random patch since its location is already known, which introduces a more precise patch location identification. A larger patch size will introduce more significant inconsistency between the patch and the normal pattern, which can increase the detection performance. However, the larger the patch is, the higher the possibility that it will cover the original objects, decreasing recognition performance after data recovery.

Table 1.

Method	Inception-V3	VGG-16	ResNet-18
PM [12]	88%	89%	85%
NIC [21]	90%	88%	90%M
PatchGuard [44]	90%	90%	89%
LanCeX	91%	90%	89%

Table 1. Detection Evaluation with Other Baseline Methods

Table 2.

Setting	Inception-V3	VGG-16	ResNet-18
Medium and random	91%	90%	89%
medium and fixed	93%	92%	92%
Small and random	85%	83%	82%
Large and random	94%	91%	92%

Table 2. Detection Evaluation under Different Attack Settings

Threshold Selection: As mentioned previously, threshold value $T_{ic}$ is critical since an inappropriate threshold will decrease either $N_{a}$ or $N_{n}$, which will decrease the entire detection performance. According to the method in Section 5, we can calculate the threshold range for the “cat” class as $D_{avg}^{ground}(i) = 0.41$ and $D_{min}^{adv}(i) = 0.53$. We evaluate the $R_d$ for the “cat” class patch when threshold $T_{ic}$ is selected from 0.41 to 0.53. The results are shown in Figure 15. The detection success rate in terms of different threshold values shows a normal-like distribution; both smaller and larger $T_{ic}$ will degrade the detection success rate. When $T_{ic}$ equals 0.48, the corresponding $R_d$ can achieve optimal performance.

Fig. 15.

Data Recovery Effectiveness: We also evaluate image prediction accuracies under two image data recovery methods (Zero Mask and Telea) and compare their results with Patch Masking. Table 3 shows the specific image data recovery performance. Telea significantly helps to correct predictions, resulting in 90.0%$\sim$91.5% accuracy recovery improvement on different models, whereas Patch Masking achieves 88.0%$\sim$90.0% accuracy recovery improvement. On the other hand, the performance of Zero Mask is lower than that of Telea since it will remove some important information from the original salient objects.

Table 3.

	Acc	Time	Acc	Time	Acc	Time
	Inception-V3		VGG-16		ResNet-18
Original	9.8%	N/A	9.5%	N/A	9.8%	N/A
PM [12]	88.1%	233 ms	88.7%	315 ms	90.3%	461 ms
PatchGuard [44]	86.8%	203 ms	90.1%	220 ms	89.6%	338 ms
LanCeX(Zero Mask)	88.5%	188 ms	87.6%	233 ms	89.5%	315 ms
LanCeX(Telea)	91.2%	211 ms	91.8%	268 ms	91.3%	357 ms

Table 3. Image Data Recovery Performance Evaluation*

*(1) Patch Masking (PM) [12]. (2) Original means the model without applying any defense methods.

Time Cost: We leverage the process time cost to represent the method’s computational complexity. In Table 3, we find that the process time cost of our two defense methods for one condensed adversarial attack is from 188 ms$\sim$357 ms while the Patch Masking is from 233 ms$\sim$461ms. Among the two methods, Zero Mask shows faster computation process than Telea while also slightly sacrifices the recovery performance in terms of prediction accuracy.

Based on this comparison, we show that our defense method has better performance than Patch Masking with respect to both effectiveness and efficiency.

8.2 Defense Evaluation for Object Detection

Experiment Setup: We evaluate our proposed defense method based on DPatch [19] with two 120 × 120 adversarial patches (targeted and untargeted). The detection model and dataset used here are YOLO [31] and PASCAL VOC 2007 [8], respectively. We re-implement a state-of-the-art work, Information-based Defense [50], and use it as the baseline method. According to the threshold evaluation method proposed in Section VI, we can determine the thresholds for a targeted attack and untargeted attack as 0.41 and 8.1, respectively. The entropy threshold of Information-based Defense is set as 7.5, which is the same as that of the original paper.

Detection and Data Recovery Effectiveness: Table 4 shows the overall detection and image data recovery performance, which are represented by detection ratio rate $R_d$ and mean Average Precision (mAP). Table 4 indicates the following. (1) In both targeted attack and untargeted attack settings, LanCeX achieves an 84.5% detection success rate on average, which is higher than the Information-based Defense method. (2) As for image recovery performance, LanCeX can improve the attacked mAP from the original 1.24 and 0.07 to 58.92 and 58.23, respectively. However, the information-based defense method can only achieve 56.11 mAP and 56.23 mAP.

Table 4.

	Attacked		Information-based [50]		LanCeX
	Tar	Untar	Tar	Untar	Tar	Untar
Detection ($R_d$%)	-	-	79	80	83	86
Recovery (mAP)	1.24	0.07	56.11	56.23	58.92	58.23
Time Cost (ms)	-	-	455	461	321	328

Table 4. Adversarial Patch Attack Detection Evaluation

*Tar/Untar represents targeted/untargeted adversarial attacks, respectively.

Time Cost: Similarly, we measure the time cost of the entire defense process to reflect the method’s computational complexity. Table 4 describes the time cost of LanCeX and Information-based Defense. The time cost of LanCeX for one condensed adversarial attack is from 321 ms$\sim$328 ms while Information-based Defense is from 455 ms$\sim$466 ms. This is because Information-based Defense needs to calculate the entropy across the entire image while LanCeX needs conduct computation on the patch area only, which significantly saves the computation workload.

Based on this comparison, we show that our defense method has better performance than the state-of-the-art method with respect to both effectiveness and efficiency.

8.3 Defense Evaluation for Audio Recognition

Experiment Setup: For the audio recognition scenario, we use the Command Classification Model [24] on the Google Voice Command dataset [24]. For comparison, we re-implement another state-of-the-art defense method: Dependency Detection [46]. Four works–FGSM [10], BIM [17], CW [4], and Genetic [1]–are used as attack methods to prove the generality of our defense method. We randomly select 1,000 samples from the Voice Command dataset as attacked audio.

Detect Effectiveness: Table 5 shows the overall detection performance comparison. LanCeX always achieves more than 92% detection success rate $R_d$ for all audio adversarial attacks. By contrast, Dependency Detection achieves an 89% detection success rate $R_d$ on average. Therefore, LanCeX demonstrates the best detection performance.

Table 5.

	FGSM	BIM	CW	Genetic
Dependency [6]	91%	89%	90%	88%
LanCeX	96%	94%	93%	91%

Table 5. Detection Evaluation for Audio Recognition

*Dependency Detection (Dependency) [46].

Threshold Choosing Discussion: Figure 16 shows the detection success rate $R_d$ under different threshold $T_{ar}$ settings. We achieve an optimal detection success rate when $T_{ar}$ equals 0.11, 0.12, and 0.13 while other threshold settings degrade detection performance.

Fig. 16.

Data Recovery Effectiveness: Next, we evaluate LanCeX’s data recovery performance. The k value in the top-k index is set as 6. We re-implement another method, Noise Flooding [29], and add it into evaluation as comparison. We use the original vulnerable model without data recovery as the baseline. Table 6 shows the overall audio recovery performance evaluation. After applying our recovery method, prediction accuracy significantly increases from an average of 8% to an average of 85.8%, which achieves 77.8% accuracy recovery. Dependency Detection and Noise Flooding have lower recovery rates: 74% and 54%, respectively.

Table 6.

Method	FGSM	BIM	CW	Genetic	Time Cost (ms)
No Recovery	10%	5%	4%	13%	-
Dependency Detection [46]	85%	83%	80%	80%	1,813
Noise Flooding [29]	62%	65%	62%	59%	1,246
LanCeX	87%	88%	85%	83%	521

Table 6. Audio Data Recovery Performance Evaluation

Time Cost: For defense efficiency, the computational complexity of LanCeX is much lower than other methods according to our previous analysis. As a result, the time cost of our method is 521 ms whereas the other two methods usually cost more than 1,813 ms for a single condensed adversarial attack. Thus, our defense method is 2$\sim 3\times$ faster.

9 Conclusion

In this article, we propose a versatile and lightweight CNN defense solution against condensed adversarial attacks and apply it in three practical recognition scenarios (image classification, object detection, and audio recognition). Leveraging the comprehensive CNN vulnerability visualization, attention-based attack process analysis, and two novel CNN inference inconsistency metrics, our method can effectively and efficiently detect and eliminate the condensed adversarial attacks in these three recognition scenarios. Experiments show that our methodology can achieve optimal detection successful rate and data recovery performance. Moreover, due to consideration of light computation during the design of our method, it is feasible to use it with resource-constrained embedded systems, such as mobile devices.

Footnotes

Notably, our attention mechanism does not have trainable parameters similar to those in [14]. Instead, we calculate it as feature map activation patterns.

It should be noted that the proposed defense method here could not recover the original input. We name it “recovery” to keep consistent with the previous sections.

References

[1]

Moustafa Alzantot, Bharathan Balaji, and Mani Srivastava. 2018. Did you hear that? Adversarial examples against automatic speech recognition. arXiv preprint arXiv:1801.00554.

Abstract

1 Introduction

2 Background and Related Works

2.1 Condensed Adversarial Attacks

2.2 Condensed Adversarial Attack Defense

3 Interpretation-oriented Condensed Adversarial Attacks Analysis

3.1 CNN Vulnerability Interpretation

3.2 Attack Process Analysis with Attention Mechanism

3.3 Inference Inconsistency between Condensed Adversarial Attack and Natural Input Recognition

4 LanCeX Defense Methodology

4.1 Self-Detection for Condensed Adversarial Attack

4.2 Adversarial Input Mitigation via Data Recovery

4.3 Computational Complexity Analysis

5 Scenario 1: Defense Against Condensed Adversarial Attacks in Image Classification

5.1 Defense Process in Image Classification

5.2 Computational Complexity Analysis

6 Scenario 2: Defense Against Condensed Adversarial Attack in Object Detection

6.1 Defense Process in Object Detection

6.2 Computational Complexity Analysis

7 Scenario 3: Defense Against Condensed Adversarial Attack in Audio Recognition

7.1 Defense Process in Audio Recognition

7.2 Computational Complexity Analysis

8 Experiment and Evaluation

8.1 Defense Evaluation for Image Classification

8.2 Defense Evaluation for Object Detection

8.3 Defense Evaluation for Audio Recognition

9 Conclusion

Footnotes

References

Cited By

Index Terms

Recommendations

An Illumination Modulation-Based Adversarial Attack Against Automated Face Recognition System

Defense against Adversarial Attacks on Image Recognition Systems Using an Autoencoder

Universal Physical Adversarial Attack via Background Image

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations