JP2024154761A

JP2024154761A - Adversarial attack countermeasure support system, method, and program

Info

Publication number: JP2024154761A
Application number: JP2023068791A
Authority: JP
Inventors: シビトウ; Ziwei Deng; 直人秋良; Naoto Akira
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2024-10-31

Abstract

To support measures against adversarial attacks with various attack patterns for an inference model.SOLUTION: A system for supporting measures against adversarial attacks includes: an attack encoder which encodes attack embedding from an original image; a noise adding unit which adds random noise to the attack embedding output from the attack encoder; a generator which generates an attack image to attack an inference model using the attack embedding with the random noise added thereto; an inference unit which performs inference on the attack image using the inference model, to obtain an inference result; a distinction unit which distinguishes the attack image from the original image to calculate a distinction result; a loss calculation unit which calculates a training loss which is a loss for training the attack image on the basis of the inference result and the distinction result; and a parameter update unit which updates the attack encoder, the generator, and the distinction unit on the basis of the training loss.SELECTED DRAWING: Figure 2

Description

本開示は、敵対的攻撃への対策を支援する技術に関する。 This disclosure relates to technology that helps counter hostile attacks.

近年、ディープニューラルネットワーク（ＤＮＮ）が登場して以来、ＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）を利用したアプリケーションは劇的に増加している。以下、ＡＩを利用したアプリケーションをＡＩアプリケーションともいう。ＡＩアプリケーションに用いられるＡＩモデルは通常のデータの入力に対して高いパフォーマンスを発揮するが、その一方で敵対的攻撃に対しては脆弱であり、分類の誤りや誤検出を起こす可能性がある。したがって、実際のアプリケーション、特に自動運転や監視システムなど高いセキュリティが要求される分野においてＡＩアプリケーションをリリースする前には敵対的攻撃に対するロバスト性を評価することが重要である。 In recent years, since the emergence of deep neural networks (DNNs), applications using AI (Artificial Intelligence) have increased dramatically. Hereinafter, applications using AI are also referred to as AI applications. AI models used in AI applications perform well when normal data is input, but are vulnerable to adversarial attacks and may cause misclassification or false positives. Therefore, it is important to evaluate the robustness against adversarial attacks before releasing AI applications in actual applications, especially in fields that require high security such as autonomous driving and surveillance systems.

非特許文献１には、画像の分類あるいは検出を行うＡＩモデルのロバスト性を評価する手法が開示されている。非特許文献１に開示された手法は、画像にパラメータフリー戦略で人間が認識できないノイズを追加することにより敵対的サンプルを生成し、評価対象のＡＩモデルで敵対的サンプルに対して分類あるいは検出を実行し、その分類あるいは検出の精度によってＡＩモデルのロバスト性を評価するというものである。 Non-Patent Document 1 discloses a method for evaluating the robustness of an AI model that classifies or detects images. The method disclosed in Non-Patent Document 1 generates adversarial samples by adding noise that cannot be recognized by humans to images using a parameter-free strategy, performs classification or detection on the adversarial samples using the AI model to be evaluated, and evaluates the robustness of the AI model based on the accuracy of the classification or detection.

特許文献１には、敵対的生成ネットワーク（ＧＡＮ：ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）を用いて敵対的サンプルを生成する手法が記載されている。特許文献１に記載された手法では視覚的な変化を伴う敵対的サンプルが生成される。 Patent document 1 describes a method for generating adversarial samples using a Generative Adversarial Network (GAN). The method described in patent document 1 generates adversarial samples that involve visual changes.

国際公開ＷＯ２０２０／１６５９３５号明細書International Publication No. WO2020/165935

Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-free Attacks, Francesco Croce and Matthias Hein, ICML 2020Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-free Attacks, Francesco Croce and Matthias Hein, ICML 2020

ＡＩモデルのロバスト性の評価においては、人間の目で認識できるかどうかとは関係なく、様々な種類の特徴を持った敵対的サンプルを用いて評価を行うことが好ましい。しかしながら、非特許文献１に開示された手法は、元の画像との視覚的特徴の差異が比較的小さい敵対的サンプルを生成し、ＡＩモデルのロバスト性の評価に用いるものである。したがって、元の画像との視覚的特徴の差異が明らかであるにも関わらず人間にとって不自然でないというような類の画像による攻撃に対するＡＩモデルのロバスト性を適切に評価することができない。 In evaluating the robustness of an AI model, it is preferable to use adversarial samples with various types of features, regardless of whether they can be recognized by the human eye. However, the method disclosed in Non-Patent Document 1 generates adversarial samples whose visual features differ relatively little from the original image, and uses them to evaluate the robustness of the AI model. Therefore, it is not possible to properly evaluate the robustness of an AI model against attacks using images that clearly differ in visual features from the original image but do not appear unnatural to humans.

特許文献１には、元の画像とは異なる視覚的特徴を持った敵対的サンプルを生成する方法が記載されている。特許文献１に記載の方法によれば、上述したような非特許文献１に開示された手法におけるた攻撃パターンの欠如を克服する可能性がある。しかしながら、特許文献１の手法は、攻撃を成功させることができる最も簡単な攻撃パターンの敵対的サンプルを生成する傾向を持つものであり、多様な攻撃パターンの敵対的サンプルを生成するものではない。 Patent Document 1 describes a method for generating adversarial samples that have visual characteristics different from those of the original image. The method described in Patent Document 1 has the potential to overcome the lack of attack patterns in the method disclosed in Non-Patent Document 1 as described above. However, the method in Patent Document 1 tends to generate adversarial samples with the simplest attack pattern that can be successfully attacked, and does not generate adversarial samples with a variety of attack patterns.

本開示に含まれるひとつの目的は、ＡＩモデルの多様な攻撃パターンの敵対的攻撃に対する対策を支援する技術を提供することである。 One objective of this disclosure is to provide technology that helps counter adversarial attacks with a variety of attack patterns on AI models.

本開示に含まれるひとつの態様による敵対的攻撃対策支援システムは、元画像から攻撃エンベディングをエンコードする攻撃エンコーダと、前記攻撃エンコーダから出力された前記攻撃エンベディングにランダムノイズを加えるノイズ追加部と、前記ランダムノイズが加えられた攻撃エンべディングを用いて推論モデルを攻撃するための攻撃画像を生成するジェネレータと、前記推論モデルを用いて前記攻撃画像に対する推論を行い、推論結果を取得する推論部と、前記攻撃画像と前記元画像との識別を行い、識別結果を算出する識別部と、前記推論結果および前記識別結果に基づいて前記攻撃画像を訓練するための損失である訓練損失を算出する損失算出部と、前記訓練損失に基づいて前記攻撃エンコーダ、前記ジェネレータ、および識別部を更新するパラメータ更新部と、を有する。 An adversarial attack countermeasure support system according to one aspect of the present disclosure includes an attack encoder that encodes an attack embedding from an original image, a noise addition unit that adds random noise to the attack embedding output from the attack encoder, a generator that generates an attack image for attacking an inference model using the attack embedding to which the random noise has been added, an inference unit that performs inference on the attack image using the inference model and obtains an inference result, a discrimination unit that discriminates between the attack image and the original image and calculates the discrimination result, a loss calculation unit that calculates a training loss, which is a loss for training the attack image based on the inference result and the discrimination result, and a parameter update unit that updates the attack encoder, the generator, and the discrimination unit based on the training loss.

本開示に含まれるひとつの態様によれば、多様な攻撃パターンの攻撃画像に対するＡＩモデルのロバスト性の評価を支援することが可能になる。 One aspect of the present disclosure makes it possible to assist in evaluating the robustness of an AI model against attack images with a variety of attack patterns.

本実施形態におけるコンピュータシステムのブロック図である。FIG. 2 is a block diagram of a computer system according to the present embodiment. 攻撃生成モデル学習部のブロック図である。FIG. 2 is a block diagram of an attack generation model learning unit. 攻撃画像生成部のブロック図である。FIG. 2 is a block diagram of an attack image generation unit. ＡＩモデルロバスト性評価部のブロック図である。FIG. 1 is a block diagram of an AI model robustness evaluation unit. ＡＩモデルロバスト性評価部の動作を示すフローチャートである。13 is a flowchart showing the operation of the AI model robustness evaluation unit.

以下、図面を参照して、本発明の実施形態について説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付し適宜説明を省略する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. Note that the present invention is not limited to this embodiment. In addition, in the description of the drawings, the same parts are given the same reference numerals and the description will be omitted as appropriate.

＜ハードウエア構成＞
本実施形態にて用いられる装置は、任意の適切なコンピュータシステムにソフトウェアプログラムを適用することによ実現されてもよい。 <Hardware configuration>
The apparatus used in the present embodiment may be realized by applying a software program to any suitable computer system.

図１は、本実施形態におけるコンピュータシステムのブロック図である。 Figure 1 is a block diagram of a computer system in this embodiment.

コンピュータシステム３００は、主要コンポーネントとして、１つ以上のプロセッサ３０２、メモリ３０４、端末インターフェース３１２、ストレージインターフェース３１４、Ｉ／Ｏ（入出力）デバイスインターフェース３１６、およびネットワークインターフェース３１８を有している。これらのコンポーネントは、メモリバス３０６、Ｉ／Ｏバス３０８、バスインターフェース部３０９、およびＩ／Ｏバスインターフェース部３１０を介して相互的に接続される。 The computer system 300 has, as its main components, one or more processors 302, memory 304, a terminal interface 312, a storage interface 314, an I/O (input/output) device interface 316, and a network interface 318. These components are interconnected via a memory bus 306, an I/O bus 308, a bus interface unit 309, and an I/O bus interface unit 310.

プロセッサ３０２は、１つまたは複数の汎用プログラマブル中央処理装置（ＣＰＵ）３０２Ａおよび３０２Ｂを含んでもよい。例えば、コンピュータシステム３００は複数のプロセッサを備えてもよい。他の例として、コンピュータシステム３００は単一のＣＰＵを備えていてもよい。プロセッサ３０２は、メモリ３０４に格納された命令を実行する装置であり、不図示のオンボードキャッシュを含んでもよい。 Processor 302 may include one or more general purpose programmable central processing units (CPUs) 302A and 302B. For example, computer system 300 may include multiple processors. As another example, computer system 300 may include a single CPU. Processor 302 is a device that executes instructions stored in memory 304 and may include an on-board cache (not shown).

メモリ３０４は、データおよびプログラムを記憶するためのランダムアクセス可能な半導体メモリ、記憶装置、揮発性の記憶媒体、または不揮発性の記憶媒体を含んでもよい。メモリ３０４は、以下に説明する各部の機能を実現するソフトウェアプログラム、ソフトウェアモジュール、およびデータ構造のすべてまたは一部を格納してもよい。例えば、メモリ３０４は、攻撃生成モデル学習部３４０、攻撃画像生成部３５０、およびＡＩモデルロバスト性評価部３６０の機能を実現するソフトウェアモジュールを格納していてもよい。各部とソフトウェアモジュールの構成単位は一致していなくてもよい。例えば、複数の部分を１つのソフトウェアモジュールにより実現してもよいし、１つの部分を複数のソフトウェアモジュールにより実現してもよい。 The memory 304 may include a randomly accessible semiconductor memory, a storage device, a volatile storage medium, or a non-volatile storage medium for storing data and programs. The memory 304 may store all or part of the software programs, software modules, and data structures that realize the functions of each unit described below. For example, the memory 304 may store software modules that realize the functions of the attack generation model learning unit 340, the attack image generation unit 350, and the AI model robustness evaluation unit 360. The constituent units of each unit and the software module may not be the same. For example, multiple units may be realized by one software module, or one unit may be realized by multiple software modules.

ある態様として、攻撃生成モデル学習部３４０、攻撃画像生成部３５０、およびＡＩモデルロバスト性評価部３６０は、それらの機能を実現するソフトウェアプログラムをプロセッサにより実行するプロセッサベースのシステムの代わりに、またはプロセッサベースのシステムに加えて、半導体デバイス、チップ、論理ゲート、回路、回路カード、および／または他の物理ハードウェアデバイスにより一部または全部をハードウェアで実現されてもよい。また、ある態様として、攻撃生成モデル学習部３４０、攻撃画像生成部３５０とＡＩモデルロバスト性評価部３６０は、ソフトウェアプログラムの命令または記述以外のデータを含んでもよい。また、ある態様として、不図示のカメラ、センサ、または他のデータ入力デバイスが、バスインターフェース部３０９、プロセッサ３０２、またはコンピュータシステム３００の他のハードウェアと直接通信するように提供されてもよい。攻撃生成モデル学習部３４０、攻撃画像生成部３５０、およびＡＩモデルロバスト性評価部３６０の詳細については図２、３、４を参照しながら後述する。 In one embodiment, the attack generation model learning unit 340, the attack image generation unit 350, and the AI model robustness evaluation unit 360 may be implemented in part or in whole in hardware using semiconductor devices, chips, logic gates, circuits, circuit cards, and/or other physical hardware devices instead of or in addition to a processor-based system in which a software program that realizes the functions of the attack generation model learning unit 340 is executed by a processor. In another embodiment, the attack generation model learning unit 340, the attack image generation unit 350, and the AI model robustness evaluation unit 360 may include data other than instructions or descriptions of the software program. In another embodiment, a camera, sensor, or other data input device (not shown) may be provided to directly communicate with the bus interface unit 309, the processor 302, or other hardware of the computer system 300. Details of the attack generation model learning unit 340, the attack image generation unit 350, and the AI model robustness evaluation unit 360 will be described later with reference to FIGS. 2, 3, and 4.

コンピュータシステム３００は、バスインターフェース部３０９を含んでもよい。バスインターフェース部３０９は、プロセッサ３０２、メモリ３０４、表示システム３２４、およびＩ／Ｏバスインターフェース部３１０間の通信を行う。Ｉ／Ｏバスインターフェース部３１０はＩ／Ｏバス３０８と接続されていてもよい。Ｉ／Ｏバス３０８にはデータの転送が可能なように様々な入出力部が接続されている。Ｉ／Ｏバスインターフェース部３１０は、Ｉ／Ｏバス３０８を介して、一般にＩ／Ｏプロセッサ（ＩＯＰ）あるいはＩ／Ｏアダプタ（ＩＯＡ）として知られる複数のＩ／Ｏインターフェース部（３１２，３１４，３１６、および３１８）と通信してもよい。 The computer system 300 may include a bus interface unit 309. The bus interface unit 309 provides communication between the processor 302, the memory 304, the display system 324, and an I/O bus interface unit 310. The I/O bus interface unit 310 may be connected to an I/O bus 308. Various inputs and outputs are connected to the I/O bus 308 to allow data transfer. The I/O bus interface unit 310 may communicate via the I/O bus 308 with multiple I/O interfaces (312, 314, 316, and 318), commonly known as I/O processors (IOPs) or I/O adapters (IOAs).

表示システム３２４は、表示装置３２６に画像を表示する処理を実行するシステムであり、不図示の表示コントローラと表示メモリの一方または両方を含んでもよい。表示コントローラは、ビデオとオーディオの両方のデータを表示装置３２６に提供することができる。表示システム３２４は、単独のディスプレイ画面、テレビ、タブレット、または携帯型デバイスなどの表示装置３２６に接続されてもよい。 Display system 324 is a system that performs processing to display images on display device 326, and may include one or both of a display controller and a display memory (not shown). The display controller can provide both video and audio data to display device 326. Display system 324 may be connected to display device 326, such as a standalone display screen, a television, a tablet, or a portable device.

また、コンピュータシステム３００は、データを収集し、プロセッサ３０２に当該データを提供するように構成された１つまたは複数のセンサ等の不図示のデバイスを含んでもよい。例えば、コンピュータシステム３００は、心拍数データやストレスレベルデータ等を収集するバイオメトリックセンサ、湿度データ、温度データ、圧力データ等を収集する環境センサと、加速度データ、運動データ等を収集するモーションセンサとを含んでもよい。これ以外の種類のセンサも使用可能である。 Computer system 300 may also include one or more sensors or other devices (not shown) configured to collect data and provide the data to processor 302. For example, computer system 300 may include biometric sensors that collect heart rate data, stress level data, etc., environmental sensors that collect humidity data, temperature data, pressure data, etc., and motion sensors that collect acceleration data, movement data, etc. Other types of sensors may also be used.

Ｉ／Ｏインターフェース部（３１２，３１４，３１６、および３１８）は、様々なストレージまたはＩ／Ｏデバイスと通信する機能を備える。例えば、端末インターフェース３１２は、ビデオ表示装置、スピーカテレビ等のユーザ出力デバイスや、キーボード、マウス、キーパッド、タッチパッド、トラックボール、ボタン、ライトペン、または他のポインティングデバイス等のユーザ入力デバイスのようなユーザＩ／Ｏデバイス３２０の取り付けが可能である。ユーザは、ユーザインターフェースを使用して、ユーザ入力デバイスを操作することで、ユーザＩ／Ｏデバイス３２０およびコンピュータシステム３００に対して入力データや指示を入力し、コンピュータシステム３００からの出力データを受け取ってもよい。ユーザインターフェースは例えば、ユーザＩ／Ｏデバイス３２０を介して、表示装置に表示されたり、スピーカによって再生されたり、プリンタを介して印刷されてもよい。 The I/O interface units (312, 314, 316, and 318) are capable of communicating with various storage or I/O devices. For example, the terminal interface 312 can be attached to user I/O devices 320, such as user output devices such as a video display device, a speaker television, and user input devices such as a keyboard, a mouse, a keypad, a touchpad, a trackball, a button, a light pen, or other pointing device. A user may use the user interface to input input data and instructions to the user I/O device 320 and the computer system 300 by operating the user input device, and to receive output data from the computer system 300. The user interface may be displayed on a display device, played through a speaker, or printed through a printer via the user I/O device 320, for example.

ストレージインターフェース３１４は、１つまたは複数のディスクドライブや直接アクセスするストレージ装置３２２（通常は磁気ディスクドライブストレージ装置であるが、単一のディスクドライブとして見えるように構成されたディスクドライブのアレイまたは他のストレージ装置であってもよい）の取り付けが可能である。ひとつの態様として、ストレージ装置３２２は、任意の二次記憶装置として実装されてもよい。メモリ３０４の内容は、ストレージ装置３２２に記憶され、必要に応じてストレージ装置３２２から読み出されてもよい。Ｉ／Ｏデバイスインターフェース３１６は、プリンタ、ファックスマシン等の他のＩ／Ｏデバイスに対するインターフェースを提供してもよい。ネットワークインターフェース３１８は、コンピュータシステム３００と他のデバイスが相互的に通信できるように、通信経路を提供してもよい。この通信経路は、例えば、ネットワーク３３０であってもよい。 The storage interface 314 may be attached to one or more disk drives or directly accessed storage devices 322 (usually magnetic disk drive storage devices, but may also be an array of disk drives or other storage devices configured to appear as a single disk drive). In one embodiment, the storage device 322 may be implemented as any secondary storage device. The contents of the memory 304 may be stored in the storage device 322 and retrieved from the storage device 322 as needed. The I/O device interface 316 may provide an interface to other I/O devices such as printers, fax machines, etc. The network interface 318 may provide a communication path to allow the computer system 300 and other devices to communicate with each other. This communication path may be, for example, a network 330.

ひとつの態様として、コンピュータシステム３００は、マルチユーザメインフレームコンピュータシステム、シングルユーザシステム、またはサーバコンピュータ等の、直接的ユーザインターフェースを有しない、他のコンピュータシステム（クライアント）からの要求を受信するデバイスであってもよい。他の実施形態では、コンピュータシステム３００は、デスクトップコンピュータ、携帯型コンピュータ、ノートパソコン、タブレットコンピュータ、ポケットコンピュータ、電話、スマートフォン、または任意の他の適切な電子機器であってもよい。 In one aspect, computer system 300 may be a device that receives requests from other computer systems (clients) without a direct user interface, such as a multi-user mainframe computer system, a single-user system, or a server computer. In other embodiments, computer system 300 may be a desktop computer, a portable computer, a laptop, a tablet computer, a pocket computer, a telephone, a smartphone, or any other suitable electronic device.

次に、図２を参照して本実施形態に係る攻撃生成モデル学習部３４０について説明する。 Next, the attack generation model learning unit 340 according to this embodiment will be described with reference to FIG.

図２は、攻撃生成モデル学習部３４０のブロック図である。 Figure 2 is a block diagram of the attack generation model learning unit 340.

攻撃生成モデル学習部３４０は、攻撃エンコーダ１０３、ノイズ追加部１０５、ジェネレータ１０７、識別部１１０、損失算出部１１１、およびパラメータ更新部１１２を有している。 The attack generation model learning unit 340 has an attack encoder 103, a noise adding unit 105, a generator 107, an identifying unit 110, a loss calculation unit 111, and a parameter updating unit 112.

元画像１０１とクラス条件ｃ１０２が攻撃エンコーダ１０３に入力される。元画像１０１は、攻撃対象とするＡＩモデルの生成に用いられた学習データセット内の元の画像である。ＡＩモデルは、画像からの物体検出あるいは画像の分類を行う推論モデルである。元画像１０１には、攻撃エンコーダに入力される前に、画像の反転、トリミング、回転などいくつかの処理を施してデータ拡張を行ってもよい。クラス条件ｃ１０２は、攻撃の対象とするクラスを示す情報である。以下、攻撃の対象とするクラスを対象クラスと呼ぶ場合がある。対象クラスは、すなわち、攻撃によってＡＩモデルが誤検出するクラスまたは誤った分類を行うクラスである。対象クラスは、ＡＩモデルが画像を分類できる任意のクラスであってよいし、分類されない背景を示すクラスであってもよい。攻撃エンコーダ１０３は、元画像１０１を、攻撃エンベディングｅ１０４と呼ばれる特徴ベクトルへとマッピングするＣＮＮ（畳み込みニューラルネットワーク）であり、元画像１０１の画像表現とクラス条件ｃ１０２と元画像１０１に適した攻撃戦略とをベクトルにエンコードする。攻撃エンコーダ１０３は、特に限定されないが、例えば、入力された元画像１０１およびクラス条件ｃ１０２に最も適した攻撃パターンの攻撃エンベディングｅ１０４を生成する。最も適した攻撃パターンとはすなわち攻撃に成功する可能性が最も高い攻撃パターンである。 The original image 101 and the class condition c 102 are input to the attack encoder 103. The original image 101 is an original image in the learning dataset used to generate the AI model to be attacked. The AI model is an inference model that detects objects from images or classifies images. The original image 101 may be subjected to several processes such as image inversion, cropping, and rotation before being input to the attack encoder to perform data expansion. The class condition c 102 is information indicating the class to be attacked. Hereinafter, the class to be attacked may be referred to as the target class. The target class is, in other words, a class that the AI model erroneously detects or erroneously classifies due to the attack. The target class may be any class into which the AI model can classify an image, or may be a class indicating a background that is not classified. The attack encoder 103 is a CNN (convolutional neural network) that maps the original image 101 to a feature vector called an attack embedding e 104, and encodes the image representation of the original image 101, the class condition c 102, and an attack strategy suitable for the original image 101 into a vector. The attack encoder 103 is not particularly limited, but for example, generates an attack embedding e 104 of an attack pattern that is most suitable for the input original image 101 and class condition c 102. The most suitable attack pattern is the attack pattern that is most likely to be successful.

ノイズ追加部１０５は、攻撃エンコーダ１０３により生成された攻撃エンベディングｅ１０４に複数のランダムノイズε １０６をそれぞれ加える。ランダムノイズεは、一様分布またはガウス分布であってよいが、これらに限定されることはない。攻撃エンベディングｅ１０４にそれぞれ異なるランダムノイズε １０６が加えられた複数の攻撃エンベディングはジェネレータ１０７に入力される。 The noise adding unit 105 adds multiple random noises ε 106 to the attack embeddings e 104 generated by the attack encoder 103. The random noises ε may have a uniform distribution or a Gaussian distribution, but are not limited to these. The multiple attack embeddings in which different random noises ε 106 are added to the attack embeddings e 104 are input to the generator 107.

ジェネレータ１０７は、クラス条件ｃ１０２と複数の攻撃エンベディングとを入力として複数の攻撃画像Ｇ（ｅ，ｃ，ε）１０８を生成し、複数の攻撃画像１０８をを識別部１１０に出力する。ジェネレータ１０７は、攻撃エンベディングを攻撃画像１０８にデコードするＣＮＮである。 The generator 107 receives the class condition c 102 and multiple attack embeddings as input to generate multiple attack images G(e, c, ε) 108, and outputs the multiple attack images 108 to the classification unit 110. The generator 107 is a CNN that decodes the attack embeddings into the attack images 108.

推論部１０９は、攻撃対象のＡＩモデルを備え、ＡＩモデルを用いて画像からの物体検出あるいは画像の分類を行い、その検出あるいは分類の結果を取得する。以下、検出あるいは分類の結果を推論結果という場合がある。識別部１１０は、攻撃画像１０８と元画像１０１との識別を行い、識別の結果を取得する。識別は、人間の目で見た視覚的な区別を行うことである。以下、攻撃画像１０８と元画像１０１との識別の結果を識別結果という場合がある。 The inference unit 109 is equipped with an AI model of the target of attack, and uses the AI model to detect objects from images or classify images, and obtain the detection or classification results. Hereinafter, the detection or classification results may be referred to as the inference results. The identification unit 110 identifies the attack image 108 and the original image 101, and obtains the identification results. Identification is a visual distinction made by the human eye. Hereinafter, the identification results between the attack image 108 and the original image 101 may be referred to as the identification results.

損失算出部１１１は、推論部１０９による推論結果および識別部１１０による識別結果に基づいて、攻撃画像１０８を評価するための損失を算出する。パラメータ更新部１１２は、損失算出部１１１で算出された損失を最適化するように攻撃エンコーダ１０３、ジェネレータ１０７、および識別部１１０のパラメータを更新する。損失算出部１１１による損失の算出およびパラメータ更新部１１２によるパラメータの更新の詳細について以下に述する。 The loss calculation unit 111 calculates a loss for evaluating the attack image 108 based on the inference result by the inference unit 109 and the classification result by the classification unit 110. The parameter update unit 112 updates the parameters of the attack encoder 103, the generator 107, and the classification unit 110 so as to optimize the loss calculated by the loss calculation unit 111. Details of the calculation of the loss by the loss calculation unit 111 and the update of the parameters by the parameter update unit 112 are described below.

攻撃画像１０８は、元画像１０１に多少の変更を加えたものとなるが、元画像１０１と全く異なる画像にすることは好ましくないため、元画像１０１と攻撃画像１０８の差分を適切な範囲に制限するために、式（１）のように損失Ｌ_ｄｉｆを設計する。損失Ｌ_ｄｉｆ（第１の損失）は、元画像１０１と攻撃画像１０８の差分を適切な範囲Ｍに制限するための損失である。以下、損失Ｌ_ｄｉｆを第１の損失という場合がある。 Although the attack image 108 is a slight modification of the original image 101, it is not preferable to make the attack image 108 completely different from the original image 101. Therefore, in order to limit the difference between the original image 101 and the attack image 108 to an appropriate range, the loss L _dif is designed as shown in formula (1). The loss L _dif (first loss) is a loss for limiting the difference between the original image 101 and the attack image 108 to an appropriate range M. Hereinafter, the loss L _dif may be referred to as the first loss.

ここで、ｘは元画像である。Ｅ_εは、元画像ｘと、複数のランダムノイズεにより生成された複数の攻撃画像Ｇ（ｅ，ｃ，ε）１０８との差分の平均値である。

Here, x is the original image, and _{E ε} is the average value of the differences between the original image x and multiple attack images G(e, c, ε) 108 generated by multiple random noises ε.

次に、ジェネレータ１０７と識別部１１０は、通常のＧＡＮのように、２プレーヤーによるミニマックス法のゲームで互いに競争することになる。識別部１１０は、元画像１０１と攻撃画像１０８を正しく識別しようとする。一方、ジェネレータ１０７は、よりリアルで高品質な攻撃画像１０８を生成して識別部１１０をだまそうとする。そのために、損失Ｌ_Ｇおよび損失Ｌ_ＧＡＮは、それぞれ以下の式（２）および式（３）に示すように設計される。以下、損失Ｌ_ＧＡＮを第２の損失という場合がある。 Next, the generator 107 and the classifier 110 compete against each other in a two-player minimax game, as in a normal GAN. The classifier 110 attempts to correctly classify the original image 101 and the attack image 108. Meanwhile, the generator 107 attempts to trick the classifier 110 by generating a more realistic and high-quality attack image 108. To this end, the loss L _G and the loss L _GAN are designed as shown in the following formulas (2) and (3), respectively. Hereinafter, the loss L _GAN may be referred to as the second loss.

ここで、Ｄ（ｘ）は、識別部１１０による元画像ｘが正しく識別される確率を示す。また、Ｄ（Ｇ（ｅ，ｃ，ε））は、生成された攻撃画像Ｇ（ｅ，ｃ，ε）が誤って識別される確率を示す。識別部Ｄは、Ｌ_ＧＡＮを最大化しようとする。ジェネレータ１０７は、Ｌ_Ｇを最小化しようとする。

Here, D(x) denotes the probability that the original image x is correctly classified by the classification unit 110. Also, D(G(e, c, ε)) denotes the probability that the generated attack image G(e, c, ε) is incorrectly classified. The classification unit D tries to maximize L _GAN . The generator 107 tries to minimize L _G.

次に、推論部１０９における攻撃画像１０８に対する物体検出あるいは画像の分類の推論結果から攻撃が成功したか否かを判定することができる。例えばＡＩモデルが分類タスクのモデルである場合、ＡＩモデルが攻撃画像１０８をクラス条件ｃに示されている対象クラスに誤って分類したら攻撃は成功したと判断できる。また、ＡＩモデルが検出タスクのモデルである場合、ＡＩモデルが攻撃画像１０８から対象クラスの物体を誤って検出したら攻撃は成功したと判断できる。したがって、損失Ｌ_ＡＩは式（４）に示すように設計される。以下、損失Ｌ_ＡＩを第３の損失という場合がある。 Next, it can be determined whether the attack is successful or not from the inference result of object detection or image classification for the attack image 108 in the inference unit 109. For example, when the AI model is a classification task model, it can be determined that the attack is successful if the AI model erroneously classifies the attack image 108 into the target class indicated in the class condition c. Also, when the AI model is a detection task model, it can be determined that the attack is successful if the AI model erroneously detects an object of the target class from the attack image 108. Therefore, the loss L _AI is designed as shown in formula (4). Hereinafter, the loss L _AI may be referred to as the third loss.

ここで、ＡＩ（Ｇ（ｅ，ｃ，ε），ｃ）は、クラス条件ｃと、生成された攻撃画像１０８のＡＩモデルによる推論結果との差分を意味する。ＡＩモデルが分類タスクのモデルである場合、差分を最小化しようとする。ＡＩモデルが検出タスクのモデルである場合、差分を最大化しようとする。

Here, AI(G(e, c, ε), c) means the difference between the class condition c and the inference result of the AI model of the generated attack image 108. If the AI model is a model for a classification task, it tries to minimize the difference. If the AI model is a model for a detection task, it tries to maximize the difference.

次に、複数の攻撃画像Ｇ（ｅ，ｃ，ε）に対するＡＩモデルによる推論の結果ＡＩ（Ｇ（ｅ，ｃ，ε），ｃ）から、複数の攻撃画像Ｇ（ｅ，ｃ，ε）のなかで攻撃の成功に最も近い攻撃画像Ｇ（ｅ，ｃ，ε_＋）が選択される。例えば、推論結果の精度に基づいて攻撃の成功に最も近い攻撃画像Ｇ（ｅ，ｃ，ε_＋）を選択してもよい。そして、より良い攻撃エンベディングｅをエンコードするために攻撃エンコーダ１０３を更新する損失Ｌ_ｅは、式（５）に示すように設計される。以下、損失Ｌ_ｅを第４の損失という場合がある。 Next, from the result AI(G(e,c,ε),c) of inference by the AI model for the multiple attack images G(e,c,ε), the attack image G(e,c,ε+) that is closest to the successful attack among the multiple attack images G(e,c, _ε ) is selected. For example, the attack image G(e,c,ε ₊ ) that is closest to the successful attack may be selected based on the accuracy of the inference result. Then, the loss L _e that updates the attack encoder 103 to encode a better attack embedding e is designed as shown in Equation (5). Hereinafter, the loss L _e may be referred to as the fourth loss.

ここで、円の中に点が打たれた記号は、攻撃エンベディングｅを埋め込んだ攻撃とランダムノイズεとの組み合わせ計算を意味する演算子である。これは攻撃エンベディングｅ１０４に対するランダムノイズεの追加を意味する。攻撃エンベディングｅ１０４に対するランダムノイズεの追加は、攻撃エンベディングｅ１０４に対してランダムノイズεを加算することであってもよいし、乗算することであってもよい。

Here, the symbol with a dot in a circle is an operator that means a combination calculation of an attack in which the attack embedding e is embedded and a random noise ε. This means adding the random noise ε to the attack embedding e 104. Adding the random noise ε to the attack embedding e 104 may be adding or multiplying the random noise ε to the attack embedding e 104.

損失算出部１１１は、それぞれ異なる重みを表すハイパーパラメータω_１、ω_２、ω_３を用いて、上述した第１、２、３、４の損失の損失関数から、式（６）に示す総合的な損失Ｌを算出する。以下、損失Ｌを訓練損失という場合がある。 The loss calculation unit 111 calculates the overall loss L shown in formula (6) from the loss functions of the first, second, third, and fourth losses described above, using hyperparameters ω ₁ , ω ₂ , and ω ₃ each representing a different weight. Hereinafter, the loss L may be referred to as the training loss.

そして、パラメータ更新部１１２は、訓練損失である損失Ｌに基づいて勾配を計算し、勾配を用いた最適化アルゴリズムによって、攻撃エンコーダ１０３、ジェネレータ１０７、および識別部１１０のＣＮＮモデルのパラメータを更新する。

Then, the parameter update unit 112 calculates a gradient based on the loss L, which is the training loss, and updates the parameters of the CNN models of the attack encoder 103, the generator 107, and the identification unit 110 by an optimization algorithm using the gradient.

訓練損失を算出しパラメータを更新するという学習を所定の終了条件が満たされるまで複数回繰り返すと、攻撃エンコーダ１０３は、適切な攻撃エンベディングｅ１０４をエンコードし、攻撃を成功させるように訓練される。終了条件として、例えば、推論結果の精度に閾値を設けてよいし、反復回数に上限を設けてもよい。そして、ジェネレータ１０７は、高品質でリアルな攻撃画像１０８を生成するように訓練される。また、識別部１１０は、実際の画像と生成された攻撃画像１０８を区別するように訓練される。このようにして訓練された攻撃エンコーダ１０３と訓練されたジェネレータ１０７は、攻撃画像生成部３５０で使用される。また、訓練された攻撃エンコーダ１０３、訓練されたジェネレータ１０７、および訓練された識別部１１０は、ＡＩモデルロバスト性評価部３６０で使用される。 When the learning of calculating the training loss and updating the parameters is repeated multiple times until a predetermined termination condition is met, the attack encoder 103 is trained to encode an appropriate attack embedding e 104 and perform a successful attack. As the termination condition, for example, a threshold value may be set for the accuracy of the inference result, or an upper limit may be set for the number of iterations. Then, the generator 107 is trained to generate a high-quality and realistic attack image 108. Also, the identification unit 110 is trained to distinguish between an actual image and the generated attack image 108. The attack encoder 103 and the trained generator 107 thus trained are used in the attack image generation unit 350. Also, the trained attack encoder 103, the trained generator 107, and the trained identification unit 110 are used in the AI model robustness evaluation unit 360.

次に、図３を参照して、攻撃画像生成部３５０について説明する。 Next, the attack image generation unit 350 will be described with reference to FIG. 3.

図３は、攻撃画像生成部３５０のブロック図である。図３を参照すると、攻撃画像生成部３５０は、攻撃エンコーダ１０３とジェネレータ１０７を有している。 Figure 3 is a block diagram of the attack image generation unit 350. Referring to Figure 3, the attack image generation unit 350 has an attack encoder 103 and a generator 107.

攻撃エンコーダ１０３とジェネレータ１０７は、攻撃生成モデル学習部３４０にて訓練されたものである。元画像２０１は、攻撃画像２０４を生成する元の画像である。クラス条件ｃ２０２は、攻撃の対象クラスを示す情報である。クラス条件ｃ２０２の対象クラスは、攻撃生成モデル学習部３４０にて用いられたクラス条件ｃ１０２に示された対象クラスと同じであってもよい。攻撃エンコーダ１０３は、元画像２０１とクラス条件ｃ２０２に対して最適な攻撃エンベディングｅ２０３をエンコードすることができる。ジェネレータ１０７は、その攻撃エンベディングｅ２０３に基づいて高品質でリアルな攻撃画像２０４を生成することができる。 The attack encoder 103 and the generator 107 are trained by the attack generation model learning unit 340. The original image 201 is an original image for generating the attack image 204. The class condition c 202 is information indicating a target class of the attack. The target class of the class condition c 202 may be the same as the target class indicated in the class condition c 102 used in the attack generation model learning unit 340. The attack encoder 103 can encode an optimal attack embedding e 203 for the original image 201 and the class condition c 202. The generator 107 can generate a high-quality and realistic attack image 204 based on the attack embedding e 203.

次に、図４を参照して、ＡＩモデルロバスト性評価部３６０について説明する。 Next, the AI model robustness evaluation unit 360 will be described with reference to FIG. 4.

ＡＩモデルを評価するには、まず訓練済みの攻撃エンコーダと訓練済みのジェネレータによって攻撃画像を生成し、生成された攻撃画像を評価対象のＡＩモデルに入力し、ＡＩモデルから推論結果を取得する。一般には、推論結果の適合率や再現率などの精度が評価基準としてそのまま使用される。ただし、この評価方法による評価は、ジェネレータの能力に大きく依存する。つまり、ジェネレータがどれだけ好適に訓練されているかが評価結果に大きな影響を与える。ジェネレータのネットワーク内の全てのパラメータは訓練によって得ることが可能であるが、その訓練の反復回数は経験に基づいて人間によって決定される。したがって、ジェネレータをどれだけ好適に訓練することができるかは人間の経験や能力に依存してしまう。 To evaluate an AI model, first, an attack image is generated using a trained attack encoder and a trained generator, the generated attack image is input into the AI model to be evaluated, and an inference result is obtained from the AI model. In general, the accuracy of the inference result, such as the precision rate and recall rate, is used as the evaluation criterion. However, evaluation using this evaluation method is heavily dependent on the capabilities of the generator. In other words, how well the generator is trained has a significant impact on the evaluation results. All parameters in the generator network can be obtained through training, but the number of training iterations is determined by humans based on experience. Therefore, how well the generator can be trained depends on human experience and capabilities.

これに対して本実施形態では、ジェネレータの微調整（訓練）の反復回数をＡＩモデルのロバスト性の評価基準としている。具体的には、事前に訓練済みのジェネレータ１０７を使用して攻撃画像４０４を生成し、その攻撃画像４０４で攻撃に失敗した場合、攻撃に成功する攻撃画像４０４を生成できるようになるまでジェネレータ１０７の微調整（訓練）を繰り返す。そして、攻撃に成功するまでの微調整の反復回数がＡＩモデルの評価結果４０６として記録される。微調整の反復回数が多いほどＡＩモデルは攻撃を受けにいと言えるため、微調整の反復回数は、ＡＩモデルがどのていど堅牢であるかを表すロバスト性の評価基準となる。 In contrast, in this embodiment, the number of iterations of fine-tuning (training) the generator is used as the evaluation criterion for the robustness of the AI model. Specifically, an attack image 404 is generated using a pre-trained generator 107, and if the attack fails with that attack image 404, fine-tuning (training) of the generator 107 is repeated until an attack image 404 that is successful in the attack can be generated. The number of iterations of fine-tuning until the attack is successful is then recorded as the evaluation result 406 of the AI model. Since the more iterations of fine-tuning, the less susceptible the AI model is to attack, the number of iterations of fine-tuning is the evaluation criterion for robustness that indicates how robust the AI model is.

図４は、ＡＩモデルロバスト性評価部３６０のブロック図である。図４を参照すると、ＡＩモデルロバスト性評価部３６０は、攻撃エンコーダ１０３、ジェネレータ１０７、評価部４０５、識別部１１０、損失算出部４０７、およびパラメータ更新部４０８を有している。なお、ＡＩモデルロバスト性評価部３６０には、ランダムノイズε １０６を加えるノイズ追加部１０５に相当する機能はない。 Figure 4 is a block diagram of the AI model robustness evaluation unit 360. Referring to Figure 4, the AI model robustness evaluation unit 360 has an attack encoder 103, a generator 107, an evaluation unit 405, an identification unit 110, a loss calculation unit 407, and a parameter update unit 408. Note that the AI model robustness evaluation unit 360 does not have a function equivalent to the noise addition unit 105 that adds random noise ε 106.

元画像４０１は、評価に用いる攻撃画像４０４を生成する元の画像である。クラス条件ｃ４０２は、評価における攻撃の対象クラスを示す情報である。クラス条件ｃ４０２の対象クラスは、攻撃生成モデル学習部３４０にて用いられたクラス条件ｃ１０２に示された対象クラスと同じであってもよい。 The original image 401 is the original image used to generate the attack image 404 used in the evaluation. The class condition c 402 is information indicating the target class of the attack in the evaluation. The target class of the class condition c 402 may be the same as the target class indicated in the class condition c 102 used in the attack generation model learning unit 340.

攻撃エンコーダ１０３および識別部１１０は、攻撃生成モデル学習部３４０にて訓練されたものである。ジェネレータ１０７は、初期の状態においては攻撃生成モデル学習部３４０にて訓練されたものである。 The attack encoder 103 and the identification unit 110 are trained by the attack generation model learning unit 340. The generator 107 is trained by the attack generation model learning unit 340 in the initial state.

損失算出部３０７は、重みを表すハイパーパラメータω_１、ω_２を用いて、上述した第１、２、３の損失の損失関数から、式（７）に示す微調整損失Ｌ_ｆｔを算出する。 The loss calculation unit 307 calculates the fine-tuning loss L _ft shown in formula (7) from the loss functions of the first, second, and third losses described above, using the hyperparameters ω ₁ and ω ₂ representing weights.

パラメータ更新部４０８は、微調整損失Ｌ_ｆｔに基づいて勾配を計算し、勾配を用いた最適化アルゴリズムによってジェネレータ１０７のパラメータを更新する。

The parameter update unit 408 calculates a gradient based on the fine-tuning loss L _ft , and updates the parameters of the generator 107 by an optimization algorithm using the gradient.

評価部４０５は、評価対象のＡＩモデルを備え、ＡＩモデルを用いて攻撃画像４０４からの物体検出あるいは攻撃画像４０４の分類を行い、その推論結果において攻撃が失敗していたら、攻撃が成功するまで損失算出部４０７およびパラメータ更新部４０８によるジェネレータ１０７の訓練とＡＩモデルによる推論結果の取得とを繰り返す。評価部４０５は、全ての元画像４０１を基にした攻撃画像４０４の評価を行った後、推論結果の平均精度と反復回数を算出し、評価結果４０６として出力する。 The evaluation unit 405 has an AI model to be evaluated, and uses the AI model to detect objects from the attack image 404 or classify the attack image 404. If the inference result indicates that the attack has failed, the loss calculation unit 407 and the parameter update unit 408 repeatedly train the generator 107 and obtain inference results from the AI model until the attack is successful. After evaluating the attack images 404 based on all original images 401, the evaluation unit 405 calculates the average accuracy of the inference results and the number of iterations, and outputs them as the evaluation result 406.

次に、図５を参照して、ＡＩモデルロバスト性評価部３６０の動作の流れについて説明する。 Next, the operation flow of the AI model robustness evaluation unit 360 will be explained with reference to FIG. 5.

図５は、ＡＩモデルロバスト性評価部３６０の動作を示すフローチャートである。 Figure 5 is a flowchart showing the operation of the AI model robustness evaluation unit 360.

まず、ＡＩモデルロバスト性評価部３６０に対して、評価用データセット内の元画像４０１とクラス条件４０２を含む必要な情報が入力される（ステップｓ１１）。 First, the necessary information including the original image 401 and class condition 402 in the evaluation dataset is input to the AI model robustness evaluation unit 360 (step S11).

次に、ＡＩモデルロバスト性評価部３６０内では、攻撃エンコーダ１０３およびジェネレータ１０７が、与えられた元画像４０１がクラス条件４０２に示された対象クラスに誤検出あるいは誤分類されるような攻撃エンベディング４０３および攻撃画像４０４を生成する（ステップｓ１２）。 Next, within the AI model robustness evaluation unit 360, the attack encoder 103 and generator 107 generate an attack embedding 403 and an attack image 404 such that the given original image 401 is misdetected or misclassified into the target class indicated in the class condition 402 (step S12).

次に、評価部４０５は、生成された攻撃画像４０４をＡＩモデルに入力し、ＡＩモデルによる推論結果を取得する（ステップｓ１３）。そして、評価部４０５は、推論結果とクラス条件４０２に示された対象クラスとを比較して、攻撃の成否を判定する（ステップｓ１４）。 Next, the evaluation unit 405 inputs the generated attack image 404 into the AI model and obtains an inference result by the AI model (step S13). The evaluation unit 405 then compares the inference result with the target class indicated in the class condition 402 to determine whether the attack is successful or not (step S14).

攻撃が成功していれば、評価部４０５は、推論結果を評価結果４０６に反映させる（ステップｓ１５）。攻撃が失敗していたら、損失算出部４０７が、式（７）に基づいて微調整損失Ｌ_ｆｔを算出し（ステップｓ１６）、パラメータ更新部４０８が、微調整損失Ｌ_ｆｔに基づいてジェネレータ１０７のパラメータを更新し（ステップｓ１７）、反復回数のカウントを＋１してステップｓ１２に戻る。ＡＩモデルロバスト性評価部３６０は、ステップｓ１６、ｓ１７、ｓ１２、ｓ１３を一例の更新処理とし、攻撃が成功するまで更新処理を繰り返し、最終的に攻撃が成功していれば、評価部４０５は、推論結果と反復回数を評価結果４０６に反映させる（ステップｓ１５）。 If the attack is successful, the evaluation unit 405 reflects the inference result in the evaluation result 406 (step s15). If the attack is unsuccessful, the loss calculation unit 407 calculates the fine-tuning loss L _ft based on the formula (7) (step s16), the parameter update unit 408 updates the parameters of the generator 107 based on the fine-tuning loss L _ft (step s17), increments the count of the number of iterations by +1, and returns to step s12. The AI model robustness evaluation unit 360 repeats the update process until the attack is successful, with steps s16, s17, s12, and s13 being examples of update processes, and if the attack is ultimately successful, the evaluation unit 405 reflects the inference result and the number of iterations in the evaluation result 406 (step s15).

以上、本発明の実施形態について述べてきたが、本発明は、ここに示された実施形態だけに限定されるものではなく、本発明の技術思想の範囲内において、これらの実施形態を組み合わせて使用したり、一部の構成を変更したりしてもよい。また、上記実施形態の一部又は全部は以下の事項を含む。ただし、本発明が以下の事項に限定されるものではない。 Although the embodiments of the present invention have been described above, the present invention is not limited to the embodiments shown here, and these embodiments may be used in combination or some configurations may be changed within the scope of the technical concept of the present invention. In addition, some or all of the above embodiments include the following items. However, the present invention is not limited to the following items.

（事項１）
敵対的攻撃対策支援システムは、元画像から攻撃エンベディングをエンコードする攻撃エンコーダと、前記攻撃エンコーダから出力された前記攻撃エンベディングにランダムノイズを加えるノイズ追加部と、前記ランダムノイズが加えられた攻撃エンべディングを用いて推論モデルを攻撃するための攻撃画像を生成するジェネレータと、前記推論モデルを用いて前記攻撃画像に対する推論を行い、推論結果を取得する推論部と、前記攻撃画像と前記元画像との識別を行い、識別結果を算出する識別部と、前記推論結果および前記識別結果に基づいて前記攻撃画像を訓練するための損失である訓練損失を算出する損失算出部と、前記訓練損失に基づいて前記攻撃エンコーダ、前記ジェネレータ、および識別部を更新するパラメータ更新部と、を有する。 (Item 1)
The adversarial attack countermeasure support system includes an attack encoder that encodes an attack embedding from an original image, a noise adding unit that adds random noise to the attack embedding output from the attack encoder, a generator that generates an attack image for attacking an inference model using the attack embedding with the random noise added, an inference unit that performs inference on the attack image using the inference model and obtains an inference result, a discrimination unit that discriminates between the attack image and the original image and calculates the discrimination result, a loss calculation unit that calculates a training loss, which is a loss for training the attack image based on the inference result and the discrimination result, and a parameter update unit that updates the attack encoder, the generator, and the discrimination unit based on the training loss.

これによれば、ランダムノイズを加えた攻撃エンベディングを用いた攻撃画像による攻撃の評価に基づいて攻撃エンコーダおよびジェネレータのパラメータを更新し、攻撃画像を改善するので、多様な攻撃パターンの攻撃画像に対するＡＩモデルのロバスト性の評価を支援することが可能になる。 This allows the parameters of the attack encoder and generator to be updated based on the evaluation of attacks using attack images that use attack embeddings with added random noise, and improves the attack images, making it possible to assist in the evaluation of the robustness of AI models against attack images with a variety of attack patterns.

（事項２）
事項１に記載の敵対的攻撃対策支援システムにおいて、前記損失算出部は、前記元画像と前記攻撃画像の差分を所定範囲に制限しようとするための第１の損失を計算し、前記識別部が前記元画像と前記攻撃画像を正しく識別しようとし前記ジェネレータが前記元画像と前記攻撃画像を誤って識別させようとして敵対する第２の損失を計算し、前記推論モデルによる推論に対する攻撃が成功する可能性を高めようとするための第３の損失を計算し、前記攻撃が成功する可能性の高い攻撃画像のランダムノイズが加えられていない攻撃エンベディングとランダムノイズが加えられた攻撃エンベディングとの距離を小さくしようとするための第４の損失を計算し、前記第１の損失と前記第２の損失と前記第３の損失と前記第４の損失を所定の重みづけをして合計することにより前記訓練損失を計算する。 (Item 2)
In the adversarial attack countermeasure support system described in item 1, the loss calculation unit calculates a first loss for attempting to limit the difference between the original image and the attack image to a predetermined range, calculates a second loss for attempting to cause the classification unit to correctly identify the original image and the attack image and the generator to erroneously identify the original image and the attack image, calculates a third loss for attempting to increase the possibility that an attack against an inference by the inference model will be successful, calculates a fourth loss for attempting to reduce the distance between an attack embedding to which no random noise has been added of an attack image to which the attack is likely to be successful and an attack embedding to which random noise has been added, and calculates the training loss by summing the first loss, the second loss, the third loss, and the fourth loss with a predetermined weighting.

これによれば、多様な攻撃パターンの攻撃画像に対するＡＩモデルのロバスト性の評価を支援することが可能になる。 This makes it possible to assist in evaluating the robustness of AI models against attack images with a variety of attack patterns.

（事項３）
事項２に記載の敵対的攻撃対策支援システムにおいて、前記損失算出部による前記訓練損失の算出と前記パラメータ更新部による前記攻撃エンコーダ、前記ジェネレータ、および識別部の更新を所定の終了条件が満たされるまで繰り返して得られた前記攻撃エンコーダおよび前記ジェネレータを用いて攻撃画像を生成する攻撃画像生成部を更に有する。これによれば、推論モデルの攻撃に成功する可能性の高い攻撃画像を生成することができる。 (Item 3)
The adversarial attack countermeasure support system according to item 2 further includes an attack image generation unit that generates an attack image using the attack encoder and the generator obtained by repeating the calculation of the training loss by the loss calculation unit and the update of the attack encoder, the generator, and the identification unit by the parameter update unit until a predetermined termination condition is satisfied. This makes it possible to generate an attack image that is likely to be successful in attacking an inference model.

（事項４）
事項３に記載の敵対的攻撃対策支援システムにおいて、前記攻撃画像生成部により生成された攻撃画像を前記推論モデルに入力して推論結果を取得し、前記攻撃画像と元画像との識別を行って識別結果を算出し、前記推論結果および前記識別結果に基づいて前記攻撃画像を微調整するための損失である微調整損失を算出し、前記推論結果にて攻撃が成功しなかった場合、前記微調整損失に基づいて前記ジェネレータを更新し、更新された前記ジェネレータを前記攻撃画像生成部に用いて生成した攻撃画像を前記推論モデルに入力して推論結果および識別結果を取得するという一連の更新処理を攻撃が成功するまで繰り返し、前記更新処理の反復回数を前記推論モデルの評価結果として出力するロバスト性評価部を更に有する。 (Item 4)
The adversarial attack countermeasure support system described in item 3 further includes a robustness evaluation unit that repeats a series of update processes, including inputting an attack image generated by the attack image generation unit to the inference model to obtain an inference result, identifying the attack image from an original image to calculate an identification result, calculating a fine-tuning loss that is a loss for fine-tuning the attack image based on the inference result and the identification result, updating the generator based on the fine-tuning loss if the attack is not successful based on the inference result, inputting an attack image generated using the updated generator to the attack image generation unit to the inference model to obtain an inference result and an identification result, until the attack is successful, and outputs the number of iterations of the update process as an evaluation result of the inference model.

これによれば、攻撃が成功するまでジェネレータの更新を繰り返しその反復回数を評価結果とするので、ジェネレータの能力に依存しない方法でロバスト性を評価することができる。 This allows the generator to be repeatedly updated until the attack is successful, and the number of iterations is used as the evaluation result, making it possible to evaluate robustness in a way that is not dependent on the capabilities of the generator.

（事項５）
事項４に記載の敵対的攻撃対策支援システムにおいて、前記ロバスト性評価部は、前記第１の損失と前記第２の損失と前記第３の損失を所定の重みづけをして合計することにより前記微調整損失を計算する。これによれば、適切な更新処理により推論モデルの攻撃が成功するまでの更新処理の反復回数により適切なロバスト性の評価が可能となる。 (Item 5)
In the adversarial attack countermeasure support system described in item 4, the robustness evaluation unit calculates the fine-tuning loss by summing the first loss, the second loss, and the third loss with a predetermined weighting. This makes it possible to appropriately evaluate robustness based on the number of iterations of the update process until an attack on an inference model is successful through an appropriate update process.

１０３…攻撃エンコーダ、１０５…ノイズ追加部、１０７…ジェネレータ、１０９…推論部、１１０…識別部、１１１…損失算出部、１１２…パラメータ更新部、３００…コンピュータシステム、３０２…プロセッサ、３０４…メモリ、３０６…メモリバス、３０８…Ｉ／Ｏバス、３０９…バスインターフェース部、３１０…Ｉ／Ｏバスインターフェース部、３１２…端末インターフェース、３１４…ストレージインターフェース、３１６…Ｉ／Ｏデバイスインターフェース、３１６…デバイスインターフェース、３１８…ネットワークインターフェース、３２０…Ｉ／Ｏデバイス、３２２…ストレージ装置、３２４…表示システム、３２６…表示装置、３３０…ネットワーク、３４０…攻撃生成モデル学習部、３５０…攻撃画像生成部、３６０…ＡＩモデルロバスト性評価部、４０５…評価部
４０６…評価結果、４０７…損失算出部、４０８…パラメータ更新部 103...attack encoder, 105...noise addition unit, 107...generator, 109...inference unit, 110...identification unit, 111...loss calculation unit, 112...parameter update unit, 300...computer system, 302...processor, 304...memory, 306...memory bus, 308...I/O bus, 309...bus interface unit, 310...I/O bus interface unit, 312...terminal interface, 314...storage interface, 316...I/O device interface, 316...device interface, 318...network interface, 320...I/O device, 322...storage device, 324...display system, 326...display device, 330...network, 340...attack generation model learning unit, 350...attack image generation unit, 360...AI model robustness evaluation unit, 405...evaluation unit 406...evaluation result, 407...loss calculation unit, 408...parameter update unit

Claims

an attack encoder that encodes an attack embedding from an original image;
a noise adding unit that adds random noise to the attack embedding output from the attack encoder;
A generator that generates an attack image for attacking an inference model using the attack embedding to which the random noise is added;
an inference unit that performs inference on the attack image using the inference model and obtains an inference result;
a classification unit that classifies the attack image and the original image and calculates a classification result;
a loss calculation unit that calculates a training loss, which is a loss for training the attack image, based on the inference result and the classification result;
a parameter updater that updates the attack encoder, the generator, and the discriminator based on the training loss;
A support system for countering hostile attacks.

The loss calculation unit is
Calculating a first loss for limiting a difference between the original image and the attack image to a predetermined range;
Calculating a second loss in which the classifier tries to correctly classify the original image and the attack image and the generator tries to incorrectly classify the original image and the attack image;
Calculating a third loss to attempt to increase the likelihood of a successful attack against an inference by the inference model;
Calculating a fourth loss for reducing the distance between an attack embedding to which no random noise has been added and an attack embedding to which random noise has been added of the attack image to which the attack is likely to be successful;
calculating the training loss by summing the first loss, the second loss, the third loss, and the fourth loss with a predetermined weighting;
The hostile attack countermeasure support system according to claim 1 .

and an attack image generation unit that generates an attack image using the attack encoder and the generator obtained by repeating the calculation of the training loss by the loss calculation unit and the update of the attack encoder, the generator, and the discrimination unit by the parameter update unit until a predetermined termination condition is satisfied.
The hostile attack countermeasure support system according to claim 2.

a robustness evaluation unit that repeats a series of update processes, including inputting an attack image generated by the attack image generation unit into the inference model to obtain an inference result, discriminating between the attack image and an original image to calculate a discrimination result, calculating a fine-tuning loss, which is a loss for fine-tuning the attack image based on the inference result and the discrimination result, updating the generator based on the fine-tuning loss if the attack is not successful based on the inference result, inputting an attack image generated by the updated generator into the inference model to obtain an inference result and a discrimination result, until the attack is successful, and outputs the number of iterations of the update process as an evaluation result of the inference model.
The hostile attack countermeasure support system according to claim 3.

the robustness evaluation unit calculates the fine-tuning loss by summing the first loss, the second loss, and the third loss with a predetermined weighting;
The hostile attack countermeasure support system according to claim 4.

The computer
The attack encoder encodes the attack embedding from the original image.
adding random noise to the attack embedding output from the attack encoder;
A generator generates an attack image for attacking the inference model using the attack embedding to which the random noise has been added;
Performing inference on the attack image using the inference model to obtain an inference result;
A discrimination unit discriminates between the attack image and the original image and calculates a discrimination result;
Calculating a training loss, which is a loss for training the attack image, based on the inference result and the classification result;
updating the attack encoder, the generator, and the discriminator based on the training loss;
A method for supporting countermeasures against adversarial attacks.

The attack encoder encodes the attack embedding from the original image.
adding random noise to the attack embedding output from the attack encoder;
A generator generates an attack image for attacking the inference model using the attack embedding to which the random noise has been added;
Performing inference on the attack image using the inference model to obtain an inference result;
A discrimination unit discriminates between the attack image and the original image and calculates a discrimination result;
Calculating a training loss, which is a loss for training the attack image, based on the inference result and the classification result;
updating the attack encoder, the generator, and the discriminator based on the training loss.
A program to be executed by a computer to assist in countering hostile attacks.