CN111209497B

CN111209497B - DGA domain name detection method based on GAN and Char-CNN

Info

Publication number: CN111209497B
Application number: CN202010007697.0A
Authority: CN
Inventors: 杨超; 杨延洲; 苏锐丹; 郑昱; 尤伟; 陈明哲; 王潇皓
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-01-05
Filing date: 2020-01-05
Publication date: 2022-03-04
Anticipated expiration: 2040-01-05
Also published as: CN111209497A

Abstract

The invention provides a DGA domain name detection method based on GAN and Char-CNN, which is used for solving the problem of low detection recall rate of a low-randomness DGA domain name in the prior art and comprises the following implementation steps: acquiring a training sample set and a verification sample set; constructing and generating a countermeasure network GAN and a character-level convolutional neural network Char-CNN; generating an antagonistic network GAN and performing iterative training; acquiring an augmentation training set; performing iterative training on the character-level convolutional neural network Char-CNN; and detecting the domain name based on the trained character-level convolutional neural network Char-CNN'. According to the method, the antagonistic domain name is generated by using the GAN to augment the data set, the richness of the training sample set is improved, the error rate of the detection model is reduced by the residual block structure, the detection recall rate of the low-randomness DGA domain name is improved, meanwhile, the hyper-parameters needing to be calculated by the Char-CNN are few, and the training time of the detection model is shortened.

Description

DGA domain name detection method based on GAN and Char-CNN

Technical Field

The invention belongs to the technical field of network security, relates to a DGA domain name detection method, and particularly relates to a DGA domain name detection method based on GAN and Char-CNN, which can be used for positioning infected hosts, closing botnets and defending network attacks.

Background

The DGA domain name is a domain name periodically generated by using domain name Generation algorithm DGA (domain Generation algorithms) according to random seeds such as numbers, dates, Twitter hotspots, and the like. Network attackers register DGA domain names as the medium for bots to communicate with command and control servers, and these large number of potential DGA domain names make it difficult for law enforcement personnel to effectively shut down the botnet. The DGA domain name seriously threatens the safety of a network host, and particularly, the emerging low-randomness DGA domain name is strong in concealment and larger in threat, so that the DGA domain name is significant in effective detection. The DGA domain name detection task is to extract the characteristics of the domain name, calculate the extracted characteristics, output the prediction probability and further detect whether the domain name is the DGA domain name. Indexes for evaluating the detection effect of the DGA domain names are many, such as a working characteristic curve of a subject, an F1 value, a detection recall rate and the like, wherein the detection recall rate indicates a ratio of the detected DGA domain names to all DGA domain names, and thus is important for evaluating the detection recall rate indexes.

The DGA domain name detection method can be classified into a blacklist-based DGA domain name detection method, a machine learning-based DGA domain name detection method, and a deep learning-based DGA domain name detection method. The DGA domain name detection method based on the blacklist detects whether the domain name is the DGA domain name or not by judging whether the domain name is in a preset blacklist list or not, and the blacklist needs to be updated continuously, so that the method is poor in real-time performance. The DGA domain name detection method based on machine learning comprises the steps of manually extracting the characteristics of the length, the information entropy, the vowel character proportion, the number of repeated characters and the like of a domain name, detecting the DGA domain name by using machine learning algorithms such as a support vector machine and a random forest, and carrying out real-time detection. According to the DGA domain name detection method based on deep learning, potential features of a domain name are automatically extracted through a neural network model, prediction probability is output after neuron calculation, and therefore whether the domain name is the DGA domain name or not is detected.

In order to solve the problem, methods for extracting multidimensional characteristics of domain names through an integrated neural network and further detecting the DGA domain names are continuously provided in recent years. For example, an article, "integrated DGA domain name detection method based on deep learning" was published in 2018, volume 37, phase 10, "information technology and network security", by people such as ralla Yun, a middle electric great wall internet system application limited company, and an integrated DGA domain name detection method based on deep learning is proposed. The method integrates a Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN) in deep learning, and constructs an integrated detection model consisting of a character embedding layer, a feature extraction layer and a classification layer. The characteristic extraction layer adopts a CNN model and an RNN model to automatically extract the characteristics of the input characters from the dimensions of space and time respectively, and the detection recall rate of the DGA domain name is effectively improved. However, this method still has disadvantages: the low randomness DGA domain names contained in the training sample set are too small in number and low in richness, and meanwhile, the problem of gradient disappearance occurs when the network level is too deep, so that the error rate is increased, and the detection recall rate of the low randomness DGA domain names is low; the calculation of each time step in the recurrent neural network RNN depends on the calculation and the output of the previous time step, so that more hyper-parameters need to be calculated, and the training time of the detection model is increased.

Disclosure of Invention

The invention aims to provide a DGA domain name detection method based on GAN and Char-CNN aiming at the defects of the prior art, which is used for solving the problem of low detection recall rate of low-randomness DGA domain name in the prior art.

In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:

(1) acquiring a training sample set and a verification sample set:

(1a) sequentially selecting the first L hot domain names from the hot domain name set Alexa to form a training sample set A, wherein L is more than or equal to 600000;

(1b) randomly selecting M benign domain names with the class of 0 from a benign domain name set TRANCO, labeling the class of each benign domain name, randomly selecting N DGA domain names with the class of 1 from a DGA domain name set DGArchive, labeling the class of each DGA domain name, then combining alpha, M benign domain names, alpha, N DGA domain names and labels corresponding to the domain names into a training sample set B, combining the rest M-alpha, M benign domain names, the rest N-alpha, N DGA domain names and labels corresponding to the domain names into a verification sample set, wherein M is more than or equal to 100000, N is more than or equal to 100000, and alpha is more than or equal to 0.6 and less than or equal to 0.8;

(2) constructing and generating a countermeasure network GAN and a character-level convolutional neural network Char-CNN:

constructing a generation countermeasure network GAN comprising a generator network and a discriminator network, wherein the generator network comprises a full connection layer, a plurality of residual blocks, a one-dimensional convolution layer and an activation layer; the discriminator network comprises a one-dimensional convolution layer, a plurality of residual blocks and a full connection layer;

constructing a character-level convolutional neural network Char-CNN comprising an embedded layer, a plurality of one-dimensional convolutional layers, a plurality of active layers, a plurality of one-dimensional maximum pooling layers, a plurality of residual blocks, a Dropout layer and a plurality of fully-connected layers;

(3) generating an anti-network GAN for iterative training:

(3a) let the number of iterations be q₁Maximum number of iterations is Q₁，Q₁Not less than 2000, and q is₁＝0；

(3b) Will random noise₁Calculating as the input of a generator network to obtain m confrontation domain name vectors, and simultaneously coding m hot domain names randomly selected from a training sample set A to obtain m hot domain name vectors, wherein m is more than or equal to 64 and less than or equal to L;

(3c) predicting by taking m confrontation domain name vectors and m hot domain name vectors as the input of a discriminator network to obtain a probability set

Wherein,

for the probability that the ith antagonistic domain name vector originates from the training sample set A, d_jThe probability that the jth hot domain name vector is derived from the training sample set A is represented by i being more than or equal to 1 and less than or equal to m, and j being more than or equal to 1 and less than or equal to m;

(3d) according to

Loss of compute generator network_gLoss of sum arbiter network_d；

(3e) Using Adam's algorithm and passing through loss_gAnd loss_dTraining the generation antagonistic network GAN and judging q₁＝Q₁If yes, obtaining a trained generation confrontation network GAN', otherwise, making q₁＝q₁+1, and performing step (3 b);

(4) obtaining an augmentation training set:

(4a) will random noise₂Calculating as a trained input for generating an antagonistic network GAN' to obtain P antagonistic domain name vectors, and decoding each antagonistic domain name vector to obtain P antagonistic domain names with the category of 1, wherein P is more than or equal to 20000 and less than or equal to L;

(4b) labeling the category of each confrontation domain name, and adding P confrontation domain names and the label of each confrontation domain name into a training sample set B to obtain an augmented training set;

(5) performing iterative training on a character-level convolutional neural network Char-CNN:

(5a) let the number of iterations be q₂Maximum number of iterations is Q₂，Q₂Not less than 1000, and let q₂＝0；

(5b) Coding n domain names randomly selected from an augmented training set to obtain n domain name vectors, and predicting the n domain name vectors as the input of a character-level convolutional neural network Char-CNN to obtain a probability set { p }₁,p₂,...,p_k,...,p_nIn which p is_kThe probability that the category of the kth domain name is 1 is more than or equal to 1 and less than or equal to N, and the probability that N is more than or equal to 32 and less than or equal to (alpha M + alpha N + P);

(5c) according to { p₁,p₂,...,p_k,...,p_nCalculating loss of the character-level convolutional neural network Char-CNN;

(5d) training a character-level convolutional neural network Char-CNN by adopting an RMSprop algorithm and through a value of lossObtaining the trained Char-CNN model Char-CNnq₂；

(5e) C verification domain names randomly selected from the verification sample set are coded to obtain c verification domain name vectors, and the c verification domain name vectors are used as Char-CNNq₂Is predicted to obtain a probability set

Wherein,

the probability that the category of the verification domain name is 1 is the v-th verification domain name, v is more than or equal to 1 and less than or equal to c, and c is more than or equal to 32 and less than or equal to (M-alpha M + N-alpha N);

(5f) according to

Calculating the detection Accuracy Accuracy of the c verification samples;

(5g) judging q₂＝Q₂Whether the result is true or whether Accuracy is not increased any more is judged, if yes, a trained character-level convolutional neural network Char-CNN' is obtained, and otherwise, q is made₂＝q₂+1, and performing step (5 b);

(6) detecting the domain name based on the trained character-level convolutional neural network Char-CNN':

(6a) setting the number of the domain names to be detected as t, and coding each domain name to be detected to obtain t domain name vectors to be detected, wherein t is more than or equal to 1;

(6b) predicting t domain name vectors to be detected as input of the trained character-level convolutional neural network Char-CNN' to obtain a probability set

And judge

If the result is true, the u-th domain name to be detected is the DGA domain name, otherwise, the u-th domain name to be detected is the non-DGA domain name,

the probability that the category of the u-th domain name to be detected is 1 is shown, and u is more than or equal to 1 and less than or equal to t.

Compared with the prior art, the invention has the following advantages:

firstly, the confrontation domain name is generated by generating the confrontation network GAN, and a generator network and a discriminator network in the generated confrontation network GAN are trained together to mutually game, so that the generated confrontation domain name can well simulate the hot domain name with low randomness; meanwhile, the residual block relieves the problem of gradient disappearance of a deep network through a target function of conversion learning, and reduces the error rate of a detection model, so that the detection recall rate of the low-randomness DGA domain name is further improved, and a simulation result shows that the detection recall rate is improved by 28.3 percent compared with the prior art.

Secondly, the DGA domain name is detected through the character-level convolutional neural network Char-CNN, the Char-CNN learns local features through convolutional calculation and then obtains overall features through aggregation, compared with the cyclic neural network RNN, the number of hyper-parameters needing to be calculated is less, meanwhile, the structure of a residual block in the Char-CNN is simple, the learning speed is high, and therefore the training time of a detection model is shortened.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a block diagram of the present invention for generating residual blocks in the countermeasure network GAN and the character level convolutional neural network Char-CNN;

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments.

Referring to fig. 1, the present invention includes the steps of:

(1) acquiring a training sample set and a verification sample set:

referring to fig. 2, a generator network, a discriminator network and a character-level convolutional neural network Char-CNN, wherein the residual block contained therein includes 2 active layers and 2 one-dimensional convolutional layers: the first active layer → the first one-dimensional convolution layer → the second active layer → the second one-dimensional convolution layer, wherein the activation function of the active layer is ReLU; the output space dimension of the one-dimensional convolution layer is 128, the size of the convolution kernel is 5, and the step length of the convolution kernel movement is 1 character; the input x of the first active layer and the output f (x) of the second one-dimensional convolution layer are added in a jump mode, the target function finally learned by the residual block is h (x), h (x) eta f (x) and x + x, wherein eta is a weight coefficient, and 0 is larger than or equal to eta and smaller than or equal to 1.

The target function of common deep network learning is f (x) ═ x, the derivative of the target function is constantly 1, the problem of gradient disappearance in the back propagation process can be caused, the problem of gradient disappearance of the deep network is relieved by the residual block through converting the learned target function, the error rate of the detection model is reduced, the detection recall rate of the low-randomness DGA domain name is improved, meanwhile, the residual block is simple in structure and high in learning speed, and the training time of the detection model is shortened.

The number of the residual blocks contained in the generator network and the arbiter network in the generation countermeasure network GAN is 5, where:

the specific structure of the generator network is as follows: fully-connected layer → first residual block → second residual block → third residual block → fourth residual block → fifth residual block → one-dimensional convolution layer → active layer, wherein the fully-connected layer has an input spatial dimension of 128 and an output spatial dimension of 128 × 63; the weight coefficients of all the residual blocks are 0.3; the output space dimension of the one-dimensional convolutional layer is 38, the size of the convolutional kernel is 1, and the step length of the convolutional kernel movement is 1 character; the activation function of the activation layer is Softmax;

the specific structure of the discriminator network is as follows: one-dimensional convolution layer → first residual block → second residual block → third residual block → fourth residual block → fifth residual block → full-connected layer, wherein the input space dimension of the one-dimensional convolution layer is 38, the output space dimension is 128, the convolution kernel size is 1, and the step length of the convolution kernel movement is 1 character; the weight coefficients of all the residual blocks are 0.3; the output space dimension of the fully connected layer is 1;

the number of one-dimensional convolutional layers contained in a character-level convolutional neural network Char-CNN is 2, the number of active layers is 4, the number of one-dimensional maximum pooling layers is 2, the number of residual blocks is 3, the number of full-link layers is 2, and the specific structure of the Char-CNN is as follows: the embedded layer → the first one-dimensional convolutional layer → the first active layer → the first one-dimensional maximum pooling layer → the second one-dimensional convolutional layer → the second active layer → the second one-dimensional maximum pooling layer → the first fully-connected layer → the first residual block → the second residual block → the third active layer → the Dropout layer → the second fully-connected layer → the fourth active layer, wherein the embedded layer has an input spatial dimension of 38, an output spatial dimension of 128, and a sequence length of 63; the output space dimensionality of all the one-dimensional convolutional layers is 128, the moving step length of the convolution kernel is 1 character, the convolution kernel size of the first one-dimensional convolutional layer is 3, and the convolution kernel size of the second one-dimensional convolutional layer is 2; the activation functions of the first, second and third activation layers are all ThresholdReLU, and the activation function of the fourth activation layer is Sigmoid; all the one-dimensional maximum pooling layers are filled in a same mode, and the size of a pooling window is 2; the weight coefficients of all the residual blocks are 0.3; the drop rate of the Dropout layer is 0.5; the output spatial dimension of the first fully-connected layer is 64 and the output spatial dimension of the second fully-connected layer is 1.

(3) Generating an anti-network GAN for iterative training:

(3b) Random noise is generated by using random _ normal function contained in third-party library NumPy in Python language₁To give noise₁Calculating as the input of a generator network to obtain m confrontation domain name vectors, and simultaneously coding m hot domain names randomly selected from a training sample set A to obtain m hot domain name vectors, wherein m is more than or equal to 64 and less than or equal to L;

Wherein,

(3d) according to

Loss of compute generator network_gLoss of sum arbiter network_dThe calculation formulas are respectively as follows:

(4) obtaining an augmentation training set:

(4a) random noise is generated by using random _ normal function contained in third-party library NumPy in Python language₂To give noise₂Calculating as a trained input for generating an antagonistic network GAN' to obtain P antagonistic domain name vectors, and decoding each antagonistic domain name vector to obtain P antagonistic domain names with the category of 1, wherein P is more than or equal to 20000 and less than or equal to L;

the confrontation domain names generated by mutual game of the generator network and the discriminator network in the GAN can well simulate the hot domain names with low randomness, are generated by an algorithm and have low randomness, can be regarded as DGA domain names with low randomness, and can be added into the training sample set to improve the richness of the training sample set and effectively improve the detection recall rate of the DGA domain names with low randomness.

(5b) Coding n domain names randomly selected from the augmented training set to obtain n domain name vectors, andpredicting n domain name vectors as the input of a character-level convolutional neural network Char-CNN to obtain a probability set { p }₁,p₂,...,p_k,...,p_nIn which p is_kThe probability that the category of the kth domain name is 1 is more than or equal to 1 and less than or equal to N, and the probability that N is more than or equal to 32 and less than or equal to (alpha M + alpha N + P);

(5c) according to { p₁,p₂,...,p_k,...,p_nAnd calculating loss of the character-level convolutional neural network Char-CNN, wherein the calculation formula is as follows:

wherein, y_kTrue category for the kth domain name;

(5d) training a character-level convolutional neural network Char-CNN by adopting an RMSprop algorithm and a loss value to obtain a trained Char-CNN model Char-CNNq₂；

Wherein,

(5f) according to

Calculating the detection Accuracy of the c verification samples, wherein the calculation formula is as follows:

wherein tp is the number of samples of which the real category is 1 and the probability of predicting the category to be 1 is greater than 0.5 in the c verification samples; tn is the number of samples with the true category of 0 in the verification samples and the probability of predicting the category of 1 not more than 0.5;

the character-level convolutional neural network Char-CNN is a feedforward neural network which comprises convolutional calculation and has a deep structure, local learning features are reunited to obtain overall features, potential features can be fully extracted, compared with a Recurrent Neural Network (RNN), the number of hyper-parameters needing calculation is less, meanwhile, a residual block in the convolutional neural network has a simple structure and high learning speed, and therefore training time of a detection model is shortened.

And judge

The process of domain name coding involved in the above steps is: firstly, establishing mapping from characters to numbers according to an effective character set in a domain name, then traversing the characters in the domain name in sequence, converting the characters into corresponding numbers one by one, and finally filling 0 to obtain domain name vectors with the same length; the process of domain name decoding is as follows: firstly, mapping from numbers to characters is established according to an effective character set in a domain name, then, numbers in a vector are traversed in sequence, non-0 numbers are converted into corresponding characters one by one, and finally, the domain name is obtained.

The technical effects of the present invention will be further described with reference to simulation experiments.

1. Simulation conditions and contents:

during simulation experiments, a training sample set A consists of the first 600000 popular domain names sequentially selected from a popular domain name set Alexa; the training sample set B consists of 80000 benign domain names randomly selected from a benign domain name set TRANCO, 80000 DGA domain names randomly selected from a DGA domain name set DGArchive and labels corresponding to the domain names; the verification sample set consists of 20000 benign domain names randomly selected from a benign domain name set TRANCO, 20000 DGA domain names randomly selected from a DGA domain name set DGArchive and labels corresponding to the domain names; the number of training iterations is 2000; the domain names to be detected comprise 1000 low-randomness DGA domain names and 1000 high-randomness DGA domain names. The hardware platform is an Intel Core i7-7700K @4.50GHz CPU, an 8GB RAM and an NVIDIA Geforce GTX2080 GPU, and the operating system is Ubuntu 16.04 LTS; the simulation experiment software platforms are Python 3.6.5, Tensorflow 1.3 and Keras 2.2.1.

Simulation I, comparing and simulating the detection recall rate of the low-randomness DGA domain name of the integrated DGA domain name detection method based on deep learning, wherein the result is shown in table 1;

secondly, comparing and simulating the training time of the detection model of the integrated DGA domain name detection method based on deep learning, wherein the result is shown in Table 2;

2. and (3) simulation result analysis:

TABLE 1

TABLE 2

Training time for prior art detection models	Training time of detection model of the invention
		724min	482min

As can be seen from Table 1, compared with the existing integrated DGA domain name detection method based on deep learning, the DGA domain name detection method based on GAN and Char-CNN provided by the invention has the advantages that the detection recall rate of the low-randomness DGA domain name is improved by 28.3% on the premise of keeping the detection recall rate of the traditional high-randomness DGA domain name, which shows that the DGA domain name detection method based on GAN and Char-CNN provided by the invention can well extract features, improve the richness of a training sample set, reduce the error rate of a detection model, and further improve the detection recall rate of the low-randomness DGA domain name, thereby having important practical significance.

As can be seen from Table 2, compared with the existing integrated DGA domain name detection method based on deep learning, the DGA domain name detection method based on GAN and Char-CNN provided by the invention shortens the training time of the detection model by 242 minutes, which shows that the DGA domain name detection method based on GAN and Char-CNN provided by the invention has fewer hyper-parameters to be calculated, the structure of the residual block in Char-CNN is simple, the learning speed is high, and further the training time of the detection model is shortened.

The foregoing description is only an example of the present invention and should not be construed as limiting the invention in any way, and it will be apparent to those skilled in the art that various changes and modifications in form and detail may be made therein without departing from the principles and arrangements of the invention, but such changes and modifications are within the scope of the invention as defined by the appended claims.

Claims

1. A DGA domain name detection method based on GAN and Char-CNN is characterized by comprising the following steps:

(1) acquiring a training sample set and a verification sample set:

(3) generating an anti-network GAN for iterative training:

(3b) Will random noise₁As the input of the generator network, calculating to obtain m confrontation domain name vectors, and simultaneously carrying out hot-gating on m randomly selected from the training sample set ACoding the domain name to obtain m hot domain name vectors, wherein m is more than or equal to 64 and less than or equal to L;

Wherein,

(3d) according to

Loss of compute generator network_gLoss of sum arbiter network_d；

(4) obtaining an augmentation training set:

Wherein,

(5f) according to

Calculating the detection Accuracy Accuracy of the c verification samples;

And judge

2. The GAN and Char-CNN based DGA domain name detection method of claim 1, wherein the generator network, the discriminator network and the character level convolutional neural network Char-CNN in step (2) comprise a residual block comprising 2 active layers and 2 one-dimensional convolutional layers: the first active layer → the first one-dimensional convolution layer → the second active layer → the second one-dimensional convolution layer, wherein the activation function of the active layer is ReLU; the output space dimension of the one-dimensional convolution layer is 128, the size of the convolution kernel is 5, and the step length of the convolution kernel movement is 1 character; the input x of the first active layer and the output f (x) of the second one-dimensional convolution layer are added in a jump mode, the target function finally learned by the residual block is h (x), h (x) eta f (x) and x + x, wherein eta is a weight coefficient, and 0 is larger than or equal to eta and smaller than or equal to 1.

3. The DGA domain name detection method based on GAN and Char-CNN as claimed in claim 1, wherein the generation of the antagonistic network GAN and the character level convolutional neural network Char-CNN in step (2) has the following specific structures:

the generation countermeasure network GAN, in which the generator network and the discriminator network each include 5 residual blocks, where:

the number of the one-dimensional convolutional layers contained in the character-level convolutional neural network Char-CNN is 2, the number of the active layers is 4, the number of the one-dimensional maximum pooling layers is 2, the number of the residual blocks is 3, the number of the full-connection layers is 2, and the specific structure of the Char-CNN is as follows: the embedded layer → the first one-dimensional convolutional layer → the first active layer → the first one-dimensional maximum pooling layer → the second one-dimensional convolutional layer → the second active layer → the second one-dimensional maximum pooling layer → the first fully-connected layer → the first residual block → the second residual block → the third active layer → the Dropout layer → the second fully-connected layer → the fourth active layer, wherein the embedded layer has an input spatial dimension of 38, an output spatial dimension of 128, and a sequence length of 63; the output space dimensionality of all the one-dimensional convolutional layers is 128, the moving step length of the convolution kernel is 1 character, the convolution kernel size of the first one-dimensional convolutional layer is 3, and the convolution kernel size of the second one-dimensional convolutional layer is 2; the activation functions of the first, second and third activation layers are all ThresholdReLU, and the activation function of the fourth activation layer is Sigmoid; all the one-dimensional maximum pooling layers are filled in a same mode, and the size of a pooling window is 2; the weight coefficients of all the residual blocks are 0.3; the drop rate of the Dropout layer is 0.5; the output spatial dimension of the first fully-connected layer is 64 and the output spatial dimension of the second fully-connected layer is 1.

4. The GAN and Char-CNN based DGA domain name detection method of claim 1 wherein the loss of generator network in step (3d)_gLoss of sum arbiter network_dThe calculation formulas are respectively as follows:

5. the method for detecting DGA domain name based on GAN and Char-CNN as claimed in claim 1, wherein the loss of the character level convolutional neural network Char-CNN in step (5c) is calculated as:

wherein, y_kThe true category of the kth domain name.

6. The GAN and Char-CNN based DGA domain name detection method of claim 1, wherein the detection Accuracy of the c verification samples in step (5f) is calculated by the following formula:

wherein tp is the number of samples of which the real category is 1 and the probability of predicting the category to be 1 is greater than 0.5 in the c verification samples; tn is the number of samples in which the true class is 0 and the probability of predicting class to be 1 is not more than 0.5 in the verification samples.

7. The GAN and Char-CNN based DGA domain name detection method of claim 1 wherein the random noise in step (3b)₁And the random noise described in the step (4a)₂All generated by using random _ normal function contained in the third-party library NumPy in Python language.