CN110830490B

CN110830490B - Malicious domain name detection method and system based on area confrontation training deep network

Info

Publication number: CN110830490B
Application number: CN201911111270.9A
Authority: CN
Inventors: 朱斐
Original assignee: Suzhou University
Current assignee: Wuxi Fushi Printing And Painting Education Technology Co ltd
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2022-08-02
Anticipated expiration: 2039-11-14
Also published as: CN110830490A

Abstract

The invention discloses a malicious domain name detection method and a system based on a band confrontation training deep network, wherein the method comprises the following steps: (1) acquiring a malicious domain name sample; (2) preprocessing a malicious domain name sample; (3) training a network model, namely selecting C-RNN-GAN to generate an antagonistic network model; (4) obtaining a suspicious domain name sample; (5) judging and outputting; (6) and judging the suspicious domain name. The invention discloses a malicious domain name detection method and system based on a deep network with countermeasure training, which utilize the characteristic of generating the countermeasure network to obtain a discriminator for calculating the truth of a domain name through the countermeasure training. The discriminator judges the robustness according to the multidimensional characteristics behind the domain name sample and can be used as a classifier for malicious domain name detection. The invention adopts the method of generating the countermeasure network, learns the data characteristics behind the malicious domain name sample, is fully suitable for the actual situation of attack and defense countermeasures of network security, and can realize self-learning and self-improvement. The accuracy of domain name classification is effectively improved.

Description

Malicious domain name detection method and system based on deep network with adversarial training

技术领域technical field

本发明涉及人工智能及控制技术领域，具体涉及一种基于带对抗训练深度网络的恶意域名检测方法及系统。The invention relates to the technical field of artificial intelligence and control, in particular to a malicious domain name detection method and system based on a deep network with confrontation training.

背景技术Background technique

域名系统(Domain Name System，简称DNS)是整个互联网组成的一部分，完成了IP地址与域名的相互映射，用于在网络通信时将域名解析成IP 地址，方便记忆和使用。如果DNS配置不合理，可能会导致网速缓慢、网站打不开，恶意的DNS甚至引起广告弹窗、欺诈、监听和劫持修改等恶意行为。Domain Name System (DNS) is a part of the entire Internet, which completes the mutual mapping between IP addresses and domain names, and is used to resolve domain names into IP addresses during network communication, which is convenient for memory and use. If the DNS configuration is unreasonable, it may lead to slow network speed, the website cannot be opened, and malicious DNS may even cause malicious behaviors such as advertisement pop-ups, fraud, monitoring and hijacking modification.

近年来，DNS的安全问题频发。DNS作为世界上最庞大最复杂的分布式数据库，由于其开放、复杂、庞大等特性以及设计之初对安全性考虑不周，再加上人为破坏，使得DNS很难应对日益复杂的现代通讯网络，DNS面临非常严重的安全威胁。其中，比较常见的安全威胁有DNS欺骗和分布式拒绝服务攻击。DNS欺骗是指服务器对错误的域名请求做出错误的域名解析。DNS 欺骗会引起诸多安全问题，例如将用户引导到钓鱼网站、欺诈网站等。分布式拒绝服务攻击(Distribution Denial of Service，简称DDoS)也是DNS 面临的安全威胁之一，它利用网络协议和操作系统的漏洞，采用欺骗和伪装的策略来进行网络攻击，使服务器耗尽计算资源从而无法处理合法用户的网络请求。例如僵尸网络等。因此，如何解决DNS的安全问题并寻求有效的解决方案是当前DNS亟待解决的问题之一。In recent years, DNS security problems have occurred frequently. DNS, as the largest and most complex distributed database in the world, due to its openness, complexity, hugeness and other characteristics, as well as ill-conceived security at the beginning of its design, coupled with man-made sabotage, makes it difficult for DNS to cope with the increasingly complex modern communication network , DNS faces a very serious security threat. Among them, the more common security threats are DNS spoofing and distributed denial of service attacks. DNS spoofing means that a server makes a wrong domain name resolution for a wrong domain name request. DNS spoofing can cause many security issues, such as directing users to phishing sites, fraudulent sites, and more. Distributed Denial of Service (DDoS) is also one of the security threats faced by DNS. It exploits the vulnerabilities of network protocols and operating systems, and uses deception and camouflage strategies to carry out network attacks, causing servers to exhaust computing resources. As a result, network requests from legitimate users cannot be processed. such as botnets, etc. Therefore, how to solve the security problem of DNS and seek an effective solution is one of the problems that needs to be solved urgently in the current DNS.

为了解决DNS安全问题，人们提出多种解决方案，其中比较常见的是域名检测，即综合计算当前可疑域名的可信度，检测当前域名是否合法。域名检测又可分为基于知识和基于机器学习的两类方法。基于知识的方法通过计算域名一同出现的概率进行可疑域名检测。这种方法虽然检测的准确率高，但需要大量的专家知识，由于受限于专家知识不充足，导致检测的查全率不能满足要求，漏检恶意域名；基于传统机器学习的方法要求大量的样本标记数据，使用聚类、支持向量机、决策树等算法计算和分类，此方法需要大量的人工标记数据和算法的配合，往往难以用于大规模应用实例。因此，需要提出一种新的方法，结合上述两类方法的优势，弥补二者不足，以获取更佳的域名检测效果。In order to solve the problem of DNS security, people have proposed a variety of solutions, among which the most common is domain name detection, which comprehensively calculates the credibility of the current suspicious domain name and detects whether the current domain name is legitimate. Domain name detection can be further divided into two categories: knowledge-based and machine learning-based. Knowledge-based methods detect suspicious domain names by calculating the probability of domain names appearing together. Although this method has high detection accuracy, it requires a lot of expert knowledge. Due to insufficient expert knowledge, the detection recall rate cannot meet the requirements, and malicious domain names are missed; traditional machine learning-based methods require a large number of Sample labeled data is calculated and classified using algorithms such as clustering, support vector machines, and decision trees. This method requires a large amount of manual labeled data and the cooperation of algorithms, and is often difficult to use in large-scale application instances. Therefore, it is necessary to propose a new method that combines the advantages of the above two methods to make up for the shortcomings of the two methods, so as to obtain a better domain name detection effect.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种基于带对抗训练深度网络的恶意域名检测方法及系统，有效地提高恶意域名检测的准确率。The purpose of the present invention is to provide a malicious domain name detection method and system based on a deep network with adversarial training, which can effectively improve the accuracy of malicious domain name detection.

为实现上述发明目的，本发明提供以下的技术方案：一种基于带对抗训练深度网络的恶意域名检测方法，包括如下步骤：In order to achieve the above purpose of the invention, the present invention provides the following technical solutions: a malicious domain name detection method based on a deep network with adversarial training, comprising the following steps:

(1)恶意域名样本获取，从威胁情报平台获取威胁情报，提取其中的恶意域名并查询恶意域名的相关维度信息，根据恶意行为，筛选网络攻击范畴和置信度高的恶意域名，形成恶意域名样本，建立恶意域名样本集；(1) Obtaining malicious domain name samples, obtaining threat intelligence from the threat intelligence platform, extracting malicious domain names and querying relevant dimension information of malicious domain names, and screening malicious domain names with high network attack scope and confidence according to malicious behavior to form malicious domain name samples , establish a malicious domain name sample set;

(2)网络模型训练，选用C-RNN-GAN生成对抗网络模型，所述网络模型包括生成器和判别器，使用恶意域名样本集作为所述网络模型的输入进行训练；(2) network model training, select C-RNN-GAN to generate an adversarial network model, the network model includes a generator and a discriminator, and use the malicious domain name sample set as the input of the network model for training;

(3)可疑域名样本获取，查询可疑域名的相关维度信息，形成可疑域名样本；(3) Obtaining suspicious domain name samples, querying the relevant dimension information of suspicious domain names, and forming suspicious domain name samples;

(4)判别输出，向训练后的所述网络模型的判别器输入可疑域名样本，得到当前计算的相似度值；(4) discriminating output, inputting suspicious domain name samples to the discriminator of the network model after training to obtain the currently calculated similarity value;

(5)判断可疑域名，判断相似度值是否小于当前阈值，如是，则可疑域名为恶意域名，将其作为恶意域名样本，加入恶意域名样本集中，如否，则可疑域名为合法域名。(5) Determine the suspicious domain name, and determine whether the similarity value is less than the current threshold. If so, the suspicious domain name is a malicious domain name, and it is regarded as a malicious domain name sample and added to the malicious domain name sample set. If not, the suspicious domain name is a legitimate domain name.

进一步的，恶意域名的相关维度信息包括如下信息中的一者或多者：Further, the relevant dimension information of the malicious domain name includes one or more of the following information:

网站排名信息，其为Alexa网站排名信息；Website ranking information, which is Alexa website ranking information;

页面收录量信息，其包括百度收录页面的数量、搜狗收录页面的数量以及必应收录页面的数量；Page index information, including the number of pages indexed by Baidu, the number of pages indexed by Sogou, and the number of pages indexed by Bing;

页面完整性信息，其中0表示无信息，1表示有信息；Page integrity information, where 0 means no information, 1 means there is information;

注册地信息，其中0表示国外注册，1表示国内注册；Registration place information, where 0 means foreign registration, 1 means domestic registration;

A记录信息，其中0表示无记录，1表示有记录；A record information, where 0 means no record, 1 means there is a record;

CNAME记录信息，其中0表示无记录，1表示有记录；CNAME record information, where 0 means no record, 1 means there is a record;

CDN使用记录信息，其中0表示无使用记录，1表示有使用记录；CDN usage record information, where 0 means no usage record, 1 means there is a usage record;

更新程度信息，其为恶意域名的更新次数；Update degree information, which is the number of updates of malicious domain names;

其中，A记录信息用来指定主机名或域名对应的IP地址记录；CNAME记录信息是指别名记录，记录将多个名字映射到同一台计算机；CDN使用记录是内容分发网络ContentDelivery Network使用记录，是指构建在现有网络基础之上的智能虚拟网络，以便用户就近获取所需内容，降低网络拥塞，提高用户访问响应速度和命中率。Among them, the A record information is used to specify the IP address record corresponding to the host name or domain name; the CNAME record information refers to the alias record, and the record maps multiple names to the same computer; the CDN usage record is the content delivery network ContentDelivery Network usage record, is Refers to the intelligent virtual network built on the basis of the existing network, so that users can obtain the desired content nearby, reduce network congestion, and improve user access response speed and hit rate.

进一步的，生成器和判别器的损失函数如下：Further, the loss functions of the generator and discriminator are as follows:

其中，S_G是生成器的损失函数，用于训练生成器；S_D是判别器的损失函数，用于训练判别器；G是生成器，用于生成样本；D是判别器，用于区分真实样本与生成样本；R是表示层，来自判别器的逻辑分类层的前一层； xⁱ表示恶意域名样本；zⁱ是用于生成器输入的随机序列向量，表示来自真实的样本数据；n表示当前恶意域名样本的数量。Among them, S _G is the loss function of the generator, used to train the generator; S _D is the loss function of the discriminator, used to train the discriminator; G is the generator, used to generate samples; D is the discriminator, used to distinguish Real samples and generated samples; R is the representation layer, which comes from the previous layer of the discriminator's logical classification layer; x ⁱ is the malicious domain name sample; z ⁱ is the random sequence vector used for the generator input, representing the real sample data; n represents the current number of malicious domain name samples.

进一步的，生成器和判别器均采用深度为2的LSTM长短期记忆网络。Further, both the generator and the discriminator adopt the LSTM long short-term memory network with a depth of 2.

进一步的，其中，采用阈值自学习的方式更新阈值，公式如下：Further, wherein, the threshold is updated by means of threshold self-learning, and the formula is as follows:

a_t＝min(d，a_t-1)，其中，a_t表示当前阈值，a_t-1为前一次阈值，d为相似度值。at = min( _d , at _-1 ), where at represents the current threshold, at _-1 is the previous threshold, and _d is the similarity value.

本发明还提供用于如上所述的恶意域名检测方法的恶意域名检测系统，包括：The present invention also provides a malicious domain name detection system for the malicious domain name detection method described above, including:

数据获取模块，其用于获取恶意域名样本和可疑域名样本；A data acquisition module, which is used to acquire malicious domain name samples and suspicious domain name samples;

数据预处理模块，其用于筛选恶意域名样本，组成恶意域名样本集；A data preprocessing module, which is used to filter malicious domain name samples to form a malicious domain name sample set;

网络模型，其采用C-RNN-GAN生成对抗网络模型，用于以恶意域名样本作为输入进行训练后以可疑域名样本作为输入并输出计算值；The network model, which uses C-RNN-GAN to generate an adversarial network model, is used for training with malicious domain name samples as input, and then takes suspicious domain name samples as input and outputs calculated values;

判断模块，其用于根据计算值和阈值判断可疑域名为恶意域名或合法域名。The judgment module is used to judge whether the suspicious domain name is a malicious domain name or a legitimate domain name according to the calculated value and the threshold value.

由于上述技术方案运用，本发明与现有技术相比具有以下优点：本发明公开的基于带对抗训练深度网络的恶意域名检测方法及系统，利用生成对抗网络的特性，对抗训练得到计算域名真假的判别器。本发明的方法及系统充分适合于网络安全的攻防对抗的实际情况，并且能够实现自我学习和自我完善。判别器依据域名样本背后的多维特征进行鲁棒性判断，可以作为恶意域名检测的分类器。本发明由于采用了生成对抗网络的方法，学习恶意域名样本背后的数据特征，有效的提高域名分类的准确率。Due to the application of the above technical solutions, the present invention has the following advantages compared with the prior art: the method and system for detecting malicious domain names based on a deep network with adversarial training disclosed in the present invention utilizes the characteristics of a generative adversarial network, and the adversarial training obtains the true and false computed domain names. the discriminator. The method and system of the present invention are fully suitable for the actual situation of network security attack and defense confrontation, and can realize self-learning and self-improvement. The discriminator makes robust judgments based on the multi-dimensional features behind the domain name samples, and can be used as a classifier for malicious domain name detection. Due to the method of generating a confrontation network, the invention learns the data features behind the malicious domain name samples, and effectively improves the accuracy of domain name classification.

附图说明Description of drawings

图1为本发明中恶意域名检测方法的流程图；1 is a flowchart of a malicious domain name detection method in the present invention;

图2为本发明中网络模型的结构图；Fig. 2 is the structure diagram of the network model in the present invention;

图3为本发明中恶意域名检测系统的结构图。FIG. 3 is a structural diagram of a malicious domain name detection system in the present invention.

具体实施方式Detailed ways

下面结合本发明的原理、附图以及实施例对本发明进一步描述。The present invention will be further described below with reference to the principles, drawings and embodiments of the present invention.

为克服现有恶意域名检测方法的不足，有效地提高恶意域名检测的准确率，本发明提出利用生成对抗网络的特性，对抗训练得到计算数据真假的判别器。判别器依据数据样本背后的多维特征进行鲁棒性判断，可以作为恶意域名检测的分类器。本发明由于采用了生成对抗网络的方法，学习恶意样本背后的数据特征，有效的提高数据分类的准确率。In order to overcome the deficiencies of the existing malicious domain name detection methods and effectively improve the accuracy of malicious domain name detection, the present invention proposes a discriminator for obtaining true and false computing data by adversarial training using the characteristics of a generative adversarial network. The discriminator makes robust judgments based on the multi-dimensional features behind the data samples, and can be used as a classifier for malicious domain name detection. Because the method of generating confrontation network is adopted in the present invention, the data features behind malicious samples are learned, and the accuracy of data classification is effectively improved.

参见图1至图3，如其中的图例所示，一种基于带对抗训练深度网络的恶意域名检测方法，包括如下步骤：Referring to Figures 1 to 3, as shown in the legends, a method for detecting malicious domain names based on a deep network with adversarial training includes the following steps:

本实施例中优选的实施方式，恶意域名的相关维度信息包括如下信息中的一者或多者：In a preferred implementation in this embodiment, the relevant dimension information of the malicious domain name includes one or more of the following information:

本实施例中优选的实施方式，生成器和判别器的损失函数如下：For the preferred implementation in this embodiment, the loss functions of the generator and the discriminator are as follows:

本实施例中优选的实施方式，生成器和判别器均采用深度为2的LSTM 长短期记忆网络。In a preferred implementation in this embodiment, both the generator and the discriminator use an LSTM long short-term memory network with a depth of 2.

本实施例中优选的实施方式，其中，采用阈值自学习的方式更新阈值，公式如下：The preferred implementation in this embodiment, wherein the threshold is updated by means of threshold self-learning, and the formula is as follows:

数据获取模块10，其用于获取筛选恶意域名样本和可疑域名样本；A data acquisition module 10, which is used for acquiring and screening malicious domain name samples and suspicious domain name samples;

网络模型20，其采用C-RNN-GAN生成对抗网络模型，用于以恶意域名样本作为输入进行训练后以可疑域名样本作为输入并输出计算值；The network model 20, which adopts a C-RNN-GAN to generate an adversarial network model, is used for training with malicious domain name samples as input, and then takes suspicious domain name samples as input and outputs calculated values;

判断模块30，其用于根据计算值和阈值判断可疑域名为恶意域名或合法域名。The judgment module 30 is used for judging whether the suspicious domain name is a malicious domain name or a legitimate domain name according to the calculated value and the threshold value.

以下为对恶意域名检测方法的各个步骤的详细解释：The following is a detailed explanation of each step of the malicious domain name detection method:

获取数据及其维度信息Get data and its dimension information

从威胁情报平台获取威胁情报，威胁情报中包含多种信息，其中域名信息是核心数据之一。根据已有的威胁情报，提取其中的恶意域名相关信息，得到恶意域名样本库。根据恶意域名样本库收集的域名相关联的信息，查询 Alexa网站排名信息，Alexa网站排名是当前较为权威的网站访问量评价指标，若无法查询到域名的排名信息，则录入一个固定的数值；百度与搜狗的收录信息，表示搜索引擎对网站页面等收录情况，若无法查询到信息，则设置当前维度的值为0；必应的收录信息；网站的完整度等数据。详细维度信息见下表。Obtain threat intelligence from the threat intelligence platform. Threat intelligence contains a variety of information, among which domain name information is one of the core data. According to the existing threat intelligence, the relevant information of the malicious domain name is extracted, and the malicious domain name sample database is obtained. According to the information related to the domain name collected by the malicious domain name sample database, query the Alexa website ranking information. The Alexa website ranking is a relatively authoritative website traffic evaluation index. If the ranking information of the domain name cannot be queried, enter a fixed value; Baidu The indexed information with Sogou indicates the indexing of website pages by search engines. If the information cannot be found, set the current dimension to 0; the indexed information of Bing; the integrity of the website and other data. See the table below for detailed dimension information.

维度dimension 名称name 处理方法Approach 11 Alexa排名Alexa Rank 获取Alexa网站排名信息Get Alexa website ranking information 22 百度收录Baidu included 获取网站收录页面的数量信息Get the number of pages included in the website 33 搜狗收录Sogou included 获取网站收录页面的数量信息Get the number of pages included in the website 44 必应收录Bing included 获取网站收录页面的数量信息Get the number of pages included in the website 55 网页内容完整性web content integrity 检测网页内容的完整性，0表示无信息，1表示有信息Check the integrity of the web page content, 0 means no information, 1 means there is information 66 注册地检测registered location detection 0表示国外注册，1表示国内注册0 means foreign registration, 1 means domestic registration 77 A记录A record 0表示无记录，1表示有记录0 means no record, 1 means there is record 88 CNAMECNAME 0表示无记录，1表示有记录0 means no record, 1 means there is record 99 CDNCDN 0表示无使用记录，1表示有使用记录0 means no usage record, 1 means there is usage record 1010 域名更新程度Domain name renewal 检测当前域名更新的次数 Check the number of current domain name updates

数据清洗Data cleaning

恶意行为分为多种，包括传播恶意软件、发送垃圾邮件、诈骗和钓鱼等。而恶意行为在不同安全等级的定义是不一样的。例如，发送垃圾邮在正常情况下可以被定义为恶意行为，但是在安全等级较低的情况下，垃圾邮件就有可能不属于上述的恶意行为。因此，需要筛选威胁情报中的恶意行为和恶意域名，着重挑选网络攻击范畴和置信度高的恶意域名样本，建立恶意域名样本库。将该域名列表和域名相关维度信息作为对抗训练神经生成对抗网络的样本集X。There are many types of malicious behavior, including spreading malware, sending spam, scams, and phishing. The definition of malicious behavior is different in different security levels. For example, sending spam can be defined as malicious behavior under normal circumstances, but in the case of low security level, spam may not belong to the above malicious behavior. Therefore, it is necessary to screen malicious behaviors and malicious domain names in threat intelligence, focus on selecting malicious domain name samples with high network attack scope and confidence, and establish a malicious domain name sample database. The domain name list and domain-related dimension information are used as the sample set X for adversarial training neural generative adversarial network.

建立生成对抗网络Building a Generative Adversarial Network

生成对抗网络选用带对抗训练的连续循环神经网络(Continuous recurrentneural networks with adversarial training，简称C-RNN-GAN) 的生成对抗网络结构。C-RNN-GAN生成对抗网络是一种带有对抗训练的深度循环生成对抗网络。根据对抗的思想，分布建立生成器G和判别器D。生成器G尽可能生成与真实样本数据x相同的样本数据，而判别器D尽可能的区分是生成的样本数据还是真实的样本数据。其中，针对恶意域名信息样本集，生成器和判别器分别采用深度为2的长短期记忆(Long Short-Term Memory，简写LSTM)网络，用于处理样本集中离散的真实数据，学习真实样本数据的特征。The generative adversarial network adopts the generative adversarial network structure of Continuous recurrent neural networks with adversarial training (C-RNN-GAN for short). C-RNN-GAN Generative Adversarial Network is a deep recurrent generative adversarial network with adversarial training. According to the idea of confrontation, the distribution builds the generator G and the discriminator D. The generator G generates the same sample data as the real sample data x as much as possible, and the discriminator D distinguishes between the generated sample data and the real sample data as much as possible. Among them, for the malicious domain name information sample set, the generator and the discriminator respectively use a long short-term memory (Long Short-Term Memory, LSTM) network with a depth of 2, which is used to process the discrete real data in the sample set and learn the real sample data. feature.

训练生成对抗网络Train Generative Adversarial Networks

根据设置好的模型结构定义损失函数。由于采用的生成对抗的思想，并且使用深度循环网络作为生成器G和判别器D的生成对抗网络结构。因此，定义损失函数如下：Define the loss function according to the set model structure. Due to the idea of generative adversarial adopted, a deep recurrent network is used as the generative adversarial network structure of generator G and discriminator D. Therefore, the loss function is defined as follows:

；

;

；

;

其中，S_G是生成器的损失函数，用于训练生成器；S_D是判别器的损失函数，用于训练判别器；G是生成器，用于生成样本；D是判别器，用于区分真实样本与生成样本；R是表示层，来自判别器的逻辑分类层的前一层； zⁱ是用于生成器输入的随机序列向量；表示来自真实的样本数据；n表示当前样本的数量。Among them, S _G is the loss function of the generator, used to train the generator; S _D is the loss function of the discriminator, used to train the discriminator; G is the generator, used to generate samples; D is the discriminator, used to distinguish Real samples and generated samples; R is the representation layer, the previous layer from the discriminator's logical classification layer; z ⁱ is the random sequence vector used for the generator input; represents the real sample data; n represents the number of current samples.

根据定义好的损失函数，设置好超参数训练整个模型。输入数据来自样本集X，输入数据的格式是(x¹，x²，…，xⁿ)。According to the defined loss function, set the hyperparameters to train the entire model. The input data comes from the sample set X, and the format of the input data is (x ¹ , x ² , ..., x ⁿ ).

检测可疑域名Detect suspicious domains

训练好生成对抗网络，并提取其中的判别器。输入可疑域名的相关信息，且相关信息必须具有数据所有维度的信息。将其输入到判别器，得到当前计算的相似度的值d。Train the generative adversarial network and extract the discriminator. Enter the relevant information of the suspicious domain name, and the relevant information must have information of all dimensions of the data. Input it to the discriminator to get the currently calculated similarity value d.

阈值自适应智能学习Threshold adaptive intelligent learning

由于无法人工确定分类到恶意域名的阈值α的大小，因此，采用阈值自学习的方式。公式如下：Since the size of the threshold α for classifying malicious domain names cannot be determined manually, the threshold self-learning method is adopted. The formula is as follows:

a_t＝min(d，a_t-1)，其中，a_t表示当前阈值，a_t-1为前一次阈值， d为相似度值。at =min( _d , at _-1 ), where at represents the current threshold, at _-1 is the previous threshold, and _d is the similarity value.

选取测试集对训练好的模型测试，每条测试样本通过判别器D得到的相似度的值与前一次的阈值比较，选取两者中的较小的值作为新的阈值，不断自学习，获取当前样本集最合理的恶意域名检测的阈值。Select the test set to test the trained model. The similarity value obtained by each test sample through the discriminator D is compared with the previous threshold, and the smaller value of the two is selected as the new threshold, and continuous self-learning is obtained. The most reasonable threshold for malicious domain name detection in the current sample set.

对所公开的实施例的上述说明，使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下，在其它实施例中实现。因此，本发明将不会被限制于本文所示的这些实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. a malicious domain name detection method based on a deep network with adversarial training, is characterized in that, comprises the steps:

(1) Obtaining malicious domain name samples, obtaining threat intelligence from the threat intelligence platform, extracting malicious domain names and querying relevant dimension information of malicious domain names, and screening malicious domain names with high network attack scope and confidence according to malicious behavior to form malicious domain name samples , establish a malicious domain name sample set;

The relevant dimension information of malicious domain names includes one or more of website ranking information, page index information, page integrity information, registration location information, A record information, CNAME record information, CDN usage record information, and update degree information;

(2) network model training, select C-RNN-GAN to generate an adversarial network model, the network model includes a generator and a discriminator, and use the malicious domain name sample set as the input of the network model for training;

(3) Obtaining suspicious domain name samples, querying the relevant dimension information of suspicious domain names, and forming suspicious domain name samples;

(4) discriminating output, inputting suspicious domain name samples to the discriminator of the network model after training to obtain the currently calculated similarity value;

(5) Judging suspicious domain names, and judging whether the similarity value is less than the current threshold, if so, the suspicious domain name is a malicious domain name, and it is used as a malicious domain name sample and added to the malicious domain name sample set, if not, the suspicious domain name is a legitimate domain name;

The threshold is updated by threshold self-learning, and the formula is as follows:

at = min( _d , at _-1 ), where at represents the current threshold, at _-1 is the previous threshold, and _d is the similarity value.

2. malicious domain name detection method as claimed in claim 1, is characterized in that:

Website ranking information, which is Alexa website ranking information;

Page index information, including the number of pages indexed by Baidu, the number of pages indexed by Sogou, and the number of pages indexed by Bing;

Page integrity information, where 0 means no information, 1 means there is information;

Registration place information, where 0 means foreign registration, 1 means domestic registration;

A record information, where 0 means no record, 1 means there is a record;

CNAME record information, where 0 means no record, 1 means there is a record;

CDN usage record information, where 0 means no usage record, 1 means there is a usage record;

Update degree information, which is the number of updates of malicious domain names;

Among them, the A record information is used to specify the IP address record corresponding to the host name or domain name; the CNAME record information refers to the alias record, which maps multiple names to the same computer; the CDN usage record is the content delivery network ContentDelivery Network usage record, is Refers to the intelligent virtual network built on the basis of the existing network, so that users can obtain the desired content nearby, reduce network congestion, and improve user access response speed and hit rate.

3. malicious domain name detection method as claimed in claim 1, is characterized in that, the loss function of generator and discriminator is as follows:

Among them, S _G is the loss function of the generator, used to train the generator; S _D is the loss function of the discriminator, used to train the discriminator; G is the generator, used to output generated samples; D is the discriminator, used to Distinguish between real samples and generated samples; R is the representation layer, the previous layer from the discriminator's logical classification layer; ^xi represents malicious domain samples; ^zi is the random sequence vector used for generator input, representing data from real samples ; n represents the current number of malicious domain name samples.

4 . The malicious domain name detection method according to claim 1 , wherein the generator and the discriminator both use an LSTM long short-term memory network with a depth of 2. 5 .

5. A malicious domain name detection system for the malicious domain name detection method as described in any one of claims 1-4, characterized in that, comprising:

The data acquisition module is used to obtain and screen malicious domain name samples and obtain suspicious domain name samples;

The network model adopts the C-RNN-GAN generative adversarial network model, which is used for training with malicious domain name samples as input, and then takes suspicious domain name samples as input and outputs the calculated value;

The judgment module is used to judge whether the suspicious domain name is a malicious domain name or a legitimate domain name according to the calculated value and the threshold value.