CN110830490B - Malicious domain name detection method and system based on area confrontation training deep network - Google Patents
Malicious domain name detection method and system based on area confrontation training deep network Download PDFInfo
- Publication number
- CN110830490B CN110830490B CN201911111270.9A CN201911111270A CN110830490B CN 110830490 B CN110830490 B CN 110830490B CN 201911111270 A CN201911111270 A CN 201911111270A CN 110830490 B CN110830490 B CN 110830490B
- Authority
- CN
- China
- Prior art keywords
- domain name
- information
- malicious
- malicious domain
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 32
- 238000012549 training Methods 0.000 title claims abstract description 28
- 230000006870 function Effects 0.000 claims description 14
- 230000006399 behavior Effects 0.000 claims description 9
- 230000006403 short-term memory Effects 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 17
- 230000007123 defense Effects 0.000 abstract description 2
- 238000007781 pre-processing Methods 0.000 abstract description 2
- 230000003042 antagnostic effect Effects 0.000 abstract 1
- 230000000306 recurrent effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
技术领域technical field
本发明涉及人工智能及控制技术领域,具体涉及一种基于带对抗训练深度网络的恶意域名检测方法及系统。The invention relates to the technical field of artificial intelligence and control, in particular to a malicious domain name detection method and system based on a deep network with confrontation training.
背景技术Background technique
域名系统(Domain Name System,简称DNS)是整个互联网组成的一部分,完成了IP地址与域名的相互映射,用于在网络通信时将域名解析成IP 地址,方便记忆和使用。如果DNS配置不合理,可能会导致网速缓慢、网站打不开,恶意的DNS甚至引起广告弹窗、欺诈、监听和劫持修改等恶意行为。Domain Name System (DNS) is a part of the entire Internet, which completes the mutual mapping between IP addresses and domain names, and is used to resolve domain names into IP addresses during network communication, which is convenient for memory and use. If the DNS configuration is unreasonable, it may lead to slow network speed, the website cannot be opened, and malicious DNS may even cause malicious behaviors such as advertisement pop-ups, fraud, monitoring and hijacking modification.
近年来,DNS的安全问题频发。DNS作为世界上最庞大最复杂的分布式数据库,由于其开放、复杂、庞大等特性以及设计之初对安全性考虑不周,再加上人为破坏,使得DNS很难应对日益复杂的现代通讯网络,DNS面临非常严重的安全威胁。其中,比较常见的安全威胁有DNS欺骗和分布式拒绝服务攻击。DNS欺骗是指服务器对错误的域名请求做出错误的域名解析。DNS 欺骗会引起诸多安全问题,例如将用户引导到钓鱼网站、欺诈网站等。分布式拒绝服务攻击(Distribution Denial of Service,简称DDoS)也是DNS 面临的安全威胁之一,它利用网络协议和操作系统的漏洞,采用欺骗和伪装的策略来进行网络攻击,使服务器耗尽计算资源从而无法处理合法用户的网络请求。例如僵尸网络等。因此,如何解决DNS的安全问题并寻求有效的解决方案是当前DNS亟待解决的问题之一。In recent years, DNS security problems have occurred frequently. DNS, as the largest and most complex distributed database in the world, due to its openness, complexity, hugeness and other characteristics, as well as ill-conceived security at the beginning of its design, coupled with man-made sabotage, makes it difficult for DNS to cope with the increasingly complex modern communication network , DNS faces a very serious security threat. Among them, the more common security threats are DNS spoofing and distributed denial of service attacks. DNS spoofing means that a server makes a wrong domain name resolution for a wrong domain name request. DNS spoofing can cause many security issues, such as directing users to phishing sites, fraudulent sites, and more. Distributed Denial of Service (DDoS) is also one of the security threats faced by DNS. It exploits the vulnerabilities of network protocols and operating systems, and uses deception and camouflage strategies to carry out network attacks, causing servers to exhaust computing resources. As a result, network requests from legitimate users cannot be processed. such as botnets, etc. Therefore, how to solve the security problem of DNS and seek an effective solution is one of the problems that needs to be solved urgently in the current DNS.
为了解决DNS安全问题,人们提出多种解决方案,其中比较常见的是域名检测,即综合计算当前可疑域名的可信度,检测当前域名是否合法。域名检测又可分为基于知识和基于机器学习的两类方法。基于知识的方法通过计算域名一同出现的概率进行可疑域名检测。这种方法虽然检测的准确率高,但需要大量的专家知识,由于受限于专家知识不充足,导致检测的查全率不能满足要求,漏检恶意域名;基于传统机器学习的方法要求大量的样本标记数据,使用聚类、支持向量机、决策树等算法计算和分类,此方法需要大量的人工标记数据和算法的配合,往往难以用于大规模应用实例。因此,需要提出一种新的方法,结合上述两类方法的优势,弥补二者不足,以获取更佳的域名检测效果。In order to solve the problem of DNS security, people have proposed a variety of solutions, among which the most common is domain name detection, which comprehensively calculates the credibility of the current suspicious domain name and detects whether the current domain name is legitimate. Domain name detection can be further divided into two categories: knowledge-based and machine learning-based. Knowledge-based methods detect suspicious domain names by calculating the probability of domain names appearing together. Although this method has high detection accuracy, it requires a lot of expert knowledge. Due to insufficient expert knowledge, the detection recall rate cannot meet the requirements, and malicious domain names are missed; traditional machine learning-based methods require a large number of Sample labeled data is calculated and classified using algorithms such as clustering, support vector machines, and decision trees. This method requires a large amount of manual labeled data and the cooperation of algorithms, and is often difficult to use in large-scale application instances. Therefore, it is necessary to propose a new method that combines the advantages of the above two methods to make up for the shortcomings of the two methods, so as to obtain a better domain name detection effect.
发明内容SUMMARY OF THE INVENTION
本发明的目的是提供一种基于带对抗训练深度网络的恶意域名检测方法及系统,有效地提高恶意域名检测的准确率。The purpose of the present invention is to provide a malicious domain name detection method and system based on a deep network with adversarial training, which can effectively improve the accuracy of malicious domain name detection.
为实现上述发明目的,本发明提供以下的技术方案:一种基于带对抗训练深度网络的恶意域名检测方法,包括如下步骤:In order to achieve the above purpose of the invention, the present invention provides the following technical solutions: a malicious domain name detection method based on a deep network with adversarial training, comprising the following steps:
(1)恶意域名样本获取,从威胁情报平台获取威胁情报,提取其中的恶意域名并查询恶意域名的相关维度信息,根据恶意行为,筛选网络攻击范畴和置信度高的恶意域名,形成恶意域名样本,建立恶意域名样本集;(1) Obtaining malicious domain name samples, obtaining threat intelligence from the threat intelligence platform, extracting malicious domain names and querying relevant dimension information of malicious domain names, and screening malicious domain names with high network attack scope and confidence according to malicious behavior to form malicious domain name samples , establish a malicious domain name sample set;
(2)网络模型训练,选用C-RNN-GAN生成对抗网络模型,所述网络模型包括生成器和判别器,使用恶意域名样本集作为所述网络模型的输入进行训练;(2) network model training, select C-RNN-GAN to generate an adversarial network model, the network model includes a generator and a discriminator, and use the malicious domain name sample set as the input of the network model for training;
(3)可疑域名样本获取,查询可疑域名的相关维度信息,形成可疑域名样本;(3) Obtaining suspicious domain name samples, querying the relevant dimension information of suspicious domain names, and forming suspicious domain name samples;
(4)判别输出,向训练后的所述网络模型的判别器输入可疑域名样本,得到当前计算的相似度值;(4) discriminating output, inputting suspicious domain name samples to the discriminator of the network model after training to obtain the currently calculated similarity value;
(5)判断可疑域名,判断相似度值是否小于当前阈值,如是,则可疑域名为恶意域名,将其作为恶意域名样本,加入恶意域名样本集中,如否,则可疑域名为合法域名。(5) Determine the suspicious domain name, and determine whether the similarity value is less than the current threshold. If so, the suspicious domain name is a malicious domain name, and it is regarded as a malicious domain name sample and added to the malicious domain name sample set. If not, the suspicious domain name is a legitimate domain name.
进一步的,恶意域名的相关维度信息包括如下信息中的一者或多者:Further, the relevant dimension information of the malicious domain name includes one or more of the following information:
网站排名信息,其为Alexa网站排名信息;Website ranking information, which is Alexa website ranking information;
页面收录量信息,其包括百度收录页面的数量、搜狗收录页面的数量以及必应收录页面的数量;Page index information, including the number of pages indexed by Baidu, the number of pages indexed by Sogou, and the number of pages indexed by Bing;
页面完整性信息,其中0表示无信息,1表示有信息;Page integrity information, where 0 means no information, 1 means there is information;
注册地信息,其中0表示国外注册,1表示国内注册;Registration place information, where 0 means foreign registration, 1 means domestic registration;
A记录信息,其中0表示无记录,1表示有记录;A record information, where 0 means no record, 1 means there is a record;
CNAME记录信息,其中0表示无记录,1表示有记录;CNAME record information, where 0 means no record, 1 means there is a record;
CDN使用记录信息,其中0表示无使用记录,1表示有使用记录;CDN usage record information, where 0 means no usage record, 1 means there is a usage record;
更新程度信息,其为恶意域名的更新次数;Update degree information, which is the number of updates of malicious domain names;
其中,A记录信息用来指定主机名或域名对应的IP地址记录;CNAME记录信息是指别名记录,记录将多个名字映射到同一台计算机;CDN使用记录是内容分发网络ContentDelivery Network使用记录,是指构建在现有网络基础之上的智能虚拟网络,以便用户就近获取所需内容,降低网络拥塞,提高用户访问响应速度和命中率。Among them, the A record information is used to specify the IP address record corresponding to the host name or domain name; the CNAME record information refers to the alias record, and the record maps multiple names to the same computer; the CDN usage record is the content delivery network ContentDelivery Network usage record, is Refers to the intelligent virtual network built on the basis of the existing network, so that users can obtain the desired content nearby, reduce network congestion, and improve user access response speed and hit rate.
进一步的,生成器和判别器的损失函数如下:Further, the loss functions of the generator and discriminator are as follows:
其中,SG是生成器的损失函数,用于训练生成器;SD是判别器的损失函数,用于训练判别器;G是生成器,用于生成样本;D是判别器,用于区分真实样本与生成样本;R是表示层,来自判别器的逻辑分类层的前一层; xi表示恶意域名样本;zi是用于生成器输入的随机序列向量,表示来自真实的样本数据;n表示当前恶意域名样本的数量。Among them, S G is the loss function of the generator, used to train the generator; S D is the loss function of the discriminator, used to train the discriminator; G is the generator, used to generate samples; D is the discriminator, used to distinguish Real samples and generated samples; R is the representation layer, which comes from the previous layer of the discriminator's logical classification layer; x i is the malicious domain name sample; z i is the random sequence vector used for the generator input, representing the real sample data; n represents the current number of malicious domain name samples.
进一步的,生成器和判别器均采用深度为2的LSTM长短期记忆网络。Further, both the generator and the discriminator adopt the LSTM long short-term memory network with a depth of 2.
进一步的,其中,采用阈值自学习的方式更新阈值,公式如下:Further, wherein, the threshold is updated by means of threshold self-learning, and the formula is as follows:
at=min(d,at-1),其中,at表示当前阈值,at-1为前一次阈值,d为相似度值。at = min( d , at -1 ), where at represents the current threshold, at -1 is the previous threshold, and d is the similarity value.
本发明还提供用于如上所述的恶意域名检测方法的恶意域名检测系统,包括:The present invention also provides a malicious domain name detection system for the malicious domain name detection method described above, including:
数据获取模块,其用于获取恶意域名样本和可疑域名样本;A data acquisition module, which is used to acquire malicious domain name samples and suspicious domain name samples;
数据预处理模块,其用于筛选恶意域名样本,组成恶意域名样本集;A data preprocessing module, which is used to filter malicious domain name samples to form a malicious domain name sample set;
网络模型,其采用C-RNN-GAN生成对抗网络模型,用于以恶意域名样本作为输入进行训练后以可疑域名样本作为输入并输出计算值;The network model, which uses C-RNN-GAN to generate an adversarial network model, is used for training with malicious domain name samples as input, and then takes suspicious domain name samples as input and outputs calculated values;
判断模块,其用于根据计算值和阈值判断可疑域名为恶意域名或合法域名。The judgment module is used to judge whether the suspicious domain name is a malicious domain name or a legitimate domain name according to the calculated value and the threshold value.
由于上述技术方案运用,本发明与现有技术相比具有以下优点:本发明公开的基于带对抗训练深度网络的恶意域名检测方法及系统,利用生成对抗网络的特性,对抗训练得到计算域名真假的判别器。本发明的方法及系统充分适合于网络安全的攻防对抗的实际情况,并且能够实现自我学习和自我完善。判别器依据域名样本背后的多维特征进行鲁棒性判断,可以作为恶意域名检测的分类器。本发明由于采用了生成对抗网络的方法,学习恶意域名样本背后的数据特征,有效的提高域名分类的准确率。Due to the application of the above technical solutions, the present invention has the following advantages compared with the prior art: the method and system for detecting malicious domain names based on a deep network with adversarial training disclosed in the present invention utilizes the characteristics of a generative adversarial network, and the adversarial training obtains the true and false computed domain names. the discriminator. The method and system of the present invention are fully suitable for the actual situation of network security attack and defense confrontation, and can realize self-learning and self-improvement. The discriminator makes robust judgments based on the multi-dimensional features behind the domain name samples, and can be used as a classifier for malicious domain name detection. Due to the method of generating a confrontation network, the invention learns the data features behind the malicious domain name samples, and effectively improves the accuracy of domain name classification.
附图说明Description of drawings
图1为本发明中恶意域名检测方法的流程图;1 is a flowchart of a malicious domain name detection method in the present invention;
图2为本发明中网络模型的结构图;Fig. 2 is the structure diagram of the network model in the present invention;
图3为本发明中恶意域名检测系统的结构图。FIG. 3 is a structural diagram of a malicious domain name detection system in the present invention.
具体实施方式Detailed ways
下面结合本发明的原理、附图以及实施例对本发明进一步描述。The present invention will be further described below with reference to the principles, drawings and embodiments of the present invention.
为克服现有恶意域名检测方法的不足,有效地提高恶意域名检测的准确率,本发明提出利用生成对抗网络的特性,对抗训练得到计算数据真假的判别器。判别器依据数据样本背后的多维特征进行鲁棒性判断,可以作为恶意域名检测的分类器。本发明由于采用了生成对抗网络的方法,学习恶意样本背后的数据特征,有效的提高数据分类的准确率。In order to overcome the deficiencies of the existing malicious domain name detection methods and effectively improve the accuracy of malicious domain name detection, the present invention proposes a discriminator for obtaining true and false computing data by adversarial training using the characteristics of a generative adversarial network. The discriminator makes robust judgments based on the multi-dimensional features behind the data samples, and can be used as a classifier for malicious domain name detection. Because the method of generating confrontation network is adopted in the present invention, the data features behind malicious samples are learned, and the accuracy of data classification is effectively improved.
参见图1至图3,如其中的图例所示,一种基于带对抗训练深度网络的恶意域名检测方法,包括如下步骤:Referring to Figures 1 to 3, as shown in the legends, a method for detecting malicious domain names based on a deep network with adversarial training includes the following steps:
(1)恶意域名样本获取,从威胁情报平台获取威胁情报,提取其中的恶意域名并查询恶意域名的相关维度信息,根据恶意行为,筛选网络攻击范畴和置信度高的恶意域名,形成恶意域名样本,建立恶意域名样本集;(1) Obtaining malicious domain name samples, obtaining threat intelligence from the threat intelligence platform, extracting malicious domain names and querying relevant dimension information of malicious domain names, and screening malicious domain names with high network attack scope and confidence according to malicious behavior to form malicious domain name samples , establish a malicious domain name sample set;
(2)网络模型训练,选用C-RNN-GAN生成对抗网络模型,所述网络模型包括生成器和判别器,使用恶意域名样本集作为所述网络模型的输入进行训练;(2) network model training, select C-RNN-GAN to generate an adversarial network model, the network model includes a generator and a discriminator, and use the malicious domain name sample set as the input of the network model for training;
(3)可疑域名样本获取,查询可疑域名的相关维度信息,形成可疑域名样本;(3) Obtaining suspicious domain name samples, querying the relevant dimension information of suspicious domain names, and forming suspicious domain name samples;
(4)判别输出,向训练后的所述网络模型的判别器输入可疑域名样本,得到当前计算的相似度值;(4) discriminating output, inputting suspicious domain name samples to the discriminator of the network model after training to obtain the currently calculated similarity value;
(5)判断可疑域名,判断相似度值是否小于当前阈值,如是,则可疑域名为恶意域名,将其作为恶意域名样本,加入恶意域名样本集中,如否,则可疑域名为合法域名。(5) Determine the suspicious domain name, and determine whether the similarity value is less than the current threshold. If so, the suspicious domain name is a malicious domain name, and it is regarded as a malicious domain name sample and added to the malicious domain name sample set. If not, the suspicious domain name is a legitimate domain name.
本实施例中优选的实施方式,恶意域名的相关维度信息包括如下信息中的一者或多者:In a preferred implementation in this embodiment, the relevant dimension information of the malicious domain name includes one or more of the following information:
网站排名信息,其为Alexa网站排名信息;Website ranking information, which is Alexa website ranking information;
页面收录量信息,其包括百度收录页面的数量、搜狗收录页面的数量以及必应收录页面的数量;Page index information, including the number of pages indexed by Baidu, the number of pages indexed by Sogou, and the number of pages indexed by Bing;
页面完整性信息,其中0表示无信息,1表示有信息;Page integrity information, where 0 means no information, 1 means there is information;
注册地信息,其中0表示国外注册,1表示国内注册;Registration place information, where 0 means foreign registration, 1 means domestic registration;
A记录信息,其中0表示无记录,1表示有记录;A record information, where 0 means no record, 1 means there is a record;
CNAME记录信息,其中0表示无记录,1表示有记录;CNAME record information, where 0 means no record, 1 means there is a record;
CDN使用记录信息,其中0表示无使用记录,1表示有使用记录;CDN usage record information, where 0 means no usage record, 1 means there is a usage record;
更新程度信息,其为恶意域名的更新次数;Update degree information, which is the number of updates of malicious domain names;
其中,A记录信息用来指定主机名或域名对应的IP地址记录;CNAME记录信息是指别名记录,记录将多个名字映射到同一台计算机;CDN使用记录是内容分发网络ContentDelivery Network使用记录,是指构建在现有网络基础之上的智能虚拟网络,以便用户就近获取所需内容,降低网络拥塞,提高用户访问响应速度和命中率。Among them, the A record information is used to specify the IP address record corresponding to the host name or domain name; the CNAME record information refers to the alias record, and the record maps multiple names to the same computer; the CDN usage record is the content delivery network ContentDelivery Network usage record, is Refers to the intelligent virtual network built on the basis of the existing network, so that users can obtain the desired content nearby, reduce network congestion, and improve user access response speed and hit rate.
本实施例中优选的实施方式,生成器和判别器的损失函数如下:For the preferred implementation in this embodiment, the loss functions of the generator and the discriminator are as follows:
其中,SG是生成器的损失函数,用于训练生成器;SD是判别器的损失函数,用于训练判别器;G是生成器,用于生成样本;D是判别器,用于区分真实样本与生成样本;R是表示层,来自判别器的逻辑分类层的前一层; xi表示恶意域名样本;zi是用于生成器输入的随机序列向量,表示来自真实的样本数据;n表示当前恶意域名样本的数量。Among them, S G is the loss function of the generator, used to train the generator; S D is the loss function of the discriminator, used to train the discriminator; G is the generator, used to generate samples; D is the discriminator, used to distinguish Real samples and generated samples; R is the representation layer, which comes from the previous layer of the discriminator's logical classification layer; x i is the malicious domain name sample; z i is the random sequence vector used for the generator input, representing the real sample data; n represents the current number of malicious domain name samples.
本实施例中优选的实施方式,生成器和判别器均采用深度为2的LSTM 长短期记忆网络。In a preferred implementation in this embodiment, both the generator and the discriminator use an LSTM long short-term memory network with a depth of 2.
本实施例中优选的实施方式,其中,采用阈值自学习的方式更新阈值,公式如下:The preferred implementation in this embodiment, wherein the threshold is updated by means of threshold self-learning, and the formula is as follows:
at=min(d,at-1),其中,at表示当前阈值,at-1为前一次阈值,d为相似度值。at = min( d , at -1 ), where at represents the current threshold, at -1 is the previous threshold, and d is the similarity value.
本发明还提供用于如上所述的恶意域名检测方法的恶意域名检测系统,包括:The present invention also provides a malicious domain name detection system for the malicious domain name detection method described above, including:
数据获取模块10,其用于获取筛选恶意域名样本和可疑域名样本;A
网络模型20,其采用C-RNN-GAN生成对抗网络模型,用于以恶意域名样本作为输入进行训练后以可疑域名样本作为输入并输出计算值;The
判断模块30,其用于根据计算值和阈值判断可疑域名为恶意域名或合法域名。The
以下为对恶意域名检测方法的各个步骤的详细解释:The following is a detailed explanation of each step of the malicious domain name detection method:
获取数据及其维度信息Get data and its dimension information
从威胁情报平台获取威胁情报,威胁情报中包含多种信息,其中域名信息是核心数据之一。根据已有的威胁情报,提取其中的恶意域名相关信息,得到恶意域名样本库。根据恶意域名样本库收集的域名相关联的信息,查询 Alexa网站排名信息,Alexa网站排名是当前较为权威的网站访问量评价指标,若无法查询到域名的排名信息,则录入一个固定的数值;百度与搜狗的收录信息,表示搜索引擎对网站页面等收录情况,若无法查询到信息,则设置当前维度的值为0;必应的收录信息;网站的完整度等数据。详细维度信息见下表。Obtain threat intelligence from the threat intelligence platform. Threat intelligence contains a variety of information, among which domain name information is one of the core data. According to the existing threat intelligence, the relevant information of the malicious domain name is extracted, and the malicious domain name sample database is obtained. According to the information related to the domain name collected by the malicious domain name sample database, query the Alexa website ranking information. The Alexa website ranking is a relatively authoritative website traffic evaluation index. If the ranking information of the domain name cannot be queried, enter a fixed value; Baidu The indexed information with Sogou indicates the indexing of website pages by search engines. If the information cannot be found, set the current dimension to 0; the indexed information of Bing; the integrity of the website and other data. See the table below for detailed dimension information.
数据清洗Data cleaning
恶意行为分为多种,包括传播恶意软件、发送垃圾邮件、诈骗和钓鱼等。而恶意行为在不同安全等级的定义是不一样的。例如,发送垃圾邮在正常情况下可以被定义为恶意行为,但是在安全等级较低的情况下,垃圾邮件就有可能不属于上述的恶意行为。因此,需要筛选威胁情报中的恶意行为和恶意域名,着重挑选网络攻击范畴和置信度高的恶意域名样本,建立恶意域名样本库。将该域名列表和域名相关维度信息作为对抗训练神经生成对抗网络的样本集X。There are many types of malicious behavior, including spreading malware, sending spam, scams, and phishing. The definition of malicious behavior is different in different security levels. For example, sending spam can be defined as malicious behavior under normal circumstances, but in the case of low security level, spam may not belong to the above malicious behavior. Therefore, it is necessary to screen malicious behaviors and malicious domain names in threat intelligence, focus on selecting malicious domain name samples with high network attack scope and confidence, and establish a malicious domain name sample database. The domain name list and domain-related dimension information are used as the sample set X for adversarial training neural generative adversarial network.
建立生成对抗网络Building a Generative Adversarial Network
生成对抗网络选用带对抗训练的连续循环神经网络(Continuous recurrentneural networks with adversarial training,简称C-RNN-GAN) 的生成对抗网络结构。C-RNN-GAN生成对抗网络是一种带有对抗训练的深度循环生成对抗网络。根据对抗的思想,分布建立生成器G和判别器D。生成器G尽可能生成与真实样本数据x相同的样本数据,而判别器D尽可能的区分是生成的样本数据还是真实的样本数据。其中,针对恶意域名信息样本集,生成器和判别器分别采用深度为2的长短期记忆(Long Short-Term Memory,简写LSTM)网络,用于处理样本集中离散的真实数据,学习真实样本数据的特征。The generative adversarial network adopts the generative adversarial network structure of Continuous recurrent neural networks with adversarial training (C-RNN-GAN for short). C-RNN-GAN Generative Adversarial Network is a deep recurrent generative adversarial network with adversarial training. According to the idea of confrontation, the distribution builds the generator G and the discriminator D. The generator G generates the same sample data as the real sample data x as much as possible, and the discriminator D distinguishes between the generated sample data and the real sample data as much as possible. Among them, for the malicious domain name information sample set, the generator and the discriminator respectively use a long short-term memory (Long Short-Term Memory, LSTM) network with a depth of 2, which is used to process the discrete real data in the sample set and learn the real sample data. feature.
训练生成对抗网络Train Generative Adversarial Networks
根据设置好的模型结构定义损失函数。由于采用的生成对抗的思想,并且使用深度循环网络作为生成器G和判别器D的生成对抗网络结构。因此,定义损失函数如下:Define the loss function according to the set model structure. Due to the idea of generative adversarial adopted, a deep recurrent network is used as the generative adversarial network structure of generator G and discriminator D. Therefore, the loss function is defined as follows:
; ;
; ;
其中,SG是生成器的损失函数,用于训练生成器;SD是判别器的损失函数,用于训练判别器;G是生成器,用于生成样本;D是判别器,用于区分真实样本与生成样本;R是表示层,来自判别器的逻辑分类层的前一层; zi是用于生成器输入的随机序列向量;表示来自真实的样本数据;n表示当前样本的数量。Among them, S G is the loss function of the generator, used to train the generator; S D is the loss function of the discriminator, used to train the discriminator; G is the generator, used to generate samples; D is the discriminator, used to distinguish Real samples and generated samples; R is the representation layer, the previous layer from the discriminator's logical classification layer; z i is the random sequence vector used for the generator input; represents the real sample data; n represents the number of current samples.
根据定义好的损失函数,设置好超参数训练整个模型。输入数据来自样本集X,输入数据的格式是(x1,x2,…,xn)。According to the defined loss function, set the hyperparameters to train the entire model. The input data comes from the sample set X, and the format of the input data is (x 1 , x 2 , ..., x n ).
检测可疑域名Detect suspicious domains
训练好生成对抗网络,并提取其中的判别器。输入可疑域名的相关信息,且相关信息必须具有数据所有维度的信息。将其输入到判别器,得到当前计算的相似度的值d。Train the generative adversarial network and extract the discriminator. Enter the relevant information of the suspicious domain name, and the relevant information must have information of all dimensions of the data. Input it to the discriminator to get the currently calculated similarity value d.
阈值自适应智能学习Threshold adaptive intelligent learning
由于无法人工确定分类到恶意域名的阈值α的大小,因此,采用阈值自学习的方式。公式如下:Since the size of the threshold α for classifying malicious domain names cannot be determined manually, the threshold self-learning method is adopted. The formula is as follows:
at=min(d,at-1),其中,at表示当前阈值,at-1为前一次阈值, d为相似度值。at =min( d , at -1 ), where at represents the current threshold, at -1 is the previous threshold, and d is the similarity value.
选取测试集对训练好的模型测试,每条测试样本通过判别器D得到的相似度的值与前一次的阈值比较,选取两者中的较小的值作为新的阈值,不断自学习,获取当前样本集最合理的恶意域名检测的阈值。Select the test set to test the trained model. The similarity value obtained by each test sample through the discriminator D is compared with the previous threshold, and the smaller value of the two is selected as the new threshold, and continuous self-learning is obtained. The most reasonable threshold for malicious domain name detection in the current sample set.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911111270.9A CN110830490B (en) | 2019-11-14 | 2019-11-14 | Malicious domain name detection method and system based on area confrontation training deep network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911111270.9A CN110830490B (en) | 2019-11-14 | 2019-11-14 | Malicious domain name detection method and system based on area confrontation training deep network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110830490A CN110830490A (en) | 2020-02-21 |
CN110830490B true CN110830490B (en) | 2022-08-02 |
Family
ID=69555004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911111270.9A Active CN110830490B (en) | 2019-11-14 | 2019-11-14 | Malicious domain name detection method and system based on area confrontation training deep network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110830490B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114205095B (en) * | 2020-08-27 | 2023-08-18 | 极客信安(北京)科技有限公司 | Method and device for detecting encrypted malicious traffic |
CN112217787B (en) * | 2020-08-31 | 2022-11-04 | 北京工业大学 | Method and system for generating mock domain name training data based on ED-GAN |
GB2612008B (en) | 2021-07-06 | 2023-12-27 | British Telecomm | Malware protection |
CN114048836A (en) * | 2021-10-11 | 2022-02-15 | 北京天融信网络安全技术有限公司 | DNS tunnel traffic simulation method, device and detection method |
CN114006752A (en) * | 2021-10-29 | 2022-02-01 | 中电福富信息科技有限公司 | DGA domain name threat detection system based on GAN compression algorithm and training method thereof |
CN114095212B (en) * | 2021-10-29 | 2023-09-01 | 北京天融信网络安全技术有限公司 | Method and device for countertraining DGA domain name detection model |
CN114726823B (en) * | 2022-05-18 | 2022-08-30 | 北京金睛云华科技有限公司 | Domain name generation method, device and equipment based on generation countermeasure network |
CN115022001B (en) * | 2022-05-27 | 2023-05-09 | 中国电子信息产业集团有限公司第六研究所 | Training method and device of domain name recognition model, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110012019A (en) * | 2019-04-11 | 2019-07-12 | 鸿秦(北京)科技有限公司 | A kind of network inbreak detection method and device based on confrontation model |
CN110363243A (en) * | 2019-07-12 | 2019-10-22 | 腾讯科技(深圳)有限公司 | The appraisal procedure and device of disaggregated model |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022132A (en) * | 2016-05-30 | 2016-10-12 | 南京邮电大学 | Real-time webpage Trojan detection method based on dynamic content analysis |
US10592779B2 (en) * | 2017-12-21 | 2020-03-17 | International Business Machines Corporation | Generative adversarial network medical image generation for training of a classifier |
CN108322349B (en) * | 2018-02-11 | 2021-04-06 | 浙江工业大学 | Deep learning adversity attack defense method based on adversity type generation network |
CN109584221B (en) * | 2018-11-16 | 2020-07-28 | 聚时科技(上海)有限公司 | Abnormal image detection method based on supervised generation countermeasure network |
CN110362997B (en) * | 2019-06-04 | 2023-01-17 | 广东工业大学 | A Malicious URL Oversampling Method Based on Generative Adversarial Networks |
CN110210226A (en) * | 2019-06-06 | 2019-09-06 | 深信服科技股份有限公司 | A kind of malicious file detection method, system, equipment and computer storage medium |
-
2019
- 2019-11-14 CN CN201911111270.9A patent/CN110830490B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110012019A (en) * | 2019-04-11 | 2019-07-12 | 鸿秦(北京)科技有限公司 | A kind of network inbreak detection method and device based on confrontation model |
CN110363243A (en) * | 2019-07-12 | 2019-10-22 | 腾讯科技(深圳)有限公司 | The appraisal procedure and device of disaggregated model |
Also Published As
Publication number | Publication date |
---|---|
CN110830490A (en) | 2020-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110830490B (en) | Malicious domain name detection method and system based on area confrontation training deep network | |
CN106713371B (en) | Fast Flux botnet detection method based on DNS abnormal mining | |
CN112910929B (en) | Method and device for malicious domain name detection based on heterogeneous graph representation learning | |
CN109600363B (en) | Internet of things terminal network portrait and abnormal network access behavior detection method | |
CN111131260B (en) | Mass network malicious domain name identification and classification method and system | |
CN106375345B (en) | A method and system for detecting malware domain names based on periodic detection | |
CN105827594B (en) | A kind of dubiety detection method based on domain name readability and domain name mapping behavior | |
US20140101759A1 (en) | Method and system for detecting malware | |
CN102685145A (en) | Domain name server (DNS) data packet-based bot-net domain name discovery method | |
US20140047543A1 (en) | Apparatus and method for detecting http botnet based on densities of web transactions | |
CN107046586B (en) | A kind of algorithm generation domain name detection method based on natural language feature | |
CN108023868B (en) | Malicious resource address detection method and device | |
CN110336789A (en) | Domain-flux Botnet Detection Method Based on Hybrid Learning | |
CN104579773A (en) | Domain name system analysis method and device | |
EP4402862A1 (en) | Malicious homoglyphic domain name detection, generation, and associated cyber security applications | |
CN110650156B (en) | Method and device for clustering relationships of network entities and method for identifying network events | |
CN114050912B (en) | Malicious domain name detection method and device based on deep reinforcement learning | |
CN103457909A (en) | Botnet detection method and device | |
Lei et al. | Detecting malicious domains with behavioral modeling and graph embedding | |
CN113746804B (en) | DNS hidden channel detection method, device, equipment and storage medium | |
Wang et al. | Alert correlation system with automatic extraction of attack strategies by using dynamic feature weights | |
CN110650157B (en) | Fast-flux domain name detection method based on ensemble learning | |
CN117354024A (en) | DNS malicious domain name detection system and method based on big data | |
Shen et al. | Deep Learning powered adversarial sample attack approach for security detection of DGA domain name in cyber physical systems | |
Xuanzhen et al. | Application of passive DNS in cyber security |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231122 Address after: 214000, No. 1-5 Xueqian East Road, Liangxi District, Wuxi City, Jiangsu Province Patentee after: Wuxi Fushi Printing and Painting Education Technology Co.,Ltd. Address before: 215168 no.1188, Wuzhong Avenue, Wuzhong District, Suzhou City, Jiangsu Province Patentee before: SOOCHOW University |