CN115378856B

CN115378856B - Communication detection method, device and storage medium

Info

Publication number: CN115378856B
Application number: CN202210973887.7A
Authority: CN
Inventors: 吴嘉澍; 王洋; 须成忠; 叶可江
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2022-08-15
Filing date: 2022-08-15
Publication date: 2023-07-14
Anticipated expiration: 2042-08-15
Also published as: CN115378856A

Abstract

The invention discloses a communication detection method, equipment and a storage medium, wherein the method is characterized in that an information source for assisting a target to be detected in communication detection is selected from network equipment by utilizing the similarity among data according to a communication detection task, the network equipment comprises the Internet of things and equipment in a network center of the Internet of things, communication data of the target to be detected and communication data of the information source are input into a communication detection model for communication detection, a communication detection result of the target to be detected aiming at a communication category is obtained, and the communication detection model is trained based on a first loss function representing distribution information and importance of each information source and a second loss function representing spatial gathering of the target to be detected and the information source. The invention realizes the complementation of the information between the information source and the target to be detected by using the information carried by the information source as an aid so as to improve the detection accuracy of the communication data of the target to be detected.

Description

Communication detection method, equipment and storage medium

技术领域technical field

本申请涉及物联网技术领域，特别是涉及一种通讯检测方法、装置、设备及存储介质。The present application relates to the technical field of the Internet of Things, in particular to a communication detection method, device, equipment and storage medium.

背景技术Background technique

随着物联网技术的进步，当前，越来越多的物联网设备被用于人们的日常生产生活之中，这些丰富的物联网设备也使得众多重要领域得以转型突破，例如智慧路联网与车联网设备支持了智慧城市与智慧交通，智慧医疗物联网设备支持了更加精准与人性化的智慧医疗。由于物联网设备自身的计算能力相对较弱，因此，物联网设备的使用往往需要一个网络中心的配合，该网络中心可以承载物联网设备所无法轻易完成的大规模数据计算、存储等任务。With the advancement of IoT technology, more and more IoT devices are currently being used in people's daily production and life. These rich IoT devices have also enabled transformation and breakthroughs in many important fields, such as smart road networking and car networking. The equipment supports smart cities and smart transportation, and the smart medical IoT equipment supports more accurate and humanized smart medical care. Since the computing power of IoT devices is relatively weak, the use of IoT devices often requires the cooperation of a network center that can carry large-scale data computing and storage tasks that cannot be easily completed by IoT devices.

物联网及其网络中心均涉及到大量的网络通讯，因此，实现对物联网及其网络中心通讯的检测是十分必要的。对物联网及其网络中心的通讯检测可以是多种多样的。如对通讯内容进行检测，可以用于物联网及其网络中心通讯的内容统计以及使用情况分析等，再如对通讯安全性的检测，可以用于分析物联网及其网络中心的哪些通讯为正常通讯，哪些为异常通讯，从而确保物联网设备及其网络中心的安全性。因此，一个准确的高效的物联网及其网络中心通讯检测方法是十分必要的，其可以用于对于物联网及其网络中心使用模式的分析，用于对其安全的监管与监测等诸多方面，从而确保物联网及其网络中心以一种高效可靠的方式运行。Both the Internet of Things and its network center involve a large number of network communications. Therefore, it is very necessary to realize the detection of the communication of the Internet of Things and its network center. Communication detection for the Internet of Things and its network center can be varied. For example, the detection of communication content can be used for content statistics and usage analysis of the communication of the Internet of Things and its network center, etc. For example, the detection of communication security can be used to analyze which communications of the Internet of Things and its network center are normal communication, which ones are abnormal communication, so as to ensure the security of IoT devices and their network centers. Therefore, an accurate and efficient Internet of Things and its network center communication detection method is very necessary, which can be used for the analysis of the usage mode of the Internet of Things and its network center, and for its security supervision and monitoring, etc. This ensures that the Internet of Things and its network centers operate in an efficient and reliable manner.

然而，物联网设备与物联网设备之间，以及物联网与其网络中心在运行中所面对的通讯是有差别的。例如，物联网设备进行的通讯更多的是数据传输，而物联网中心所进行的通讯则可能会更多偏向于计算与存储的相关通讯。因此，不同物联网设备所面对的通讯以及网络中心所面对的通讯各有所长，但又都不是十分全面。例如，某一种物联网设备群，如果对待进行通讯检测的一方捕获的通讯数据不够丰富，则会导致检测到的结果与实际结果偏差较大，影响通讯检测的效果。However, there are differences in the communication between IoT devices and between IoT devices, and between IoT and its network center in operation. For example, the communication performed by IoT devices is more about data transmission, while the communication performed by IoT centers may be more biased towards computing and storage-related communication. Therefore, the communication faced by different IoT devices and the communication faced by the network center have their own strengths, but they are not very comprehensive. For example, for a group of IoT devices, if the communication data captured by the party to be tested for communication is not rich enough, it will cause a large deviation between the detected result and the actual result, which will affect the effect of communication detection.

发明内容Contents of the invention

有鉴于此，本申请提供一种通讯检测方法、装置、设备及存储介质，以解决现有的物联网及其网络中心等网络设备的通讯检测效果差的问题。In view of this, the present application provides a communication detection method, device, device and storage medium to solve the problem of poor communication detection effect of existing network devices such as the Internet of Things and its network center.

为解决上述技术问题，本申请采用的一个技术方案是：提供一种通讯检测方法，包括：接收通讯检测任务，通讯检测任务包括待测目标、通讯类别以及待测目标所处网络设备；基于待测目标在通讯类别对应的第一通讯数据和网络设备中其他目标在通讯类别对应的第二通讯数据之间的相似度从其他目标中筛选得到信息源；将第一通讯数据和信息源的第三通讯数据输入至通讯检测模型进行通讯检测，得到待测目标针对于通讯类别的通讯检测结果，通讯检测模型包括特征提取网络和全局公共检测器，特征提取网络和全局公共检测器基于表征每个信息源分布信息及其重要性的第一损失函数和表征待测设备待测目标和信息源在空间聚拢上的第二损失函数训练得到。In order to solve the above technical problems, a technical solution adopted by this application is to provide a communication detection method, including: receiving a communication detection task, the communication detection task includes the target to be tested, the communication category and the network device where the target to be tested is located; The similarity between the first communication data corresponding to the communication category of the measured target and the second communication data corresponding to the communication category of other targets in the network device is obtained by screening the information source from other targets; the first communication data and the second communication data of the information source 3. The communication data is input into the communication detection model for communication detection, and the communication detection results of the target to be tested are obtained for the communication category. The communication detection model includes a feature extraction network and a global public detector. The feature extraction network and the global public detector are based on characterizing each The first loss function of information source distribution information and its importance and the second loss function representing the spatial aggregation of the device under test and the information source are obtained through training.

作为本申请的进一步改进，基于待测目标在通讯类别对应的第一通讯数据和网络设备中其他目标在通讯类别对应的第二通讯数据之间的相似度从其他目标中筛选得到信息源，包括：获取待测目标的第一通讯数据和其他目标的第二通讯数据，并利用特征提取网络从所述第一通讯数据中提取得到第一通讯向量，从第二通讯数据提取得到第二通讯向量；根据第一通讯数据在各个通讯类别的第一数据分布、第二通讯数据在各个通讯类别的第二数据分布计算得到统计量；根据统计量对所有其他目标进行排序，得到分布排序；计算第一通讯向量在各个通讯类别的第一均值向量与每个第二通讯向量在各个通讯类别的第二均值向量之间的欧式距离并取均值，得到平均欧式距离；根据平均欧式距离对所有其他目标进行排序，得到欧式排序；根据分布排序和欧式排序确认所有其他目标的最终排序，并根据最终排序选取第一预设数量个其他目标作为信息源。As a further improvement of the present application, based on the similarity between the first communication data corresponding to the communication category of the target to be tested and the second communication data corresponding to the communication category of other targets in the network device, information sources are obtained from other targets, including : Obtain the first communication data of the target to be tested and the second communication data of other targets, and use the feature extraction network to extract the first communication vector from the first communication data, and extract the second communication vector from the second communication data ; According to the first data distribution of the first communication data in each communication category, and the second data distribution of the second communication data in each communication category, the statistics are obtained; all other objects are sorted according to the statistics to obtain the distribution ranking; The Euclidean distance between the first mean vector of each communication category of a communication vector and the second mean vector of each second communication vector in each communication category is averaged to obtain the average Euclidean distance; according to the average Euclidean distance for all other targets Sorting is performed to obtain a European ranking; the final ranking of all other objects is confirmed according to the distribution ranking and the European ranking, and a first preset number of other objects are selected as information sources according to the final ranking.

作为本申请的进一步改进，训练通讯检测模型，具体包括：将待测目标的第一通讯样本数据和信息源的第二通讯样本数据分别输入至特征提取网络进行提取，得到第一通讯样本向量和第二通讯样本向量；利用第一通讯样本向量和第二通讯样本向量计算得到每个信息源的权重；将第一通讯样本向量、第二通讯样本向量分别输入至全局公共检测器进行分布预测，得到待测目标在各个通讯类别的第一预测分布向量和信息源在各个通讯类别的第二预测分布向量；根据第一预测分布向量、第二预测分布向量、权重、第一通讯样本向量、第二通讯样本向量、第一损失函数和第二损失函数计算得到损失函数值；根据损失函数值和预设优化算法对特征提取网络和全局公共检测器进行迭代训练。As a further improvement of the present application, the training of the communication detection model specifically includes: respectively inputting the first communication sample data of the target to be tested and the second communication sample data of the information source into the feature extraction network for extraction, and obtaining the first communication sample vector and The second communication sample vector; using the first communication sample vector and the second communication sample vector to calculate the weight of each information source; respectively input the first communication sample vector and the second communication sample vector to the global public detector for distribution prediction, Obtain the first predicted distribution vector of the target to be tested in each communication category and the second predicted distribution vector of the information source in each communication category; according to the first predicted distribution vector, the second predicted distribution vector, weight, the first communication sample vector, the second predicted distribution vector The second communication sample vector, the first loss function and the second loss function are calculated to obtain the loss function value; the feature extraction network and the global public detector are iteratively trained according to the loss function value and the preset optimization algorithm.

作为本申请的进一步改进，利用第一通讯样本向量和第二通讯样本向量计算得到每个信息源的权重，包括：计算第一通讯样本向量在各个通讯类别的第一均值向量与每个第二通讯样本向量在各个通讯类别的第二均值向量之间的欧式距离并取均值，得到平均欧式距离；根据每个信息源对应的平均欧式距离计算得到信息源的权重，权重的计算公式表示为：As a further improvement of the present application, the weight of each information source is calculated by using the first communication sample vector and the second communication sample vector, including: calculating the first mean value vector of the first communication sample vector in each communication category and each second The Euclidean distance of the communication sample vectors between the second mean vectors of each communication category is averaged to obtain the average Euclidean distance; the weight of the information source is calculated according to the average Euclidean distance corresponding to each information source, and the calculation formula of the weight is expressed as:

其中，ω_j表示第j个信息源的权重，dist_j表示第j个信息源与待测目标之间的平均欧式距离。Among them, ω _j represents the weight of the jth information source, and dist _j represents the average Euclidean distance between the jth information source and the target to be tested.

作为本申请的进一步改进，计算第一损失函数的第一损失函数值，包括：根据第一预测分布向量计算第一预测分布均值向量；根据第一预测分布均值向量和第二预测分布向量计算信息源与待测目标之间的KL散度；根据第一预测分布均值向量、KL散度、权重计算第一损失函数值。As a further improvement of the present application, calculating the first loss function value of the first loss function includes: calculating the first forecast distribution mean vector according to the first forecast distribution vector; calculating the information according to the first forecast distribution mean vector and the second forecast distribution vector KL divergence between the source and the target to be measured; calculate the first loss function value according to the first prediction distribution mean vector, KL divergence, and weight.

作为本申请的进一步改进，第一预测分布均值向量的计算公式为：

其中，/>

表示第一预测分布均值向量，|x^(k)|表示属于第k类通讯类别的通讯数据的数量，C()表示全局公共检测器，f(x)表示特征提取网络，/>

表示第一预测分布向量，T表示温度参数；As a further improvement of the present application, the formula for calculating the mean vector of the first forecast distribution is:

where, />

Indicates the mean vector of the first prediction distribution, |x ^(k) | indicates the number of communication data belonging to the kth communication category, C() indicates the global public detector, f(x) indicates the feature extraction network, />

Represents the first forecast distribution vector, T represents the temperature parameter;

KL散度的计算公式为：

其中，/>

表示第j个信息源与待测目标分布信息之间的KL散度，/>

表示第二预测分布向量；The calculation formula of KL divergence is:

where, />

Indicates the KL divergence between the jth information source and the target distribution information to be measured, />

represents the second predictive distribution vector;

第一损失函数值的计算公式为：

其中，LI为第一损失函数值，k表示通讯类别的数量，n表示信息源的数量，ω_j表示第j个信息源的权重，|χ_D|表示第一通讯样本数据的数量，L_ce()表示交叉熵损失，y_i表示通讯检测类别标签。The calculation formula of the first loss function value is:

Among them, LI is the first loss function value, k represents the number of communication categories, n represents the number of information sources, ω _j represents the weight of the jth information source, |χ _D | represents the number of first communication sample data, L _ce ( ) represents the cross-entropy loss, and _{y i} represents the communication detection category label.

作为本申请的进一步改进，计算第二损失函数的第二损失函数值，包括：将所有信息源的第二通讯样本数据合并为第三通讯样本数据，且将待测目标的第一通讯样本数据和所有信息源的第二通讯样本数据合并为第四通讯样本数据；利用特征提取网络从第三通讯样本数据、第四通讯样本数据中分别提取得到第三通讯样本向量、第四通讯样本向量；分别计算第一通讯样本向量、第三通讯样本向量、第四通讯样本向量在各个通讯类别的第一均值向量、第三均值向量、第四均值向量；计算第一均值向量、第三均值向量、第四均值向量两两之间的欧式距离，再求和，得到第一欧式距离损失函数值；利用聚类算法分别从各个通讯类别的第一通讯样本向量中选取第二预设数量个第一特征点、从各个通讯类别的第三通讯样本向量中选取第二预设数量个第二特征点；计算第一特征点与第二特征点两两之间的欧式距离，并取平均，得到第二欧式距离损失函数值；根据第一欧式距离损失函数值和第二欧式距离损失函数值计算得到第二损失函数值。As a further improvement of the present application, calculating the second loss function value of the second loss function includes: merging the second communication sample data of all information sources into the third communication sample data, and combining the first communication sample data of the target to be tested Merge with the second communication sample data of all information sources into the fourth communication sample data; use the feature extraction network to extract the third communication sample vector and the fourth communication sample vector respectively from the third communication sample data and the fourth communication sample data; Calculate the first average vector, the third average vector, and the fourth average vector of the first communication sample vector, the third communication sample vector, and the fourth communication sample vector in each communication category; calculate the first average vector, the third average vector, The Euclidean distance between the fourth mean value vectors is summed to obtain the first Euclidean distance loss function value; the clustering algorithm is used to select the second preset number of first communication sample vectors from the first communication sample vectors of each communication category. Feature points, select a second preset number of second feature points from the third communication sample vector of each communication category; calculate the Euclidean distance between the first feature point and the second feature point, and take the average to obtain the second feature point Two Euclidean distance loss function values; the second loss function value is calculated according to the first Euclidean distance loss function value and the second Euclidean distance loss function value.

作为本申请的进一步改进，第一欧式距离损失函数值的计算公式为：As a further improvement of the present application, the calculation formula of the first Euclidean distance loss function value is:

其中，LG表示第一欧式距离损失函数值，

表示第一均值向量，/>

表示第三均值向量，/>

表示第四均值向量，k表示通讯类别的数量，/>

表示欧式距离；Among them, LG represents the first Euclidean distance loss function value,

represents the first mean vector, />

represents the third mean vector, />

Represents the fourth mean vector, k represents the number of communication categories, />

Indicates the Euclidean distance;

第二欧式距离损失函数值的计算公式为：The formula for calculating the value of the second Euclidean distance loss function is:

其中，LL表示第二欧式距离损失函数值，R表示第二预设数量，

表示第一通讯样本向量中选取的R个第i类通讯类别的第一特征点中的第n个，/>

表示第三通讯样本向量中选取的R个第i类通讯类别的第二特征点中的第m个。Among them, LL represents the second Euclidean distance loss function value, R represents the second preset number,

Represents the nth one of the first feature points of the R i-th communication categories selected in the first communication sample vector, />

Represents the mth of the second feature points of the R i-th communication categories selected in the third communication sample vector.

为解决上述技术问题，本申请采用的又一个技术方案是：提供一种通讯检测装置，包括：接收模块，用于接收通讯检测任务，通讯检测任务包括待测目标、通讯类别以及待测目标所处网络设备；选取模块，用于基于待测目标在通讯类别对应的第一通讯数据和网络设备中其他目标在通讯类别对应的第二通讯数据之间的相似度从其他目标中筛选得到信息源；预测模块，用于将第一通讯数据和信息源的第三通讯数据输入至通讯检测模型进行通讯检测，得到待测目标针对于通讯类别的通讯检测结果，通讯检测模型包括特征提取网络和全局公共检测器，特征提取网络和全局公共检测器基于表征每个信息源分布信息及其重要性的第一损失函数和表征待测设备待测目标和信息源在空间聚拢上的第二损失函数训练得到。In order to solve the above technical problems, another technical solution adopted by the present application is to provide a communication detection device, including: a receiving module for receiving communication detection tasks, and the communication detection tasks include the target to be tested, the communication category and the target to be tested. A network device; a selection module, which is used to screen and obtain information sources from other targets based on the similarity between the first communication data corresponding to the communication category of the target to be tested and the second communication data corresponding to the communication category of other targets in the network device The prediction module is used to input the first communication data and the third communication data of the information source into the communication detection model for communication detection, and obtain the communication detection result of the target to be tested for the communication category. The communication detection model includes a feature extraction network and a global The public detector, the feature extraction network and the global public detector are trained based on the first loss function that characterizes the distribution information and importance of each information source and the second loss function that characterizes the spatial aggregation of the device under test and the information source get.

为解决上述技术问题，本申请采用的再一个技术方案是：提供一种计算机设备，所述计算机设备包括处理器、与所述处理器耦接的存储器，所述存储器中存储有程序指令，所述程序指令被所述处理器执行时，使得所述处理器执行如上述任一项的通讯检测方法的步骤。In order to solve the above technical problems, another technical solution adopted by the present application is to provide a computer device, the computer device includes a processor, a memory coupled to the processor, and program instructions are stored in the memory, so When the program instructions are executed by the processor, the processor is made to perform the steps of any one of the communication detection methods described above.

为解决上述技术问题，本申请采用的再一个技术方案是：提供一种存储介质，存储有能够实现上述任一项的通讯检测方法的程序指令。In order to solve the above-mentioned technical problems, another technical solution adopted by the present application is to provide a storage medium storing program instructions capable of implementing any of the above-mentioned communication detection methods.

本申请的有益效果是：本申请的通讯检测方法通过根据通讯检测任务，利用数据之间的相似性网络设备中选取用以辅助待测目标进行通讯检测的信息源，该网络设备包括物联网及其网络中心中的设备，再将待测目标的通讯数据和信息源的通讯数据输入至通讯检测模型进行通讯检测，得到待测目标针对于通讯类别的通讯检测结果，该通讯检测模型基于表征每个信息源分布信息及其重要性的第一损失函数和表征待测目标和信息源在空间聚拢上的第二损失函数训练得到，从而能够在检测待测目标的通讯数据时，以信息源作为辅助，利用信息源与待测目标进行信息之间的互补，帮助对待测目标的通讯数据进行更加精细化的检测，提升对待测目标通讯数据的检测效果，提高了检测准确性。The beneficial effect of the present application is: the communication detection method of the present application selects the information source used to assist the target to be tested in the communication detection by using the similarity between the data according to the communication detection task, and the network equipment includes the Internet of Things and The equipment in its network center then inputs the communication data of the target to be tested and the communication data of the information source into the communication detection model for communication detection, and obtains the communication detection results of the target to be tested for the communication category. The communication detection model is based on characterizing each The first loss function of the distribution information of an information source and its importance and the second loss function representing the spatial aggregation of the target to be tested and the information source are trained, so that when detecting the communication data of the target to be tested, the information source can be used as the Auxiliary, using the complementary information between the information source and the target to be tested, helps to detect the communication data of the target to be tested in a more refined manner, improves the detection effect of the communication data of the target to be tested, and improves the detection accuracy.

附图说明Description of drawings

图1是本发明实施例的通讯检测方法的一流程示意图；FIG. 1 is a schematic flow chart of a communication detection method according to an embodiment of the present invention;

图2是本发明实施例的通讯检测装置的功能模块示意图；2 is a schematic diagram of functional modules of a communication detection device according to an embodiment of the present invention;

图3是本发明实施例的计算机设备的结构示意图；Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention;

图4是本发明实施例的存储介质的结构示意图。FIG. 4 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本申请的一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

本申请中的术语“第一”、“第二”、“第三”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”、“第三”的特征可以明示或者隐含地包括至少一个该特征。本申请的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。本申请实施例中所有方向性指示(诸如上、下、左、右、前、后……)仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等，如果该特定姿态发生改变时，则该方向性指示也相应地随之改变。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", and "third" in this application are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, features defined as "first", "second", and "third" may explicitly or implicitly include at least one of these features. In the description of the present application, "plurality" means at least two, such as two, three, etc., unless otherwise specifically defined. All directional indications (such as up, down, left, right, front, back...) in the embodiments of the present application are only used to explain the relative positional relationship between the various components in a certain posture (as shown in the drawings) , sports conditions, etc., if the specific posture changes, the directional indication also changes accordingly. Furthermore, the terms "include" and "have", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or apparatuses.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.

本实施例的通讯检测方法应用于物联网及其网络中心中的网络设备，该网络设备包括但不限于计算机、手机、平板等。而由于物联网中网络设备自身的计算能力相对较弱，因此，物联网中网络设备的使用往往需要一个网络中心的配合，该网络中心可以承载物联网网络设备所无法轻易完成的大规模数据计算、存储等任务。需要理解的是，物联网及其网络中心中网络设备均可以在不同的场景下进行优势互补，被视为通讯检测信息丰富的信息源或者通讯检测信息不足的待测目标，并且，该待测目标并不局限于一台设备，其还可以是配置完全相同的一个设备集群，或者是物联网的网络中心，同样，信息源也可以是一台设备或者是配置完全相同的一个设备集群，或者是物联网的网络中心。本实施例中的互补则是指物联网中的网络设备，或是物联网的网络中心在某些通讯检测场景下可能成为信息源，即在通讯检测过程中提供帮助的一方，而在另一些通讯检测场景下可能成为待测目标，即在通讯检测过程中需要帮助的一方。因此，物联网网络设备与物联网网络设备之间，以及物联网网络设备与网络中心之间，形成了一种优势互补的关系。基于该种优势互补关系，本发明提出一种通讯检测方法，以实现对待测目标的更加精准的通讯检测。The communication detection method of this embodiment is applied to network devices in the Internet of Things and its network center, and the network devices include but not limited to computers, mobile phones, tablets, and the like. However, because the computing power of network devices in the Internet of Things is relatively weak, the use of network devices in the Internet of Things often requires the cooperation of a network center that can carry large-scale data calculations that cannot be easily completed by Internet of Things network devices. , storage and other tasks. What needs to be understood is that the Internet of Things and network devices in its network center can complement each other in different scenarios, and can be regarded as an information source with rich communication detection information or a target to be tested with insufficient communication detection information. The target is not limited to one device, it can also be a device cluster with the same configuration, or the network center of the Internet of Things, similarly, the information source can also be a device or a device cluster with the same configuration, or It is the network center of the Internet of Things. Complementary in this embodiment means that the network equipment in the Internet of Things, or the network center of the Internet of Things may become an information source in some communication detection scenarios, that is, the party that provides assistance in the communication detection process, and in other communication detection In the detection scenario, it may become the target to be tested, that is, the party that needs help during the communication detection process. Therefore, a relationship of complementary advantages is formed between the Internet of Things network device and the Internet of Things network device, and between the Internet of Things network device and the network center. Based on this complementary relationship of advantages, the present invention proposes a communication detection method to achieve more accurate communication detection of the target to be tested.

图1是本发明实施例的通讯检测方法的流程示意图。需注意的是，若有实质上相同的结果，本发明的方法并不以图1所示的流程顺序为限。如图1所示，该通讯检测方法包括步骤：FIG. 1 is a schematic flowchart of a communication detection method according to an embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in FIG. 1 if substantially the same result is obtained. As shown in Figure 1, the communication detection method includes steps:

步骤S101：接收通讯检测任务，通讯检测任务包括待测目标、通讯类别以及待测目标所处网络设备。Step S101: Receive a communication detection task, the communication detection task includes a target to be tested, a communication category, and a network device where the target to be tested is located.

具体地，需要说明的是，网络设备是指物联网及其网络中心中的设备，本实施例中，通讯检测任务是指针对于物联网中设备上的数据进行检测，而基于通讯检测任务的不同，网络设备上的数据可划分为多个通讯类别，例如，当通讯检测任务为入侵数据检测时，该通讯类别划分为通讯入侵类别，需要对设备中的入侵数据进行通讯检测，当通讯检测任务为访问数据检测时，则该通讯类别划分为通讯访问类别，需要对设备中的访问数据进行通讯检测。Specifically, it should be noted that the network equipment refers to the equipment in the Internet of Things and its network center. In this embodiment, the communication detection task refers to the detection of data on the equipment in the Internet of Things, and based on the difference in communication detection tasks, The data on the network device can be divided into multiple communication categories. For example, when the communication detection task is intrusion data detection, the communication category is divided into the communication intrusion category, and the intrusion data in the device needs to be detected by communication. When the communication detection task is When the access data is detected, the communication category is classified as a communication access category, and communication detection needs to be performed on the access data in the device.

步骤S102：基于待测目标在通讯类别对应的第一通讯数据和网络设备中其他目标在通讯类别对应的第二通讯数据之间的相似度从其他目标中筛选得到信息源。Step S102: Based on the similarity between the first communication data corresponding to the communication category of the target to be tested and the second communication data corresponding to the communication category of other targets in the network device, obtain information sources from other targets.

具体地，在接收到通讯检测任务后，为了提高对待测目标的检测效果，需要从物联网中选取信息源来辅助对待测目标的通讯数据检测。通过待测目标的通讯数据与物联网中其他目标之间的数据相似性筛选信息源。Specifically, after receiving the communication detection task, in order to improve the detection effect of the target to be tested, it is necessary to select information sources from the Internet of Things to assist in the detection of communication data of the target to be tested. Information sources are screened through the data similarity between the communication data of the target to be tested and other targets in the Internet of Things.

进一步的，步骤S102具体包括：Further, step S102 specifically includes:

1、获取待测目标的第一通讯数据和其他目标的第二通讯数据，并利用特征提取网络从第一通讯数据提取得到第一通讯向量，从第二通讯数据提取得到第二通讯向量。1. Obtain the first communication data of the target to be tested and the second communication data of other targets, and use the feature extraction network to extract the first communication vector from the first communication data, and extract the second communication vector from the second communication data.

具体地，将待测目标数据输入至特征提取网络，提取得到第一通讯向量，并将物联网及其网络中心中除待测目标之外的其他所有网络设备作为其他目标，其获取每个其他目标的通讯数据，并将通数据输入至特征提取网络，提取得到第二通讯向量。Specifically, the data of the target to be tested is input into the feature extraction network, and the first communication vector is extracted, and all other network devices in the Internet of Things and its network center except the target to be tested are used as other targets, and each other communication data of the target, and input the communication data to the feature extraction network to extract the second communication vector.

2、根据第一通讯数据在各个通讯类别的第一数据分布、第二通讯数据在各个通讯类别的第二数据分布计算得到统计量。2. Calculate and obtain statistics according to the first data distribution of the first communication data in each communication category and the second data distribution of the second communication data in each communication category.

需要说明的是，网络设备产生的通讯数据通常包括多个通讯类别，每个通讯类别包括多个数据。具体地，在得到第一通讯数据后，统计待测目标在各个通讯类别的数据的个数，即可获得待测目标的通讯数据的分布，例如，待测目标的通讯数据中有100条属于A类通讯，600条属于B类通讯，等等。从而，获得第一通讯数据在各个通讯类别的第一数据分布、第二通讯数据在各个通讯类别的第二数据分布，再基于卡方拟合优度检测方法计算得到统计量。需要说明的是，本实施例中的通讯类别是指在第一通讯数据和第二通讯数据共有的通讯类别。It should be noted that the communication data generated by the network device usually includes multiple communication categories, and each communication category includes multiple data. Specifically, after obtaining the first communication data, the distribution of the communication data of the target to be tested can be obtained by counting the number of data of each communication category of the target to be tested. For example, 100 pieces of communication data of the target to be tested belong to Class A communications, 600 belong to Class B communications, and so on. Thus, the first data distribution of the first communication data in each communication category and the second data distribution of the second communication data in each communication category are obtained, and then the statistics are calculated based on the chi-square goodness of fit detection method. It should be noted that the communication category in this embodiment refers to the communication category shared by the first communication data and the second communication data.

该统计量的计算公式为：The formula for calculating this statistic is:

其中，χ²表示统计量，k为通讯类别的数目，O_i为信息源的通讯数据处于第i类通讯类别的通讯数据的个数，E_i为待测目标通讯数据处于第i类通讯类别的通讯数据的个数。Wherein, ^χ represents a statistic, k is the number of communication categories, O _i is the number of communication data of the communication data of the i-type communication category for the communication data of the information source, and E _i is the i-th category of communication data for the target communication data to be measured. The number of communication data.

3、根据统计量对所有其他目标进行排序，得到分布排序。3. Sort all other targets according to the statistics to get the distribution sort.

具体地，在得到统计量后，确认每个统计量对应的其他目标，再将所有其他目标按照统计量由低至高升序排序，从而得到分布排序。Specifically, after obtaining the statistic, confirm the other targets corresponding to each statistic, and then sort all other targets in ascending order according to the statistic, so as to obtain the distribution ranking.

4、计算第一通讯向量在各个通讯类别的第一均值向量与每个第二通讯向量在各个通讯类别的第二均值向量之间的欧式距离并取均值，得到平均欧式距离。4. Calculate the Euclidean distance between the first mean value vector of the first communication vector in each communication category and the second mean value vector of each second communication vector in each communication category and take the mean value to obtain the average Euclidean distance.

具体地，平均欧式距离的计算公式为：

其中，l表示平均欧式距离，k表示通讯类别的数目，/>

表示信息源第i类通讯类别的通讯数据的均值，/>

表示待测目标第i类通讯类别的通讯数据的均值Specifically, the formula for calculating the average Euclidean distance is:

Among them, l represents the average Euclidean distance, k represents the number of communication categories, />

Indicates the mean value of the communication data of the i-th communication category of the information source, />

Indicates the mean value of the communication data of the i-th communication category of the target to be tested

5、根据平均欧式距离对所有其他目标进行排序，得到欧式排序。5. Sort all other objects according to the average Euclidean distance to obtain a Euclidean ranking.

具体地，在得到每个信息源对应的平均欧式距离后，根据平均欧式距离对信息源进行升序排序，得到欧式排序。Specifically, after the average Euclidean distance corresponding to each information source is obtained, the information sources are sorted in ascending order according to the average Euclidean distance to obtain the Euclidean sorting.

6、根据分布排序和欧式排序确认所有其他目标的最终排序，并根据最终排序选取第一预设数量个其他目标作为信息源。6. Confirm the final ranking of all other objects according to the distribution ranking and European ranking, and select a first preset number of other objects as information sources according to the final ranking.

具体地，对于每一个其他目标，确认其在分布排序中对应的第一序号，同时确认其在欧式排序的第二序号，以第一序号和第二序号的均值作为最终排序的序号，从而得到每个其他目标的最终排序，再从最终排序中选取第一预设数量个其他目标作为信息源。需要说明的是，该第一预设数量预先设置，如2个、5个等。Specifically, for each other target, confirm its corresponding first serial number in the distribution sorting, and confirm its second serial number in the European sorting, and use the average of the first serial number and the second serial number as the final sorting serial number, so as to obtain The final ranking of each other object, and then selecting a first preset number of other objects from the final ranking as information sources. It should be noted that the first preset number is preset, such as 2, 5, and so on.

本实施例中，通过采用与待测目标较为相近的信息源，可以更好的辅助待测目标进行更加精准的通讯检测。In this embodiment, by using an information source that is relatively similar to the target to be tested, the target to be tested can be better assisted to perform more accurate communication detection.

步骤S103：将第一通讯数据和信息源的第三通讯数据输入至通讯检测模型进行通讯检测，得到待测目标针对于通讯类别的通讯检测结果，通讯检测模型包括特征提取网络和全局公共检测器，特征提取网络和全局公共检测器基于表征每个信息源分布信息及其重要性的第一损失函数和表征待测设备待测目标和信息源在空间聚拢上的第二损失函数训练得到。Step S103: Input the first communication data and the third communication data of the information source into the communication detection model for communication detection, and obtain the communication detection results of the target to be tested for the communication category. The communication detection model includes a feature extraction network and a global public detector , the feature extraction network and the global public detector are trained based on the first loss function that characterizes the distribution information and importance of each information source and the second loss function that characterizes the spatial aggregation of the device under test and the information source.

具体地，在得到第一通讯数据和信息源的第三通讯数据后，将第一通讯数据和第三通讯数据输入至通讯检测模型进行通讯检测，得到待测目标针对于通讯类别的通讯检测结果。Specifically, after obtaining the first communication data and the third communication data of the information source, input the first communication data and the third communication data into the communication detection model for communication detection, and obtain the communication detection result of the target to be tested for the communication type .

需要说明的是，通讯检测模型包括特征提取网络和全局公共检测器，为了保证通信检测模型的检测效果，该特征提取网络和全局公共检测器基于表征每个信息源分布信息及其重要性的第一损失函数和表征待测设备待测目标和信息源在空间聚拢上的第二损失函数训练得到。It should be noted that the communication detection model includes a feature extraction network and a global public detector. In order to ensure the detection effect of the communication detection model, the feature extraction network and the global public detector are based on the first-order representation of the distribution information and importance of each information source. A loss function and a second loss function representing the space aggregation of the test target and the information source of the device under test are obtained through training.

进一步的，训练通讯检测模型，具体包括：Further, train the communication detection model, specifically including:

1、将待测目标的第一通讯样本数据和信息源的第二通讯样本数据分别输入至特征提取网络进行提取，得到第一通讯样本向量和第二通讯样本向量。1. Input the first communication sample data of the target to be tested and the second communication sample data of the information source into the feature extraction network for extraction, and obtain the first communication sample vector and the second communication sample vector.

本实施例中，需要理解的是，物联网中不同设备产生的通讯数据可能拥有不同的维度，例如，信息源为车载车联网设备，其通讯数据为一个10维向量，而物联网网络中心(待测目标)的通讯数据可能为一个160维向量，因此，本发明实施例需要构建特征提取网络，该特征提取网络包括一个两层的全连接神经网络，采取包括但不限于ReLU的激活函数。进一步的，本实施例中，特征提取网络的数量与待测目标和信息源的总数相同，每个特征提取网络分别负责从一种设备的通讯数据中提取通讯向量。In this embodiment, it should be understood that the communication data generated by different devices in the Internet of Things may have different dimensions. For example, the information source is a vehicle-mounted Internet of Vehicles device, and its communication data is a 10-dimensional vector, while the Internet of Things network center ( The communication data of the target to be tested) may be a 160-dimensional vector. Therefore, the embodiment of the present invention needs to construct a feature extraction network, which includes a two-layer fully connected neural network, and adopts an activation function including but not limited to ReLU. Further, in this embodiment, the number of feature extraction networks is the same as the total number of objects to be tested and information sources, and each feature extraction network is responsible for extracting communication vectors from communication data of a device.

具体地，在对物联网中某一种网络设备进行通讯检测之前，以该网络设备作为待测目标，以基于预设规则指定的网络设备作为信息源，利用信息源的通讯数据作为辅助来提升对待测目标的通讯数据的检测效果。在获得待测目标的第一通讯样本数据和信息源的第二通讯样本数据后，利用特征提取网络分别从第一通讯样本数据、第二通讯样本数据提取得到第一通讯样本向量、第二通讯样本向量。Specifically, before performing communication detection on a certain network device in the Internet of Things, the network device is used as the target to be tested, the network device specified based on preset rules is used as the information source, and the communication data of the information source is used as an auxiliary to improve The detection effect of the communication data of the target to be tested. After obtaining the first communication sample data of the target to be tested and the second communication sample data of the information source, the feature extraction network is used to extract the first communication sample data and the second communication sample data to obtain the first communication sample vector and the second communication sample data respectively. sample vector.

2、利用第一通讯样本向量和第二通讯样本向量计算得到每个信息源的权重。2. Using the first communication sample vector and the second communication sample vector to calculate the weight of each information source.

需要说明的是，为了对信息源进行加权以反映每个信息源的重要性，即信息源将通讯检测信息传递给待测目标的传递程度，算法会根据信息源与待测目标之间的特征空间差异进行权重赋值。It should be noted that, in order to weight the information sources to reflect the importance of each information source, that is, the degree to which the information source transmits the communication detection information to the target to be tested, the algorithm will use the characteristic between the information source and the target to be tested Spatial differences are used for weight assignment.

具体地，利用第一通讯样本向量和第二通讯样本向量计算得到每个信息源的权重，包括：Specifically, the weight of each information source is calculated by using the first communication sample vector and the second communication sample vector, including:

2.1、计算第一通讯样本向量在各个通讯类别的第一均值向量与每个第二通讯样本向量在各个通讯类别的第二均值向量之间的欧式距离并取均值，得到平均欧式距离。2.1. Calculate the Euclidean distance between the first mean value vector of the first communication sample vector in each communication category and the second mean value vector of each second communication sample vector in each communication category and take the mean value to obtain the average Euclidean distance.

具体地，平均欧式距离的计算方式请参阅上文，此处不再赘述。Specifically, for the calculation method of the average Euclidean distance, please refer to the above, which will not be repeated here.

2.2、根据每个信息源对应的平均欧式距离计算得到信息源的权重，权重的计算公式表示为：2.2. Calculate the weight of the information source according to the average Euclidean distance corresponding to each information source. The formula for calculating the weight is expressed as:

本实施例中，根据上述权重计算公式所计算出来的权重处在[0.75,1.25]区间范围内，这使得即使信息源与待测目标相距较近，该信息源也不会出现0权重从而因此失去对于待测目标的影响。此外，当信息源与待测目标相距较近时，权重较小，此时意味着待测目标已经较为充分的获取了该信息源所传递的通讯检测信息，因此，该信息源被赋予了一个相对较小的权重以反映其较小的重要性；反之，当信息源与待测目标相距较远时，权重较大，此时意味着待测目标尚未较为充分的获取该信息源所传递的通讯检测信息，因此，该信息源被赋予了一个相对较大的权重以增强其重要性，从而使得待测目标可以更好的掌握该信息源所具备的通讯检测信息。In this embodiment, the weight calculated according to the above weight calculation formula is within the range of [0.75, 1.25], which makes it impossible for the information source to have a weight of 0 even if the distance between the information source and the target to be measured is relatively close, so that Lose influence on the target under test. In addition, when the distance between the information source and the target to be tested is relatively small, the weight is small, which means that the target to be tested has already obtained the communication detection information delivered by the information source more fully. Therefore, the information source is given a The relatively small weight reflects its small importance; on the contrary, when the information source is far away from the target to be tested, the weight is large, which means that the target to be tested has not yet fully obtained the information conveyed by the information source. Communication detection information, therefore, the information source is given a relatively large weight to enhance its importance, so that the target to be tested can better grasp the communication detection information possessed by the information source.

3、将第一通讯样本向量、第二通讯样本向量分别输入至全局公共检测器进行分布预测，得到待测目标在各个通讯类别的第一预测分布向量和信息源在各个通讯类别的第二预测分布向量。3. Input the first communication sample vector and the second communication sample vector to the global public detector for distribution prediction, and obtain the first prediction distribution vector of the target to be tested in each communication category and the second prediction of the information source in each communication category distribution vector.

具体地，将信息源与待测目标的原始通讯数据通过特征提取网络映射至全局特征空间之后，利用全局公共检测器进行检测。Specifically, after the original communication data of the information source and the target to be tested are mapped to the global feature space through the feature extraction network, the global common detector is used for detection.

4、根据第一预测分布向量、第二预测分布向量、权重、第一通讯样本向量、第二通讯样本向量、第一损失函数和第二损失函数计算得到损失函数值。4. Calculate the loss function value according to the first prediction distribution vector, the second prediction distribution vector, the weight, the first communication sample vector, the second communication sample vector, the first loss function, and the second loss function.

具体地，第一损失函数用于表示不同信息源分布信息及其重要性，第二损失函数表示在空间特征这一角度对信息源的通讯数据和待测目标的通讯数据进行聚拢。Specifically, the first loss function is used to represent the distribution information of different information sources and their importance, and the second loss function represents the aggregation of the communication data of the information source and the communication data of the target to be measured from the perspective of spatial characteristics.

进一步的，计算第一损失函数的第一损失函数值，包括：Further, calculating the first loss function value of the first loss function includes:

4.11根据第一预测分布向量计算第一预测分布均值向量。4.11 Calculate the first predictive distribution mean vector according to the first predictive distribution vector.

其中，第一预测分布均值向量的计算公式为：

其中，/>

表示第一预测分布均值向量，|χ^(k)|表示属于第k类通讯类别的通讯数据的数量，C()表示全局公共检测器，f(x)表示特征提取网络，/>

表示第一预测分布向量，T表示温度参数。Among them, the calculation formula of the mean vector of the first forecast distribution is:

where, />

Represents the mean vector of the first prediction distribution, |χ ^(k) | represents the number of communication data belonging to the kth communication category, C() represents the global public detector, f(x) represents the feature extraction network, />

Represents the first predicted distribution vector, T represents the temperature parameter.

4.12根据第一预测分布均值向量和第二预测分布向量计算信息源与待测目标之间的KL散度。4.12 Calculate the KL divergence between the information source and the target to be measured according to the first predictive distribution mean vector and the second predictive distribution vector.

其中，KL散度的计算公式为：

其中，/>

表示第j个信息源与待测目标分布信息之间的KL散度，/>

表示第二预测分布向量。Among them, the calculation formula of KL divergence is:

where, />

represents the second predictive distribution vector.

4.13、根据第一预测分布均值向量、KL散度、权重计算第一损失函数值。4.13. Calculate the first loss function value according to the first prediction distribution mean vector, KL divergence, and weight.

其中，第一损失函数值的计算公式为：

其中，LI为第一损失函数值，k表示通讯类别的数量，n表示信息源的数量，ω_j表示第j个信息源的权重，|χ_D|表示第一通讯样本数据的数量，L_ce()表示交叉熵损失，y_i表示通讯检测类别标签。Among them, the calculation formula of the first loss function value is:

本实施例中，可以实现在通讯检测过程中将分布信息从信息丰富的信息源传递至待测目标，并加以权重以动态的反映信息源中信息的重要性，从而更好的实现精准的通讯检测。In this embodiment, the distribution information can be transferred from the information-rich information source to the target to be tested during the communication detection process, and the weight can be added to dynamically reflect the importance of the information in the information source, so as to better realize accurate communication detection.

进一步的，计算第二损失函数的第二损失函数值，包括：Further, calculating the second loss function value of the second loss function includes:

4.21、将所有信息源的第二通讯样本数据合并为第三通讯样本数据，且将待测目标的第一通讯样本数据和所有信息源的第二通讯样本数据合并为第四通讯样本数据。4.21. Combine the second communication sample data of all information sources into the third communication sample data, and combine the first communication sample data of the target to be tested and the second communication sample data of all information sources into the fourth communication sample data.

具体地，首先从特征空间距离的角度辅助信息源与待测目标之间的通讯检测信息传递。Specifically, firstly, the communication detection information transfer between the information source and the target to be tested is assisted from the perspective of feature space distance.

4.22、利用特征提取网络从第三通讯样本数据、第四通讯样本数据中分别提取得到第三通讯样本向量、第四通讯样本向量。4.22. Extract the third communication sample vector and the fourth communication sample vector from the third communication sample data and the fourth communication sample data respectively by using the feature extraction network.

4.23、分别计算第一通讯样本向量、第三通讯样本向量、第四通讯样本向量在各个通讯类别的第一均值向量、第三均值向量、第四均值向量。4.23. Calculate the first average vector, the third average vector, and the fourth average vector of the first communication sample vector, the third communication sample vector, and the fourth communication sample vector in each communication category, respectively.

具体地，每个通讯类别对应有至少一个通讯数据，对所有的通讯数据对应的向量取均值即可得到每个通讯类别的均值向量。Specifically, each communication category corresponds to at least one piece of communication data, and the mean value vector of each communication category can be obtained by taking the mean value of the vectors corresponding to all the communication data.

4.24、计算第一均值向量、第三均值向量、第四均值向量两两之间的欧式距离，再求和，得到第一欧式距离损失函数值。4.24. Calculate the Euclidean distance between the first mean vector, the third mean vector, and the fourth mean vector, and then sum them to obtain the first Euclidean distance loss function value.

其中，第一欧式距离损失函数值的计算公式为：Among them, the calculation formula of the first Euclidean distance loss function value is:

其中，LG表示第一欧式距离损失函数值，

表示第一均值向量，/>

表示第三均值向量，/>

表示第四均值向量，k表示通讯类别的数量，/>

表示欧式距离。Among them, LG represents the first Euclidean distance loss function value,

represents the first mean vector, />

represents the third mean vector, />

represents the Euclidean distance.

4.25、利用聚类算法分别从各个通讯类别的第一通讯样本向量中选取第二预设数量个第一特征点、从各个通讯类别的第三通讯样本向量中选取第二预设数量个第二特征点。4.25. Use a clustering algorithm to select a second preset number of first feature points from the first communication sample vectors of each communication category, and select a second preset number of second feature points from the third communication sample vectors of each communication category. Feature points.

需要说明的是，还需要从局部的代表性点的角度出发，对信息源与待测目标进行空间聚拢。其中，该聚类算法包括但不限于Kmeans++的聚类算法。It should be noted that it is also necessary to spatially gather information sources and targets to be measured from the perspective of local representative points. Wherein, the clustering algorithm includes but not limited to the clustering algorithm of Kmeans++.

4.26、计算第一特征点与第二特征点两两之间的欧式距离，并取平均，得到第二欧式距离损失函数值。4.26. Calculate the Euclidean distance between the first feature point and the second feature point, and take the average to obtain the second Euclidean distance loss function value.

其中，第二欧式距离损失函数值的计算公式为：Among them, the calculation formula of the second Euclidean distance loss function value is:

4.27、根据第一欧式距离损失函数值和第二欧式距离损失函数值计算得到第二损失函数值。4.27. Calculate the second loss function value according to the first Euclidean distance loss function value and the second Euclidean distance loss function value.

具体地，第二损失函数值＝第一欧式距离损失函数值+第二欧式距离损失函数值。Specifically, the second loss function value=the first Euclidean distance loss function value+the second Euclidean distance loss function value.

5、根据损失函数值和预设优化算法对特征提取网络和全局公共检测器进行迭代训练。5. Iteratively train the feature extraction network and the global public detector according to the loss function value and the preset optimization algorithm.

具体地，该预设优化算法包括但不限于随机梯度下降法等优化算法。Specifically, the preset optimization algorithm includes but is not limited to optimization algorithms such as stochastic gradient descent method.

本实施例的通讯检测方法通过根据通讯检测任务，利用数据之间的相似性从网络设备中选取用以辅助待测目标进行通讯检测的信息源，该网络设备包括物联网及其网络中心中的设备，再将待测目标的通讯数据和信息源的通讯数据输入至通讯检测模型进行通讯检测，得到待测目标针对于通讯类别的通讯检测结果，该通讯检测模型基于表征每个信息源分布信息及其重要性的第一损失函数和表征待测目标和信息源在空间聚拢上的第二损失函数训练得到，从而能够在检测待测目标的通讯数据时，以信息源作为辅助，利用信息源与待测目标进行信息之间的互补，帮助对待测目标的通讯数据进行更加精细化的检测，提升对待测目标通讯数据的检测效果，提高了检测准确性。According to the communication detection method of this embodiment, the similarity between the data is used to select the information source from the network equipment to assist the target to be tested to perform communication detection according to the communication detection task. The network equipment includes the Internet of Things and its network center. equipment, and then input the communication data of the target to be tested and the communication data of the information source into the communication detection model for communication detection, and obtain the communication detection results of the target to be tested for the communication category. The communication detection model is based on the distribution information representing each information source The first loss function of its importance and the second loss function representing the spatial aggregation of the target to be tested and the information source are trained, so that when detecting the communication data of the target to be tested, the information source can be used as an auxiliary, and the information source can be used Complementary information with the target to be tested helps to detect the communication data of the target to be tested in a more refined manner, improves the detection effect of the communication data of the target to be tested, and improves the detection accuracy.

图2是本发明实施例的通讯检测装置的功能模块示意图。如图2所示，该通讯检测装置20包括接收模块21、选取模块22和预测模块23。FIG. 2 is a schematic diagram of functional modules of a communication detection device according to an embodiment of the present invention. As shown in FIG. 2 , the communication detection device 20 includes a receiving module 21 , a selection module 22 and a prediction module 23 .

接收模块21，用于接收通讯检测任务，通讯检测任务包括待测目标、通讯类别以及待测目标所处网络设备；选取模块22，用于基于待测目标在通讯类别对应的第一通讯数据和网络设备中其他目标在通讯类别对应的第二通讯数据之间的相似度从其他目标中筛选得到信息源；预测模块23，用于将第一通讯数据和信息源的第三通讯数据输入至通讯检测模型进行通讯检测，得到待测目标针对于通讯类别的通讯检测结果，通讯检测模型包括特征提取网络和全局公共检测器，特征提取网络和全局公共检测器基于表征每个信息源分布信息及其重要性的第一损失函数和表征待测设备待测目标和信息源在空间聚拢上的第二损失函数训练得到。The receiving module 21 is used to receive the communication detection task, and the communication detection task includes the target to be tested, the communication category and the network device where the target to be tested is located; the selection module 22 is used to correspond to the first communication data and The similarity between the second communication data corresponding to the communication category of other targets in the network device is obtained by screening the information source from other targets; the prediction module 23 is used to input the first communication data and the third communication data of the information source into the communication The detection model performs communication detection and obtains the communication detection results of the target to be tested for the communication category. The communication detection model includes a feature extraction network and a global public detector. The feature extraction network and the global public detector are based on characterizing the distribution information of each information source and its The first loss function of importance and the second loss function representing the spatial aggregation of the target of the device under test and the information source are obtained through training.

可选地，选取模块22执行基于待测目标在通讯类别对应的第一通讯数据和网络设备中其他目标在通讯类别对应的第二通讯数据之间的相似度从其他目标中筛选得到信息源，包括：获取待测目标的第一通讯数据和其他目标的第二通讯数据，并利用特征提取网络从第一通讯数据提取得到第一通讯向量，从第二通讯数据提取得到第二通讯向量；根据第一通讯数据在各个通讯类别的第一数据分布、第二通讯数据在各个通讯类别的第二数据分布计算得到统计量；根据统计量对所有其他目标进行排序，得到分布排序；计算第一通讯向量在各个通讯类别的第一均值向量与每个第二通讯向量在各个通讯类别的第二均值向量之间的欧式距离并取均值，得到平均欧式距离；根据平均欧式距离对所有其他目标进行排序，得到欧式排序；根据分布排序和欧式排序确认所有其他目标的最终排序，并根据最终排序选取第一预设数量个其他目标作为信息源。Optionally, the selection module 22 performs screening to obtain information sources from other targets based on the similarity between the first communication data corresponding to the communication category of the target to be tested and the second communication data corresponding to the communication category of other targets in the network device, Including: obtaining the first communication data of the target to be tested and the second communication data of other targets, and using the feature extraction network to extract the first communication vector from the first communication data, and extract the second communication vector from the second communication data; according to The first data distribution of the first communication data in each communication category and the second data distribution of the second communication data in each communication category are calculated to obtain statistics; all other objects are sorted according to the statistics to obtain a distribution ranking; the first communication is calculated The Euclidean distance between the first mean vector of the vector in each communication category and the second mean vector of each second communication vector in each communication category is averaged to obtain the average Euclidean distance; all other objects are sorted according to the average Euclidean distance , to obtain the European ranking; confirm the final ranking of all other objects according to the distribution ranking and the European ranking, and select a first preset number of other objects as information sources according to the final ranking.

可选地，该通讯检测装置20还包括训练模块，用于训练通讯检测模型，具体包括：将待测目标的第一通讯样本数据和信息源的第二通讯样本数据分别输入至特征提取网络进行提取，得到第一通讯样本向量和第二通讯样本向量；利用第一通讯样本向量和第二通讯样本向量计算得到每个信息源的权重；将第一通讯样本向量、第二通讯样本向量分别输入至全局公共检测器进行分布预测，得到待测目标在各个通讯类别的第一预测分布向量和信息源在各个通讯类别的第二预测分布向量；根据第一预测分布向量、第二预测分布向量、权重、第一通讯样本向量、第二通讯样本向量、第一损失函数和第二损失函数计算得到损失函数值；根据损失函数值和预设优化算法对特征提取网络和全局公共检测器进行迭代训练。Optionally, the communication detection device 20 also includes a training module for training the communication detection model, which specifically includes: inputting the first communication sample data of the target to be tested and the second communication sample data of the information source into the feature extraction network for Extract to obtain the first communication sample vector and the second communication sample vector; use the first communication sample vector and the second communication sample vector to calculate the weight of each information source; input the first communication sample vector and the second communication sample vector respectively Go to the global public detector for distribution prediction, and obtain the first predicted distribution vector of the target to be tested in each communication category and the second predicted distribution vector of the information source in each communication category; according to the first predicted distribution vector, the second predicted distribution vector, The weight, the first communication sample vector, the second communication sample vector, the first loss function and the second loss function are calculated to obtain the loss function value; the feature extraction network and the global public detector are iteratively trained according to the loss function value and the preset optimization algorithm .

可选地，训练模块执行利用第一通讯样本向量和第二通讯样本向量计算得到每个信息源的权重，包括：计算第一通讯样本向量在各个通讯类别的第一均值向量与每个第二通讯样本向量在各个通讯类别的第二均值向量之间的欧式距离并取均值，得到平均欧式距离；根据每个信息源对应的平均欧式距离计算得到信息源的权重，权重的计算公式表示为：Optionally, the training module executes the calculation using the first communication sample vector and the second communication sample vector to obtain the weight of each information source, including: calculating the first mean value vector and each second communication sample vector of the first communication sample vector in each communication category The Euclidean distance of the communication sample vectors between the second mean vectors of each communication category is averaged to obtain the average Euclidean distance; the weight of the information source is calculated according to the average Euclidean distance corresponding to each information source, and the calculation formula of the weight is expressed as:

可选地，训练模块计算第一损失函数的第一损失函数值的操作具体包括：根据第一预测分布向量计算第一预测分布均值向量；根据第一预测分布均值向量和第二预测分布向量计算信息源与待测目标之间的KL散度；根据第一预测分布均值向量、KL散度、权重计算第一损失函数值。Optionally, the operation of the training module to calculate the first loss function value of the first loss function specifically includes: calculating the first forecast distribution mean vector according to the first forecast distribution vector; calculating the first forecast distribution mean vector and the second forecast distribution vector KL divergence between the information source and the target to be measured; calculate the first loss function value according to the first prediction distribution mean vector, KL divergence, and weight.

可选地，第一预测分布均值向量的计算公式为：

其中，/>

表示第一预测分布向量，T表示温度参数；Optionally, the formula for calculating the mean vector of the first forecast distribution is:

where, />

KL散度的计算公式为：

其中，/>

表示第j个信息源与待测目标分布信息之间的KL散度，/>

表示第二预测分布向量；The calculation formula of KL divergence is:

where, />

represents the second predictive distribution vector;

第一损失函数值的计算公式为：

可选地，训练模块执行计算第二损失函数的第二损失函数值的操作具体包括：将所有信息源的第二通讯样本数据合并为第三通讯样本数据，且将待测目标的第一通讯样本数据和所有信息源的第二通讯样本数据合并为第四通讯样本数据；利用特征提取网络从第三通讯样本数据、第四通讯样本数据中分别提取得到第三通讯样本向量、第四通讯样本向量；分别计算第一通讯样本向量、第三通讯样本向量、第四通讯样本向量在各个通讯类别的第一均值向量、第三均值向量、第四均值向量；计算第一均值向量、第三均值向量、第四均值向量两两之间的欧式距离，再求和，得到第一欧式距离损失函数值；利用聚类算法分别从各个通讯类别的第一通讯样本向量中选取第二预设数量个第一特征点、从各个通讯类别的第三通讯样本向量中选取第二预设数量个第二特征点；计算第一特征点与第二特征点两两之间的欧式距离，并取平均，得到第二欧式距离损失函数值；根据第一欧式距离损失函数值和第二欧式距离损失函数值计算得到第二损失函数值。Optionally, the operation of the training module to calculate the second loss function value of the second loss function specifically includes: combining the second communication sample data of all information sources into the third communication sample data, and combining the first communication sample data of the target to be tested The sample data and the second communication sample data of all information sources are merged into the fourth communication sample data; the third communication sample vector and the fourth communication sample vector are respectively extracted from the third communication sample data and the fourth communication sample data by using the feature extraction network Vector; respectively calculate the first average vector, the third average vector, and the fourth average vector of the first communication sample vector, the third communication sample vector, and the fourth communication sample vector in each communication category; calculate the first average vector, the third average The Euclidean distance between two vectors and the fourth mean value vector is summed to obtain the value of the first Euclidean distance loss function; the clustering algorithm is used to select the second preset number from the first communication sample vector of each communication category The first feature point, selecting a second preset number of second feature points from the third communication sample vector of each communication category; calculating the Euclidean distance between the first feature point and the second feature point, and taking the average, Obtaining the second Euclidean distance loss function value; calculating the second loss function value according to the first Euclidean distance loss function value and the second Euclidean distance loss function value.

可选地，第一欧式距离损失函数值的计算公式为：Optionally, the formula for calculating the value of the first Euclidean distance loss function is:

其中，LG表示第一欧式距离损失函数值，

表示第一均值向量，/>

表示第三均值向量，/>

表示第四均值向量，k表示通讯类别的数量，/>

represents the first mean vector, />

represents the third mean vector, />

Indicates the Euclidean distance;

关于上述实施例通讯检测装置中各模块实现技术方案的其他细节，可参见上述实施例中的通讯检测方法中的描述，此处不再赘述。For other details about the implementation of the technical solution of each module in the communication detection device of the above embodiment, refer to the description of the communication detection method in the above embodiment, which will not be repeated here.

需要说明的是，本说明书中的各个实施例均采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似的部分互相参见即可。对于装置类实施例而言，由于其与方法实施例基本相似，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。It should be noted that each embodiment in this specification is described in a progressive manner, and each embodiment focuses on the differences from other embodiments. For the same and similar parts in each embodiment, refer to each other, that is, Can. As for the device-type embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for related parts, please refer to part of the description of the method embodiments.

请参阅图3，图3为本发明实施例的计算机设备的结构示意图。如图3所示，该计算机设备30包括处理器31及和处理器31耦接的存储器32，存储器32中存储有程序指令，程序指令被处理器31执行时，使得处理器31执行上述任一实施例所述的通讯检测方法步骤。Please refer to FIG. 3 . FIG. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in FIG. 3 , the computer device 30 includes a processor 31 and a memory 32 coupled to the processor 31. Program instructions are stored in the memory 32. When the program instructions are executed by the processor 31, the processor 31 executes any of the above-mentioned operations. The steps of the communication detection method described in the embodiment.

其中，处理器31还可以称为CPU(Central Processing Unit，中央处理单元)。处理器31可能是一种集成电路芯片，具有信号的处理能力。处理器31还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Wherein, the processor 31 may also be referred to as a CPU (Central Processing Unit, central processing unit). The processor 31 may be an integrated circuit chip with signal processing capabilities. The processor 31 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components . A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

参阅图4，图4为本发明实施例的存储介质的结构示意图。本发明实施例的存储介质存储有能够实现上述通讯检测方法的程序指令41，其中，该程序指令41可以以软件产品的形式存储在上述存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或处理器(processor)执行本申请各个实施方式所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-OnlyMemory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质，或者是计算机、服务器、手机、平板等计算机设备设备。Referring to FIG. 4 , FIG. 4 is a schematic structural diagram of a storage medium according to an embodiment of the present invention. The storage medium in the embodiment of the present invention stores program instructions 41 capable of realizing the above-mentioned communication detection method, wherein the program instructions 41 may be stored in the above-mentioned storage medium in the form of software products, including several instructions to make a computer device ( It may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-OnlyMemory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes, Or computer equipment such as computers, servers, mobile phones, and tablets.

在本申请所提供的几个实施例中，应该理解到，所揭露的计算机设备，装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed computer equipment, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。以上仅为本申请的实施方式，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units. The above is only the implementation mode of this application, and does not limit the scope of patents of this application. Any equivalent structure or equivalent process transformation made by using the contents of this application specification and drawings, or directly or indirectly used in other related technical fields, All are included in the scope of patent protection of the present application in the same way.

Claims

1. A method of communication detection, the method comprising:

receiving a communication detection task, wherein the communication detection task comprises a target to be detected, a communication category and network equipment where the target to be detected is located;

screening information sources from other targets based on the similarity between the first communication data corresponding to the communication category of the target to be detected and the second communication data corresponding to the communication category of the other targets in the network equipment;

inputting the first communication data and the third communication data of the information sources into a communication detection model for communication detection to obtain a communication detection result of the target to be detected aiming at the communication category, wherein the communication detection model comprises a feature extraction network and a global public detector, and the feature extraction network and the global public detector are trained and obtained based on a first loss function representing the distribution information of each information source and the importance of the information sources and a second loss function representing the spatial gathering of the target to be detected and the information sources.

2. The communication detection method according to claim 1, wherein the screening the information source from the other targets based on the similarity between the first communication data corresponding to the communication category of the target to be detected and the second communication data corresponding to the communication category of the other targets in the network device includes:

Acquiring the first communication data of the target to be detected and second communication data of other targets, extracting a first communication vector from the first communication data by utilizing the characteristic extraction network, and extracting a second communication vector from the second communication data;

obtaining statistics according to first data distribution of the first communication data in each communication category and second data distribution of the second communication data in each communication category;

sorting all other targets according to the statistics to obtain distribution sorting;

calculating Euclidean distance between a first average value vector of the first communication vector in each communication category and a second average value vector of each second communication vector in each communication category, and taking an average value to obtain an average Euclidean distance;

sorting all other targets according to the average Euclidean distance to obtain Euclidean sorting;

and confirming the final ordering of all other targets according to the distribution ordering and the European ordering, and selecting a first preset number of other targets as the information sources according to the final ordering.

3. The communication detection method according to claim 1, wherein training the communication detection model specifically comprises:

Respectively inputting the first communication sample data of the target to be detected and the second communication sample data of the information source into the feature extraction network for extraction to obtain a first communication sample vector and a second communication sample vector;

calculating the weight of each information source by using the first communication sample vector and the second communication sample vector;

the first communication sample vector and the second communication sample vector are respectively input to the global public detector to conduct distribution prediction, so that a first prediction distribution vector of the target to be detected in each communication category and a second prediction distribution vector of the information source in each communication category are obtained;

calculating a loss function value according to the first prediction distribution vector, the second prediction distribution vector, the weight, the first communication sample vector, the second communication sample vector, the first loss function and the second loss function;

and carrying out iterative training on the feature extraction network and the global public detector according to the loss function value and a preset optimization algorithm.

4. The communication detection method according to claim 3, wherein the calculating the weight of each information source using the first communication sample vector and the second communication sample vector includes:

Calculating Euclidean distance between a first average value vector of the first communication sample vector in each communication category and a second average value vector of each second communication sample vector in each communication category, and taking an average value to obtain an average Euclidean distance;

calculating the weight of each information source according to the average Euclidean distance corresponding to the information source, wherein the calculation formula of the weight is expressed as follows:

wherein omega _j Representing the weight of the jth information source, dist _j And representing the average Euclidean distance between the jth information source and the target to be detected.

5. The communication detection method of claim 3, wherein calculating a first loss function value of the first loss function comprises:

calculating a first prediction distribution mean vector according to the first prediction distribution vector;

calculating KL divergence between the information source and the target to be detected according to the first prediction distribution mean value vector and the second prediction distribution vector;

and calculating the first loss function value according to the first prediction distribution mean vector, the KL divergence and the weight.

6. The communication detection method according to claim 5, wherein the calculation formula of the first prediction distribution mean vector is:

Wherein (1)>

Representing said first predictive distribution mean vector,/->

Representing the number of communication data belonging to the k-th communication class, C () representing said global common detector, f (x) representing said feature extraction network,/->

Representing the first predicted distribution vector, T representing a temperature parameter;

the calculation formula of the KL divergence is as follows:

wherein (1)>

Indicating KL divergence between jth information source and distribution information of target to be detected,/for>

Representing the second predicted distribution vector;

the calculation formula of the first loss function value is as follows:

wherein LI is the first loss function value, k represents the number of communication categories, n represents the number of information sources, ω _j Weight representing the jth information source, < ->

Indicating the number of the first communication sample data, L _ce () Representing cross entropy loss, y _i And a communication detection type label is indicated.

7. The communication detection method according to claim 3, wherein calculating a second loss function value of the second loss function comprises:

merging the second communication sample data of all the information sources into third communication sample data, and merging the first communication sample data of the target to be detected and the second communication sample data of all the information sources into fourth communication sample data;

Extracting a third communication sample vector and a fourth communication sample vector from the third communication sample data and the fourth communication sample data respectively by utilizing the characteristic extraction network;

respectively calculating a first average value vector, a third average value vector and a fourth average value vector of the first communication sample vector, the third communication sample vector and the fourth communication sample vector in each communication category;

calculating Euclidean distances between the first mean value vector, the third mean value vector and the fourth mean value vector, and summing to obtain a first Euclidean distance loss function value;

a clustering algorithm is utilized to respectively select a second preset number of first characteristic points from the first communication sample vectors of all the communication categories, and select the second preset number of second characteristic points from the third communication sample vectors of all the communication categories;

calculating Euclidean distances between the first characteristic points and the second characteristic points, and averaging to obtain a second Euclidean distance loss function value;

and calculating the second loss function value according to the first Euclidean distance loss function value and the second Euclidean distance loss function value.

8. The communication detection method according to claim 7, wherein the first euclidean distance loss function value is calculated by the following formula:

wherein LG represents the first Euclidean distance loss function value,

representing the first mean vector, +.>

Representing the third mean vector, +.>

Representing the fourth mean vector, k representing the number of communication categories, +.>

Representing the Euclidean distance;

the calculation formula of the second Euclidean distance loss function value is as follows:

wherein LL represents the second euclidean distance loss function value, R represents the second preset number,

n-th,/in the first feature points representing the selected R i-th communication categories in the first communication sample vector>

And representing the mth of the second characteristic points of the R ith communication categories selected from the third communication sample vector.

9. A computer device comprising a processor, a memory coupled to the processor, the memory having stored therein program instructions that, when executed by the processor, cause the processor to perform the steps of the communication detection method of any of claims 1-8.

10. A storage medium storing program instructions capable of implementing the communication detection method according to any one of claims 1 to 8.