CN110569360A - Method for labeling and automatically associating network session data - Google Patents
Method for labeling and automatically associating network session data Download PDFInfo
- Publication number
- CN110569360A CN110569360A CN201910840735.8A CN201910840735A CN110569360A CN 110569360 A CN110569360 A CN 110569360A CN 201910840735 A CN201910840735 A CN 201910840735A CN 110569360 A CN110569360 A CN 110569360A
- Authority
- CN
- China
- Prior art keywords
- session data
- matching
- source
- time
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000002372 labelling Methods 0.000 title claims abstract description 10
- 230000002159 abnormal effect Effects 0.000 claims description 11
- 230000011218 segmentation Effects 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 14
- 238000010586 diagram Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012106 screening analysis Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
the invention discloses a method for labeling and automatically associating network session data, which comprises the following steps: step 1, establishing multidimensional classification labels aiming at a network session data source set in a system; step 2, importing the network session data source set into a system, marking the session data by using multidimensional classification labels, and generating a label ID (identity) by the label; and 3, performing three-time traversal matching on all session data according to different label classifications, performing first-time traversal matching on the accurate items, performing second-time traversal matching on the range items in the first-time matching result, performing third-time traversal matching on the fuzzy items in the second-time matching result, and storing the session data matched for the third time in association with the label ID. The invention can realize multi-classification statistics of network session data, applies the analysis result to the whole session data, expands the analysis success, and greatly improves the analysis and comparison efficiency of the data compared with the prior system which relies on a mode that manual recording cannot be automatically associated by manual one-by-one analysis.
Description
Technical Field
The invention relates to the technical field of data statistics, in particular to a method for labeling and automatically associating network session data.
Background
With the development of computer technology and internet, the improvement of broadband rate and the reduction of cost, the arrival of 5G technology and the popularization of the internet of things, the connection between the life and work of people and the network is tighter and tighter, the number of network sessions is increased in geometric level, and the level can easily reach hundreds of millions. When an analysis expert obtains session data of suspicious, safe or threatening states through long-term layer-by-layer screening analysis in mass data like the vast sea, technicians need to record the session data obtained through screening analysis and apply the session data to analysis and comparison of other data. Therefore, a method for tagging and automatically associating network session data is needed in the art.
Disclosure of Invention
The present invention is directed to a method for tagging and automatically associating network session data in order to solve the above problems.
in order to achieve the above object, the present disclosure provides a method for tagging and automatically associating network session data, comprising the following steps:
Step 1, establishing multi-dimensional classification labels aiming at a source/target IP (Internet protocol/target), a source/target port, a source/target MAC (media access control), a session protocol, an abnormal type, a sending/receiving/overall load, a sending/receiving/overall packet number, duration, a domain name, a URL (Uniform resource locator) and content details of a network session data source set in a system;
Step 2, importing the network session data source set into a system, marking the session data by using multidimensional classification labels, and generating a label ID (identity) by the label;
and 3, performing three-time traversal matching on all session data according to different label classifications, performing first-time traversal matching on accurate items, performing second-time traversal matching on range items in the first-time matching result, performing third-time traversal matching on fuzzy items in the second-time matching result, and storing the session data matched for the third time in association with the label ID, wherein the accurate items comprise source/target IPs, source/target MACs, source/target ports, session protocols and abnormal types, the range items comprise sending/receiving/overall loads, sending/receiving/overall packet numbers and duration, and the fuzzy items comprise domain names, URLs and content details.
the invention has the beneficial effects that:
1. The invention establishes multidimensional classification labels, associates the matched data with the label ID, realizes multi-classification statistics of network session data, combines the experience technology of an analytical expert with the high-concurrency, multi-task and high-efficiency calculation advantages of a modern computer, and can apply the analytical result of the expert to the whole session data through simple operation, so that the analysis is successfully expanded, compared with the existing system which depends on the mode that manual record cannot be automatically associated through manual one-by-one analysis, the analysis and comparison efficiency of the data is greatly improved;
2. The traversal matching mode of the data, disclosed by the invention, matches simple conditions first and then matches complex conditions, so that the data range is efficiently reduced and the matching efficiency is improved.
drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is an operational page of a method for tagging and automatically associating web session data according to the present invention;
FIG. 2 is a label configuration page for applying the method for automatically associating and annotating network session data according to the present invention;
FIG. 3 is a three-round multi-threaded concurrent session data matching process of the method for network session data annotation and auto-correlation according to the present invention;
FIG. 4 is a detailed diagram of the operation of a method for tagging and automatically associating network session data according to the present invention;
FIG. 5 is a detailed diagram of the operation of the method for tagging and automatically associating network session data according to the present invention.
Detailed Description
The following describes in detail specific embodiments of the present disclosure. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
The invention relates to a method for labeling and automatically associating network session data, which comprises the following steps:
step 1, establishing multi-dimensional classification labels aiming at a source/target IP (Internet protocol/target), a source/target port, a source/target MAC (media access control), a session protocol, an abnormal type, a sending/receiving/overall load, a sending/receiving/overall packet number, duration, a domain name, a URL (Uniform resource locator) and content details of a network session data source set in a system;
step 2, importing the network session data source set into a system, marking the session data by using multidimensional classification labels, storing the labels as a single record, and generating label IDs;
and 3, performing three-time traversal matching on all session data according to different label classifications, performing first-time traversal matching on accurate items, performing second-time traversal matching on range items in the first-time matching result, performing third-time traversal matching on fuzzy items in the second-time matching result, and storing the session data matched for the third time in association with the label ID, wherein the accurate items comprise source/target IPs, source/target MACs, source/target ports, session protocols and abnormal types, the range items comprise sending/receiving/overall loads, sending/receiving/overall packet numbers and duration, and the fuzzy items comprise domain names, URLs and content details.
furthermore, in step 3, a multithreading segmentation mode is applied to perform traversal matching on the session data, the number of threads is freely configured, the number of segments is the total number of session data strips divided by the number of threads, each segment is the total number of session data strips divided by the number of segments, if a remainder exists, the rest data is uniformly put in from the first segment of data, the purpose of substantially equally dividing the data is achieved, and finally the matching results of each segment of data are spliced together to form a matching result set.
The first traversal matching is carried out in all session data, the second traversal matching is carried out in the result set of the first matching, the third traversal matching is carried out in the result set of the second matching, and through a matching mode of firstly matching simple conditions and then matching complex conditions, the data range is effectively reduced, and the matching efficiency is improved.
The method for labeling and automatically associating the network session data supports labeling any session data line, and simultaneously supports various types of conditions including source/target IP, source/target MAC, source/target ports, session protocols, abnormal types, sending/receiving/overall load, sending/receiving/overall packet number, duration, domain names, URLs and content details during labeling. The invention establishes multidimensional classification labels, associates the matched data with the label ID, realizes multi-classification statistics of network session data, combines the experience technology of an analytical expert with the high-concurrency, multi-task and high-efficiency calculation advantages of a modern computer, and can apply the analytical result of the expert to the whole session data through simple operation, so that the analysis is successfully expanded.
The specific implementation of the system to which the invention is applied is as follows:
the method comprises the following steps: and (4) acquiring a session data source collection list which is acquired and managed well from the outside, and displaying the list to the right side of the system. The display column contains the source IP, source port, source MAC, destination IP, destination port, destination MAC, session protocol, exception type, send load, send packet number, receive load, receive packet number, session load, session packet number, duration, start time, end time, exception type, domain name, URL, content details. As shown in the right-hand portion of fig. 1.
Step two: after the first step is completed, the analysis expert can perform gradual analysis in the session data source list, and when some session is found to be possibly abnormal, the line where the session is located can be clicked by a right button, a button of a pencil icon is selected in a popped right-click menu item, and finally a session data labeling configuration page is presented. As shown in fig. 2.
Step three: when the session data annotation page is opened, the conditions which can be used for marking in the session are automatically displayed. Respectively source/destination IP, source/destination MAC, source/destination port, session protocol, exception type, send/receive/bulk load, send/receive/bulk packet count, duration, domain name, URL, content details. Wherein, the source/target IP, the source/target MAC, the source/target port, the session protocol and the abnormal type only support the accurate matching, the sending/receiving/whole load, the sending/receiving/whole packet number and the continuous time support the accurate and range matching, and the domain name, URL and the content detail support the accurate and fuzzy matching. As shown in fig. 2.
Step four: after the configuration of the marking information is completed through the second step and the third step, the system automatically starts a background task, starts a plurality of threads and performs group-by-group traversal comparison on all session data. The comparison is carried out in three rounds, the first round is only matched with the accurate item, the second round is only matched with the range item, and the third round is only matched with the fuzzy item, so that the matching range can be efficiently reduced, and the final matching item can be obtained. And finally, adding marking information to all matched items, and recording the marking information into a marking bookmark. As shown in fig. 3.
step five: all the session data after the automatic association in step four is completed will be displayed in the bookmark. The mark bookmark is displayed as a two-layer tree structure, the top node of the tree supports classification according to IP, port, MAC, protocol and exception, and can be switched and selected by a user. And displaying the nodes of the second layer of the tree according to the aggregation of the mark names, and displaying how many sessions are associated according to the top-level node type and the marking condition. And each second-layer node is followed by a deleting button and an editing button, the mark can be modified and deleted, and a mouse is hovered over the node to display the marking configuration detail information of the node. The mark bookmark also supports a fuzzy query function, and can perform fuzzy matching on the second-layer node names. As shown on the left side of fig. 1 and in fig. 4.
Step six: the second level node of the marked bookmark supports double-click viewing detail operation. After the user double-clicks the node, the session detail data automatically associated with all the systems is displayed on the right side. If a certain session has label information, the session can also be seen through a label graphic representation of the session.
Step seven: the labeled diagrams in the session list are divided into two types. One is the left-most vertical bar labeled illustration of the session, which is displayed as long as the session is labeled. The other is a triangular mark diagram on the upper right corner of the conversation line cell, the mark only appears in IP, MAC, port, protocol and abnormal cells, and the mark diagram appears when the conversation mark condition is related to IP, MAC, port, protocol and abnormality. As shown on the right side of fig. 1 and in fig. 5.
step eight: when the mouse hovers over the marker graphic representation, the name of the marker and the condition configuration summary information are also displayed in the form of a floating window. As shown in fig. 5.
the classification checking function in the fifth step to the seventh step is specifically as follows:
firstly, displaying all marked items in a classification mode through a two-layer tree, wherein the first-layer tree node is a classification type, and the second-layer node is a marked statistical result;
The classification type of the second first layer tree node supports dynamic single selection switching, and the support types are classified according to IP, ports, MAC, protocols and exceptions;
the name of a node of a third second layer tree is a mark name, the number of session data associated with the mark is displayed behind the name, a mouse is hovered over the node to display detailed condition information (including all configurable and enabled conditions) of the mark, and a shortcut operation button for editing and deleting the configuration of the mark is arranged behind the node;
the names of the nodes of the second layer of tree support fuzzy matching search, after the keywords are input, only the nodes containing the keywords are displayed by the tree seeds, and other nodes are hidden;
Fifthly, double-clicking the second layer tree node can refresh the content displayed in the right session data detail list and only display the result set meeting the marking condition of the node;
sixthly, special marked diagrams are displayed in the conversation detail lines meeting any marking conditions. The labeled diagrams fall into two categories. One is the left-most vertical bar labeled illustration of the session, which is displayed as long as the session is labeled. And secondly, a triangular mark diagram on the upper right corner of the conversation line cell, wherein the mark only appears in the IP, MAC, port, protocol and abnormal cell, and the mark diagram appears when the conversation mark condition is related to the IP, MAC, port, protocol and abnormality. When the mouse is hovered over the mark graphic, the name of the mark and the condition configuration information are also displayed in the form of a floating form.
the analysis of the result recording, sharing and expanding functions in the step eight specifically comprises the following steps:
the record is a mark of the analysis result data, and the mark is stored in the server and can exist permanently as long as the mark is not deleted actively. Sharing refers to all users seeing the marked items and the marked results. The condition of the mark is expanded, and the condition can be configured to be a range or fuzzy matching, so that the effect of covering the surface with points is achieved.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.
Claims (2)
1. the method for labeling and automatically associating the network session data is characterized by comprising the following steps of:
Step 1, establishing multi-dimensional classification labels aiming at a source/target IP (Internet protocol/target), a source/target port, a source/target MAC (media access control), a session protocol, an abnormal type, a sending/receiving/overall load, a sending/receiving/overall packet number, duration, a domain name, a URL (Uniform resource locator) and content details of a network session data source set in a system;
Step 2, importing the network session data source set into a system, marking the session data by using multidimensional classification labels, and generating a label ID (identity) by the label;
And 3, performing three-time traversal matching on all session data according to different label classifications, performing first-time traversal matching on accurate items, performing second-time traversal matching on range items in the first-time matching result, performing third-time traversal matching on fuzzy items in the second-time matching result, and storing the session data matched for the third time in association with the label ID, wherein the accurate items comprise source/target IPs, source/target MACs, source/target ports, session protocols and abnormal types, the range items comprise sending/receiving/overall loads, sending/receiving/overall packet numbers and duration, and the fuzzy items comprise domain names, URLs and content details.
2. the method according to claim 1, wherein in step 3, the session data is traversed and matched by means of multi-thread segmentation, the number of threads is freely configured, the number of segments is total number of session data divided by the number of threads, the number of segments is total number of session data divided by the number of segments, if there is a remainder, the rest data is uniformly put in from the first segment of data, and the matching results of each segment of data are spliced together to form a matching result set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910840735.8A CN110569360A (en) | 2019-09-06 | 2019-09-06 | Method for labeling and automatically associating network session data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910840735.8A CN110569360A (en) | 2019-09-06 | 2019-09-06 | Method for labeling and automatically associating network session data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110569360A true CN110569360A (en) | 2019-12-13 |
Family
ID=68778117
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910840735.8A Pending CN110569360A (en) | 2019-09-06 | 2019-09-06 | Method for labeling and automatically associating network session data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110569360A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111813642A (en) * | 2020-07-06 | 2020-10-23 | 成都深思科技有限公司 | Multithreading-based network communication session data statistical operation method |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101119321A (en) * | 2007-09-29 | 2008-02-06 | 杭州华三通信技术有限公司 | Network flux classification processing method and apparatus |
CN102325347A (en) * | 2011-09-14 | 2012-01-18 | 中兴通讯股份有限公司 | Transport stream template coupling method in LTE system and apparatus thereof |
JP2012238926A (en) * | 2011-05-09 | 2012-12-06 | Canon Inc | Data control device, data control method in the same, and program |
CN104579941A (en) * | 2015-01-05 | 2015-04-29 | 北京邮电大学 | Message classification method in OpenFlow switch |
CN106250480A (en) * | 2016-08-01 | 2016-12-21 | 浪潮软件集团有限公司 | Metadata-based visual statistical analysis method |
CN106452948A (en) * | 2016-09-22 | 2017-02-22 | 恒安嘉新(北京)科技有限公司 | Automatic classification method and system of network flow |
WO2018121153A1 (en) * | 2016-12-29 | 2018-07-05 | 北京国双科技有限公司 | Written judgment retrieval method and device |
CN108449226A (en) * | 2018-02-28 | 2018-08-24 | 华青融天(北京)技术股份有限公司 | The method and system of information Fast Classification |
CN108923954A (en) * | 2018-06-07 | 2018-11-30 | 成都深思科技有限公司 | A kind of network data visual analyzing and display systems |
CN110069575A (en) * | 2019-04-25 | 2019-07-30 | 中电科嘉兴新型智慧城市科技发展有限公司 | A kind of dynamic data statistical method and system based on multidimensional data mark |
CN110100415A (en) * | 2016-12-30 | 2019-08-06 | 比特梵德荷兰私人有限责任公司 | System for network flow to be ready for quickly analyzing |
-
2019
- 2019-09-06 CN CN201910840735.8A patent/CN110569360A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101119321A (en) * | 2007-09-29 | 2008-02-06 | 杭州华三通信技术有限公司 | Network flux classification processing method and apparatus |
JP2012238926A (en) * | 2011-05-09 | 2012-12-06 | Canon Inc | Data control device, data control method in the same, and program |
CN102325347A (en) * | 2011-09-14 | 2012-01-18 | 中兴通讯股份有限公司 | Transport stream template coupling method in LTE system and apparatus thereof |
CN104579941A (en) * | 2015-01-05 | 2015-04-29 | 北京邮电大学 | Message classification method in OpenFlow switch |
CN106250480A (en) * | 2016-08-01 | 2016-12-21 | 浪潮软件集团有限公司 | Metadata-based visual statistical analysis method |
CN106452948A (en) * | 2016-09-22 | 2017-02-22 | 恒安嘉新(北京)科技有限公司 | Automatic classification method and system of network flow |
WO2018121153A1 (en) * | 2016-12-29 | 2018-07-05 | 北京国双科技有限公司 | Written judgment retrieval method and device |
CN110100415A (en) * | 2016-12-30 | 2019-08-06 | 比特梵德荷兰私人有限责任公司 | System for network flow to be ready for quickly analyzing |
CN108449226A (en) * | 2018-02-28 | 2018-08-24 | 华青融天(北京)技术股份有限公司 | The method and system of information Fast Classification |
CN108923954A (en) * | 2018-06-07 | 2018-11-30 | 成都深思科技有限公司 | A kind of network data visual analyzing and display systems |
CN110069575A (en) * | 2019-04-25 | 2019-07-30 | 中电科嘉兴新型智慧城市科技发展有限公司 | A kind of dynamic data statistical method and system based on multidimensional data mark |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111813642A (en) * | 2020-07-06 | 2020-10-23 | 成都深思科技有限公司 | Multithreading-based network communication session data statistical operation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104462113B (en) | Searching method, device and electronic equipment | |
US20140067842A1 (en) | Information processing method and apparatus | |
US20080306899A1 (en) | Methods, apparatus, and computer-readable media for analyzing conversational-type data | |
CN112104734B (en) | Method, device, equipment and storage medium for pushing information | |
CN106528894B (en) | The method and device of label information is set | |
JP2021504818A (en) | Structuring incoherent nodes by superposition on the underlying Knowledge Graph | |
CN114116811B (en) | Log processing method, device, equipment and storage medium | |
US20080155430A1 (en) | Integrating private metadata into a collaborative environment | |
CN106033438A (en) | Public sentiment data storage method and server | |
US7539934B2 (en) | Computer-implemented method, system, and program product for developing a content annotation lexicon | |
CN107632972A (en) | Sheet disposal method and apparatus | |
CN110569360A (en) | Method for labeling and automatically associating network session data | |
US9355402B2 (en) | System, method and computer program product for improving messages content using user'S tagging feedback | |
CN115470489A (en) | Detection model training method, detection method, device and computer readable medium | |
CN103220555B (en) | The sorting technique of a kind of digital cable customers, Apparatus and system | |
US9384285B1 (en) | Methods for identifying related documents | |
CN109558381A (en) | A kind of data processing method and device | |
US20120005202A1 (en) | Method for Acceleration of Legacy to Service Oriented (L2SOA) Architecture Renovations | |
CN106156273A (en) | Data message methods of exhibiting and client | |
US11275803B2 (en) | Contextually related sharing of commentary for different portions of an information base | |
CN107767156A (en) | A kind of information input method, apparatus and system | |
JP7206632B2 (en) | System, method and program for visual exploration of subnetwork patterns in bimodal networks | |
CN103902280B (en) | transaction processing method and device | |
DE112018002133T5 (en) | ACTIVITY CLASSIFICATION BASED ON THE SOUND EMISSION OF A USER INTERFACE | |
JP7119550B2 (en) | System and method, program, and computer device for visual search of search results in bimodal networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191213 |