[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN103200133A - Flow identification method based on network flow gravitation cluster - Google Patents

Flow identification method based on network flow gravitation cluster Download PDF

Info

Publication number
CN103200133A
CN103200133A CN2013100938686A CN201310093868A CN103200133A CN 103200133 A CN103200133 A CN 103200133A CN 2013100938686 A CN2013100938686 A CN 2013100938686A CN 201310093868 A CN201310093868 A CN 201310093868A CN 103200133 A CN103200133 A CN 103200133A
Authority
CN
China
Prior art keywords
flow
network
bunch
gravitation
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013100938686A
Other languages
Chinese (zh)
Inventor
张登银
廖建飞
万明祥
王雪梅
程春玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN2013100938686A priority Critical patent/CN103200133A/en
Publication of CN103200133A publication Critical patent/CN103200133A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a flow identification method based on network flow gravitation cluster. The method comprises a training stage and an identification stage. The training stage comprises the steps of selecting network flow characteristic attributes, and forming flow training sets after normalization processing is conducted on each network flow; selecting isolation flows by a Z fraction, and isolating the flows; conducting iteration classified-learning on each network flow in the training sets by using semi-supervised learning network flow gravitation cluster principles and methods in all non-isolated flow sets; and at last, finishing classification of the isolation flows and forming a flow classification model. The identification stage comprises the steps of forming a network flow sequence to be identified; conducting flow gravitation classification on each network flow to be identified in network flows, mapping each network flow to specific network traffic business types through a flow cluster and finishing identification to the network flows. According to the flow identification method based on the network flow gravitation cluster, unknown and encrypted flows can be identified, and the locally optimal solution problem of cluster identification is solved, and identification accuracy is improved.

Description

A kind of method for recognizing flux of stream gravitation cluster Network Based
Technical field
The present invention is a kind of method for recognizing flux of stream gravitation cluster Network Based, is mainly used in solving that the existing efficient of current network flow recognition methods is on the low side, real-time is relatively poor, accuracy and the not high problem of fineness, belongs to the network security management field.
Background technology
One of development trend of current network safety management just provides Differentiated Services and gives different service quality guarantees, and its prerequisite is exactly to use and can correctly identify effectively the various different service types in the network traffics.Outstanding traffic identification technology not only can greatly improve the ability of network management, can also predict the unknown flow rate of network to a certain extent.Flow identification can apply in each module sections such as network measure, network management, content auditing, public sentiment monitoring, Service Quality Management.Existing method for recognizing flux mainly contains: port identification method, DPI (Deep Packet Inspection, deep packet) method of identification, DFI (Deep Flow Inspection, deep stream) method of identification, communication behavior method of identification and machine learning method of identification.
1. based on the recognition methods of port
The port identification method is by setting up the correspondence table of well known port and network application, and is as shown in table 1, only need finish a mapping process to the identification of network traffics.
Table 1 well known port-application table
Port numbers Protocol type Application type
21 FTP File transfer
23 Telnet Telnet
25 SMTP Email
69 TFTP Simple files transmission (Trivial FTP)
80 HTTP World wide web (www)
110 POP3 The remote email visit
161 SNMP Simple network management
443 HTTPS Safe hypertext transmission
Network traffics application and identification method based on port, there is not the complicated algorithm process, thus convenient rapid, for traditional flow application discrimination height, but this method has a lot of limitation: 1) port numbers of Network application may not registered in IANA registered port table, and the public is maintained secrecy; 2) but the communication port of some application do not bind thereby the manual amendment, though perhaps at port table registration is arranged, in order to break through administration restriction, specially use other port numbers to substitute and evade; 3) some network traffics is because the encryption safe cause can't be obtained its port information at the IP layer.
2. based on the recognition methods of DPI
There is the character string inseparable with application type in the content of some network traffics bags, collects these content field, forms flow service feature storehouse, just can set up the corresponding relation of Network type and feature database.The method advantage of DPI recognition network type of service is need not by means of the protocol port mapping table, so can identify the application type of port dynamic change and port camouflage.Owing to must detect the application layer message content, search application layer feature database, very huge or need upgrade the flow service feature continually in the storehouse time at the feature database volume, algorithm complex is higher, and this method also is invalid for the identification of encipher flux in addition.
3. based on the recognition methods of DFI
Different types of service has different session connections or network flow state, very obvious as the feature of voip traffic on the network flow state: the bag appearance of network real-time Transmission (RTP) flow is to fixing, generally at 130~220byte, it is lower to connect speed, be 20~84kbit/s, session persistence is also longer relatively simultaneously; And the flow application average packet of downloading based on P2P long all more than 450byte, download time agreement long, that connect speed height, first-selected transport layer be TCP.The DFI technology just is being based on the behavioural characteristic of this serial flow, set up the traffic characteristic model, the bag information such as interval long, that connect between speed, transmission amount of bytes, packet and the packet that connect stream by analysis session are come and the discharge model contrast, thereby realize the identification application type.The DFI technical concerns is not packet-by-packet analyzed and do not adopt like DPI in the behavior of network traffics, carries out pattern matching then, therefore can realize the identification of and unknown network business professional to refined net, but the accuracy of this kind method identification at present is not high, and meticulous inadequately.
4. the recognition methods of stream communication behavior Network Based
Based on the communication behavior recognition methods, it is the social theoretical method of a kind of similar crowd, network flow no longer is the individuality that isolates, but will analyze contacting between this network flow and other network flow, the network colony, sets up feature and identifies discharge pattern with this from these contacts.As shown in Figure 1, 2, 3, observe the behavior of individual host earlier, and analyze the behavior that communicates to connect of itself and other main frame, analyze from application layer (having reflected that the transport layer topology connects), functional layer (having reflected ISP or consumer behaviour) and society's layer (having reflected the main frame degree of communication) then, at last these main frames group's behavior pattern and known network traffics service feature are mated, reach identifying purpose.This method need not to resolve the network packet load, just comes the flow of complexity is classified by the main frame behavior pattern of network traffics communication, thus effective to encryption stream, but it is only effective to the flow identification of specific a few types at present, does not have versatility.
5. based on the recognition methods of machine learning
Networks of different type flow business, its flow attribution such as packet pass in and out the duration of speed, packet number, average packet size, byte number, network traffics, the arrival interval of flow bag etc. and respectively are not quite similar, these attributes all can be used as the object of traffic characteristic statistics, can form the flow empirical model thus, be used for machine learning.As Fig. 4, the classification and identification algorithm of network traffics is a kind of methods that flow formed the traffic classification model and then identifies earlier by machine learning algorithm.It is quite abundant at present to carry out the research of flow identification based on clustering algorithm, the K-means clustering algorithm be wherein common and often be used a kind of, its characteristic is that the light and handy easy influence power of algorithm is extensive, the thought of this algorithm is that the target sample in the flow set is clustered into the different flow of K bunch, and the content by continuous iterative step modified flow rate bunch, make the square-error difference of clustering criteria function minimize to maximize similitude in the flow bunch at last, but its setting for the original cluster centre of class bunch is very responsive, thereby showing as original cluster centre arbitrarily sets and may cause algorithm to be absorbed in the final cluster effect that locally optimal solution greatly has influence on class bunch easily, and this method has only been considered distance factor when cluster, has ignored the influence of density factor to cluster.
Summary of the invention
Technical problem:The purpose of this invention is to provide the recognition methods of a kind of network traffics type of service, by optimizing the locally optimal solution of clustering algorithm, it is not high mainly to solve flow identification accuracy and fineness, is difficult to identify unknown flow rate and encipher flux problem.
Technical scheme:Method of the present invention is a kind of method of tactic, as Fig. 5,6, distance and the density of the similitude of flow and flow bunch have much relations, by introducing density (being quality) thought, in conjunction with dividing (i.e. distance) thought, designed the traffic classification recognition methods of stream gravitation cluster Network Based, and by the operation that isolated stream is handled and the initial clustering barycenter is set in the gravitation cluster being solved the locally optimal solution problem that former cluster identification flow exists, make that the Classification and Identification ability of flow is more accurate.
The inventive method comprises following two stages:
Training stage: make up training set according to the sample flow, form network flow gravitation disaggregated model;
Cognitive phase: network traffics are carried out type of service identification according to gravitation cluster recognition principle.
The training stage of the inventive method may further comprise the steps:
Step 1, catch network traffics after, choose the characteristic attribute of network traffics, every network flow is done normalized after, form the flow training set;
Step 2 is calculated the field intensity of every sample flow in the flow training set, and calculates it thus at the Z score of training set, if the absolute value of Z score〉2, then be identified as isolated stream, and will isolate stream and isolate with other sample flow;
Step 3 in all non-isolated stream flow sets, is classified as a class bunch in advance with the sample flow that distributes in the flow set more closely, calculates the barycenter of its flow bunch, as flow set initial clustering barycenter;
Step 4 is utilized network flow gravitation cluster principle and the method for semi-supervised learning, every network flow of training set is carried out the iteration classification learning, and upgrade flow bunch barycenter simultaneously;
Step 5 is finished the classification to isolated stream, forms the traffic classification model.
The cognitive phase of the inventive method may further comprise the steps:
Step 1 by the network flow characteristic attribute set, forms network flow sequence to be identified;
Step 2 according to the disaggregated model that the training stage obtains, is utilized the gravitation cluster principle, every in network traffics stream to be identified is carried out the classification of flow gravitation, and upgrade flow bunch barycenter simultaneously;
Step 3 bunch is mapped to concrete network traffics type of service by flow, finishes the identification of network traffics.
In the training stage of the inventive method, in the step 1, the characteristic attribute of network traffics weighs different flow attributive character value by the introducing variance contribution ratio and characteristic vector is determining this network flow to belong to the importance of which kind of type of service, and make contribution rate accumulation and during to 80% ratio, ask for network flow characteristic attribute subclass.In the step 2, by the Z score of calculating sample flow at flow set, and make its absolute value isolate stream greater than 2 network flow conduct.In the step 3, will isolate stream and isolate in advance, then the sample flow that distributes in the flow set more closely will be classified as in advance a class bunch, calculate the barycenter of its flow bunch, and be set and be the initial clustering barycenter.
Among the present invention, be model with gravitation, construct the flow law of gravitation, and when calculated flow rate gravitation for " engulfing " effect takes place to other network flow and " black hole " phenomenon occurs because of too huge in the quality of control certain flow bunch, improvement flow attraction meter formula is:
Figure 2013100938686100002DEST_PATH_IMAGE002
C wherein 1, C 2Be to adhere to class that the flow bunch particle of two kinds of different service types forms bunch in the network traffics separately, M 1, M 2Be respectively flow bunch C 1, C 2Quality, O 1, O 2Be respectively flow bunch C 1, C 2Barycenter.
Among the present invention, the probability judgment basis that network flow belongs to flow bunch is the scale between this network flow and flow bunch gravitation.For example known certain network flow P to be identified and flow bunch C 1Flow gravitation be F 1, with flow bunch C 2Flow gravitation be F 2If have
Figure DEST_PATH_IMAGE004
, network flow P should be service class C so 1Probability be greater than and should be service class C 2Probability.
Provide definition and principle that the recognition methods of network flow gravitation cluster uses below:
Flow bunch plastid: being called for short " flow bunch ", is the flow bunch unit that has " flow bunch quality " in the data on flows space.It is that the similar network flow element of some value in the important attribute space is formed, and has two attributes the most basic: flow bunch quality and flow bunch barycenter.
Flow bunch quality: the flow bunch network flow object number that the plastid the inside includes.
Flow bunch barycenter: if X 1..., X i..., X n(X i=<x I, 1, x I, 2..., x I, d,
Figure DEST_PATH_IMAGE006
) be the group network stream sequence in the d dimension flow space S, C jBe by X 1, X 2..., X e( ,
Figure DEST_PATH_IMAGE010
) flow bunch plastid, then a C forming jFlow bunch barycenter O j=<x 0,1, x 0,2..., x 0, dBe X 1, X 2..., X eGeometric center in this data on flows space.
Atom flow bunch plastid: refer to only comprise the flow bunch plastid of a network flow, its flow bunch quality is 1, so can be described as atom flow bunch particle again.
Flow gravitation: refer to the matching degree between the network traffics type of service.Flow bunch C 1And C 2Gravitation suc as formula (1):
Figure DEST_PATH_IMAGE012
(1)
Work as C 2When only containing a network flow, formula (1) is converted to:
(2)
The actual relative size that only needs to calculate gravitation is so gravitational constant G can ignore; In addition, because of too huge " black hole " phenomenon is appearred in other network flow generation " engulfing " effect for preventing and treating certain flow bunch quality, improvement flow attraction meter formula (1) and (2) is:
Figure 99392DEST_PATH_IMAGE002
(3)
(4)
Principle of stacking: establish p 1, p 2..., p mBe to belong to same type m flow bunch particle, flow gravitation F in the network traffics space l, F 2..., F mBe p 1, p 2..., p mAct on formation respectively with another network flow, m flow bunch particle calculates as the formula (5) the gravitation stack of this network flow so:
Figure DEST_PATH_IMAGE018
(5)
Principle of classification: as shown in Figure 7, establish C 1, C 2Be to adhere to class that the flow bunch particle of two kinds of different service types forms bunch in the network traffics separately.The class of a network flow P bunch attribute is waited to seek knowledge, known P and C 1Flow gravitation be F 1, P and C 2Flow gravitation be F 2If have
Figure 891899DEST_PATH_IMAGE004
, network traffics P should be service class C so 1Probability be greater than and should be service class C 2Probability.
The flow gravitational field: refer to that flow bunch particle is owing to the acting in conjunction of flow gravitation, the field that is flooded in whole flow space that produces.Define the field intensity of a network flow in a certain class of traffic bunch and be in such bunch the field intensity sum that whole flow bunch particles act on this flow, the field intensity that flow bunch particle acts on this network flow is it to the flow gravitation of the atom flow bunch particle that only comprises a network flow.
Improvements of the present invention are embodied in:
Improve one: the choosing of network flow characteristic attribute.The Attribute Association degree insufficient or that choose that flow attribution is chosen is very high, can make that all the discrimination of model and accuracy are lower.When keeping the algorithm portability, the present invention uses 37 kinds of traffic characteristic attributes that previous research institute summarizes that the network flow of different service types is described, and is expressed as x 1, x 2..., x 37, also namely representing the network traffics sequence can represent with the matrix X of a nx37:
Figure DEST_PATH_IMAGE020
(6)
In the formula (6):
Figure DEST_PATH_IMAGE022
Represent the normalization value of i bar network flow on j attribute, n is the bar number of network flow, then by X iThe covariance matrix that constitutes:
Figure DEST_PATH_IMAGE024
(7)
Figure DEST_PATH_IMAGE026
(8)
Wherein, network flow Mean vector , U is the characteristic vector u by covariance matrix C 1, u 2..., u 37The eigenvectors matrix that constitutes,
Figure DEST_PATH_IMAGE032
Be the characteristic value by covariance matrix C
Figure DEST_PATH_IMAGE034
The eigenvalue matrix that constitutes supposes that characteristic value satisfies
Figure DEST_PATH_IMAGE036
Figure DEST_PATH_IMAGE038
Figure DEST_PATH_IMAGE040
Introduce variance contribution ratio
Figure DEST_PATH_IMAGE042
Weigh different characteristic value and characteristic vector determining this network flow to belong to the importance of which kind of type of service, come dimensionality reduction with this.Before in the formula (9) dPlant the traffic characteristic vector and can form the characteristic attribute subspace of identifying for traffic classification, the partial redundance attribute has been removed in this space, has reflected the fundamental characteristics of network flow quantity set, gets
Figure DEST_PATH_IMAGE044
Try to achieve dValue, d is the number of final traffic characteristic attribute set.
Figure DEST_PATH_IMAGE046
(9)
Improve two: isolated stream is handled.The method of the isolated stream of tional identification for the distance of calculating sample flow and other sample flow and, will be with the value maximum
Figure DEST_PATH_IMAGE048
% bar traffic identifier is isolated stream, here
Figure 974431DEST_PATH_IMAGE048
Value easily causes result's inaccuracy by manually setting at random.For eliminating the randomness of this artificial value, the Z score of introducing in the statistics is identified isolated stream.
In statistics, the difference of variate-value and all variable averages is called Z score divided by the value of standard deviation, if when statistics distributes than symmetry especially near normal distribution, can produce following criterion: calculate the Z score of these group data, have 68% value to be positioned at approximately
Figure DEST_PATH_IMAGE050
, 95% value is positioned at
Figure DEST_PATH_IMAGE052
, 99% value is positioned at Z score with the concept of standard deviation portrayed between individuality and the group's average apart from situation, its measured truly individual in the group relative standard's distance, overcome the deficiency of artificial assign thresholds simultaneously.Adopt the absolute value of Z mark to isolate stream greater than 2 network flow conduct in the literary composition.If s[i] the k dimension value of [k] expression i bar stream, network flow i in the field intensity of flow set is:
Figure DEST_PATH_IMAGE056
(10)
dBe the dimension of flow sample point, n is the bar number of network flow.
Then the Z score of sample flow i is:
Figure DEST_PATH_IMAGE058
(11)
Wherein:
Figure DEST_PATH_IMAGE060
Improve three: the setting of initial clustering barycenter.Method of the present invention is that isolated stream is isolated earlier, then the network flow that distributes in the flow set more closely is classified as earlier cursorily a class bunch, compute classes bunch barycenter, be set and be the initial clustering barycenter, so can guarantee its reasonability that begins to set and keep relative stronger similitude at its surrounding space network flow, the error accumulation that causes with homogeney at random of the setting of reduction cluster barycenter and the deterioration of cluster effect.
Workflow:
The method for recognizing flux of flow gravitation cluster Network Based, its flow process are divided into training stage and cognitive phase as shown in Figure 8:
Training stage makes up training set according to the sample flow, forms network flow gravitation disaggregated model:
1. after catching network traffics, choose the characteristic attribute of network traffics, every network flow is done normalized after, form the flow training set.Wherein the characteristic attribute choosing method of network traffics is as follows:
Use 37 kinds of traffic characteristic attributes that previous research institute summarizes that the network flow of different service types is described, be expressed as x 1, x 2..., x 37, also namely representing the network traffics sequence can represent with the matrix X of a nx37:
Figure 120373DEST_PATH_IMAGE020
In the formula:
Figure 934745DEST_PATH_IMAGE022
Represent the normalization value of i bar network flow on j attribute, n is the bar number of network flow, then by X iConstitute covariance matrix:
Figure 819525DEST_PATH_IMAGE024
Namely
Wherein, network flow
Figure 98507DEST_PATH_IMAGE028
Mean vector , U is the characteristic vector u by covariance matrix C 1, u 2..., u 37The eigenvectors matrix that constitutes,
Figure 633186DEST_PATH_IMAGE032
Be the characteristic value by covariance matrix C
Figure 387516DEST_PATH_IMAGE034
The eigenvalue matrix that constitutes supposes that characteristic value satisfies
Figure 710230DEST_PATH_IMAGE038
Introduce variance contribution ratio
Figure 194749DEST_PATH_IMAGE042
Weigh different characteristic value and characteristic vector determining this network flow to belong to the importance of which kind of type of service, get Use
Figure 175660DEST_PATH_IMAGE046
Formula is tried to achieve dValue, d is the number of final traffic characteristic subclass.
2. calculate the field intensity of every sample flow in the flow training set, and calculate it thus at the Z score of training set, if the absolute value of Z score 2, then be identified as isolated stream, and will isolate stream and isolate with other sample flow.
3. in all non-isolated stream flow sets, the sample flow that distributes in the flow set more closely is classified as a class bunch in advance, calculate the barycenter of its flow bunch, as flow set initial clustering barycenter.
Concrete grammar is described below:
Input: flow number of clusters order K contains the flow set that the N bar has been isolated isolated stream
Output: initial clustering barycenter
Figure DEST_PATH_IMAGE064
For i =1:K-1
1) calculates sample set
Figure 386193DEST_PATH_IMAGE062
In the gravitational field of every sample flow, get that network flow of field intensity minimum value, be designated as
Figure DEST_PATH_IMAGE066
2) ask flow set In with
Figure 990481DEST_PATH_IMAGE066
The network flow of gravitation minimum is designated as
Figure DEST_PATH_IMAGE070
3) ask flow set
Figure DEST_PATH_IMAGE072
Every network flow with
Figure 947591DEST_PATH_IMAGE070
The gravitation maximum preceding N-1/K bar sample flow (these sample flow than other sample flow with More approaching) be classified as class
Figure DEST_PATH_IMAGE074
4) obtain class
Figure 30265DEST_PATH_IMAGE074
The cluster barycenter, make the initial clustering barycenter
Figure DEST_PATH_IMAGE076
5) residue sample set , contain
Figure DEST_PATH_IMAGE080
The bar sample flow.
End
Set
Figure 918586DEST_PATH_IMAGE062
In remaining network flow be classified as class
Figure DEST_PATH_IMAGE082
, the while is also obtained the cluster barycenter of class K
4. utilize the network flow gravitation cluster principle of semi-supervised learning, every network flow of training set is carried out the iteration classification learning, and upgrade flow bunch barycenter simultaneously.Wherein when calculated flow rate gravitation, because of too huge " black hole " phenomenon is appearred in other network flow generation " engulfing " effect for preventing and treating certain flow bunch quality, improvement flow attraction meter formula is:
Figure 614141DEST_PATH_IMAGE002
5. finish the classification to isolated stream, form the traffic classification model.
Cognitive phase, carry out type of service identification according to gravitation cluster recognition principle to network traffics:
1. by the network flow characteristic attribute set, form network flow sequence to be identified.
2. the disaggregated model that obtains according to the training stage utilizes the gravitation cluster principle, every in network traffics stream to be identified is carried out the classification of flow gravitation, and upgrade flow bunch barycenter simultaneously.Wherein the network flow probability judgment basis that belongs to flow bunch is the scale between this network flow and flow bunch gravitation.
3. bunch be mapped to concrete network traffics type of service by flow, finish the identification of network traffics.
Beneficial effect
At the defect problem that has local solution based on cluster recognition network flow, designed a kind of method for recognizing flux of stream gravitation cluster Network Based, then from the choosing of traffic characteristic attribute, the processing of isolated network stream, the setting link of initial clustering barycenter have been carried out adaptive processing to this flow gravitation clustering method, this method for recognizing flux recognition effect is better, discrimination is higher, and recognizer restrains sooner.
Description of drawings
Fig. 1 is HTTP connection mode figure.
Fig. 2 is DNS type of service connection mode figure.
Fig. 3 is that typical P2P uses connection mode figure.
Fig. 4 is based on the recognition methods schematic diagram of machine learning.
Fig. 5 is the schematic diagram that concerns of flow similitude and distance.
Fig. 6 is the schematic diagram that concerns of flow similitude and quality.
Fig. 7 is based on the net flow assorted of flow gravitation.
Fig. 8 is method for recognizing flux workflow diagram of the present invention.
Embodiment
According to Fig. 7 scene graph, 23 sample flow are arranged, label is from being 1,2 ..., 23, belong to two kinds of different service types 1 and type of service 2, P is network flow to be identified, the primitive character attribute of network flow has 37 kinds, is expressed as x 1, x 2..., x 37Flow chart in conjunction with Fig. 8 provides following training process and identifying:
1. training process:
1) value of every network flow of statistics on primitive character attribute attribute forms the flow training set:
Wherein
Figure 28942DEST_PATH_IMAGE022
Represent the normalization value of i bar sample flow on j attribute, then by sample flow X iConstitute covariance matrix
Figure DEST_PATH_IMAGE088
And convert to
Figure 843927DEST_PATH_IMAGE026
Form, U is the characteristic vector u by covariance matrix C 1 , u 2..., u 37The eigenvectors matrix that constitutes,
Figure 281861DEST_PATH_IMAGE032
Be the characteristic value by covariance matrix C
Figure 515527DEST_PATH_IMAGE034
The eigenvalue matrix that constitutes makes characteristic value satisfy
Figure 988097DEST_PATH_IMAGE036
Figure 163864DEST_PATH_IMAGE038
Figure 902144DEST_PATH_IMAGE040
By
Figure 126451DEST_PATH_IMAGE046
In
Figure 250265DEST_PATH_IMAGE044
Obtain d=11, namely the attribute of sample flow is taken as preceding 11 kinds.
2) calculate the field intensity of every sample flow in the flow training set and their Z score, by the absolute value of Z mark greater than 2 network flow as isolated stream, select isolated stream 1,6 and 23, and will isolate stream and isolate with other sample flow;
3) in all non-isolated stream flow sets, adopt improved initial clustering barycenter establishing method to come calculated flow rate collection initial clustering barycenter, make being
Figure DEST_PATH_IMAGE090
,
Figure DEST_PATH_IMAGE092
4) utilize designed network flow gravitation clustering method, every sample flow (removing stream 1,6 and 23) of training set is carried out the iteration classification learning, and upgrade flow bunch barycenter simultaneously;
5) finish at last isolating the classification of stream 1,6 and 23, form flow bunch C 1And C 2, difference corresponding flow type of service 1 and 2.
2. identifying:
1) computing network stream is done normalized then in the value of preceding 11 kinds of traffic characteristic attributes, forms network flow to be identified;
2) disaggregated model that obtains according to the training stage utilizes the gravitation cluster principle, computing network stream P and flow bunch C 1Flow gravitation be F 1, with flow bunch C 2Flow gravitation be F 2, if having
Figure 409982DEST_PATH_IMAGE004
, then P is classified as flow bunch 1, otherwise is classified as flow bunches 2, and upgrade flow bunch barycenter simultaneously ,
Figure 678206DEST_PATH_IMAGE092
3) bunch be mapped to concrete network traffics type of service by flow, finish the identification of network traffics.

Claims (8)

1. the method for recognizing flux of a stream gravitation cluster Network Based is characterized in that comprising following two stages:
Training stage: make up training set according to the sample flow, form network flow gravitation disaggregated model;
Cognitive phase: network traffics are carried out type of service identification according to gravitation cluster recognition principle.
2. method for recognizing flux according to claim 1 is characterized in that the described training stage may further comprise the steps:
Step 1, catch network traffics after, choose the characteristic attribute of network traffics, every network flow is done normalized after, form the flow training set;
Step 2 is calculated the field intensity of every sample flow in the flow training set, and calculates it thus at the Z score of training set, if the absolute value of Z score〉2, then be identified as isolated stream, and will isolate stream and isolate with other sample flow;
Step 3 in all non-isolated stream flow sets, is classified as a class bunch in advance with the sample flow that distributes in the flow set more closely, calculates the barycenter of its flow bunch, as flow set initial clustering barycenter;
Step 4 is utilized network flow gravitation cluster principle and the method for semi-supervised learning, every network flow of training set is carried out the iteration classification learning, and upgrade flow bunch barycenter simultaneously;
Step 5 is finished the classification to isolated stream, forms the traffic classification model.
3. method for recognizing flux according to claim 1 is characterized in that described cognitive phase may further comprise the steps:
Step 1 by the network flow characteristic attribute set, forms network flow sequence to be identified;
Step 2 according to the disaggregated model that the training stage obtains, is utilized the gravitation cluster principle, every in network traffics stream to be identified is carried out the classification of flow gravitation, and upgrade flow bunch barycenter simultaneously;
Step 3 bunch is mapped to concrete network traffics type of service by flow, finishes the identification of network traffics.
4. method for recognizing flux according to claim 2, it is characterized in that in the described step 1, the characteristic attribute of network traffics weighs different flow attributive character value by the introducing variance contribution ratio and characteristic vector is determining this network flow to belong to the importance of which kind of type of service, and make contribution rate accumulation and during to 80% ratio, ask for network flow characteristic attribute subclass.
5. method for recognizing flux according to claim 2 is characterized in that in the described step 2, by the Z score of calculating sample flow at flow set, and makes its absolute value isolate stream greater than 2 network flow conduct.
6. method for recognizing flux according to claim 2, it is characterized in that in the described step 3, will isolate stream and isolate in advance, then the sample flow that distributes in the flow set more closely is classified as in advance a class bunch, calculate the barycenter of its flow bunch, be set and be the initial clustering barycenter.
7. according to claim 2 or the described method for recognizing flux of claim 3, it is characterized in that, be model with gravitation, construct the flow law of gravitation, and when calculated flow rate gravitation for " engulfing " effect takes place to other network flow and " black hole " phenomenon occurs because of too huge in the quality of control certain flow bunch, improvement flow attraction meter formula is:
Figure 186546DEST_PATH_IMAGE001
C wherein 1, C 2Be to adhere to class that the flow bunch particle of two kinds of different service types forms bunch in the network traffics separately, M 1, M 2Be respectively flow bunch C 1, C 2Quality, O 1, O 2Be respectively flow bunch C 1, C 2Barycenter.
8. according to claim 2 or the described method for recognizing flux of claim 3, it is characterized in that the probability judgment basis that network flow belongs to flow bunch is the scale between this network flow and flow bunch gravitation;
For example known certain network flow P to be identified and flow bunch C 1Flow gravitation be F 1, with flow bunch C 2Flow gravitation be F 2
If have
Figure 148686DEST_PATH_IMAGE002
, network flow P should be service class C so 1Probability be greater than and should be service class C 2Probability.
CN2013100938686A 2013-03-21 2013-03-21 Flow identification method based on network flow gravitation cluster Pending CN103200133A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013100938686A CN103200133A (en) 2013-03-21 2013-03-21 Flow identification method based on network flow gravitation cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013100938686A CN103200133A (en) 2013-03-21 2013-03-21 Flow identification method based on network flow gravitation cluster

Publications (1)

Publication Number Publication Date
CN103200133A true CN103200133A (en) 2013-07-10

Family

ID=48722496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013100938686A Pending CN103200133A (en) 2013-03-21 2013-03-21 Flow identification method based on network flow gravitation cluster

Country Status (1)

Country Link
CN (1) CN103200133A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104394021A (en) * 2014-12-09 2015-03-04 中南大学 Network flow abnormity analysis method based on visualization clustering
WO2015154484A1 (en) * 2014-09-11 2015-10-15 中兴通讯股份有限公司 Traffic data classification method and device
CN105871832A (en) * 2016-03-29 2016-08-17 北京理工大学 Network application encrypted traffic recognition method and device based on protocol attributes
CN107070930A (en) * 2017-04-20 2017-08-18 中国电子技术标准化研究院 A kind of suspicious network towards main frame connects recognition methods
CN107070745A (en) * 2017-03-31 2017-08-18 武汉绿色网络信息服务有限责任公司 Unknown flow rate analysis method caused by a kind of rule is omitted
CN107682317A (en) * 2017-09-06 2018-02-09 中国科学院计算机网络信息中心 Establish method, data detection method and the equipment of Data Detection model
CN109067612A (en) * 2018-07-13 2018-12-21 哈尔滨工程大学 A kind of online method for recognizing flux based on incremental clustering algorithm
CN109218223A (en) * 2018-08-08 2019-01-15 西安交通大学 A kind of robustness net flow assorted method and system based on Active Learning
CN109714311A (en) * 2018-11-15 2019-05-03 北京天地和兴科技有限公司 A method of the unusual checking based on clustering algorithm
CN109831454A (en) * 2019-03-13 2019-05-31 北京品友互动信息技术股份公司 The recognition methods of false flow and device
CN109995611A (en) * 2019-03-18 2019-07-09 新华三信息安全技术有限公司 Traffic classification model foundation and traffic classification method, apparatus, equipment and server
CN110071845A (en) * 2018-01-24 2019-07-30 中国移动通信有限公司研究院 The method and device that a kind of pair of unknown applications are classified
CN110428137A (en) * 2019-07-04 2019-11-08 阿里巴巴集团控股有限公司 A kind of update method and device of risk prevention system strategy
CN110659669A (en) * 2019-08-26 2020-01-07 中国科学院信息工程研究所 User behavior identification method and system based on encrypted camera video traffic mode change
CN110661682A (en) * 2019-09-19 2020-01-07 上海天旦网络科技发展有限公司 Automatic analysis system, method and equipment for universal interconnection data
CN110768933A (en) * 2018-07-27 2020-02-07 深信服科技股份有限公司 Network flow application identification method, system and equipment and storage medium
CN110995461A (en) * 2019-10-28 2020-04-10 厦门大学 Network fault diagnosis method
CN112866267A (en) * 2021-01-29 2021-05-28 哈尔滨工业大学(威海) System, method, equipment and storage medium for dynamically identifying and dividing network service
EP3895367A4 (en) * 2018-12-11 2022-08-31 Assia Spe, Llc Service type identification systems and methods for optimizing local area networks

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070211645A1 (en) * 2006-03-07 2007-09-13 Kddi R&D Laboratories, Inc. Method and management apparatus for classifying congestion paths based on packet delay
CN102291279A (en) * 2011-08-18 2011-12-21 西北工业大学 Traffic detection method for peer-to-peer (P2P) network
CN102546625A (en) * 2011-12-31 2012-07-04 深圳市永达电子股份有限公司 Semi-supervised clustering integrated protocol identification system
CN102571486A (en) * 2011-12-14 2012-07-11 上海交通大学 Traffic identification method based on bag of word (BOW) model and statistic features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070211645A1 (en) * 2006-03-07 2007-09-13 Kddi R&D Laboratories, Inc. Method and management apparatus for classifying congestion paths based on packet delay
CN102291279A (en) * 2011-08-18 2011-12-21 西北工业大学 Traffic detection method for peer-to-peer (P2P) network
CN102571486A (en) * 2011-12-14 2012-07-11 上海交通大学 Traffic identification method based on bag of word (BOW) model and statistic features
CN102546625A (en) * 2011-12-31 2012-07-04 深圳市永达电子股份有限公司 Semi-supervised clustering integrated protocol identification system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈贞翔: "具有规模适应性的互联网流量识别方法研究"", 《山东大学博士学位论文》, 31 December 2008 (2008-12-31) *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015154484A1 (en) * 2014-09-11 2015-10-15 中兴通讯股份有限公司 Traffic data classification method and device
CN105471670A (en) * 2014-09-11 2016-04-06 中兴通讯股份有限公司 Flow data classification method and device
CN105471670B (en) * 2014-09-11 2019-08-02 中兴通讯股份有限公司 Data on flows classification method and device
CN104394021A (en) * 2014-12-09 2015-03-04 中南大学 Network flow abnormity analysis method based on visualization clustering
CN104394021B (en) * 2014-12-09 2017-08-25 中南大学 Exception of network traffic analysis method based on visualization cluster
CN105871832B (en) * 2016-03-29 2018-11-02 北京理工大学 A kind of network application encryption method for recognizing flux and its device based on protocol attribute
CN105871832A (en) * 2016-03-29 2016-08-17 北京理工大学 Network application encrypted traffic recognition method and device based on protocol attributes
CN107070745A (en) * 2017-03-31 2017-08-18 武汉绿色网络信息服务有限责任公司 Unknown flow rate analysis method caused by a kind of rule is omitted
CN107070930B (en) * 2017-04-20 2020-06-23 中国电子技术标准化研究院 Host-oriented suspicious network connection identification method
CN107070930A (en) * 2017-04-20 2017-08-18 中国电子技术标准化研究院 A kind of suspicious network towards main frame connects recognition methods
CN107682317A (en) * 2017-09-06 2018-02-09 中国科学院计算机网络信息中心 Establish method, data detection method and the equipment of Data Detection model
CN107682317B (en) * 2017-09-06 2019-12-06 中国科学院计算机网络信息中心 method for establishing data detection model, data detection method and equipment
CN110071845A (en) * 2018-01-24 2019-07-30 中国移动通信有限公司研究院 The method and device that a kind of pair of unknown applications are classified
CN109067612A (en) * 2018-07-13 2018-12-21 哈尔滨工程大学 A kind of online method for recognizing flux based on incremental clustering algorithm
CN110768933A (en) * 2018-07-27 2020-02-07 深信服科技股份有限公司 Network flow application identification method, system and equipment and storage medium
CN110768933B (en) * 2018-07-27 2022-08-09 深信服科技股份有限公司 Network flow application identification method, system and equipment and storage medium
CN109218223A (en) * 2018-08-08 2019-01-15 西安交通大学 A kind of robustness net flow assorted method and system based on Active Learning
CN109218223B (en) * 2018-08-08 2021-07-13 西安交通大学 Robust network traffic classification method and system based on active learning
CN109714311A (en) * 2018-11-15 2019-05-03 北京天地和兴科技有限公司 A method of the unusual checking based on clustering algorithm
CN109714311B (en) * 2018-11-15 2021-12-31 北京天地和兴科技有限公司 Abnormal behavior detection method based on clustering algorithm
US11758419B2 (en) 2018-12-11 2023-09-12 Assia Spe, Llc Service type identification systems and methods for optimizing local area networks
EP3895367A4 (en) * 2018-12-11 2022-08-31 Assia Spe, Llc Service type identification systems and methods for optimizing local area networks
CN109831454B (en) * 2019-03-13 2022-02-25 北京深演智能科技股份有限公司 False traffic identification method and device
CN109831454A (en) * 2019-03-13 2019-05-31 北京品友互动信息技术股份公司 The recognition methods of false flow and device
CN109995611A (en) * 2019-03-18 2019-07-09 新华三信息安全技术有限公司 Traffic classification model foundation and traffic classification method, apparatus, equipment and server
CN109995611B (en) * 2019-03-18 2021-06-25 新华三信息安全技术有限公司 Traffic classification model establishing and traffic classification method, device, equipment and server
CN110428137A (en) * 2019-07-04 2019-11-08 阿里巴巴集团控股有限公司 A kind of update method and device of risk prevention system strategy
CN110659669B (en) * 2019-08-26 2022-11-15 中国科学院信息工程研究所 User behavior identification method and system based on encrypted camera video traffic mode change
CN110659669A (en) * 2019-08-26 2020-01-07 中国科学院信息工程研究所 User behavior identification method and system based on encrypted camera video traffic mode change
CN110661682B (en) * 2019-09-19 2021-05-25 上海天旦网络科技发展有限公司 Automatic analysis system, method and equipment for universal interconnection data
CN110661682A (en) * 2019-09-19 2020-01-07 上海天旦网络科技发展有限公司 Automatic analysis system, method and equipment for universal interconnection data
CN110995461A (en) * 2019-10-28 2020-04-10 厦门大学 Network fault diagnosis method
CN112866267A (en) * 2021-01-29 2021-05-28 哈尔滨工业大学(威海) System, method, equipment and storage medium for dynamically identifying and dividing network service

Similar Documents

Publication Publication Date Title
CN103200133A (en) Flow identification method based on network flow gravitation cluster
CN102315974B (en) Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows
CN105871832B (en) A kind of network application encryption method for recognizing flux and its device based on protocol attribute
CN102035698B (en) HTTP tunnel detection method based on decision tree classification algorithm
CN101841440B (en) Peer-to-peer network flow identification method based on support vector machine and deep packet inspection
CN100550909C (en) A kind of system, method and apparatus of realizing professional perception
Qin et al. Robust application identification methods for P2P and VoIP traffic classification in backbone networks
CN104244035B (en) Network video stream sorting technique based on multi-level clustering
Yang et al. Research on network traffic identification based on machine learning and deep packet inspection
Wang et al. A novel semi-supervised approach for network traffic clustering
CN108768986A (en) A kind of encryption traffic classification method and server, computer readable storage medium
CN109167680A (en) A kind of traffic classification method based on deep learning
CN107181724A (en) A kind of recognition methods for cooperateing with stream, system and the server using this method
CN109067612A (en) A kind of online method for recognizing flux based on incremental clustering algorithm
CN102984269B (en) A kind of point-to-point method for recognizing flux and device
CN109299742A (en) Method, apparatus, equipment and the storage medium of automatic discovery unknown network stream
CN103780501B (en) Peer-to-peer network traffic identification method of inseparable-wavelet support vector machine
CN107404398A (en) A kind of networks congestion control judgement system
CN103532908B (en) A kind of P2P protocol recognition methods based on secondary decision tree
Wang et al. Internet traffic classification using machine learning: a token-based approach
Liu et al. A cascade forest approach to application classification of mobile traces
Min et al. Online Internet traffic identification algorithm based on multistage classifier
CN102984131A (en) Information recognition method and device
Tavallaee et al. Online classification of network flows
Yang et al. Internet traffic classification using dbscan

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130710