CN105187383A - Abnormal behaviour detection method based on communication network - Google Patents
Abnormal behaviour detection method based on communication network Download PDFInfo
- Publication number
- CN105187383A CN105187383A CN201510475895.9A CN201510475895A CN105187383A CN 105187383 A CN105187383 A CN 105187383A CN 201510475895 A CN201510475895 A CN 201510475895A CN 105187383 A CN105187383 A CN 105187383A
- Authority
- CN
- China
- Prior art keywords
- user
- exceptional value
- abnormal
- value
- addressee
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses an abnormal behaviour detection method based on a communication network. The abnormal behaviour detection method is capable of identifying abnormal behaviours in the communication network based on individual non-textual characteristics. The abnormal behaviour detection method disclosed by the invention can be widely applied to mining and analyzing user behaviours. The provided method comprises the following four steps: (1), dividing a given communication network into a plurality of network snapshots according to a time sequence; (2), extracting user data comprising the three characteristics, namely the communication traffic, the communication time distribution and the receiver frequency distribution, according to the network snapshots; (3), calculating abnormal values comprising the three characteristics, namely a communication traffic abnormal value, a communication time distribution abnormal value and a receiver frequency distribution abnormal value, according to the user data; and (4), standardizing the abnormal values through a conversion process, and converting the abnormal values into the same interval, so that the abnormal values are convenient to compare and analyze.
Description
Technical field
Invention relates to Data Mining, is specifically related to a kind of method that abnormal behavior detects.
A kind of Open Chinese formula entity relation extraction method based on pattern self study of background technology
Digging user behavior and analytical behavior are the important research field of mining data exception and inside threat extremely.
Communication network is formed by many people communication service, such as Email, phone etc.Communication network plays an important role in daily life, and he provides model and social relationships that a unprecedented chance let us removes analysis and digging user.Excavate about the user behavior in communication network now and had a lot of research, such as corporations excavate, role analysis, simulation model etc.
A large amount of research work is had to concentrate on above personal behavior model excavation and event excavation in recent communication network.But the contact of abnormality detection and correlation model is closely, how defining conventional model is important study hotspot.
Challenge main is at present exactly how easily and accurately to simulate and to represent telex network model.Relatively more conventional technology is exactly text based semantic analysis, and the topic according to extracting and follow the tracks of text message obtains user behavior pattern and intention.But, because privacy concern and authority restriction, obtain user profile content and there is a lot of obstacles.Another popular technology is that network framework and time attribute are to excavate user model.With work above unlike, our research directly focuses on the individual behavior of user.
Tracking and monitoring user behavior develops and extremely us can be helped to predict potential threat and excavate unknown event.Therefore a searching effective method goes to study them is very important.According to the communications records collected, we can obtain a network, nodes representative of consumer ID, while represent direct information interaction.Communication network is a typical time series network.It can be expressed by a series of snapshot.Behavioral activity according to user in snapshot can obtain user behavior benchmark, detects the abnormal behavior of user.
Summary of the invention
The present invention mainly provides a kind of abnormal behavior detection method based on communication network.The method can detect individual abnormal behavior based on the historical behavior of individuality, facilitates analyst quantize individual behavior exception and provide relevant decision support.
For the communications records obtained, first construct a communication network.Node on behalf user, while represent communications records.If originator u have sent information in t to addressee v, be just based upon one of t is pointed to v directed edge by u.This limit is represented with a vector (u, v, t).Then communication network is divided into a series of snapshot according to certain time interval.The set on limit can be regarded as when each impinges upon the time attribute ignoring it soon.
Suppose G={g
1, g
2..., g
mit is the snapshot intercepting a series of communication network.For each user, first extract the essential information of each user's snapshot.Then we pay close attention to three non-textual features wherein: the distribution of the traffic, call duration time and addressee's channel zapping.
Calculate the traffic exceptional value of user, utilize Iglewicz and Hoaglin to propose based on the Z-scores method after the improvement of absolute median (MAD), will the absolute value of Z-scores afterwards be improved | mz
i| as traffic exceptional value
Calculate the call duration time abnormal distribution value of user, the mean value utilizing all call duration times to distribute, to define the benchmark of call duration time distribution, utilizes Kullback-Leibler divergence to calculate call duration time abnormal distribution value.
Calculate addressee's channel zapping exceptional value of user, if defining an addressee appears in k snapshot, we are exactly k with regard to the frequency defining him, similar above, we also define receiver's channel zapping benchmark, utilize Kullback-Leibler divergence to calculate addressee's channel zapping exceptional value.
Map exceptional value to a standard value in interval [0,1] finally by a conversion regime, standardized exceptional value can be interpreted as the possibility observing exceptional value.Simultaneously also for relatively bringing a lot of facility between different user abnormal behaviour.
Accompanying drawing explanation
Accompanying drawing 1 is that the present invention detects the basic flow sheet of proposed method to abnormal behavior.
Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Fig. 1 is the flow chart that abnormal behavior provided by the invention detects.Specifically can comprise the steps:
101, network snapshots is divided according to the time interval:
Communication network is a typical time series network.It can be expressed by a series of snapshot.According to certain time interval, communication network can be divided into several network snapshots, be convenient to carry out next step and analyze.
102, subscriber data is extracted according to network snapshots:
After several network snapshots of acquisition, we therefrom can extract the effective information of user, and the present invention pays close attention to the traffic, call duration time distribution and these three features of addressee's channel zapping.
103, according to subscriber data structuring user's benchmark:
After we extract subscriber data, construct user's benchmark according to subscriber data, the mean value of these benchmark normally some snapshot sample, obtain user's benchmark and be convenient to calculate exceptional value
104, according to subscriber data and user's benchmark exceptional value:
Choose three features of user in the present invention: the traffic, call duration time distribution and addressee's channel zapping carry out feature abnormalities calculating, and concrete account form is as follows:
I traffic
Communication network is mainly used in the information transmission between user, and therefore, a certain user in a communication network traffic is the key character of characterizing consumer behavior pattern.Suppose that the traffic in a period of time interval keeps relative stability.Based on this hypothesis, the change of user traffic can reflect reality the generation of a certain event in the world.We utilize the Z-scores of improvement to measure the exception { n of user traffic
1, n
2..., n
m.
Z-scores is generally used for the exceptional value mark in numeric data.For one group of given data set { x
1, x
2..., x
n, sample x
iz-score calculated by following formula:
Wherein
If z
iabsolute value exceeded 3, so corresponding x
ijust exceptional value will be marked as.This method is also called three-sigma rule.But due to average
and sample standard deviation s is not invariable, Z-score calculates the possible maximum of gained and does not rely on data value, and only depends on the quantity of measured value.Therefore, the method be not suitable for marking exceptional value, especially for small-scale data set.
Be directed to this defect, Iglewicz and Hoaglin utilize absolute median (MAD) improve before Z-scores method.For above-mentioned given data set, calculate x in the following manner
iimprovement Z-scores:
Wherein
for the median of data-oriented collection, MAD is
median.To the absolute value of rear Z-scores be improved | mz
i| as x
iexceptional value.Observation sample x
ithis sample of the larger expression of exceptional value to depart from average far away.
II. call duration time distribution
The plan of major part user every day is more regular.The activity of a certain user within a period of time can be regarded as periodic behavior.Therefore, we are using the important indicator of call duration time distribution as acquisition user normal behavior model.
Telex network Annual distribution
represent in 24 hours, the accounting of transmission of information in each snapshot.User's normal behavior pattern greatly depends on the implication of got feature and can define in several ways.Here the mean value utilizing all call duration times to distribute is to define the benchmark of call duration time distribution, and formula represents as follows:
Wherein, T
tt the element of call duration time distribution benchmark T.Meet discrete type probability distribution condition
In order to obtain the exceptional value of Annual distribution, this patent uses Kullback-Leibler divergence to calculate the difference of two discrete type probability distribution.The definition mode of Kullback-Leibler divergence is as follows:
Wherein P represents that the benchmark that call duration time distributes, Q represent a certain call duration time distribution observed.When and if only if two Annual distribution are completely the same, the Kullback-Leibler divergence between them is 0.In addition, Kullback-Leibler divergence always non-negative.Therefore, this patent uses Kullback-Leibler divergence to carry out to be distributed in computing time the exceptional value of call duration time distribution in snapshot m.
III. addressee's channel zapping
Excavation recipient information is a very important research method.Many users have direct contact with addressee continually.The frequent degree that addressee is touched can react social relationships and the social status of user.
But user in snapshot along with time variations is not changeless.It is very difficult for analyzing user behavior by the distributed area of addressee, so we have studied addressee's channel zapping of each snapshot.
In order to distinguish the frequency of addressee, if our a definition addressee appears in k snapshot, we are exactly k with regard to the frequency defining him, and addressee has high frequency to mean to have and contact frequently.
After detecting recipient's frequency, then add up the communications records in snapshot.We can obtain all user addressee's channel zapping in snapshot, namely
addressee's channel zapping
represent in snapshot the ratio of the information being sent to each addressee.Similar above, we also define receiver's channel zapping benchmark
kullback – Leibler divergence is as the exceptional value of addressee's channel zapping in snapshot m.
105, integrate exceptional value and obtain standardization exceptional value:
Show that the mark of abnormal person has different values and scope by describing method above based on different types of data and computational methods, caused comparing in the abnormal behaviour of different user have a lot of inconvenience.
Based on consideration above, introduce a conversion regime here to map exceptional value to a standard value in interval [0,1].According to presented hereinbefore, standardized exceptional value can be interpreted as the possibility observing exceptional value.Simultaneously also for relatively bringing a lot of facility between different user abnormal behaviour.
For a given exceptional value collection { s
1, s
2..., s
m, standardized value ns
ibe defined as
ns
i=tanh(θ·s
i)
Wherein θ is a regulating parameter.All exceptional values are all inner in interval [0,1].Exceptional value is that 0 expression measured value and benchmark are completely the same.We suppose exceptional value collection
median (median) be mapped to 0.5.Namely
so
Claims (6)
1., based on an abnormal behavior detection method for communication network, it is characterized in that, comprising:
For the telex network record obtained, first a series of communication network snapshot is configured to according to time series, then user's non-textual feature is extracted according to subscriber data in snapshot, obtain user's benchmark, the exceptional value of user is gone out again according to the benchmark of user, finally integrate the exceptional value of each various criterion, obtain the exceptional value of ultimate criterion.
2. method according to claim 1, is characterized in that, the non-textual feature extracting user according to subscriber data in communication network snapshot comprises three features: the distribution of the traffic, call duration time and addressee's channel zapping.
3. method according to claim 2, is characterized in that, after extraction telex network measure feature, calculate traffic exceptional value, utilize the Z-scores improved to measure the exceptional value of user traffic, it is far away that the larger expression of exceptional value departs from average.
4. method according to claim 2, it is characterized in that, after extraction telex network Time-distribution, utilize the mean value of all call duration time distributions to define the benchmark of call duration time distribution, recycling Kullback-Leibler divergence calculates call duration time abnormal distribution value.
5. method according to claim 2, is characterized in that, after extraction user addressee frequency feature, obtain addressee's channel zapping benchmark, recycling Kullback-Leibler divergence calculates the exceptional value of addressee's channel zapping in snapshot.
6. the method according to claim 1-5, it is characterized in that, after calculating traffic exceptional value, call duration time abnormal distribution value and addressee's channel zapping exceptional value, by a formula standardization exceptional value, make exceptional value in identical interval, be convenient to compare and analyze.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510475895.9A CN105187383A (en) | 2015-08-06 | 2015-08-06 | Abnormal behaviour detection method based on communication network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510475895.9A CN105187383A (en) | 2015-08-06 | 2015-08-06 | Abnormal behaviour detection method based on communication network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105187383A true CN105187383A (en) | 2015-12-23 |
Family
ID=54909227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510475895.9A Pending CN105187383A (en) | 2015-08-06 | 2015-08-06 | Abnormal behaviour detection method based on communication network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105187383A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106452955A (en) * | 2016-09-29 | 2017-02-22 | 北京赛博兴安科技有限公司 | Abnormal network connection detection method and system |
CN107481090A (en) * | 2017-07-06 | 2017-12-15 | 众安信息技术服务有限公司 | A kind of user's anomaly detection method, device and system |
CN109035768A (en) * | 2018-07-25 | 2018-12-18 | 北京交通大学 | A kind of taxi detours the recognition methods of behavior |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8561184B1 (en) * | 2010-02-04 | 2013-10-15 | Adometry, Inc. | System, method and computer program product for comprehensive collusion detection and network traffic quality prediction |
CN103744994A (en) * | 2014-01-22 | 2014-04-23 | 中国科学院信息工程研究所 | Communication-network-oriented user behavior pattern mining method and system |
-
2015
- 2015-08-06 CN CN201510475895.9A patent/CN105187383A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8561184B1 (en) * | 2010-02-04 | 2013-10-15 | Adometry, Inc. | System, method and computer program product for comprehensive collusion detection and network traffic quality prediction |
CN103744994A (en) * | 2014-01-22 | 2014-04-23 | 中国科学院信息工程研究所 | Communication-network-oriented user behavior pattern mining method and system |
Non-Patent Citations (3)
Title |
---|
吴孙丹: ""基于聚类的入侵检测方法的研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
常淑影: ""基于流监测的网络流量异常检测算法研究与实现"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
李全刚,时金桥,秦志光,柳厅文: ""面向邮件网络事件检测的用户行为模式挖掘"", 《计算机学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106452955A (en) * | 2016-09-29 | 2017-02-22 | 北京赛博兴安科技有限公司 | Abnormal network connection detection method and system |
CN106452955B (en) * | 2016-09-29 | 2019-03-26 | 北京赛博兴安科技有限公司 | A kind of detection method and system of abnormal network connection |
CN107481090A (en) * | 2017-07-06 | 2017-12-15 | 众安信息技术服务有限公司 | A kind of user's anomaly detection method, device and system |
WO2019007306A1 (en) * | 2017-07-06 | 2019-01-10 | 众安信息技术服务有限公司 | Method, device and system for detecting abnormal behavior of user |
CN109035768A (en) * | 2018-07-25 | 2018-12-18 | 北京交通大学 | A kind of taxi detours the recognition methods of behavior |
CN109035768B (en) * | 2018-07-25 | 2020-11-06 | 北京交通大学 | Method for identifying taxi detour behavior |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021174751A1 (en) | Method, apparatus and device for locating pollution source on basis of big data, and storage medium | |
Chacon-Hurtado et al. | Rainfall and streamflow sensor network design: a review of applications, classification, and a proposed framework | |
CN102629904B (en) | Detection and determination method of network navy | |
CN103795613B (en) | Method for predicting friend relationships in online social network | |
CN108765004A (en) | A method of user's electricity stealing is identified based on data mining | |
CN103995837A (en) | Personalized tourist track planning method based on group footprints | |
CN104463603A (en) | Credit assessment method and system | |
CN105303469A (en) | Method and system for line loss abnormal reason data mining and analysis | |
CN106332052B (en) | Micro-area public security early warning method based on mobile communication terminal | |
CN104156403A (en) | Clustering-based big data normal-mode extracting method and system | |
CN103744994A (en) | Communication-network-oriented user behavior pattern mining method and system | |
CN105187383A (en) | Abnormal behaviour detection method based on communication network | |
CN105893352A (en) | Air quality early-warning and monitoring analysis system based on big data of social network | |
Jin et al. | Spatiotemporal distribution analysis of extreme precipitation in the Huaihe River Basin based on continuity | |
Yang et al. | Anomaly detection on collective moving patterns: A hidden markov model based solution | |
Fu et al. | Collaborative multiple change detection methods for monitoring the spatio-temporal dynamics of mangroves in Beibu Gulf, China | |
Liu et al. | Quantifying COVID-19 recovery process from a human mobility perspective: An intra-city study in Wuhan | |
CN111460796B (en) | Accidental sensitive word discovery method based on word network | |
Qi et al. | Geo-tagging quality-of-experience self-reporting on twitter to mobile network outage events | |
Mount et al. | The need for operational reasoning in data‐driven rating curve prediction of suspended sediment | |
CN107843779A (en) | A kind of Power System Fault Record classifying and analyzing method and system based on fuzzy clustering | |
CN118433649A (en) | Resident population identification method based on mobile phone signaling data | |
Chung et al. | Information extraction methodology by web scraping for smart cities | |
CN117633249A (en) | Basic variable construction method and device for SDGs space type monitoring index | |
CN109635008A (en) | A kind of equipment fault detection method based on machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20151223 |
|
WD01 | Invention patent application deemed withdrawn after publication |