[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN108509979B - Anomaly detection method, server and computer readable storage medium - Google Patents

Anomaly detection method, server and computer readable storage medium Download PDF

Info

Publication number
CN108509979B
CN108509979B CN201810167498.9A CN201810167498A CN108509979B CN 108509979 B CN108509979 B CN 108509979B CN 201810167498 A CN201810167498 A CN 201810167498A CN 108509979 B CN108509979 B CN 108509979B
Authority
CN
China
Prior art keywords
behavior
target
dependency
user
target user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810167498.9A
Other languages
Chinese (zh)
Other versions
CN108509979A (en
Inventor
彭行雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nubia Technology Co Ltd
Original Assignee
Nubia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nubia Technology Co Ltd filed Critical Nubia Technology Co Ltd
Priority to CN201810167498.9A priority Critical patent/CN108509979B/en
Publication of CN108509979A publication Critical patent/CN108509979A/en
Application granted granted Critical
Publication of CN108509979B publication Critical patent/CN108509979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an abnormality detection method, which comprises the following steps: acquiring a behavior time sequence of a target user in a current time window; matching a target sub-behavior from the maximum frequent pattern of the previous time window according to the behavior time sequence of the target user; acquiring a target dependency coefficient of a target child behavior; calculating the dependency of the target user under a preset critical condition according to the target dependency coefficient; the preset critical condition is the maximum allowable number of the target user to execute the behaviors on the time sequence; judging whether the dependency is greater than a preset threshold value; if yes, determining that the behavior of the target user is abnormal; the invention also provides a server and a computer readable storage medium, by the scheme, global calculation is not needed, the abnormal behavior time sequence of the target user can be detected in a shorter time, the calculation amount and the time consumption are greatly reduced, the real-time performance is improved, the reliability of a large data platform is improved, and the resource cost is saved.

Description

Anomaly detection method, server and computer readable storage medium
Technical Field
The present invention relates to the field of big data, and more particularly, to an anomaly detection method, a server, and a computer-readable storage medium.
Background
For a large data platform, the interface resource has strong computing power and can be widely used in butt joint. For large data platforms themselves, the application software running on them still occupies or even exhausts computing resources when the administrator cannot quickly handle errors due to misuse. When the calculation cost is gradually increased along with the increase of the data volume, the big data anomaly detection becomes an indispensable function in the big data platform service, and the real-time anomaly detection method is one of the important means for rapidly processing errors.
At present, a flow anomaly detection method based on flow mining can be adopted for real-time anomaly detection of big data, and the flow anomaly detection based on flow mining refers to that in order to save expert experience cost in an anomaly detection process, a certain implicit and potentially valuable mode, information, association and the like are mined from network flow to detect anomaly. The method is mainly characterized in that known or unknown abnormal flows can be detected, the abnormal position can be determined, and the method can be used for real-time abnormal detection.
The flow anomaly detection method based on flow mining relates to a hidden semi-Markov chain model, a frequent pattern detection method, a data packet segmentation detection method and the like, wherein the hidden semi-Markov chain model builds a Web page chain according to the browsing behavior of a Web user to form an access behavior outline, if the deviation degree of the behavior of the user and the normal behavior outline exceeds a certain threshold value, the behavior is considered to be abnormal, and the time is consumed when the Markov chain is trained. The frequent pattern detection method updates the behavior model in a frequent pattern, and analyzes actual data through a sliding window to detect intrusion behaviors in real time; however, the construction and adjustment of the prefix tree require large overhead, so that the global frequent pattern set cannot be updated and maintained quickly. The data packet segmentation detection method utilizes a self-adaptive lifting mode to carry out reinforcement learning on a plurality of detectors, and enhances the detection effect. The flow anomaly detection method based on flow mining is good at finding hidden anomalies due to high detection precision, and plays a vital role in anomaly detection.
However, with the mass and aging of data, when the current flow anomaly detection method based on flow mining is adopted to perform real-time anomaly detection on big data, the algorithm model is complex, the calculation amount is large, the calculation resources of a big data platform are seriously consumed, the time consumption is long, and the cost is increased.
Disclosure of Invention
The invention mainly aims to provide an anomaly detection method, a server and a computer readable storage medium, and aims to solve the problems that in the prior art, when real-time anomaly detection of big data is carried out, an algorithm model is complex, the calculated amount is large, the calculation resources of a big data platform are seriously consumed, the time consumption is long, and the cost is increased.
In order to solve the above technical problem, the present invention provides an abnormality detection method, including the steps of:
acquiring a behavior time sequence of a target user in a current time window;
matching a target sub-behavior from the maximum frequent pattern of the previous time window according to the behavior time sequence of the target user;
acquiring a target dependency coefficient of a target child behavior;
calculating the dependency of the target user under a preset critical condition according to the target dependency coefficient; the preset critical condition is the maximum allowable number of the target user to execute the behaviors on the time sequence;
judging whether the dependency is greater than a preset threshold value;
and if so, determining that the behavior of the target user is abnormal.
Optionally, before obtaining the behavior time sequence of the target user in the current time window, the method further includes the following steps:
acquiring behavior time sequences of all users in a previous time window;
mining all frequent modes in the previous time window by adopting a Prefix span mode mining algorithm according to the behavior time sequences of all users in the previous time window;
and obtaining the maximum frequent pattern according to all frequent patterns.
Optionally, after obtaining the maximum frequent pattern according to all the frequent patterns, the method further includes the following steps:
according to the behavior time sequences of all users in the previous time window, combining the user behavior penalty factor with a Zipfer distribution curve, and fitting by using a least square method to obtain a dependency coefficient corresponding to each sub-behavior in the maximum frequent mode;
the user behavior penalty factor is used for reducing the weight of the active users.
Optionally, the penalty factor for user behavior is 1/ln (1+ Act)u),ActuIs the liveness of the user.
Optionally, calculating the dependency of the target user under the preset critical condition according to the target dependency coefficient includes:
according to the formula p ═ kω/ln(1+Actu) Calculating the dependency of the target user;
p is the dependency, k is the maximum allowed number of behaviors performed by the target user in the time sequence, and ω is the dependency coefficient.
Optionally, matching the target sub-behavior from the most frequent pattern of the previous time window according to the behavior time sequence of the target user includes:
calculating the sequence similarity of the behavior time sequence of the target user and all behavior time sequences in the maximum frequent mode of the previous time window by adopting a DNA sequence comparison algorithm;
and taking the action with the highest sequence similarity in the time series of all actions of the maximum frequent pattern as the target child action.
Optionally, before calculating the dependency of the target user under the preset critical condition according to the target dependency coefficient, the method further includes the following steps:
judging whether the calculation of the dependency coefficient of the previous time window is overtime or not;
if so, acquiring a dependency coefficient of the target sub-behavior in a preset time window, and taking the dependency coefficient as a target dependency coefficient;
and if not, determining that the obtained target dependency coefficient is the dependency coefficient of the target child behavior of the previous time window.
Optionally, after determining whether the dependency is greater than the preset threshold, the method further includes the following steps:
if not, determining that the behavior of the target user is normal, and performing offline backup on the behavior time sequence of the target user in the current time window.
Further, the invention provides a server, which comprises a processor, a memory and a communication bus;
the communication bus is used for realizing connection communication between the processor and the memory;
the processor is configured to execute one or more programs stored in the memory to implement the steps of the anomaly detection method as described above.
Further, the present invention provides a computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of an anomaly detection method as above.
Advantageous effects
The invention provides an abnormality detection method, a server and a computer-readable storage medium, wherein the abnormality detection method comprises the following steps: acquiring a behavior time sequence of a target user in a current time window; matching a target sub-behavior from the maximum frequent pattern of the previous time window according to the behavior time sequence of the target user; acquiring a target dependency coefficient of a target child behavior; calculating the dependency of the target user under a preset critical condition according to the target dependency coefficient; the preset critical condition is the maximum allowable number of the target user to execute the behaviors on the time sequence; judging whether the dependency is greater than a preset threshold value; if yes, determining that the behavior of the target user is abnormal; by the scheme, global calculation is not needed, the behavior time sequence abnormity of the target user can be detected in a short time, the calculation amount and the consumed time are greatly reduced, the real-time performance is improved, the reliability of a big data platform is improved, and the resource cost is saved.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a diagram illustrating an alternative server hardware architecture for implementing various embodiments of the present invention;
fig. 2 is a basic flowchart of an anomaly detection method according to a first embodiment of the present invention;
FIG. 3 is a basic flowchart of an anomaly detection method according to a second embodiment of the present invention;
fig. 4 is a schematic diagram of a server according to a third embodiment of the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, in order to implement the structure diagram of an optional server according to various embodiments of the present invention, the server at least includes: an Input Output (IO) bus 11, a processor 12, a memory 13, a memory 14, and a communication device 15. Wherein,
the input/output (IO) bus 11 is connected to other components (the processor 12, the storage 13, the memory 14, and the communication device 15) of the server to which it belongs, and provides a transmission line for the other components.
The processor 12 typically controls the overall operation of the server to which it belongs. For example, processor 12 performs computations, validation, etc. The processor 12 may be a Central Processing Unit (CPU), among others.
The communication device 15, typically comprising one or more components, allows radio communication between a server to which it belongs and a wireless communication system or network.
The memory 13 stores processor-readable, processor-executable software code containing instructions for controlling the processor 12 to perform the functions described herein (i.e., software execution functions).
Based on the above server hardware structure, various embodiments of the method of the present invention are proposed.
First embodiment
In order to solve the problems in the prior art that, when performing real-time anomaly detection on big data, an algorithm model is complex, a calculation amount is large, calculation resources of a big data platform are seriously consumed, time consumption is long, and cost is increased, the present embodiment provides an anomaly detection method, referring to fig. 2, where fig. 2 is a basic flow chart of the anomaly detection method provided by the present embodiment, and the anomaly detection method includes the following steps:
s201: acquiring a behavior time sequence of a target user in a current time window;
in an internet application (e.g., portal, mall, etc.), a user is a human or machine program that requests services from the application. The behavior of the user refers to a triple tuple (u, t, i) formed by a certain service i (item) which the user u (user) requests to apply at a certain time t (time) as a behavior of the user. For example, at a certain time t, user u requests a service of a mall application to access a science and technology book.
A plurality of behaviors of the user u in a time window are arranged according to the time sequence of the occurrence of the behaviors, so that a behavior time sequence Y of the user is formedu={Yu1,Yu2,…,Yuk,…,YumIn which Y isukRepresents the kth behavior triple tuple of the user u in chronological order within a certain time window.
The interval of the time window can be set according to actual requirements, and can be set to 5 minutes, 10 minutes, 20 minutes, and the like.
Optionally, before acquiring the behavior time sequence of the target user in the current time window in S201, the method further includes the following three steps:
the first step is as follows: acquiring behavior time sequences of all users in a previous time window;
the second step is that: according to the behavior time sequences of all users in the previous time window, adopting a Prefix span mode mining algorithm to mine all frequent modes FS (frequency patterns) in the previous time window, wherein the set FS is { s }1,s2,…sn};
The third step: obtaining a maximum frequent pattern MaxFS according to all frequent patterns FS, wherein the set MaxFS is { ms ═ ms1,ms2,…,msz}。
Wherein, the Prefix span pattern mining algorithm is one of sequence pattern analysis algorithms;
s in the FS set and ms in the MaxFS set are both the behavior time sequence of the user, and the sequence is composed of one or more behavior objects which are used as user tags.
Assume that the previous time window is WpreThe current time window is WcurThe interval was 10 minutes and the window sliding distance was 1 minute. If WpreWhen the ratio is 06:00-06:10, then Wcur=06:01-06:11。
In a big data system, the data volume of the behavior time sequences of all users in the previous time window is huge, the data volume of all the mined frequent patterns in the previous time window is also huge, the data in the frequent patterns are screened according to the preset conditions to obtain the maximum frequent pattern, the time required by the subsequent process of matching the target child behaviors according to the behavior time sequences of the target users in the current time window is reduced, and the efficiency is improved.
Optionally, after obtaining the maximum frequent pattern according to all the frequent patterns, the method further includes the following steps:
according to the behavior time sequences of all users in the previous time window, combining the user behavior penalty factor with a Zipfer distribution curve, and fitting by using a least square method to obtain a dependency coefficient corresponding to each sub-behavior in the maximum frequent mode;
the user behavior penalty factor is used for reducing the weight of the active users.
The calculation of LTUD (Locally Tagged User Dependent dynamics) can be done in the following way: for data within Wpre, for all user data, according to the zipff distribution curve model, assume that the request sequence of user u within Wpre is Y ═ Y1,Y2…YkAnd k is the continuous action times of the user. Then the user has a dependency on the application of p ═ k when performing the k-th actionω/ln(1+Actu). The dependency coefficient ω, i.e., LTUD, is obtained using least squares fitting. This LTUD is for each tag class, i.e. there are z sequences in the maximum frequent pattern MFS, there are z ω for LTUD.
There is a classical law in statistics called "two eight law": 80% of the events are handled by 20% of the people. Similarly, anomaly detection is also true, with 80% of traffic being generated by 20% of network operators. A large amount of flow can be generated by the behaviors of machines colliding libraries, network crawlers or application errors and the like which violate normal behaviors, and when the noise proportion in the data is increased, the real and effective small amount of user behavior data can be ignored during statistics; in order to prevent the defect that the detection capability of the system for the inactive users is insufficient due to the fact that the active users have great influence on the data performance, the behavior weight of the active users needs to be changed; specifically, the modification of the behavior weight of the active user includes performing a weight reduction on the active user (whether active abnormal traffic or normal user traffic). The way of reducing the weight is to add a penalty factor to the behavior of the active user.
Judging the mode of the active user: sorting in descending order according to the behavior number of the users, for example, if 1 ten thousand users exist, the top 20% of the users are taken as active users.
Optionally, the penalty factor for user behavior is 1/ln (1+ Act)u),ActuLn is the logarithmic sign in mathematics for the liveness of the user.
S202: matching a target sub-behavior from the maximum frequent pattern of the previous time window according to the behavior time sequence of the target user;
optionally, in step S202, matching the target sub-behavior from the most frequent pattern of the previous time window according to the behavior time sequence of the target user includes:
calculating the sequence similarity of the behavior time sequence of the target user and all behavior time sequences in the maximum frequent mode of the previous time window by adopting a DNA sequence comparison algorithm (Smith Waterman, SW);
and taking the action with the highest sequence similarity in the time series of all actions of the maximum frequent pattern as the target child action.
And calculating sequence similarity by adopting a DNA sequence comparison algorithm, and taking the behavior with the highest sequence similarity as a target sub-behavior.
That is, the MaxFS ═ ms for calculating the behavior time series of the target user and the previous time window1,ms2,…,mszAnd (4) regarding the ms with the highest sequence similarity as a target child behavior, regarding the behavior time sequence of the target user and the ms with the highest sequence similarity in the MaxFS as the same label, and marking a classification label which can be associated with the ms on the target user ID.
Optionally, a child behavior is considered valid only if the sequence similarity is above a threshold; and if all the calculated sequence similarity degrees are lower than the threshold value, all the sequence similarity degrees are regarded as invalid child behaviors.
The threshold of the sequence similarity can be set according to practical situations, for example, 70%, 75%, and the like.
S203: acquiring a target dependency coefficient of a target child behavior;
s204: calculating the dependency of the target user under a preset critical condition according to the target dependency coefficient; the preset critical condition is the maximum allowable number of the target user to execute the behaviors on the time sequence;
optionally, before calculating the dependency of the target user under the preset critical condition according to the target dependency coefficient in S204, the method further includes the following steps:
judging whether the calculation of the dependency coefficient of the previous time window is overtime or not;
if so, acquiring a dependency coefficient of the target sub-behavior in a preset time window, and taking the dependency coefficient as a target dependency coefficient;
if not, determining that the target dependency coefficient obtained in step S203 is the dependency coefficient of the target child behavior in the previous time window.
The number of the time windows contained in the preset time window is at least 1, only one time window can be taken, and a plurality of continuous time windows can be taken as the preset time windows; for example, the preset time window may be a time window before the previous time window, or all days of the previous day, or three days of the previous day, etc.;
if the previous time window 06:00-06:10 is overtime, the current time reaches 06:10:06, then the LTUD calculated by the previous time window 05:50-06:00 of the current time window is taken, or the LTUD calculated by the data of the whole day of the previous day is taken.
In different hardware environments, due to inconsistency of computing performance or network delay, a computing timeout (for example, an interval greater than a time window plus 5 seconds) may be caused, and after the computing timeout, a LTUD acquired by using recent history data (for example, a day or a last time window) is required; and if not, directly using the LTUD of the previous time window.
Optionally, in step S204, calculating the dependency of the target user under the preset critical condition according to the target dependency coefficient includes:
according to the formula p ═ kω/ln(1+Actu) Calculating the dependency of the target user;
p is the dependency, k is the maximum allowed number of behaviors performed by the target user in the time sequence, and ω is the dependency coefficient.
When k represents the k-th behavior of the target user in the time sequence, the behavior may not be continued. For example, k is 5, but if 7 behaviors are found in the target user, the target user will be treated as an exception.
S205: judging whether the dependency is greater than a preset threshold value; if yes, entering S206; if not, entering S207;
the preset threshold of the dependency may be set according to actual conditions, for example, 0.3, 0.4, and the like.
S206: and determining that the behavior of the target user is abnormal.
And after determining that the behavior of the target user is abnormal, performing exception handling on the target user.
S207: and determining that the target user is normal in behavior.
And determining that the behavior of the target user is normal, and performing offline backup on the behavior time sequence of the target user in the current time window so as to calculate the LTUD according to the historical data under the condition that the calculation is overtime.
By implementing the embodiment, two nearest time windows can be adopted in the anomaly detection method for stream mining without global calculation, the behavior time sequence anomaly of the target user can be detected in a shorter time, the calculation amount and the consumed time are greatly reduced, the real-time performance of the large data anomaly detection is improved on the basis of the dynamic user dependency, the reliability of a large data platform is improved, and the resource cost is saved.
The model provided by the embodiment is simple and easy to understand, has high maintainability, can save a large amount of computing resources and cost compared with a nonlinear complex stream mining algorithm, can detect the user behavior time sequence abnormity in a short time, and improves the reliability of a large data platform. The method provides a feasible solution for the hot spot problems of large calculation amount, long consumed time and the like in the current real-time anomaly detection, and provides reference for the industry.
Second embodiment
The idea of this embodiment is: on the flow data containing user information in a certain time window, for any user, performing tagging classification after capturing behavior flow of the user, calculating LTUD, and taking the LTUD as a reference; and calculating the dependency of the user on the application by combining the incoming user traffic and the LTUD in the next time window, and when the dependency is greater than a preset threshold and the user behavior time sequence is not converged, determining that the user traffic is abnormal. The method only needs two nearest time windows in the anomaly detection method for stream mining, global calculation is not needed, and the calculation amount and the time consumption are greatly reduced.
Specifically, firstly, selecting user behavior data of a previous sliding time window, and determining a general behavior pattern of a user in a label classification by using a maximum frequent pattern; secondly, calculating a user dependency coefficient in a first sliding time window, wherein the dependency coefficient is calculated by combining a user behavior penalty factor with a Zipfer distribution curve and using a least square method; then, when user traffic enters the current sliding window, calculating the dependency of the user on the application according to the dependency coefficient of the last time window; and finally, comparing the dependency with a theoretical threshold, and quickly detecting real-time abnormality according to whether the user behavior time sequence is converged.
The present embodiment will provide an embodiment of a specific abnormality detection method, and referring to fig. 3, fig. 3 is a flowchart of an abnormality detection method provided in the present embodiment;
s301: labeling the user;
the big data system captures all user behaviors in a time window 06:00-06:10, and the behavior sequence of a certain user u is YuThen all user behavior sequences constitute Y. Y resembles a matrix, but the number of columns is not equally long, with each row representing a sequence of rows for one user. For example, Y23 in the matrix below indicates that the user numbered 2 accessed the service numbered 3 and is chronologically the second activity that belongs to him.
Y11Y12Y13
Y21Y23
Y31Y13Y14
The corresponding real data is the following scene:
Figure BDA0001584879970000101
Figure BDA0001584879970000111
here, for convenience of description, for example, the access behavior of the user with the reference number 1 is changed to a letter a, where a denotes a behavior of the user 1.
Figure BDA0001584879970000112
In the above table, it can be seen that through "maximum common denominator" of behavior, a user's behavior can always find one of the MFSs that is most similar to it.
For example Y1:[ABCDEFCEDF]Wherein the most similar MFS has { ABCE, CDE, EF, DF }, and the sequence ms and Y of each MFS need to be aligned by DNA sequence alignment1Sequence similarity comparison shows that ms is ABCE and Y1The similarity is highest (the principle is that the number of elements in the same sequence is the largest), so that the sum ms can be regarded as the same class as ABCE, and then the user 1 can be marked with a classification label capable of being associated with ms.
S302: behavior weight change;
in order to prevent the defect that the detection capability of a system for an inactive user is insufficient due to the fact that the influence of an active user on data performance is large, a penalty factor 1/ln (1+ Act) needs to be added to the behavior of the active useru) Wherein ActuIs the liveness of user u.
S303: calculating a dependency coefficient;
calculating a dependency coefficient LTUD, and putting the data of the matrix Y into a dependency formula p ═ kω/ln(1+Actu) The dependency coefficient ω, i.e., LTUD, is obtained by least squares fitting. This LTUD corresponds to each tag class, i.e. maximumThere are z sequences in the frequent pattern MFS, and there are z ω for LTUD.
S304: judging whether the calculation is overtime; if not, the process goes to S305; if yes, entering S306;
s305: acquiring a real-time dependency coefficient, and entering S307;
s306: acquiring a historical dependency coefficient, and entering S307;
if the 06:00-06:10 window calculation is overtime and the current time reaches 06:10:06, the LTUD calculated by the data all day of yesterday or the LTUD calculated by the data 05:50-06:00 the day is taken.
S307: calculating the user dependence;
s308: judging whether the dependency exceeds a preset threshold, if so, entering S309; if not, the process goes to S310;
s309: performing exception handling, and entering S311;
s310: after the detection, the offline backup is performed, and the process proceeds to S311.
S311: and (6) ending.
User 1 access behavior sequence captured for time window 06:01 ═ 06:11 is Y1Finding its best matching ms from the most frequent pattern and then finding ω in the corresponding LTUD. For the formula p ═ kω/ln(1+Actu) Known as ω and ActuIf the dependency threshold p is set to 0.3, the k value can be obtained. When k is obtained to indicate the k-th action of the user in the time sequence, the action may not be continued. E.g. k-5, but we find Y1There are 7 behaviors, then Y1Will be handled as an exception.
By implementing the embodiment, two nearest time windows can be adopted in the anomaly detection method for stream mining without global calculation, the behavior time sequence anomaly of the target user can be detected in a shorter time, the calculation amount and the consumed time are greatly reduced, the real-time performance of the large data anomaly detection is improved on the basis of the dynamic user dependency, the reliability of a large data platform is improved, and the resource cost is saved.
The model provided by the embodiment is simple and easy to understand, has high maintainability, can save a large amount of computing resources and cost compared with a nonlinear complex stream mining algorithm, can detect the user behavior time sequence abnormity in a short time, and improves the reliability of a large data platform. The method provides a feasible solution for the hot spot problems of large calculation amount, long consumed time and the like in the current real-time anomaly detection, and provides reference for the industry.
Third embodiment
Referring to fig. 4, fig. 4 is a schematic diagram of a server provided in this embodiment, where the server includes a processor 401, a memory 402, and a communication bus 403, where:
the communication bus 403 is used for realizing connection communication between the processor 401 and the memory 402;
the processor 401 is configured to execute one or more programs stored in the memory 402 to implement the steps of the abnormality detection method in the first embodiment and the second embodiment.
By implementing the embodiment, two nearest time windows can be adopted in the anomaly detection method for stream mining without global calculation, the behavior time sequence anomaly of the target user can be detected in a shorter time, the calculation amount and the consumed time are greatly reduced, the real-time performance of the large data anomaly detection is improved on the basis of the dynamic user dependency, the reliability of a large data platform is improved, and the resource cost is saved.
The model provided by the embodiment is simple and easy to understand, has high maintainability, can save a large amount of computing resources and cost compared with a nonlinear complex stream mining algorithm, can detect the user behavior time sequence abnormity in a short time, and improves the reliability of a large data platform. The method provides a feasible solution for the hot spot problems of large calculation amount, long consumed time and the like in the current real-time anomaly detection, and provides reference for the industry.
Fourth embodiment
The present embodiment provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of the abnormality detection method in the first and second embodiments.
By implementing the embodiment, two nearest time windows can be adopted in the anomaly detection method for stream mining without global calculation, the behavior time sequence anomaly of the target user can be detected in a shorter time, the calculation amount and the consumed time are greatly reduced, the real-time performance of the large data anomaly detection is improved on the basis of the dynamic user dependency, the reliability of a large data platform is improved, and the resource cost is saved.
The model provided by the embodiment is simple and easy to understand, has high maintainability, can save a large amount of computing resources and cost compared with a nonlinear complex stream mining algorithm, can detect the user behavior time sequence abnormity in a short time, and improves the reliability of a large data platform. The method provides a feasible solution for the hot spot problems of large calculation amount, long consumed time and the like in the current real-time anomaly detection, and provides reference for the industry.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. An abnormality detection method characterized by comprising the steps of:
acquiring a behavior time sequence of a target user in a current time window;
matching a target sub-behavior from the maximum frequent pattern of the previous time window according to the behavior time sequence of the target user;
obtaining a target dependency coefficient of the target child behavior;
calculating the dependency of the target user under a preset critical condition according to the target dependency coefficient, wherein the method comprises the following steps:
according to the formula p ═ kω/ln(1+Actu) Calculating the dependency degree of the target user;
the p is the dependency, k is the maximum allowable number of behaviors executed by the target user in time sequence, ω is the dependency coefficient, and 1/ln (1+ Act)u) For a user behavior penalty factor, the ActuThe ln is the log symbol in mathematics for the activity of the user;
judging whether the dependency is larger than a preset threshold value or not;
and if so, determining that the behavior of the target user is abnormal.
2. The anomaly detection method according to claim 1, characterized by, before said obtaining a time sequence of behaviors of a target user of a current time window, further comprising the steps of:
acquiring behavior time sequences of all users in the previous time window;
mining all frequent patterns in the previous time window by adopting a Prefix span pattern mining algorithm according to the behavior time sequences of all users in the previous time window;
and obtaining the maximum frequent mode according to all frequent modes.
3. The anomaly detection method according to claim 2, characterized by, after said deriving said maximum frequent pattern from all frequent patterns, further comprising the steps of:
according to the behavior time sequences of all users in the previous time window, combining the user behavior penalty factor with a Zipfer distribution curve, and fitting by using a least square method to obtain a dependency coefficient corresponding to each sub-behavior in the maximum frequent pattern;
and the user behavior penalty factor is used for reducing the weight of the active users.
4. The anomaly detection method according to any one of claims 1 to 3, wherein said matching target sub-behaviors from the most frequent pattern of the previous time window according to said target user's temporal sequence of behaviors comprises:
calculating the sequence similarity of the behavior time sequence of the target user and all behavior time sequences in the maximum frequent mode of the previous time window by adopting a DNA sequence comparison algorithm;
and taking the action with the highest sequence similarity in all action time sequences of the maximum frequent pattern as the target child action.
5. The anomaly detection method according to any one of claims 1 to 3, further comprising, before said calculating the dependency of said target user under a preset critical condition based on said target dependency coefficient, the steps of:
judging whether the calculation of the dependency coefficient of the previous time window is overtime or not;
if so, acquiring a dependency coefficient of the target sub-behavior in a preset time window, and taking the dependency coefficient as the target dependency coefficient;
if not, determining that the obtained target dependency coefficient is the dependency coefficient of the target child behavior of the previous time window.
6. The abnormality detection method according to any one of claims 1 to 3, further comprising, after said judging whether said dependency is greater than a preset threshold, the steps of:
if not, determining that the behavior of the target user is normal, and performing offline backup on the behavior time sequence of the target user in the current time window.
7. A server, comprising a processor, a memory, and a communication bus;
the communication bus is used for realizing connection communication between the processor and the memory;
the processor is configured to execute one or more programs stored in the memory to implement the steps of the anomaly detection method of any one of claims 1 to 6.
8. A computer readable storage medium, characterized in that the computer readable storage medium stores one or more programs which are executable by one or more processors to implement the steps of the abnormality detection method according to any one of claims 1 to 6.
CN201810167498.9A 2018-02-28 2018-02-28 Anomaly detection method, server and computer readable storage medium Active CN108509979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810167498.9A CN108509979B (en) 2018-02-28 2018-02-28 Anomaly detection method, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810167498.9A CN108509979B (en) 2018-02-28 2018-02-28 Anomaly detection method, server and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN108509979A CN108509979A (en) 2018-09-07
CN108509979B true CN108509979B (en) 2022-03-11

Family

ID=63375875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810167498.9A Active CN108509979B (en) 2018-02-28 2018-02-28 Anomaly detection method, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN108509979B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032490A (en) * 2018-12-28 2019-07-19 中国银联股份有限公司 Method and device thereof for detection system exception
CN111488899B (en) * 2019-01-29 2024-02-23 杭州海康威视数字技术股份有限公司 Feature extraction method, device, equipment and readable storage medium
CN110609783B (en) * 2019-09-24 2023-08-04 京东科技控股股份有限公司 Method and device for identifying abnormal behavior user
CN112579661B (en) * 2019-09-29 2023-04-14 杭州海康威视数字技术股份有限公司 Method and device for determining specific target pair, computer equipment and storage medium
CN110929799B (en) * 2019-11-29 2023-05-12 上海盛付通电子支付服务有限公司 Method, electronic device, and computer-readable medium for detecting abnormal user
CN111241151A (en) * 2019-12-27 2020-06-05 北京健康之家科技有限公司 Service data analysis early warning method, system, storage medium and computing device
CN113515554A (en) * 2020-04-09 2021-10-19 华晨宝马汽车有限公司 Anomaly detection method and system for irregularly sampled time series
CN111984455B (en) * 2020-09-08 2024-09-27 中国农业银行股份有限公司 Method, device, server and computer storage medium for detecting timeout data
CN112131274B (en) * 2020-09-22 2024-01-19 平安科技(深圳)有限公司 Method, device, equipment and readable storage medium for detecting abnormal points of time sequence
CN112380971B (en) * 2020-11-12 2023-08-25 杭州海康威视数字技术股份有限公司 Behavior detection method, device and equipment
CN112818868B (en) * 2021-02-03 2024-05-28 招联消费金融股份有限公司 Method and device for identifying illegal user based on behavior sequence characteristic data
CN113609202B (en) * 2021-08-11 2024-09-06 湖南快乐阳光互动娱乐传媒有限公司 Data processing method and device
CN114598527B (en) * 2022-03-08 2024-05-28 江苏大学 Abnormal network flow detection method based on maximum frequent pattern dissimilarity
CN114610581B (en) * 2022-03-17 2024-04-12 杭州云深科技有限公司 Data processing system for acquiring application software
CN117078048B (en) * 2023-10-17 2024-01-26 深圳市福山自动化科技有限公司 Digital twinning-based intelligent city resource management method and system
CN117742304B (en) * 2024-02-09 2024-05-07 珠海市南特金属科技股份有限公司 Fault diagnosis method and system for crankshaft double-top vehicle control system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102868936A (en) * 2012-09-06 2013-01-09 北京邮电大学 Method and system for storing video logs
CN107493277A (en) * 2017-08-10 2017-12-19 福建师范大学 The online method for detecting abnormality of big data platform based on maximum information coefficient

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8098585B2 (en) * 2008-05-21 2012-01-17 Nec Laboratories America, Inc. Ranking the importance of alerts for problem determination in large systems

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102868936A (en) * 2012-09-06 2013-01-09 北京邮电大学 Method and system for storing video logs
CN107493277A (en) * 2017-08-10 2017-12-19 福建师范大学 The online method for detecting abnormality of big data platform based on maximum information coefficient

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"无线传感器网络路由与传输中典型攻击防御机制研究";胡蓉华;《中国博士学位论文全文数据库信息科技辑》;20170715;全文 *

Also Published As

Publication number Publication date
CN108509979A (en) 2018-09-07

Similar Documents

Publication Publication Date Title
CN108509979B (en) Anomaly detection method, server and computer readable storage medium
JP6106340B2 (en) Log analysis device, attack detection device, attack detection method and program
Mirzasoleiman et al. Lazier than lazy greedy
CN110929145B (en) Public opinion analysis method, public opinion analysis device, computer device and storage medium
CN102880501B (en) Implementation method, device and system that application is recommended
Ding et al. Control flow-based opcode behavior analysis for malware detection
CN106548073B (en) Malicious APK screening method based on convolutional neural network
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
CN106940679A (en) Data processing method and device
CN114338195B (en) Web flow anomaly detection method and device based on improved isolated forest algorithm
CN106557695A (en) A kind of malicious application detection method and system
KR101858620B1 (en) Device and method for analyzing javascript using machine learning
CN111863280A (en) Health detection method, system, terminal device and storage medium
CN111914257A (en) Document detection method, device, equipment and computer storage medium
CN116305129B (en) Document detection method, device, equipment and medium based on VSTO
CN116186759A (en) Sensitive data identification and desensitization method for privacy calculation
CN110162973B (en) Webshell file detection method and device
CN112347474A (en) Method, device, equipment and storage medium for constructing security threat information
US10417422B2 (en) Method and apparatus for detecting application
CN108804501B (en) Method and device for detecting effective information
CN110704614B (en) Information processing method and device for predicting user group type in application
CN109684844B (en) Webshell detection method and device, computing equipment and computer-readable storage medium
CN113626817B (en) Malicious code family classification method
CN113360899B (en) Machine behavior recognition method and system
CN111783804A (en) Abnormal call bill determining method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant