[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112651019A - Abnormal behavior detection method and device based on unsupervised learning and computing terminal - Google Patents

Abnormal behavior detection method and device based on unsupervised learning and computing terminal Download PDF

Info

Publication number
CN112651019A
CN112651019A CN202011576194.1A CN202011576194A CN112651019A CN 112651019 A CN112651019 A CN 112651019A CN 202011576194 A CN202011576194 A CN 202011576194A CN 112651019 A CN112651019 A CN 112651019A
Authority
CN
China
Prior art keywords
behavior
user
abnormal
data
user behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011576194.1A
Other languages
Chinese (zh)
Inventor
王天祥
朱永强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Wangan Technology Development Co ltd
Original Assignee
Chengdu Wangan Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Wangan Technology Development Co ltd filed Critical Chengdu Wangan Technology Development Co ltd
Priority to CN202011576194.1A priority Critical patent/CN112651019A/en
Publication of CN112651019A publication Critical patent/CN112651019A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The embodiment of the application provides an abnormal behavior detection method, an abnormal behavior detection device and a calculation terminal based on unsupervised learning, wherein standard scores of all characteristic components in a behavior vector of user behavior data of a target user are calculated according to a preset user behavior baseline of the target user, the degree of deviation of the user behavior data from the user behavior baseline is represented by the standard scores, abnormal behavior information corresponding to the user behavior data is calculated according to the standard scores of all the characteristic components, accordingly, potential abnormal behaviors of users in an enterprise can be effectively mined, the abnormal degree of the user behaviors can be measured, and effective basis is further provided for user behavior analysis in the enterprise.

Description

Abnormal behavior detection method and device based on unsupervised learning and computing terminal
Technical Field
The application relates to the technical field of computers, in particular to an abnormal behavior detection method and device based on unsupervised learning and a computing terminal.
Background
With the high-speed development of information technology, enterprise informatization construction brings great convenience to enterprise business, and meanwhile, the problem of enterprise information security is increasingly complicated, and at present, because security events caused by internal malicious personnel and employees who are lost carelessly are endless, individuals and enterprises suffer huge losses, and internal user behavior analysis increasingly becomes the key point of enterprise security concern.
However, because the number of the abnormal behavior samples of the users inside the enterprise is small in the real-world situation and difficult to obtain, it is difficult to effectively mine the potential abnormal behaviors of the users inside the enterprise and measure the abnormal degree of the user behaviors.
Disclosure of Invention
Based on the defects of the existing design, the application provides the abnormal behavior detection method, the abnormal behavior detection device and the abnormal behavior detection computing terminal based on unsupervised learning, which can effectively mine the potential abnormal behaviors of the users in the enterprise and measure the abnormal degree of the user behaviors, thereby providing an effective basis for the analysis of the user behaviors in the enterprise.
According to a first aspect of the present application, there is provided an abnormal behavior detection method based on unsupervised learning, applied to a computing terminal, the method including:
extracting a behavior vector of user behavior data of a target user;
calculating standard scores of all characteristic components in the behavior vector according to a preset user behavior baseline of a target user, wherein the standard scores are used for representing the degree of deviation of the user behavior data from the user behavior baseline;
and calculating abnormal behavior information corresponding to the user behavior data according to the standard scores of the characteristic components.
In a possible implementation manner of the first aspect, the step of extracting a behavior vector of the user behavior data of the target user includes:
performing data cleaning on the user behavior data;
and extracting vector representation of the user behavior data after data cleaning to serve as the behavior vector of the user behavior data.
In a possible implementation manner of the first aspect, the manner of performing data cleaning on the user behavior data includes at least one of the following data cleaning manners:
deleting a preset log field in the user behavior data;
replacing different log fields belonging to the same behavior attribute in the user behavior data with a uniform log field of the same behavior attribute;
sampling and adding fields containing missing values in the user behavior data according to data distribution of preset attributes;
and carrying out duplicate removal processing on the log field in the user behavior data.
In a possible implementation manner of the first aspect, the step of extracting the vector representation of the data-cleaned user behavior data as the behavior vector of the user behavior data includes:
calculating the occurrence frequency of the type characteristic of the user behavior data after the data cleaning as a first behavior vector;
regarding the numerical characteristic of the user behavior data after the data cleaning, taking a characteristic numerical value of the numerical characteristic as a second behavior vector, or taking a characteristic numerical value obtained by converting the numerical characteristic according to a preset measurement mode as a second behavior vector;
wherein the behavior vectors of the user behavior data include the first behavior vector and the second behavior vector.
In a possible implementation manner of the first aspect, the step of calculating abnormal behavior information corresponding to the user behavior data according to the standard scores of the feature components includes:
acquiring a first target characteristic component with the maximum absolute value in the standard scores of the characteristic components;
and mapping the first target characteristic component to a preset degree interval by adopting a preset activation function to obtain the behavior abnormal degree of the target user, wherein the behavior abnormal degree is used as abnormal behavior information corresponding to the user behavior data.
In a possible implementation manner of the first aspect, the step of calculating abnormal behavior information corresponding to the user behavior data according to the standard scores of the feature components includes:
acquiring a second target characteristic component of which the absolute value is greater than a preset threshold value in the standard fractions of the characteristic components;
mapping each second target characteristic component to a preset degree interval by adopting a preset activation function to obtain a behavior abnormal degree corresponding to each second target characteristic component;
multiplying the behavior abnormal degree corresponding to each second target characteristic component by the corresponding preset weight to obtain the behavior weight abnormal degree of each second target characteristic component;
and determining the sum of the abnormal degrees of the behavior weights of the second target characteristic components as the abnormal degree of the behavior of the target user, and taking the abnormal degree as the abnormal behavior information corresponding to the user behavior data.
In one possible implementation of the first aspect, the method further comprises:
acquiring historical user behavior data of the target user within a preset time range;
extracting historical behavior vectors of the historical user behavior data;
calculating a user behavior mean vector and a user behavior standard deviation vector corresponding to the historical behavior vector;
and taking the user behavior mean vector and the user behavior standard deviation vector as the user behavior baseline.
For example, in a possible implementation manner of the first aspect, the method further includes:
acquiring an abnormal behavior information set of the target user, and determining an abnormal operation service associated with the abnormal behavior information set of the target user;
acquiring a preset number of service data objects in a service data partition of the abnormally-operating service, determining a current service data object from the preset number of service data objects, selecting an initial abnormal traceability tracking node from the current service data object, and determining traceability tracking information of the initial abnormal traceability tracking node;
obtaining sample tracing information, and performing tracing correlation calculation according to the tracing information and the sample tracing information to obtain tracing correlation information;
when the tracing association information does not meet the preset association requirement, returning to the step of selecting an initial abnormal tracing node from the current business data object until the tracing association information meets the preset association requirement, and taking the initial abnormal tracing node meeting the preset association requirement as an abnormal tracing object corresponding to the current business data object;
adding each abnormal traceability object to an abnormal traceability process to obtain each traceability addition node;
and respectively tracing the source of the abnormal operation service based on each source tracing adding node.
According to a second aspect of the present application, there is provided an abnormal behavior detection apparatus based on unsupervised learning, applied to a computing terminal, the apparatus including:
the extraction module is used for extracting the behavior vector of the user behavior data of the target user;
the first calculation module is used for calculating standard scores of all characteristic components in the behavior vector according to a preset user behavior baseline of a target user, wherein the standard scores are used for representing the degree of deviation of the user behavior data from the user behavior baseline;
and the second calculation module is used for calculating abnormal behavior information corresponding to the user behavior data according to the standard scores of the characteristic components.
According to a third aspect of the present application, there is provided a computing terminal, including a machine-readable storage medium and a processor, where the machine-readable storage medium stores a computer program, and the processor is configured to execute the computer program to perform the unsupervised learning-based abnormal behavior detection method according to the first aspect or any one of the possible implementation manners of the first aspect.
According to a fourth aspect of the present application, a readable storage medium is provided, in which a computer program is stored, and the computer program is executed to perform the method for detecting abnormal behavior based on unsupervised learning of the first aspect or any one of the possible implementations of the first aspect.
Based on any aspect, the standard scores of all characteristic components in the behavior vector of the user behavior data of the target user are calculated according to the preset user behavior baseline of the target user, so that the degree of deviation of the user behavior data from the user behavior baseline is represented through the standard scores, abnormal behavior information corresponding to the user behavior data is calculated according to the standard scores of all the characteristic components, potential abnormal behaviors of users in an enterprise can be effectively mined, the abnormal degree of the user behaviors can be measured, and effective basis is further provided for user behavior analysis in the enterprise.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart illustrating an abnormal behavior detection method based on unsupervised learning according to an embodiment of the present application;
FIG. 2 shows a flow diagram of the sub-steps of step S110 shown in FIG. 1;
FIG. 3 shows one of the sub-step flow diagrams of step S130 shown in FIG. 1;
FIG. 4 shows a second schematic flow chart of the sub-steps of step S130 shown in FIG. 1;
fig. 5 is a second schematic flowchart of an abnormal behavior detection method based on unsupervised learning according to an embodiment of the present application;
fig. 6 is a schematic functional block diagram of an abnormal behavior detection apparatus based on unsupervised learning according to an embodiment of the present application;
fig. 7 shows a component structure diagram of a computing terminal for executing the above abnormal behavior detection method based on unsupervised learning according to an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some of the embodiments of the present application.
It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.
Fig. 1 shows an interaction flow diagram of an abnormal behavior detection method based on unsupervised learning according to an embodiment of the present application. It should be understood that, in other embodiments, the order of some steps in the unsupervised learning-based abnormal behavior detection method of the present embodiment may be interchanged according to actual needs, or some steps may be omitted or deleted. The detailed steps of the abnormal behavior detection method based on unsupervised learning are described as follows.
Step S110, a behavior vector of the user behavior data of the target user is extracted.
In this embodiment, the target user may refer to any user that needs to perform the abnormal behavior detection based on unsupervised learning. For example, for an enterprise, all users may be identified as target users, or a portion of tagged users may be identified as target users.
User behavior data may cover data sources that target users generate during the course of using the various internal systems of the enterprise. The data source may be understood as a log file associated with the target user when using an internal system of the enterprise, for example, the internal system may include, but is not limited to, a user terminal, an enterprise server, an enterprise business system, an enterprise security device, and the like.
And step S120, calculating the standard scores of the characteristic components in the behavior vectors according to the preset user behavior base line of the target user.
In this embodiment, the preset user behavior baseline may be obtained based on historical user behavior data of each target user, and may be specifically used to characterize historical behavior characteristics of each target user. The criterion score may be used to characterize the degree to which the user behavior data deviates from the user behavior baseline. Due to the design, the abnormal behavior samples of the users in the enterprise do not need to be collected, and the problem that the abnormal behavior detection based on unsupervised learning is difficult to implement due to the fact that the number of the abnormal behavior samples of the users in the enterprise is small and the abnormal behavior samples are difficult to obtain when a supervised detection mode is adopted can be avoided.
And step S130, calculating abnormal behavior information corresponding to the user behavior data according to the standard scores of the characteristic components.
Based on the above steps, the present embodiment calculates the standard score of each feature component in the behavior vector of the user behavior data of the target user according to the preset user behavior baseline of the target user, so as to represent the degree of deviation of the user behavior data from the user behavior baseline through the standard score, thereby calculating abnormal behavior information corresponding to the user behavior data according to the standard score of each feature component, thereby being capable of effectively mining the potential abnormal behavior of the user inside the enterprise and measuring the abnormal degree of the user behavior, and further providing an effective basis for the user behavior analysis inside the enterprise.
In one possible embodiment, for step S110, in order to improve the accuracy of the behavior vector and reduce noise interference, please refer to fig. 2, which can be implemented by the following sub-steps S111-S112, which are described in detail below.
And a substep S111, performing data cleaning on the user behavior data.
For example, in some alternative examples, the manner of performing data cleansing on the user behavior data may include at least one of the following data cleansing manners, which are described in detail below.
(1) And deleting the preset log field in the user behavior data.
For example, in some possible examples, there may be some redundant or unimportant log fields in the user behavior data, and then the log fields may be deleted directly.
(2) And replacing different log fields belonging to the same behavior attribute in the user behavior data with the uniform log field of the same behavior attribute.
For example, in some possible examples, a unified log field may be set for corresponding replacement of fields belonging to the same behavior attribute but having different names of log fields.
(3) And sampling and adding fields containing missing values in the user behavior data according to the data distribution of the preset attributes.
For example, in some possible examples, for fields containing missing values, i.e., fields in which the value of some attribute or attributes in the dataset are incomplete, sample additions may be made based on the data distribution of the corresponding attribute.
(4) And carrying out duplicate removal processing on the log field in the user behavior data.
And a substep S112, extracting the vector representation of the user behavior data after data cleaning as the behavior vector of the user behavior data.
For example, in some alternative examples, extracting a vector representation of the data-cleaned user behavior data as a behavior vector of the user behavior data may be implemented by the following embodiments.
(1) And calculating the occurrence frequency of the classification type characteristics of the user behavior data after data cleaning as a first behavior vector.
(2) And regarding the numerical characteristic of the user behavior data after data cleaning, taking the characteristic numerical value of the numerical characteristic as a second behavior vector, or taking the characteristic numerical value obtained by converting the numerical characteristic according to a preset measurement mode as the second behavior vector.
Wherein the behavior vector of the user behavior data may include a first behavior vector and a second behavior vector.
For example, for the category-type features of the user behavior, such as logging in the system, using a mobile medium, printing a file, etc., statistical coding may be used, that is, the occurrence frequency of the category-type features is calculated as a first behavior vector, and the statistical time range may be selected from time periods of days, weeks, months, etc., and is not limited in particular. In addition, for the numerical characteristics of the user behavior, such as login system time, number of bytes downloaded by the server on the same day, etc., the original numerical value may be directly used as the second behavior vector, or converted into the second behavior vector in a measurement manner. The behavior vector of the user behavior data may be represented as follows:
Figure BDA0002864052720000081
wherein, biA behavior vector representing user behavior data of user i,
Figure BDA0002864052720000082
and representing the value of the behavior characteristic c of the user i.
In one possible implementation, for step S120, the inventors have studied and found that, based on the theorem of large numbers, it can be assumed that each feature component of the behavior vector approximately follows a gaussian distribution, and the occurrence probability of the user behavior feature can be represented by the magnitude of the standard score of the user behavior feature.
Based on this, the standard scores, that is, the z scores, of the feature components in the behavior vectors can be respectively calculated, so as to measure the degree of deviation of the behavior vector of the current target user from the preset user behavior baseline of the target user, and the specific calculation manner may be as follows:
Figure BDA0002864052720000091
wherein
Figure BDA0002864052720000092
Representing the division of the vector components by the bit,
Figure BDA0002864052720000093
and the standard score representing the behavior characteristic c of the user i to be analyzed currently.
In one possible implementation, referring to step S130 in combination with fig. 3, the following exemplary sub-step S131-sub-step S132 can be implemented, which is described in detail below.
In the substep S131, the first target feature component having the largest absolute value among the standard scores of the feature components is obtained.
And a substep S132, adopting a preset activation function to map the first target characteristic component into a preset degree interval, and obtaining the behavior abnormal degree of the target user as abnormal behavior information corresponding to the user behavior data.
For example, the first target feature component may be represented as:
Figure BDA0002864052720000094
by
Figure BDA0002864052720000095
The value range of (1) is (- ∞, + ∞), and z is known to bemaxThe value range of [0, + ∞). Thereafter, z can be taken as the preset activation function using tanh activation functionmaxMapping to [0,100) to obtain the abnormal degree i of the behavior of the target user iriskDegree of abnormality in behavior iriskThe calculation method of (c) is as follows:
irisk=100*tanh(zmax)
in another possible implementation, still referring to step S130, please refer to fig. 4, which can be implemented by the following exemplary substeps S133-substep S136, which are described in detail below.
And a substep S133, acquiring a second target feature component of which the absolute value is greater than a preset threshold value in the standard scores of the feature components.
And a substep S134, mapping each second target characteristic component to a preset degree interval by adopting a preset activation function, and obtaining a behavior abnormal degree corresponding to each second target characteristic component.
In the substep S135, the behavior anomaly degree corresponding to each second target feature component is multiplied by the corresponding preset weight to obtain the behavior weight anomaly degree of each second target feature component.
In this embodiment, different feature components have different preset weights, and specifically, custom configuration may be performed according to an abnormal influence condition of each feature component, which is not limited in detail in this embodiment.
And a substep S136, determining the sum of the abnormal degrees of the behavior weights of each second target characteristic component as the abnormal degree of the behavior of the target user, and using the abnormal degree as the abnormal behavior information corresponding to the user behavior data.
In a possible implementation manner, please further refer to fig. 5, the method for detecting abnormal behavior based on unsupervised learning according to the embodiment of the present application may further include the following steps S101 to S104, which are described in detail below.
Step S101, obtaining historical user behavior data of a target user within a preset time range.
In this embodiment, the historical user behavior data may be derived from multi-source heterogeneous log files in an enterprise, field names of the same behavior attribute in each log file are not completely consistent, and each log file includes redundant, repeated, and missing-value-containing fields, so that data cleaning needs to be performed on the user behavior data, and the specific data cleaning mode may refer to the detailed operation mode of the foregoing substep S111, which is not repeated herein.
Step S102, extracting historical behavior vectors of historical user behavior data.
In this embodiment, the manner of extracting the historical behavior vector of the historical user behavior data may refer to the detailed operation manner of the foregoing sub-step S112, which is not repeated herein.
And step S103, calculating a user behavior mean vector and a user behavior standard deviation vector corresponding to the historical behavior vector.
For example, calendarsThe history behavior vector may be, but is not limited to
Figure BDA0002864052720000101
Wherein
Figure BDA0002864052720000102
Representing the historical behavior vector of the user i, and calculating a user behavior mean vector miThe formula may be as follows:
Figure BDA0002864052720000103
wherein m isiA mean vector of the behavior of the user i is represented,
Figure BDA0002864052720000104
representing the mean of the behavior feature c of user i.
And, the user behavior standard deviation vector siThe formula may be as follows:
Figure BDA0002864052720000111
Figure BDA0002864052720000112
wherein s isiA vector of standard deviation of the behavior of user i,
Figure BDA0002864052720000113
representing the standard deviation of the behavior characteristic c of the user i.
And step S104, taking the user behavior mean vector and the user behavior standard deviation vector as user behavior baselines.
Therefore, the embodiment of the application fully considers the classification type characteristics and the numerical type characteristics of the user behaviors, constructs the user behavior baseline, can fully mine the potential abnormal behaviors of the users in the enterprise and measure the abnormal degree of the user behaviors, and provides an effective means for analyzing the user behaviors in the enterprise.
For example, in one possible implementation, the above method may further include the following steps.
1) And acquiring an abnormal behavior information set of the target user, and determining the abnormal operation service associated with the abnormal behavior information set of the target user.
2) The method comprises the steps of obtaining a preset number of service data objects in a service data partition of an abnormal service, determining a current service data object from the preset number of service data objects, selecting an initial abnormal traceability tracking node from the current service data object, and determining traceability tracking information of the initial abnormal traceability tracking node.
3) And obtaining sample tracing information, and performing tracing correlation calculation according to the tracing information and the sample tracing information to obtain tracing correlation information.
4) And when the tracing associated information does not meet the preset associated requirement, returning to the step of selecting the initial abnormal tracing node from the current business data object until the tracing associated information meets the preset associated requirement, and taking the initial abnormal tracing node meeting the preset associated requirement as the abnormal tracing object corresponding to the current business data object.
5) And adding each abnormal tracing object to the abnormal tracing process to obtain each tracing adding node.
6) And respectively tracing the source of the abnormal operation service based on each source tracing adding node.
Based on the same inventive concept, please refer to fig. 6, which is a schematic diagram illustrating functional modules of the abnormal behavior detection apparatus 110 based on unsupervised learning according to an embodiment of the present application, and the embodiment may divide the functional modules of the abnormal behavior detection apparatus 110 based on unsupervised learning according to the method embodiment executed by the computing terminal 100. For example, the functional blocks may be divided for the respective functions, or two or more functions may be integrated into one processing block. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation. For example, in the case of dividing each function module according to each function, the abnormal behavior detection apparatus 110 based on unsupervised learning shown in fig. 6 is only a schematic apparatus. The abnormal behavior detection apparatus 110 based on unsupervised learning may include an extraction module 111, a first calculation module 112, and a second calculation module 113, and the functions of the functional modules of the abnormal behavior detection apparatus 110 based on unsupervised learning are described in detail below.
And the extracting module 111 is configured to extract a behavior vector of the user behavior data of the target user. It is understood that the extracting module 111 can be used to execute the step S110, and for the detailed implementation of the extracting module 111, reference can be made to the contents related to the step S110.
The first calculating module 112 is configured to calculate a standard score of each feature component in the behavior vector according to a preset user behavior baseline of the target user, where the standard score is used to represent a degree that the user behavior data deviates from the user behavior baseline. It is understood that the first calculating module 112 can be used to execute the step S120, and for the detailed implementation of the first calculating module 112, reference can be made to the above-mentioned contents related to the step S120.
And the second calculating module 113 is configured to calculate, according to the standard scores of the feature components, abnormal behavior information corresponding to the user behavior data. It is understood that the second calculating module 113 can be used to execute the step S130, and for the detailed implementation of the second calculating module 113, reference can be made to the above-mentioned contents related to the step S130.
In a possible implementation, the extraction module 111 is specifically configured to:
performing data cleaning on user behavior data;
and extracting the vector representation of the user behavior data after data cleaning as the behavior vector of the user behavior data.
In a possible implementation manner, the data cleaning manner of the extraction module 111 for the user behavior data includes at least one of the following data cleaning manners:
deleting a preset log field in the user behavior data;
different log fields belonging to the same behavior attribute in the user behavior data are replaced by uniform log fields of the same behavior attribute;
sampling and adding fields containing missing values in user behavior data according to data distribution of preset attributes;
and carrying out duplicate removal processing on the log field in the user behavior data.
In a possible implementation, the extraction module 111 may be specifically configured to:
calculating the occurrence frequency of the type features of the user behavior data after data cleaning as a first behavior vector;
regarding the numerical characteristic of the user behavior data after data cleaning, taking a characteristic numerical value of the numerical characteristic as a second behavior vector, or taking a characteristic numerical value obtained by converting the numerical characteristic according to a preset measurement mode as a second behavior vector;
wherein the behavior vectors of the user behavior data comprise a first behavior vector and a second behavior vector.
In a possible implementation, the second calculating module 113 may specifically be configured to:
acquiring a first target characteristic component with the largest absolute value in the standard scores of all the characteristic components;
and mapping the first target characteristic component to a preset degree interval by adopting a preset activation function to obtain the behavior abnormal degree of the target user as abnormal behavior information corresponding to the user behavior data.
In a possible implementation, the second calculating module 113 is specifically configured to:
acquiring a second target characteristic component of which the absolute value is greater than a preset threshold value in the standard scores of the characteristic components;
mapping each second target characteristic component to a preset degree interval by adopting a preset activation function to obtain a behavior abnormal degree corresponding to each second target characteristic component;
multiplying the behavior abnormal degree corresponding to each second target characteristic component by the corresponding preset weight to obtain the behavior weight abnormal degree of each second target characteristic component;
and determining the sum of the behavior weight abnormal degrees of each second target characteristic component as the behavior abnormal degree of the target user, and taking the sum as the abnormal behavior information corresponding to the user behavior data.
In a possible implementation, the unsupervised learning based abnormal behavior detection apparatus 110 may further include a baseline determination module, specifically configured to:
acquiring historical user behavior data of a target user within a preset time range;
extracting historical behavior vectors of historical user behavior data;
calculating a user behavior mean vector and a user behavior standard deviation vector corresponding to the historical behavior vector;
and taking the user behavior mean vector and the user behavior standard deviation vector as user behavior baselines.
Based on the same inventive concept, please refer to fig. 7, which shows a schematic block diagram of a computing terminal 100 for executing the above abnormal behavior detection method based on unsupervised learning according to an embodiment of the present application, where the computing terminal 100 may include an abnormal behavior detection apparatus 110 based on unsupervised learning, a machine-readable storage medium 120 and a processor 130.
In this embodiment, the machine-readable storage medium 120 and the processor 130 are both located in the computing terminal 100 and are disposed separately. However, it should be understood that the machine-readable storage medium 120 may also be separate from the computing terminal 100 and accessible by the processor 130 through a bus interface. Alternatively, the machine-readable storage medium 120 may be integrated into the processor 130, e.g., may be a cache and/or general purpose registers.
The unsupervised learning based abnormal behavior detection apparatus 110 may include software functional modules (e.g., the extraction module 111, the first calculation module 112, and the second calculation module 113 shown in fig. 6) stored in the machine-readable storage medium 120, when the processor 130 executes the software functional modules in the unsupervised learning based abnormal behavior detection apparatus 110, so as to implement the unsupervised learning based abnormal behavior detection method provided by the foregoing method embodiment.
Since the computing terminal 100 provided in the embodiment of the present application is another implementation form of the method embodiment executed by the computing terminal 100, and the computing terminal 100 can be used to execute the method for detecting abnormal behavior based on unsupervised learning provided in the method embodiment, reference may be made to the method embodiment for obtaining technical effects, and details are not repeated here.
The embodiments described above are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided in the accompanying drawings is not intended to limit the scope of the application, but is merely representative of selected embodiments of the application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims. Moreover, all other embodiments that can be made available by a person skilled in the art without making any inventive step based on the embodiments of the present application shall fall within the scope of protection of the present application.

Claims (10)

1. An abnormal behavior detection method based on unsupervised learning is characterized by being applied to a computing terminal and comprising the following steps:
extracting a behavior vector of user behavior data of a target user;
calculating standard scores of all characteristic components in the behavior vector according to a preset user behavior baseline of a target user, wherein the standard scores are used for representing the degree of deviation of the user behavior data from the user behavior baseline;
and calculating abnormal behavior information corresponding to the user behavior data according to the standard scores of the characteristic components.
2. The unsupervised learning-based abnormal behavior detection method according to claim 1, wherein the step of extracting the behavior vector of the user behavior data of the target user comprises:
performing data cleaning on the user behavior data;
and extracting vector representation of the user behavior data after data cleaning to serve as the behavior vector of the user behavior data.
3. The unsupervised learning-based abnormal behavior detection method according to claim 2, wherein the data cleaning manner for the user behavior data comprises at least one of the following data cleaning manners:
deleting a preset log field in the user behavior data;
replacing different log fields belonging to the same behavior attribute in the user behavior data with a uniform log field of the same behavior attribute;
sampling and adding fields containing missing values in the user behavior data according to data distribution of preset attributes;
and carrying out duplicate removal processing on the log field in the user behavior data.
4. The unsupervised learning-based abnormal behavior detection method according to claim 2, wherein the step of extracting the vector representation of the data-cleaned user behavior data as the behavior vector of the user behavior data comprises:
calculating the occurrence frequency of the type characteristic of the user behavior data after the data cleaning as a first behavior vector;
regarding the numerical characteristic of the user behavior data after the data cleaning, taking a characteristic numerical value of the numerical characteristic as a second behavior vector, or taking a characteristic numerical value obtained by converting the numerical characteristic according to a preset measurement mode as a second behavior vector;
wherein the behavior vectors of the user behavior data include the first behavior vector and the second behavior vector.
5. The unsupervised learning-based abnormal behavior detection method according to any one of claims 1-4, wherein the step of calculating abnormal behavior information corresponding to the user behavior data according to the standard scores of the feature components comprises:
acquiring a first target characteristic component with the maximum absolute value in the standard scores of the characteristic components;
and mapping the first target characteristic component to a preset degree interval by adopting a preset activation function to obtain the behavior abnormal degree of the target user, wherein the behavior abnormal degree is used as abnormal behavior information corresponding to the user behavior data.
6. The unsupervised learning-based abnormal behavior detection method according to any one of claims 1-4, wherein the step of calculating abnormal behavior information corresponding to the user behavior data according to the standard scores of the feature components comprises:
acquiring a second target characteristic component of which the absolute value is greater than a preset threshold value in the standard fractions of the characteristic components;
mapping each second target characteristic component to a preset degree interval by adopting a preset activation function to obtain a behavior abnormal degree corresponding to each second target characteristic component;
multiplying the behavior abnormal degree corresponding to each second target characteristic component by the corresponding preset weight to obtain the behavior weight abnormal degree of each second target characteristic component;
and determining the sum of the abnormal degrees of the behavior weights of the second target characteristic components as the abnormal degree of the behavior of the target user, and taking the abnormal degree as the abnormal behavior information corresponding to the user behavior data.
7. The unsupervised learning-based abnormal behavior detection method according to any one of claims 1-4, wherein the method further comprises:
acquiring historical user behavior data of the target user within a preset time range;
extracting historical behavior vectors of the historical user behavior data;
calculating a user behavior mean vector and a user behavior standard deviation vector corresponding to the historical behavior vector;
and taking the user behavior mean vector and the user behavior standard deviation vector as the user behavior baseline.
8. An abnormal behavior detection device based on unsupervised learning, which is applied to a computing terminal, and comprises:
the extraction module is used for extracting the behavior vector of the user behavior data of the target user;
the first calculation module is used for calculating standard scores of all characteristic components in the behavior vector according to a preset user behavior baseline of a target user, wherein the standard scores are used for representing the degree of deviation of the user behavior data from the user behavior baseline;
and the second calculation module is used for calculating abnormal behavior information corresponding to the user behavior data according to the standard scores of the characteristic components.
9. A computing terminal comprising a machine-readable storage medium having a computer program stored therein and a processor configured to run the computer program to perform the unsupervised learning based abnormal behavior detection method of any of claims 1-7.
10. A readable storage medium, in which a computer program is stored, the computer program being executed to perform the unsupervised learning-based abnormal behavior detection method according to any one of claims 1 to 7.
CN202011576194.1A 2020-12-28 2020-12-28 Abnormal behavior detection method and device based on unsupervised learning and computing terminal Pending CN112651019A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011576194.1A CN112651019A (en) 2020-12-28 2020-12-28 Abnormal behavior detection method and device based on unsupervised learning and computing terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011576194.1A CN112651019A (en) 2020-12-28 2020-12-28 Abnormal behavior detection method and device based on unsupervised learning and computing terminal

Publications (1)

Publication Number Publication Date
CN112651019A true CN112651019A (en) 2021-04-13

Family

ID=75363458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011576194.1A Pending CN112651019A (en) 2020-12-28 2020-12-28 Abnormal behavior detection method and device based on unsupervised learning and computing terminal

Country Status (1)

Country Link
CN (1) CN112651019A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116027724A (en) * 2022-09-23 2023-04-28 河北东来工程技术服务有限公司 Ship equipment risk monitoring method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108809745A (en) * 2017-05-02 2018-11-13 中国移动通信集团重庆有限公司 A kind of user's anomaly detection method, apparatus and system
CN108881194A (en) * 2018-06-07 2018-11-23 郑州信大先进技术研究院 Enterprises user anomaly detection method and device
CN110351307A (en) * 2019-08-14 2019-10-18 杭州安恒信息技术股份有限公司 Abnormal user detection method and system based on integrated study
CN111651767A (en) * 2020-06-05 2020-09-11 腾讯科技(深圳)有限公司 Abnormal behavior detection method, device, equipment and storage medium
CN111833171A (en) * 2020-03-06 2020-10-27 北京芯盾时代科技有限公司 Abnormal operation detection and model training method, device and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108809745A (en) * 2017-05-02 2018-11-13 中国移动通信集团重庆有限公司 A kind of user's anomaly detection method, apparatus and system
CN108881194A (en) * 2018-06-07 2018-11-23 郑州信大先进技术研究院 Enterprises user anomaly detection method and device
CN110351307A (en) * 2019-08-14 2019-10-18 杭州安恒信息技术股份有限公司 Abnormal user detection method and system based on integrated study
CN111833171A (en) * 2020-03-06 2020-10-27 北京芯盾时代科技有限公司 Abnormal operation detection and model training method, device and readable storage medium
CN111651767A (en) * 2020-06-05 2020-09-11 腾讯科技(深圳)有限公司 Abnormal behavior detection method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116027724A (en) * 2022-09-23 2023-04-28 河北东来工程技术服务有限公司 Ship equipment risk monitoring method and system
CN116027724B (en) * 2022-09-23 2024-01-12 河北东来工程技术服务有限公司 Ship equipment risk monitoring method and system

Similar Documents

Publication Publication Date Title
CN107633265B (en) Data processing method and device for optimizing credit evaluation model
CN110679114B (en) Method for estimating deletability of data object
CN110570544A (en) method, device, equipment and storage medium for identifying faults of aircraft fuel system
CN114978877B (en) Abnormality processing method, abnormality processing device, electronic equipment and computer readable medium
CN115034596A (en) Risk conduction prediction method, device, equipment and medium
CN109582906B (en) Method, device, equipment and storage medium for determining data reliability
CN113448955A (en) Data set quality evaluation method and device, computer equipment and storage medium
US20220179764A1 (en) Multi-source data correlation extraction for anomaly detection
CN111178701B (en) Risk control method and device based on feature derivation technology and electronic equipment
US10467538B2 (en) Link de-noising in a network
CN115422028A (en) Credibility evaluation method and device for label portrait system, electronic equipment and medium
CN112651019A (en) Abnormal behavior detection method and device based on unsupervised learning and computing terminal
Turgeman et al. Context-aware incremental clustering of alerts in monitoring systems
CN116883181B (en) Financial service pushing method based on user portrait, storage medium and server
CN112131248A (en) Data analysis method, device, equipment and storage medium
CN117785539A (en) Log data analysis method, device, computer equipment and storage medium
CN113535458B (en) Abnormal false alarm processing method and device, storage medium and terminal
CN113869904B (en) Suspicious data identification method, device, electronic equipment, medium and computer program
CN115344495A (en) Data analysis method and device for batch task test, computer equipment and medium
CN114492364A (en) Same vulnerability judgment method, device, equipment and storage medium
CN114553473A (en) Abnormal login behavior detection system and method based on login IP and login time
CN113807413A (en) Object identification method and device and electronic equipment
US20240004747A1 (en) Processor System and Failure Diagnosis Method
CN118035049B (en) Application interface abnormality warning method, device, electronic equipment and medium
CN113868438B (en) Information reliability calibration method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination