[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115269314A - Transaction abnormity detection method based on log - Google Patents

Transaction abnormity detection method based on log Download PDF

Info

Publication number
CN115269314A
CN115269314A CN202210826059.0A CN202210826059A CN115269314A CN 115269314 A CN115269314 A CN 115269314A CN 202210826059 A CN202210826059 A CN 202210826059A CN 115269314 A CN115269314 A CN 115269314A
Authority
CN
China
Prior art keywords
log
template
transaction
variable
templates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210826059.0A
Other languages
Chinese (zh)
Inventor
沈国鹏
朱品燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunji Zhizao Technology Co ltd
Original Assignee
Beijing Yunji Zhizao Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunji Zhizao Technology Co ltd filed Critical Beijing Yunji Zhizao Technology Co ltd
Priority to CN202210826059.0A priority Critical patent/CN115269314A/en
Publication of CN115269314A publication Critical patent/CN115269314A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a transaction abnormity detection method based on logs, which comprises the following stages of log collection, log template clustering, transaction template clustering, feature extraction and abnormity detection, and compared with the prior art, the invention has the advantages that: (1) unknown fault types may be detected. And (2) the false alarm rate is low. (3) The mobility is good, the application range is wider, and the method is suitable for other systems. And (4) the maintenance cost is low.

Description

Transaction abnormity detection method based on log
Technical Field
The invention relates to the field of log transaction abnormity detection, in particular to a transaction abnormity detection method based on logs.
Background
The method has the advantages that the existing scheme for directly detecting transaction abnormity aiming at the transaction log does not exist, other related schemes are to detect the abnormity by taking the transaction log as a general operation log, one scheme is a log keyword alarm mode, the other scheme is an operation log abnormity detection method based on log template mining, the classical algorithm of the method is DeepLog, the other methods are not much different from the scheme of the DeepLog method in the whole thinking, and the whole thinking of the method comprises three parts: 1, mining a log template by using a log analysis algorithm; 2, extracting data characteristics related to the log template; and 3, carrying out modeling analysis based on the data characteristics.
The prior art solution described has the following disadvantages:
1. the mode of warning based on the log keywords needs expert experience for maintenance, and exists
(1) Only known fault types can be detected, and unknown fault types cannot be detected;
(2) The false alarm rate is high;
(3) The migratability is poor, namely the keyword alarm rule of one set of system is not applicable to other systems;
(4) The maintenance cost is high.
2. Operation log abnormity detection method based on log template mining
(1) There is no direct and efficient modeling analysis of the transactions of the system;
(2) The type of failure associated with the system transaction cannot be detected.
Based on the above reasons, a transaction anomaly detection method based on logs becomes a technical problem to be solved urgently in the whole society.
Disclosure of Invention
In order to solve the technical problems, the technical scheme provided by the invention is as follows: a transaction abnormity detection method based on log comprises the following stages,
(1) Collecting logs, namely collecting operation logs belonging to the same transaction into a transaction log;
(2) Log template clustering, namely mining a log template from an operation log of a transaction log by using a log analysis algorithm, wherein one log template corresponds to one log printing statement of a software system under an ideal condition, the log template is obtained by reserving a constant part in the operation log and replacing a variable part of the operation log by a wildcard, and the constant and the variable correspond to a dead part and a variable part in the system log printing statement;
(3) Clustering transaction templates, wherein after clustering the log templates, mapping operation logs in a transaction log with the log templates obtained by mining, wherein 8 operation logs in the transaction log correspond to 5 log templates; the transaction template clustering firstly maps all operation logs in a transaction log into a log template, and then reordering is carried out to obtain a transaction template corresponding to the transaction log; the content of the transaction template is all log templates related to a transaction log, meaning which operations need to be passed by a transaction, namely the transaction template is an abstract representation of different operations needed by different transaction types; certain operations can be executed repeatedly for one transaction, so that reordering is needed when abstract representation is carried out;
(4) Extracting characteristics, namely extracting data characteristics related to a log template and a transaction template in log data;
(5) And (4) anomaly detection, which is divided into two parts, namely model training and online anomaly detection.
Further, the characteristics related to the log template in the step (4) are as follows:
extracting variables, namely extracting values in the original operation logs corresponding to each variable part in the log template, for example, if one log template has three variables (positions of wildcard identifiers), each operation log belonging to the log template can extract a variable sequence p = [ "value1", "value2", "value3" ], wherein "value1", "value2", "value3" are contents of the operation log corresponding to the three variable positions of the log template. And stacking the variable sequences extracted from all the operation logs corresponding to the log template according to a time sequence to obtain a variable matrix P = [ P1, P2.,. Pn ] of the log template, wherein n is the number of the operation logs corresponding to the template.
Further, the data characteristics related to the transaction template in the step (4) are as follows:
(1) The log template count vector of the transaction template dimension is subsequently referred to as a template count vector v, one transaction log corresponds to one v, the transaction template corresponding to the transaction log as shown in the figure has 5 log templates, the template count vector v of the transaction log is [2, 1], and the meaning of the template count vector v is the number of the operation logs corresponding to each log template in the transaction log; stacking template counting vectors V of all transaction logs corresponding to the transaction templates to obtain a template counting matrix V = [ V1, V2, V3,.. Times, vn ];
(2) And the log templates of the transaction template dimension are sequentially adjacent to a matrix m, m is a two-dimensional matrix of n x n, wherein n is the number of the log templates contained in the transaction template, the adjacent matrix m is called subsequently, one transaction log corresponds to m, and m [ i, j ] represents how many log templates of two continuous operation logs in the transaction log are respectively consistent with the log templates corresponding to the subscripts i and j. The transaction logs of the above figures are examples, which include 8 operation logs, (log 1, log2, log3, log4, log5, log6, log7, log 8), whose corresponding 2-gram data are [ (log 1, log 2), (log 2, log 3),. ], (log 7, log 8) ], m [ i, j ] indicates how many of the 2-gram data correspond to (i, j), i and j are subscripts of the log template in the transaction template;
(3) And setting a time window for the transaction template index, and recording the number of the transaction logs corresponding to each transaction template in each time window so as to construct the transaction template index, namely, one transaction template corresponds to one template index for recording the number of the original transaction logs corresponding to the transaction template.
Further, the model training in step (5) trains a multidimensional unsupervised algorithm model from training data, wherein the training data is log data obtained by acquiring all normal log data of the software system within a period of history and then processing the log data by the module 1, namely a method for log collection introduction, and the method comprises the following steps:
(1) And the template library comprises a log template library and a transaction template library, and all log templates and transaction templates obtained by analyzing the training data by using the log template clustering and transaction template clustering introduction method form the template library. The template library represents which normal operations exist and which normal transaction types exist in a stable running state of the system;
(2) After all log templates are extracted from the training data, a variable matrix P of each log template is extracted from the training data through a variable extraction method introduced by the module 4 (feature extraction), and then enumerated variable mining is performed, wherein the enumerated variable refers to that the number of different values corresponding to the variable is smaller than a specified threshold value T. Finally, performing enumeration variable modeling, including enumeration value set modeling and low-frequency enumeration value modeling, wherein the enumeration value set modeling is a set obtained after the duplication of the corresponding value of the enumeration variable in the training set; in the low-frequency enumerated value modeling process, each enumerated value of an enumerated variable is calculated first, then frequency in data is trained, and Tc with the frequency smaller than a set threshold value is considered as a low-frequency enumerated value;
(3) According to the transaction template counting vector model, a template counting matrix V of each transaction template is extracted from training data according to a characteristic extraction introduction method, wherein V [: i ] represents ith column data of V, u _ i = Unique (V [: i ]) represents a discrete value list (e.g. [1,2,3 ]) obtained after the ith column of V is subjected to de-weight, and len (u _ i) represents how many pieces of data exist in u _ i. Then, according to V, U = [ U _0, U _1., U _ m ], where m is the number of transaction templates containing log templates. Finally, replace the empty list in U with length larger than a certain threshold value Tu, i.e. if len (U [ i ]) > Tu, U [ i ] = [ ]. The U obtained through the calculation is a template counting vector model of the transaction template, and the meaning of the U is a confidence interval of execution times of various operations in a certain transaction type under the stable operation state of the system;
(4) And (3) operating a sequential model by the transaction template, and extracting an adjacency matrix list mL of the transaction log corresponding to each transaction template from the training data according to a method for introducing feature extraction, wherein n is the number of the transaction logs corresponding to the transaction template, and m _ i is an adjacency matrix of a specific transaction log. Then, the n adjacent matrixes are added in a matrix addition mode and then are normalized according to rows to obtain M, wherein M is the transaction template operation sequence model, and means a probability transfer matrix in a Markov chain and represents the probability distribution of executing other operations after each operation;
(5) The business template index dynamic threshold model is characterized in that a template index of each business template is extracted from training data according to a feature extraction method, and then an index dynamic threshold model is trained for each business template by using the template index. The index dynamic threshold model used is not limited, such as facebook open source prophet algorithm, 3-sigma algorithm, etc.
Further, the online abnormality detecting section in the step (5):
(1) Data access: and accessing online log data in real time through components such as kafka and the like, and then performing data processing according to the mode introduced by the module 1 (data access).
(2) Template extraction: extracting the operation log template and the transaction template of the real-time log according to the modes of the modules 2 and 3, matching and mapping the templates and the obtained template base in the training stage, and identifying the template which is not matched into the template base as a new template (a new log template or a new transaction template)
(3) Characteristic extraction: and extracting a log template variable sequence of the real-time log, a log template counting vector of the transaction template dimension, a log template sequence adjacency matrix of the transaction template dimension and a transaction template index value according to a mode introduced by a module 4 (feature extraction).
(4) Anomaly detection.
Further, the anomaly detection in step (4) includes the following parts:
(1) Newly adding a log template, and triggering the newly added log template when the newly added log template exists;
(2) Newly adding a transaction template, wherein the newly added transaction template is triggered;
(3) Detecting whether the value of the enumeration variable is in an enumeration value set of the corresponding enumeration variable modeled in a training phase, and if not, triggering the enumeration variable to be abnormal; if yes, judging whether the enumeration value is a low-frequency enumeration value, if yes, triggering an enumeration variable to be lower than the former in abnormal alarm level;
(4) Judging whether the number of each operation of the transaction template in the real-time log is in the confidence interval of the log template corresponding to the transaction template or not through the log template counting vector if the operation frequency is abnormal, and triggering an alarm if the operation frequency is not abnormal;
(5) And (4) judging whether the operation sequence is abnormal or not by comparing the adjacent matrix M of the real-time log with the sequence model M obtained in the training stage. A specific judgment method is to trigger an alarm if M [ i, j ] is not equal to 0 and M [ i, j ] =0 exists in M;
(6) And (4) introducing an index dynamic threshold model obtained by training in a training stage to detect whether the index value of the transaction template extracted from the real-time log is in the dynamic threshold range fitted by the model, and triggering an alarm if the index value is not in the dynamic threshold range fitted by the model.
Compared with the prior art, the invention has the advantages that:
(1) Abstractly representing fault types at a transaction level into 6 types of 1.1;
(2) The method fills the defect that no automatic anomaly detection method specially aiming at the fault type of the transaction level exists at present;
(3) Compared with the expert rule, the method can detect the unknown fault type;
(4) The accuracy of the detection of the fault type of the transaction level is improved;
(5) The method has the learning function and the field universality;
(6) The transaction template provided by the method can be used for carrying out effective clustering analysis on the transactions.
Drawings
Fig. 1 is a schematic block diagram of a log-based transaction anomaly detection method according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The present invention will be described in detail with reference to the accompanying drawings.
The present invention, when embodied, provides a log-based transaction anomaly detection method,
1. and log collection, namely collecting the operation logs belonging to the same transaction into a transaction log, wherein as shown in the figure, one transaction log consists of a plurality of operation logs (the transaction log shown in the figure consists of 8 operation logs).
2. And (3) clustering log templates, namely mining the log templates from the operation logs of the transaction logs by using a log analysis algorithm, wherein one log template corresponds to one log printing statement of the software system under an ideal condition, as shown in the figure, the log template is obtained by reserving a constant part in the operation logs and replacing a variable part of the operation logs by wildcards, wherein the constant and the variable correspond to a write-dead part and a variable part in the system log printing statement. The log parsing algorithm has more published methods including but not limited to Spell, logCluster, drain, etc.
3. And (3) clustering the transaction templates, wherein after the log templates are clustered, the operation logs in one transaction log can be mapped with the log templates obtained by mining, and as shown in the figure, 8 operation logs of the transaction log shown in the figure correspond to 5 log templates. The transaction template clustering firstly maps all operation logs in a transaction log into a log template, and then reordering is carried out to obtain a transaction template corresponding to the transaction log. That is, the contents of the transaction template are all log templates (de-reordering) referred to by a transaction log, meaning which operations a transaction needs to go through (corresponding to the log templates), that is, the transaction template is an abstract representation of different operations required by different transaction types. A transaction may repeat certain operations multiple times, so de-reordering is required when performing abstract characterizations.
4. And (4) feature extraction, namely extracting data features related to the log template and the transaction template in the log data.
a) Features related to a log template
(1) Extracting variables, that is, extracting values from the original operation log corresponding to each variable part in the log template, for example, if one log template has three variables (positions of wildcard identifiers), each operation log belonging to the log template can extract a variable sequence p = [ = value1"," value2"," value3"], where" value1"," value2"," value3 "are contents of three variable positions in the operation log corresponding to the log template. And stacking the variable sequences extracted from all the operation logs corresponding to the log template according to a time sequence to obtain a variable matrix P = [ P1, P2,. Multidot.pn ] of the log template, wherein n is the number of the operation logs corresponding to the template.
b) Data characteristics associated with a transaction template include:
(1) The log template count vector of the transaction template dimension is subsequently referred to as a template count vector v, one transaction log corresponds to one v, the template count vector v of the transaction log is 5 log templates, and the template count vector v of the transaction log is [2, 1] which means the number of operation logs corresponding to each log template in the transaction log. And stacking the template count vectors V of all the transaction logs corresponding to the transaction templates to obtain a template count matrix V = [ V1, V2, V3,.. Multidot.vn ].
(2) And the log templates of the transaction template dimension are sequentially adjacent to a matrix m, m is a two-dimensional matrix of n x n, wherein n is the number of the log templates contained in the transaction template, the adjacent matrix m is called subsequently, one transaction log corresponds to m, and m [ i, j ] represents how many log templates of two continuous operation logs in the transaction log are respectively consistent with the log templates corresponding to the subscripts i and j. The transaction log of the above figure is an example, which contains 8 operation logs, (log 1, log2, log3, log4, log5, log6, log7, log 8), whose corresponding 2-gram data are [ (log 1, log 2), (log 2, log 3),. ], (log 7, log 8) ], m [ i, j ] indicates how many of the 2-gram data correspond to (i, j), i and j are subscripts of the log template in the transaction template.
(3) And setting a time window for the transaction template index, and recording the number of the transaction logs corresponding to each transaction template in each time window to construct the transaction template index, namely one transaction template corresponds to one template index for recording the number of the original transaction logs corresponding to the transaction template.
5. Anomaly detection, which is divided into two parts of model training and online anomaly detection
a) And (3) training a multidimensional unsupervised algorithm model from training data, wherein the training data are log data obtained by acquiring all normal log data of the software system within a period of history and then processing the log data by the method introduced by the module 1 (log collection).
(1) And the template library comprises a log template library and a transaction template library, and all log templates and transaction templates obtained by analyzing the training data by using the methods introduced by the modules 2 (log template clustering) and 3 (transaction template clustering) form the template library. The template library characterizes which normal operations exist in a system steady operation state (log template characterization) and which normal transaction types exist in the system steady operation state (transaction template characterization).
(2) After all log templates are extracted from the training data, a variable matrix P of each log template is extracted from the training data by a variable extraction method introduced by the module 4 (feature extraction), and then enumerated variable mining is performed, wherein the enumerated variable refers to that the number of different values corresponding to the variable is smaller than a specified threshold value T (for example, 10). The specific process of the enumerated variable mining is that each log template is traversed, whether a variable of the log template belongs to the enumerated variable type is judged through P of the log template, specifically, each column of P is traversed, and if the value of one column is smaller than a threshold value T after duplication removal, the corresponding log template variable of the column is considered as the enumerated variable. Finally, performing enumerated variable modeling, including enumerated value set modeling and low-frequency enumerated value modeling, wherein the enumerated value set modeling is a set obtained after the corresponding value of the enumerated variable in the training set is removed; in the low-frequency enumerated value modeling process, each enumerated value of the enumerated variables is calculated first, then frequency in the data is trained, and Tc with the frequency smaller than a set threshold value is regarded as a low-frequency enumerated value.
(3) The transaction template counting vector model extracts a template counting matrix V of each transaction template from training data according to a method introduced by a module 4 (feature extraction), wherein V [: i ] represents ith column data of V, u _ i = unity (V [: i ]) represents a discrete value list (e.g. [1,2,3 ]) obtained after the ith column of V is subjected to de-weight, and len (u _ i) represents how many pieces of data exist in u _ i. Then, according to V, U = [ U _0, U _1., U _ m ] is calculated, and m is the number of the transaction templates containing the log templates. And finally, replacing an empty list with the length larger than a certain threshold Tu in the U, namely if len (U [ i ]) > Tu, U [ i ] = [ ]. The U obtained through the above calculation is a template count vector model of the transaction template, and means a confidence interval of the execution times of various operations (log templates) in a certain transaction type (transaction template) in a system stable operation state.
(4) The transaction template operation sequence model extracts an adjacency matrix list mL = [ m _1, m _2,..,. M _ n ] of the transaction log corresponding to each transaction template from the training data according to a method introduced by a module 4 (feature extraction), wherein n is the number of the transaction logs corresponding to the transaction template, and m _ i is an adjacency matrix of a specific transaction log. Then, the n adjacent matrixes are added up in a matrix addition mode and then are normalized according to rows to obtain M, wherein M is the transaction template operation sequence model (meaning is a probability transition matrix in a Markov chain and represents the probability distribution of executing other operations after each operation (log template)).
(5) The transaction template index dynamic threshold model extracts the template index of each transaction template from the training data according to the method of the module 4 (feature extraction), and then trains an index dynamic threshold model for each transaction template by using the template index. The index dynamic threshold model used in detail is not limited, such as facebook open source prophet algorithm, 3-sigma algorithm, etc.
b) On-line abnormality detection section
(1) Data access: and accessing online log data in real time through components such as kafka and the like, and then performing data processing according to the mode introduced by the module 1 (data access).
(2) Template extraction: extracting the operation log template and the transaction template of the real-time log according to the modes of the modules 2 and 3, matching and mapping the templates and the obtained template base in the training stage, and identifying the template which is not matched into the template base as a new template (a new log template or a new transaction template)
(3) Characteristic extraction: and extracting a log template variable sequence of the real-time log, a log template counting vector of the transaction template dimension, a log template sequence adjacency matrix of the transaction template dimension and a transaction template index value according to a mode introduced by a module 4 (feature extraction).
(4) Abnormality detection:
(a) Adding a log template, triggering the added log template when the log template is added
(b) Adding a transaction template, and triggering the added transaction template when the added transaction template is added
(c) Enumerating variable abnormity, detecting whether the value of the enumerated variable is in an enumerated value set of the corresponding enumerated variable modeled in a training phase, and triggering the enumerated variable abnormity if the value of the enumerated variable is not in the enumerated value set; if the low-frequency enumeration value is judged, and if the low-frequency enumeration value is judged, an enumeration variable exception is triggered (the alarm level is lower than the former).
(d) And if the operation frequency is abnormal, judging whether the corresponding number of each operation (log template) of the transaction template in the real-time log is in the confidence interval of the log template corresponding to the transaction template through the log template counting vector, and if not, triggering an alarm.
(e) The sequence of operations is abnormal. And comparing and judging whether the operation sequence is abnormal or not through the adjacent matrix M of the real-time log and the sequence model M obtained in the training stage. One specific judgment method is to trigger an alarm if M [ i, j ] is not equal to 0 and M [ i, j ] =0 exists in M.
(f) And (3) introducing an index dynamic threshold model obtained by training in a training phase to detect whether the index value of the transaction template extracted from the real-time log is within the dynamic threshold range of model fitting, and triggering an alarm if the index value is not within the dynamic threshold range of model fitting.
The correspondence between the above 6 anomaly detection modes and the 6-class transaction-level abstract fault types in 1.1 is that 1 and 2 correspond to transactions which lack necessary operations and have extra irrelevant operations, 3 corresponds to transaction operation enumeration variable anomalies, 4 corresponds to transaction operation execution times anomalies, 5 corresponds to transaction operation execution sequence anomalies, and 6 corresponds to transaction frequency anomalies.
The present invention and its embodiments have been described above, and the description is not intended to be limiting, and the drawings are only one embodiment of the present invention, and the actual structure is not limited thereto. In summary, those skilled in the art should be able to conceive of the present invention without creative design of the similar structural modes and embodiments without departing from the spirit of the present invention, and all such modifications should fall within the protection scope of the present invention.

Claims (6)

1. A transaction abnormity detection method based on logs is characterized in that: the method comprises the following several stages that,
(1) Collecting logs, namely collecting operation logs belonging to the same transaction into a transaction log;
(2) Log template clustering, namely mining a log template from an operation log of a transaction log by using a log analysis algorithm, wherein one log template corresponds to one log printing statement of a software system under an ideal condition, the log template is obtained by reserving a constant part in the operation log and replacing a variable part of the operation log by a wildcard, and the constant and the variable correspond to a dead part and a variable part in the system log printing statement;
(3) Clustering transaction templates, wherein after clustering the log templates, the operation logs in one transaction log and the log templates obtained by mining can be mapped, and 8 operation logs of the transaction log correspond to 5 log templates; the transaction template clustering firstly maps all operation logs in a transaction log into a log template, and then reordering is carried out to obtain a transaction template corresponding to the transaction log; the content of the transaction template is all log templates related to a transaction log, and the meaning is that operations need to be performed by a transaction, namely the transaction template is an abstract representation of different operations needed by different transaction types; a transaction may execute some operations repeatedly, so that reordering is needed when abstract representation is performed;
(4) Extracting characteristics, namely extracting data characteristics related to a log template and a transaction template in log data;
(5) And (4) anomaly detection, which is divided into two parts, namely model training and online anomaly detection.
2. The log-based transaction anomaly detection method according to claim 1, wherein: the characteristics related to the log template in the step (4) are as follows:
extracting variables, that is, extracting values from the original operation log corresponding to each variable part in the log template, for example, if one log template has three variables (positions of wildcard identifiers), each operation log belonging to the log template can extract a variable sequence p = [ = value1"," value2"," value3"], where" value1"," value2"," value3 "are contents of three variable positions in the operation log corresponding to the log template. And stacking the variable sequences extracted from all the operation logs corresponding to the log template according to a time sequence to obtain a variable matrix P = [ P1, P2.,. Pn ] of the log template, wherein n is the number of the operation logs corresponding to the template.
3. The log-based transaction anomaly detection method according to claim 1, wherein: the data characteristics related to the transaction template in the step (4) are as follows:
(1) The log template counting vector of the transaction template dimension is subsequently referred to as a template counting vector v, one transaction log corresponds to one v, the transaction template corresponding to the transaction log as shown in the figure has 5 log templates, the template counting vector v of the transaction log is [2, 1], and the meaning of the template counting vector v is the number of the operation logs corresponding to each log template in the transaction log; stacking the template count vectors V of all the transaction logs corresponding to the transaction templates to obtain a template count matrix V = [ V1, V2, V3, · vn ];
(2) And the log templates of the transaction template dimension are sequentially adjacent to a matrix m, m is a two-dimensional matrix of n x n, wherein n is the number of the log templates contained in the transaction template, the adjacent matrix m is called subsequently, one transaction log corresponds to m, and m [ i, j ] represents how many log templates of two continuous operation logs in the transaction log are respectively consistent with the log templates corresponding to the subscripts i and j. The transaction logs of the above figures are examples, which include 8 operation logs, (log 1, log2, log3, log4, log5, log6, log7, log 8), whose corresponding 2-gram data are [ (log 1, log 2), (log 2, log 3),. ], (log 7, log 8) ], m [ i, j ] indicates how many of the 2-gram data correspond to (i, j), i and j are subscripts of the log template in the transaction template;
(3) And setting a time window for the transaction template index, and recording the number of the transaction logs corresponding to each transaction template in each time window to construct the transaction template index, namely one transaction template corresponds to one template index for recording the number of the original transaction logs corresponding to the transaction template.
4. The log-based transaction anomaly detection method according to claim 1, wherein: the model training in the step (5) is to train a multidimensional unsupervised algorithm model from training data, wherein the training data is log data obtained by acquiring all normal log data of a software system within a period of history and then processing the log data by the module 1, namely a method for log collection introduction, and the method comprises the following steps:
(1) And the template library comprises a log template library and a transaction template library, and all log templates and transaction templates obtained by analyzing the training data by using the log template clustering and transaction template clustering introduction method form the template library. The template library represents which normal operations exist and which normal transaction types exist in a system stable operation state;
(2) After all log templates are extracted from the training data, a variable matrix P of each log template is extracted from the training data through a variable extraction method introduced by the module 4 (feature extraction), and then enumerated variable mining is performed, wherein the enumerated variable refers to that the number of different values corresponding to the variable is smaller than a specified threshold value T, the specific process of the enumerated variable mining is that each log template is traversed, whether the variable of the log template belongs to an enumerated variable type is judged through the P of the log template, specifically, each row of the P is traversed, and if the value of one row is smaller than the threshold value T after deduplication enumeration, the log template variable corresponding to the row is considered as the variable. Finally, performing enumerated variable modeling, including enumerated value set modeling and low-frequency enumerated value modeling, wherein the enumerated value set modeling is a set obtained after the corresponding value of the enumerated variable in the training set is removed; in the low-frequency enumerated value modeling process, each enumerated value of an enumerated variable is calculated first, then frequency in data is trained, and Tc with the frequency smaller than a set threshold value is regarded as a low-frequency enumerated value;
(3) According to the transaction template counting vector model, a template counting matrix V of each transaction template is extracted from training data according to a characteristic extraction introduction method, wherein V [: i ] represents ith column data of V, u _ i = Unique (V [: i ]) represents a discrete value list (e.g. [1,2,3 ]) obtained after the ith column of V is subjected to de-weight, and len (u _ i) represents how many pieces of data exist in u _ i. Then, according to V, U = [ U _0, U _1., U _ m ], where m is the number of transaction templates containing log templates. And finally, replacing an empty list with the length larger than a certain threshold Tu in the U, namely if len (U [ i ]) > Tu, U [ i ] = [ ]. The U obtained by the calculation is a template counting vector model of the transaction template, and the meaning of the U is a confidence interval of various operation execution times in a certain transaction type under the stable operation state of the system;
(4) And operating a sequential model by using the transaction templates, and extracting an adjacency matrix list mL of the transaction logs corresponding to each transaction template from the training data according to a method introduced by feature extraction, wherein n is the number of the transaction logs corresponding to the transaction templates, and m _ i is an adjacency matrix of a specific transaction log. Then, the n adjacent matrixes are added in a matrix addition mode and then are normalized according to rows to obtain M, wherein M is the transaction template operation sequence model, and means a probability transition matrix in a Markov chain to represent the probability distribution of other operations executed after each operation;
(5) The business template index dynamic threshold model is characterized in that a template index of each business template is extracted from training data according to a characteristic extraction method, and then an index dynamic threshold model is trained for each business template by using the template index. The index dynamic threshold model used is not limited, such as facebook open source prophet algorithm, 3-sigma algorithm, etc.
5. The log-based transaction anomaly detection method according to claim 1, wherein: the online abnormality detecting section in the step (5):
(1) Data access: and accessing online log data in real time through components such as kafka and the like, and then performing data processing according to the mode introduced by the module 1 (data access).
(2) Template extraction: extracting the operation log template and the transaction template of the real-time log according to the modes of the modules 2 and 3, matching and mapping the templates and the obtained template base in the training stage, and identifying the template which is not matched into the template base as a new template (a new log template or a new transaction template)
(3) Feature extraction: and extracting a log template variable sequence of the real-time log, a log template counting vector of the transaction template dimension, a log template sequence adjacency matrix of the transaction template dimension and a transaction template index value according to a mode introduced by a module 4 (feature extraction).
(4) And (4) detecting the abnormality.
6. The log-based transaction anomaly detection method of claim 6, wherein: the anomaly detection in the step (4) comprises the following parts:
(1) Adding a log template, wherein the added log template triggers the added log template;
(2) Newly adding a transaction template, and triggering the newly added transaction template when the newly added transaction template exists;
(3) Detecting whether the value of the enumeration variable is in an enumeration value set of the corresponding enumeration variable modeled in a training phase, and if not, triggering the enumeration variable to be abnormal; if yes, judging whether the enumeration value is a low-frequency enumeration value, if yes, triggering an enumeration variable to be lower than the former in abnormal alarm level;
(4) Judging whether the number of each operation of the transaction template in the real-time log is in the confidence interval of the log template corresponding to the transaction template or not through the log template counting vector if the operation frequency is abnormal, and triggering an alarm if the operation frequency is not abnormal;
(5) And (4) judging whether the operation sequence is abnormal or not by comparing the adjacent matrix M of the real-time log with the sequence model M obtained in the training stage. A specific judgment method is to trigger an alarm if M [ i, j ] is not equal to 0 and M [ i, j ] =0 exists in M;
(6) And (3) introducing an index dynamic threshold model obtained by training in a training phase to detect whether the index value of the transaction template extracted from the real-time log is within the dynamic threshold range of model fitting, and triggering an alarm if the index value is not within the dynamic threshold range of model fitting.
CN202210826059.0A 2022-07-13 2022-07-13 Transaction abnormity detection method based on log Pending CN115269314A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210826059.0A CN115269314A (en) 2022-07-13 2022-07-13 Transaction abnormity detection method based on log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210826059.0A CN115269314A (en) 2022-07-13 2022-07-13 Transaction abnormity detection method based on log

Publications (1)

Publication Number Publication Date
CN115269314A true CN115269314A (en) 2022-11-01

Family

ID=83765339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210826059.0A Pending CN115269314A (en) 2022-07-13 2022-07-13 Transaction abnormity detection method based on log

Country Status (1)

Country Link
CN (1) CN115269314A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658441A (en) * 2022-12-13 2023-01-31 济南丽阳神州智能科技有限公司 Method, equipment and medium for monitoring abnormality of household service system based on log
CN116414610A (en) * 2023-06-12 2023-07-11 建信金融科技有限责任公司 Method, device, equipment and storage medium for acquiring abnormal log fragments
CN116755847A (en) * 2023-08-17 2023-09-15 北京遥感设备研究所 Log pre-analysis and transaction management method for relieving lock conflict
CN117667497A (en) * 2024-01-31 2024-03-08 中国铁道科学研究院集团有限公司通信信号研究所 Automatic fault analysis method and system for dispatching centralized system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658441A (en) * 2022-12-13 2023-01-31 济南丽阳神州智能科技有限公司 Method, equipment and medium for monitoring abnormality of household service system based on log
CN116414610A (en) * 2023-06-12 2023-07-11 建信金融科技有限责任公司 Method, device, equipment and storage medium for acquiring abnormal log fragments
CN116414610B (en) * 2023-06-12 2024-03-29 建信金融科技有限责任公司 Method, device, equipment and storage medium for acquiring abnormal log fragments
CN116755847A (en) * 2023-08-17 2023-09-15 北京遥感设备研究所 Log pre-analysis and transaction management method for relieving lock conflict
CN116755847B (en) * 2023-08-17 2023-11-14 北京遥感设备研究所 Log pre-analysis and transaction management method for relieving lock conflict
CN117667497A (en) * 2024-01-31 2024-03-08 中国铁道科学研究院集团有限公司通信信号研究所 Automatic fault analysis method and system for dispatching centralized system
CN117667497B (en) * 2024-01-31 2024-04-16 中国铁道科学研究院集团有限公司通信信号研究所 Automatic fault analysis method and system for dispatching centralized system

Similar Documents

Publication Publication Date Title
US20220405592A1 (en) Multi-feature log anomaly detection method and system based on log full semantics
CN115269314A (en) Transaction abnormity detection method based on log
CN108038049B (en) Real-time log control system and control method, cloud computing system and server
CN111930903B (en) System anomaly detection method and system based on deep log sequence analysis
CN110958136A (en) Deep learning-based log analysis early warning method
CN113282461B (en) Alarm identification method and device for transmission network
CN109308411B (en) Method and system for hierarchically detecting software behavior defects based on artificial intelligence decision tree
CN103761173A (en) Log based computer system fault diagnosis method and device
Alinezhad et al. Early classification of industrial alarm floods based on semisupervised learning
CN113064873B (en) Log anomaly detection method with high recall rate
CN113326244A (en) Abnormity detection method based on log event graph and incidence relation mining
CN112380274A (en) Control process-oriented anomaly detection system
CN115858794B (en) Abnormal log data identification method for network operation safety monitoring
Li et al. Improving performance of log anomaly detection with semantic and time features based on bilstm-attention
CN114490235A (en) Algorithm model for intelligently identifying quantity relation and abnormity of log data
WO2024027487A1 (en) Health degree evaluation method and apparatus based on intelligent operations and maintenance scene
CN118070229A (en) Equipment fault early warning model and method based on multi-mode data mining
CN113093695A (en) Data-driven SDN controller fault diagnosis system
CN117938430A (en) Webshell detection method based on Bert model
CN117873839A (en) Fault detection method, device, equipment and storage medium of complex computing system
Zhu et al. A Performance Fault Diagnosis Method for SaaS Software Based on GBDT Algorithm.
CN116302984A (en) Root cause analysis method and device for test task and related equipment
CN115757062A (en) Log anomaly detection method based on sentence embedding and Transformer-XL
CN113296994B (en) Fault diagnosis system and method based on domestic computing platform
CN115659189A (en) Anomaly detection method of large-scale software system based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination