CN117874680A - Operation and maintenance management system for fort machine - Google Patents
Operation and maintenance management system for fort machine Download PDFInfo
- Publication number
- CN117874680A CN117874680A CN202410055171.8A CN202410055171A CN117874680A CN 117874680 A CN117874680 A CN 117874680A CN 202410055171 A CN202410055171 A CN 202410055171A CN 117874680 A CN117874680 A CN 117874680A
- Authority
- CN
- China
- Prior art keywords
- model
- feature vector
- training
- rule
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012423 maintenance Methods 0.000 title claims abstract description 26
- 239000013598 vector Substances 0.000 claims abstract description 67
- 238000012549 training Methods 0.000 claims abstract description 60
- 238000001514 detection method Methods 0.000 claims abstract description 46
- 230000002159 abnormal effect Effects 0.000 claims abstract description 43
- 206010000117 Abnormal behaviour Diseases 0.000 claims abstract description 36
- 230000006399 behavior Effects 0.000 claims abstract description 23
- 238000010801 machine learning Methods 0.000 claims abstract description 21
- 230000007246 mechanism Effects 0.000 claims abstract description 20
- 230000005856 abnormality Effects 0.000 claims abstract description 18
- 238000013136 deep learning model Methods 0.000 claims abstract description 16
- 238000007781 pre-processing Methods 0.000 claims abstract description 13
- 238000012300 Sequence Analysis Methods 0.000 claims abstract description 10
- 238000000034 method Methods 0.000 claims description 28
- 238000007637 random forest analysis Methods 0.000 claims description 25
- 230000014509 gene expression Effects 0.000 claims description 22
- 238000007726 management method Methods 0.000 claims description 20
- 238000003066 decision tree Methods 0.000 claims description 18
- 238000012731 temporal analysis Methods 0.000 claims description 10
- 238000000700 time series analysis Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 8
- 230000010354 integration Effects 0.000 claims description 5
- 238000013480 data collection Methods 0.000 claims description 4
- 238000013135 deep learning Methods 0.000 claims description 4
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 11
- 230000008569 process Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000002547 anomalous effect Effects 0.000 description 3
- 238000005311 autocorrelation function Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/24765—Rule-based classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/259—Fusion by voting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2123/00—Data types
- G06F2123/02—Data types in the time domain, e.g. time-series data
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application relates to the field of operation and maintenance management of fort machines, and discloses an operation and maintenance management system of fort machines, which comprises the following steps: s1, collecting operation log data from a fort machine; s2, preprocessing operation log data and extracting features to obtain feature vectors; s3, training and predicting the feature vector based on the time sequence analysis model to identify time abnormality; s4, based on the rule model, carrying out rule detection on the feature vector, and identifying a behavior violating the security policy; s5, training and predicting the feature vector based on a machine learning model, and identifying abnormal behaviors; s6, training the feature vector and calculating a reconstruction error based on the deep learning model, and identifying abnormal behaviors; s7, generating a final anomaly score by adopting a weighted voting mechanism; s8, judging an abnormal state according to the set threshold value. The invention can improve the accuracy of abnormality detection by integrating a plurality of detectors of different types and utilizing the advantages of the detectors to perform cooperative work.
Description
Technical Field
The invention relates to the technical field of operation and maintenance management of fort machines, in particular to an operation and maintenance management system of fort machines.
Background
With the rapid development of information technology, the fort operation and maintenance management system is more and more widely applied to enterprises and organizations. The fort machine is used as an important safety facility for managing and monitoring access rights to key systems and data, and plays a key role in protecting network safety. However, with the continuous evolution of network attack technologies, it has been difficult for traditional security protection means to cope with new security threats and attack approaches. Therefore, how to improve the security and the anomaly detection capability of the fort operation and maintenance management system is a problem to be solved.
The fort operation and maintenance management system is a security facility for managing and monitoring access rights to critical systems and data. The method realizes audit and control of user access by centrally managing user account numbers and authorities, thereby improving the safety and reliability of the system. However, a single anomaly detection method often cannot meet the detection requirements of complex and variable security threats and attack techniques.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides the operation and maintenance management system of the fort machine, and the accuracy and the robustness of anomaly detection can be improved by integrating a plurality of detectors of different types and utilizing the advantages of the detectors to perform cooperative work.
In order to achieve the above purpose, the invention is realized by the following technical scheme: the operation and maintenance management method of the fort machine comprises the following steps:
collecting operation log data from the fort machine;
preprocessing operation log data and extracting features to obtain feature vectors;
training and predicting the feature vector based on the time sequence analysis model to identify a time anomaly;
based on the rule model, carrying out rule detection on the feature vector to identify a behavior violating the security policy;
training and predicting the feature vector based on the machine learning model to identify abnormal behavior;
training the feature vector and calculating a reconstruction error based on the deep learning model to identify abnormal behaviors;
according to the prediction result and the detection result, a weighted voting mechanism is adopted to generate a final anomaly score;
and judging the abnormal state of the feature vector according to the set threshold value.
Preferably, the time series analysis model is an ARIMA model, and the expression is:
wherein,representing the predicted value, phi, of the time series at time point t 1 ,φ 2 ,…,φ p As parameters of the autoregressive model, autoregressive coefficients, y, representing the first p time points t-1 ,y t-2 ,…,y t-p Historical values, θ, representing observed values at the first p time points 1 ,θ 2 ,…,θ q For the parameters of the moving average model, the moving average coefficients of the first q time points are represented, e t-1 ,∈ t-2 ,…,∈ t-q Representing the residual error of the observed value at the previous q time points, E t A residual term representing the point in time t.
Preferably, the rule model is based on a predefined security rule, and the expression is:
wherein R (x) represents the result of rule detection on the input feature vector x, m represents the number of rules, k i Represents the number of conditions in the ith rule, r ij (x) For the j-th condition in the i-th rule, it is indicated whether the input feature vector x satisfies the condition.
Preferably, the machine learning model is a random forest model, and the expression is:
wherein f (x) represents the result of predicting the input feature vector x, B represents the number of decision trees in the random forest, f b (x;Θ b ) For the prediction result of the b-th decision tree, Θ b The parameters representing the b-th decision tree,to average the prediction results for all decision trees.
Preferably, the deep learning model is a self-encoder model, and the expression is:
wherein,representing the difference between the input feature vector x and the output after model reconstruction, g (f (x; θ) f );θ g ) Is a reconstructed part of the deep learning model, f (x; θ f ) Representing the output of the encoder, θ g Representing the parameters of the decoder, ||x-g (f (x; θ) f );θ g )|| 2 To calculate Euclidean between the input eigenvector x and the reconstructed outputThe square of the distance.
Preferably, the weighted voting mechanism assigns weights to the models according to their performances, and the expression is:
wherein S (x) represents the final anomaly score, w, obtained by weighted voting according to the prediction results of each model i Weight of the ith model for determining its contribution to the final anomaly score, s i (x) The prediction result of the ith model on the input feature vector x is obtained.
Preferably, the threshold is set to determine an abnormal state of the feature vector, and the expression is:
here, label (x) is an abnormal state Label of the feature vector x, S (x) represents an abnormal score of the feature vector x, and τ represents a set threshold value, which is obtained according to a weighted voting mechanism.
The invention also provides an operation and maintenance management system of the fort machine, which comprises the following steps:
the data collection module is used for collecting operation logs of the fort machine;
the feature extraction module is used for preprocessing the operation log data and extracting features;
the time sequence analysis training module is used for training a time sequence analysis model;
the rule detection module is used for detecting rules;
the machine learning training module is used for training a machine learning model;
the deep learning training module is used for training a deep learning model;
the integration module is used for adopting a weighted voting mechanism;
and the abnormality judgment module is used for judging the abnormal state.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the computer program.
The invention also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described above.
The invention provides an operation and maintenance management system of a fort machine. The beneficial effects are as follows:
1. the invention can improve the accuracy of abnormality detection by integrating a plurality of detectors of different types and utilizing the advantages of the detectors to perform cooperative work. Each detector can capture different types of abnormal behavior, thereby reducing the situations of missing and false alarms. Since different types of detectors are integrated together, a greater variety of anomalies can be handled.
2. According to the invention, the integrated model is deployed into the fort operation and maintenance management system, so that the user operation and system behavior can be monitored in real time, and abnormal conditions can be found in time. By establishing a feedback loop, the performance of the integrated model can be continuously optimized, and the safety and reliability of the system can be improved.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic view of the structure of the device of the present invention;
FIG. 3 is a schematic diagram of a computer device according to the present invention.
100, a data collection module; 200. a feature extraction module; 300. a time sequence analysis training module; 400. a rule detection module; 500. a machine learning training module; 600. a deep learning training module; 700. an integration module; 800. an abnormality determination module; 40. a computer device; 41. a processor; 42. a memory; 43. a storage medium.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, an embodiment of the present invention provides a fort operation and maintenance management method, which includes steps S1 to S8, specifically as follows:
step S1, collecting operation log data from a fort machine;
in this step, the fort will collect user oplog data in the system. The operation log data records key information such as login information, operation commands, operation time and the like of a user. These oplog data may provide valuable information about the user's behavior, helping to identify potential abnormal behavior.
In this embodiment, the fort machine collects operation log data of the user in the system, including key information such as login information, operation command, operation time, operation object, and the like. Such information may help to understand the behavior and operation of the user in the system.
The login information includes a login account number, a login IP address, a login time, and the like of the user. By recording the login information, the login behavior of the user can be tracked, and abnormal login activities such as illegal login, failed login attempts and the like can be identified.
The operation command records specific operations performed by the user in the system, such as execution commands, accessing files, modifying configurations, and the like. By analyzing the operation command, potentially abnormal behavior may be detected, such as performing sensitive commands, override operations, etc.
The operation time records time stamp information of the user performing the operation. Through the time stamp information, the operation mode and habit of the user can be analyzed, and time abnormal behaviors such as operation at non-working time, frequent operation requests and the like are detected.
An operation object refers to a target object, such as a file, a database, a network device, etc., that a user performs an operation in a system. By recording the operation objects, the operation behaviors of the user on different objects can be analyzed, and abnormal operation targets, such as unauthorized operation objects, abnormal operation rights and the like, can be identified.
The purpose of the fort machine to collect these oplog data is to monitor and analyze the user's behavior, identifying potential abnormal behavior and security risks. By preprocessing operation log data, extracting features, applying time sequence analysis, rule model, machine learning model, deep learning model and other technologies, the accurate identification and timely response of abnormal behaviors can be realized, and the operation and maintenance management and safety performance of the bastion machine are improved.
Step S2, preprocessing operation log data and extracting features to obtain feature vectors;
in this step, the oplog data collected from the fort is preprocessed and feature extracted. Preprocessing includes data cleaning, noise removal, outlier processing, and the like. Feature extraction is the conversion of oplog data into feature vectors that can be used for model training and prediction. Common features include user ID, operation command, operation time, etc. Preprocessing and feature extraction may be performed using techniques such as regular expressions, time series analysis, and the like.
In this embodiment, in the process of preprocessing the operation log data and extracting the features, the following feature vectors are considered to be extracted:
1. user characteristics:
user ID: the unique identification of the user is used as a feature to distinguish the behavior of different users.
User roles: the role information of the user in the system, such as an administrator, a general user, etc., is recorded.
2. Login characteristics:
number of logins: and counting the login times of the user and reflecting the activity degree of the user.
Number of login success/failure: and recording the success and failure times of the login operation of the user, and detecting abnormal login behaviors.
Logging in an IP address: and recording the IP address of the user when logging in, and detecting the login in a different place or abnormal IP address.
3. The operation characteristics are as follows:
operation command: specific operation commands, such as file operations, system commands, etc., executed by the user are recorded.
The operation object is as follows: it is recorded which objects the user has performed an operation on, such as files, databases, network devices, etc.
Operating time: the time stamp of the user operation is recorded and used for analyzing the time mode and abnormal time behavior of the operation.
4. Abnormal characteristics:
non-on time operation: by judging the operation time, the operation performed in the non-working time is identified.
Frequent operations: the frequency of the user performing the same or similar operations in a short time is counted for detecting abnormal frequent operation behaviors.
Abnormal command: a set of sensitive commands or risky operation commands is defined, and it is detected whether the user has executed these commands.
These feature vectors may be obtained by parsing and processing the oplog data. The preprocessing step may include data cleaning, noise removal, missing value processing, and the like. Feature extraction may extract the corresponding features by using regular expressions, keyword matching, or other custom rules.
The extracted feature vectors can be used as inputs to a training model for training and predicting abnormal behavior. Different models can be modeled and analyzed for different features to improve the accuracy and robustness of anomaly detection.
Step S3, training and predicting the feature vector based on a time sequence analysis model to identify time abnormality;
in this step, the feature vectors obtained by the preprocessing and feature extraction are trained and predicted using a time series analysis model to identify temporal anomalies. Common time series analysis models include ARIMA model, exponential smoothing, and the like. Through analysis of the historical operation log data, the time series model can capture common time anomaly modes such as seasonal changes, trend changes and the like. Identifying a time anomaly helps to discover time anomaly behavior of user operations, such as login at non-working time or frequent operation requests.
In this embodiment, the time series analysis model is an ARIMA model, and the ARIMA (Autoregressive Integrated Moving Average) model is a common time series analysis method for predicting and modeling time series data. Parameters of the ARIMA model include an autoregressive order p, a differential order d, and a moving average order q.
The expression is as follows:
an autoregressive portion (AR) represents a linear relationship between the current observation and the first p observations, where φ 1 ,φ 2 ,…,φ p Is an autoregressive coefficient, y t-1 ,y t-2 ,…,y t-p Is a historical observation of the first p time points.
The moving average portion (MA) represents a linear relationship between the current observation and the first q residuals, where θ 1 ,θ 2 ,…,θ q Is the moving average coefficient, E t-1 ,∈ t-2 ,…,∈ t-q Is the residual of the first q time points.
The difference section (I, representing integration) is used to process the non-stationary time series, representing a difference operation on the original time series to make it stationary. The difference order d represents the number of times the difference operation is performed.
Finally, the ARIMA model predicts by fitting the autoregressive, differential and moving average relationships in the historical data to obtain the predicted value of the time sequence at the future time point
Specifically, in the present embodiment, the ARIMA model is used to model and predict a time series in operation log data. The method comprises the following specific steps:
1. data preparation: first, a time series in the operation log data needs to be extracted as a target variable for modeling. The time point at which the operation time is used as the time series, and the corresponding number of operations, the operation amount, and the like may be selected as the observation value.
2. Parameter selection: parameters of the ARIMA model include autoregressive order (p), differential order (d), and moving average order (q). In selecting the parameters, some common methods such as analysis of autocorrelation function (ACF) and partial autocorrelation function (PACF) and observation of characteristics such as stationarity and seasonality of time series can be used.
3. Model training: and according to the selected parameters, performing model training by using historical operation log data. During the training process, the model fits the autoregressive, differential and moving average relationships in the historical data to obtain the optimal model parameters.
4. Model prediction: a future time series is predicted using a trained ARIMA model. By inputting the observed values of the historical data, the model can generate corresponding predicted values and give corresponding confidence intervals.
5. Abnormality detection: according to the difference between the predicted value and the actual observed value, whether abnormal behavior exists or not can be judged. If the difference between the predicted value and the actual observed value exceeds a certain threshold value, it can be regarded as abnormal behavior.
Step S4, based on a rule model, carrying out rule detection on the feature vector to identify a behavior violating a security policy;
in this step, the feature vectors are subjected to rule detection using a rule model to identify behaviors that violate the security policy. The rule model logically judges the feature vector based on a predefined rule set. Rules may include access control rules, security policy rules, and the like. For example, a rule may define that if a user fails to log in consecutively more than a certain number of times in a short time, it is determined to be abnormal. The rule model has the advantages of interpretability and instantaneity, and can quickly identify the behavior violating the security policy.
In this embodiment, the rule model is based on a predefined security rule, and the expression is:
wherein R (x) represents the result of rule detection on the input feature vector x, m represents the number of rules, k i Represents the number of conditions in the ith rule, r ij (x) For the j-th condition in the i-th rule, it is indicated whether the input feature vector x satisfies the condition.
Each rule in the rule model is composed of a plurality of conditions, which may be logical expressions, comparison operations, or other predefined judgment conditions. The rule model judges each rule of the input feature vector one by one, and carries out logical AND operation on the results of all the rules to finally obtain the result of overall rule detection.
Rule models are typically used to define and detect predefined security rules in order to quickly identify and respond to specific security events or abnormal behavior. Compared to other machine learning models, rule models have the advantage of strong interpretability, easy maintenance and adjustment, but may be limited by the coverage and expressive power of the rule.
Specifically, in the present embodiment, the rule model is used to detect operation log data based on predefined security rules to identify potential abnormal behavior.
The rule model may define and detect abnormal behavior according to specific security rules. Each rule is composed of a plurality of conditions, which may include login information, operation type, operation frequency, operation object, and the like. The rule model judges whether abnormal behaviors exist or not by judging each rule in the operation log data one by one and carrying out logical AND operation on results of all the rules.
The present embodiment defines the following rules to detect abnormal behavior:
rule 1: if the same user fails to login by multiple attempts within a short time (the number of login failures is greater than a threshold), then abnormal login behavior is considered to exist.
Rule 2: if a user frequently performs sensitive operations (e.g., file deletion, rights modification, etc.) in a short period of time, it is considered that there is abnormal operation behavior.
Rule 3: if a user logs into the system and performs an operation during a non-operating period, it is considered that there is an abnormal login behavior.
Rule 4: if a user attempts to access an unauthorized file or directory multiple times within a short period of time, then an abnormal access behavior is deemed to exist.
By combining and configuring a plurality of rules, the rule model can perform comprehensive analysis and anomaly detection on the operation log data. When the operation log data satisfies one or more rules in the rule model, the rule model outputs corresponding abnormal behavior results.
The rule model has the advantages of strong interpretability and capability of intuitively knowing the judging condition and the abnormal behavior type of each rule.
Step S5, training and predicting the feature vector based on the machine learning model to identify abnormal behaviors;
in this step, the feature vectors are trained and predicted using a machine learning model to identify abnormal behavior. The machine learning model can automatically capture the characteristics of abnormal behavior by learning patterns and rules in the historical operation log data. Common machine learning models include random forests, support vector machines, neural networks, and the like. By training the model and predicting the new feature vector, it can be determined whether it belongs to abnormal behavior. The advantage of the machine learning model is that it is capable of handling complex nonlinear relationships and large-scale data sets.
In this embodiment, the machine learning model is a random forest model, and the expression is:
wherein f (x) represents the result of predicting the input feature vector x, B represents the number of decision trees in the random forest, f b (x;Θ b ) For the prediction result of the b-th decision tree, Θ b The parameters representing the b-th decision tree,to average the prediction results for all decision trees. The random forest model predicts each decision tree in the random forest and averages the predicted results of all the decision trees to obtain a final predicted result.
Anomaly detection can be performed on the oplog data using a random forest model. When training the random forest model, the marked operation log data can be used as a training set, the characteristics of the operation log are used as input, and whether the operation log is abnormal or not is used as output. By training a plurality of decision trees and combining them into a random forest model, the accuracy and robustness of anomaly detection can be improved.
Specifically, in this embodiment, the random forest model is applied to anomaly detection of operation log data.
The method comprises the following steps:
1. feature selection: before the random forest model is applied, suitable features need to be selected for training and prediction. Some characteristics related to abnormal behavior may be selected according to characteristics of the operation log data, such as a user characteristic (e.g., user ID), a login characteristic (e.g., login time, login IP), an operation characteristic (e.g., operation type, operation object), and the like. These features will be provided as input to the random forest model.
2. Data preparation: in order to train and evaluate the random forest model, a labeled oplog dataset needs to be prepared. Labels of abnormal behavior can be obtained by manual labeling or other abnormality detection methods. The operation log data set is divided into a training set and a testing set for training and evaluating the random forest model.
3. Model training: and training a random forest model by using the operation log data of the training set and the corresponding labels. The random forest model generates a plurality of decision trees, each decision tree being trained based on a different random sample and feature subset.
4. Abnormality detection: and performing anomaly detection on the operation log data of the test set by using the trained random forest model. For each operation log data, it is provided as an input to a random forest model, and then it is judged whether or not it is an abnormal behavior based on the prediction result of the random forest model.
5. Outputting a result: the operation log data may be marked as normal or abnormal according to the prediction result of the random forest model. A threshold may be set as needed, and operation log data with a prediction probability higher than the threshold may be determined to be abnormal.
The random forest model has the advantages of being capable of processing complex characteristic relationships, having good robustness and being capable of providing characteristic importance assessment. By combining the prediction results of a plurality of decision trees, the random forest model can improve the accuracy of anomaly detection and has certain tolerance to some noise and anomaly values.
Step S6, training the feature vector and calculating a reconstruction error based on the deep learning model to identify abnormal behaviors;
in this step, the feature vectors are trained using a deep learning model and reconstruction errors are computed to identify abnormal behavior. Deep learning models, such as self-encoders, can capture potential patterns of abnormal behavior by learning a low-dimensional representation of the input features. During training, the model learns the representation of the features by minimizing the differences between the input features and the reconstructed output. The reconstruction error can be used as an index of the degree of abnormality, and a larger reconstruction error indicates that the input characteristic is not matched with the mode learned by the model, and possibly represents abnormal behavior.
In this embodiment, the deep learning model is a self-encoder model, and the expression is:
wherein,representing the difference between the input feature vector x and the output after model reconstruction, g (f (x; θ) f );θ g ) Is of depthReconstruction part of model, f (x; θ) f ) Representing the output of the encoder, θ g Representing the parameters of the decoder, ||x-g (f (x; θ) f );θ g )|| 2 To calculate the square of the euclidean distance between the input feature vector x and the reconstructed output.
The goal of the self-encoder model is to reconstruct the input data by compressing it into a low-dimensional encoded representation, which is then decoded back into the original data space. By minimizing reconstruction errors, the self-encoder can learn the potential representation of the input data. In anomaly detection, the self-encoder may be used to detect anomaly patterns that are different from the training data.
When training the self-encoder model, normal operation log data can be used as a training set, characteristics of the operation log are used as input, the reconstructed output is compared with the original input, and the model parameter theta is optimized by minimizing the reconstruction error f And theta g 。
When abnormality detection is performed, new operation log data may be reconstructed using the trained self-encoder model, and then a reconstruction error is calculated. If the reconstruction error exceeds a predefined threshold, the oplog data may be marked as anomalous.
Specifically, in the present embodiment, the self-encoder model is applied to anomaly detection of operation log data. The method comprises the following steps:
1. data preparation: with the oplog data as input data, a portion of the normal oplog data may be selected as a training set for training the self-encoder model. The remaining oplog data may be used as a test set for evaluating the anomaly detection performance of the model.
2. Model training: the self-encoder model is trained using the oplog data of the training set. The self-encoder model consists of two parts, encoder and decoder. The encoder maps the input oplog data to a low-dimensional representation, and the decoder maps the low-dimensional representation back to the original data space. By minimizing the reconstruction error, the parameters of the self-encoder model can be optimized.
3. Abnormality detection: and reconstructing operation log data of the test set by using the trained self-encoder model, and calculating a reconstruction error. The reconstruction error may be measured by squaring the euclidean distance between the input data and the reconstruction output. If the reconstruction error exceeds a predefined threshold, the oplog data may be marked as anomalous.
Outputting a result: the oplog data may be marked as normal or abnormal based on a comparison of the reconstruction error and a threshold. The data with larger reconstruction errors may be an indication of abnormal behavior.
The advantage of the self-encoder model is that it is possible to learn potential representations of the data and to detect abnormal patterns that differ from the training data. By minimizing the reconstruction error, the self-encoder model can learn the pattern of normal operation log data and can generate higher reconstruction errors when abnormal behavior is encountered.
Step S7, generating a final anomaly score by adopting a weighted voting mechanism according to the prediction result and the detection result;
in this step, a weighted voting mechanism is employed to generate a final anomaly score based on the prediction results and detection results of the time series analysis model, the rule model, the machine learning model, and the deep learning model. The weighted voting mechanism may assign weights to each model based on the prediction accuracy and reliability of the respective model. Then, the prediction results of the models are multiplied by the corresponding weights, and the final anomaly scores are obtained through summation. The weighted voting mechanism has the advantages that the prediction results of a plurality of models can be synthesized, and the accuracy and the robustness of anomaly detection are improved.
In this embodiment, the weighted voting mechanism assigns weights to the models according to their performance, and the expression is:
wherein S (x) represents the final anomaly score, w, obtained by weighted voting according to the prediction results of each model i For the weight of the ith model for determiningIts contribution to the final anomaly score, s i (x) The prediction result of the ith model on the input feature vector x is obtained.
In heterogeneous ensemble learning, anomaly detection is performed by selecting a plurality of models, and weighted voting is performed based on their prediction results. The weight of each model may be determined based on its performance on the training set, and generally better performing models will be given higher weights.
The process of weighted voting is to multiply the predicted outcome of each model by its corresponding weight and then add them to get the final anomaly score. The prediction results of the models can be synthesized through weighted voting, and the accuracy and the robustness of anomaly detection are improved.
In this embodiment, a weighted voting mechanism is used to integrate the prediction results of the multiple models to obtain the final anomaly score. The method comprises the following steps:
1. weight determination: each model is assigned a weight that measures its contribution to the final anomaly score. The weights may be determined based on the performance of the model on the training set, such as accuracy, recall, F1 score, and the like. Better performing models may be given higher weights in order to affect the final anomaly score to a greater extent.
2. Fusion of prediction results: for a given input feature vector x, each model produces a prediction result s i (x) Representing a prediction of the input by the model. Then, the prediction result of each model is multiplied by the corresponding weight w i And adds them to obtain the anomaly score S (x) of . This process can be represented by an expression of a weighted voting mechanism:
3. abnormality determination based on the final abnormality score S (x), a threshold value may be set to determine whether or not the input feature vector x is abnormal. If S (x) exceeds the threshold, marking it as abnormal; otherwise, it is marked as normal.
By using a weighted voting mechanism, the prediction results of a plurality of models can be synthesized, thereby improving the accuracy and the robustness of anomaly detection.
Step S8, judging the abnormal state of the feature vector according to the set threshold value;
in this step, the final anomaly score is determined based on a set threshold value to determine the anomaly state of the feature vector. If the final anomaly score exceeds a set threshold, determining that the anomaly is detected; otherwise, the judgment is normal. By setting a proper threshold, the detection accuracy and false alarm rate of the abnormality can be flexibly controlled according to actual requirements.
In this embodiment, the threshold is set to determine the abnormal state of the feature vector, and the expression is:
here, label (x) is an abnormal state Label of the feature vector x, S (x) represents an abnormal score of the feature vector x, and τ represents a set threshold value, which is obtained according to a weighted voting mechanism.
1. And (3) calculating the anomaly score, namely firstly, synthesizing the prediction results of the multiple models by using a weighted voting mechanism to obtain the anomaly score S (x) of the feature vector. This anomaly score indicates the degree to which the feature vector x is judged to be anomalous.
2. And setting a threshold value tau according to actual requirements, wherein the threshold value tau is used for defining abnormal and normal demarcation points. If the anomaly score S (x) of the feature vector exceeds a set threshold τ, then marking it as anomaly; otherwise, it is marked as normal.
3. Threshold selection the selection of the threshold is a critical step and needs to be adjusted according to the specific situation. If the detection requirements for anomalies are more stringent, a lower threshold may be selected; if the tolerance for anomalies is high, a high threshold may be selected. The selection of the threshold may be based on experience or may be determined by trying different thresholds and evaluating their performance.
By setting the threshold value, the feature vector can be classified according to the abnormality score, thereby judging the abnormal state thereof.
In general, the present invention can improve the accuracy of anomaly detection by integrating a plurality of different types of detectors, and utilizing their advantages to perform cooperative work. Each detector can capture different types of abnormal behavior, thereby reducing the situations of missing and false alarms. Since different types of detectors are integrated together, a greater variety of anomalies can be handled. For example, some detectors may be good at detecting a particular type of attack, while other detectors may be better at detecting abnormal user behavior. By means of the diversified detector combinations, the perceptibility of the system to various abnormal conditions can be enhanced.
By deploying the integrated model into the fort machine operation and maintenance management system, user operation and system behavior can be monitored in real time, and abnormal conditions can be found in time. By establishing a feedback loop, the performance of the integrated model can be continuously optimized, and the safety and reliability of the system can be improved.
The fort machine operation and maintenance management system described below and the fort machine operation and maintenance management method described above can be referred to correspondingly.
Referring to fig. 2, the present invention further provides a fort operation and maintenance management system, including:
a data collection module 100 for collecting the operation log of the fort machine;
the feature extraction module 200 is used for preprocessing the operation log data and extracting features;
a time series analysis training module 300 for training a time series analysis model;
a rule detection module 400, configured to perform rule detection;
a machine learning training module 500 for training a machine learning model;
the deep learning training module 600 is used for training a deep learning model;
an integration module 700 for employing a weighted voting mechanism;
the abnormality determination module 800 is configured to determine an abnormal state.
The apparatus of this embodiment may be used to execute the above method embodiments, and the principle and technical effects are similar, and are not repeated herein.
Referring to fig. 3, the present invention further provides a computer device 40, including: a processor 41 and a memory 42, the memory 42 storing a computer program executable by the processor, which when executed by the processor performs the method as described above.
The present invention also provides a storage medium 43, on which storage medium 43 a computer program is stored which, when run by a processor 41, performs a method as above.
The storage medium 43 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as a static random access Memory (Static Random Access Memory, SRAM), an electrically erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), an erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. The operation and maintenance management method of the fort machine is characterized by comprising the following steps of:
collecting operation log data from the fort machine;
preprocessing operation log data and extracting features to obtain feature vectors;
training and predicting the feature vector based on the time sequence analysis model to identify a time anomaly;
based on the rule model, carrying out rule detection on the feature vector to identify a behavior violating the security policy;
training and predicting the feature vector based on the machine learning model to identify abnormal behavior;
training the feature vector and calculating a reconstruction error based on the deep learning model to identify abnormal behaviors;
according to the prediction result and the detection result, a weighted voting mechanism is adopted to generate a final anomaly score;
and judging the abnormal state of the feature vector according to the set threshold value.
2. The method for managing operation and maintenance of a fort machine according to claim 1, wherein the time series analysis model is an ARIMA model, and the expression is:
wherein,representing the predicted value, phi, of the time series at time point t 1 ,φ 2 ,...,φ p As parameters of the autoregressive model, autoregressive coefficients, y, representing the first p time points t-1 ,y t-2 ,...,y t-p Historical values, θ, representing observed values at the first p time points 1 ,θ 2 ,...,θ q For parameters of the moving average model, the moving average coefficient, ε, representing the first q time points t-1 ,ε t-2 ,...,ε t-q Representing the residual error of the observed value at the previous q time points, E t A residual term representing the point in time t.
3. The method of claim 1, wherein the rule model is based on predefined security rules, and the expression is:
wherein R (x) represents the result of rule detection on the input feature vector x, m represents the number of rules, k i Represents the number of conditions in the ith rule, r ij (x) For the j-th condition in the i-th rule, it is indicated whether the input feature vector x satisfies the condition.
4. The method of claim 1, wherein the machine learning model is a random forest model, and the expression is:
wherein f (x) represents the result of predicting the input feature vector x, B represents the number of decision trees in the random forest, f b (x;Θ b ) For the prediction result of the b-th decision tree, Θ b Representing the parameters of the b-th decision tree.To average the prediction results for all decision trees.
5. The method of claim 1, wherein the deep learning model is a self-encoder model, and the expression is:
wherein,representing the difference between the input feature vector x and the output after model reconstruction, g (f (x; θ) f );θ g ) Is a reconstructed part of the deep learning model, f (x; θ f ) Representing the output of the encoder, θ g Parameters representing a decoder,||x-g(f(x;θ f );θ g )|| 2 To calculate the square of the euclidean distance between the input feature vector x and the reconstructed output.
6. The method of claim 1, wherein the weighted voting mechanism assigns weights to the models according to their performance, and the expression is:
wherein S (x) represents the final anomaly score, w, obtained by weighted voting according to the prediction results of each model i Weight of the ith model for determining its contribution to the final anomaly score, s i (x) The prediction result of the ith model on the input feature vector x is obtained.
7. The method for managing operation and maintenance of a fort machine according to claim 1, wherein the threshold is set to determine an abnormal state of a feature vector, and the expression is:
here, label (x) is an abnormal state Label of the feature vector x, S (x) represents an abnormal score of the feature vector x, and τ represents a set threshold value, which is obtained according to a weighted voting mechanism.
8. A bastion machine operation and maintenance management system, based on the bastion machine operation and maintenance management method of any one of claims 1 to 7, comprising:
the data collection module is used for collecting operation logs of the fort machine;
the feature extraction module is used for preprocessing the operation log data and extracting features;
the time sequence analysis training module is used for training a time sequence analysis model;
the rule detection module is used for detecting rules;
the machine learning training module is used for training a machine learning model;
the deep learning training module is used for training a deep learning model;
the integration module is used for adopting a weighted voting mechanism;
and the abnormality judgment module is used for judging the abnormal state.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-7 when executing the computer program.
10. A storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410055171.8A CN117874680A (en) | 2024-01-15 | 2024-01-15 | Operation and maintenance management system for fort machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410055171.8A CN117874680A (en) | 2024-01-15 | 2024-01-15 | Operation and maintenance management system for fort machine |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117874680A true CN117874680A (en) | 2024-04-12 |
Family
ID=90584258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410055171.8A Pending CN117874680A (en) | 2024-01-15 | 2024-01-15 | Operation and maintenance management system for fort machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117874680A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118427815A (en) * | 2024-07-01 | 2024-08-02 | 苏州市吴江区公安局 | Police data anomaly detection method and system based on deep learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112527604A (en) * | 2020-12-16 | 2021-03-19 | 广东昭阳信息技术有限公司 | Deep learning-based operation and maintenance detection method and system, electronic equipment and medium |
CN113282474A (en) * | 2021-05-31 | 2021-08-20 | 长沙市到家悠享家政服务有限公司 | User behavior monitoring method, system, equipment and medium based on bastion machine |
US20210377282A1 (en) * | 2020-05-29 | 2021-12-02 | Cylance Inc. | Detecting Malware with Deep Generative Models |
CN115858606A (en) * | 2021-09-23 | 2023-03-28 | 中移动信息技术有限公司 | Method, device and equipment for detecting abnormity of time series data and storage medium |
CN116049765A (en) * | 2023-02-01 | 2023-05-02 | 中国联合网络通信集团有限公司 | Data analysis processing method, device and equipment |
-
2024
- 2024-01-15 CN CN202410055171.8A patent/CN117874680A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210377282A1 (en) * | 2020-05-29 | 2021-12-02 | Cylance Inc. | Detecting Malware with Deep Generative Models |
CN112527604A (en) * | 2020-12-16 | 2021-03-19 | 广东昭阳信息技术有限公司 | Deep learning-based operation and maintenance detection method and system, electronic equipment and medium |
CN113282474A (en) * | 2021-05-31 | 2021-08-20 | 长沙市到家悠享家政服务有限公司 | User behavior monitoring method, system, equipment and medium based on bastion machine |
CN115858606A (en) * | 2021-09-23 | 2023-03-28 | 中移动信息技术有限公司 | Method, device and equipment for detecting abnormity of time series data and storage medium |
CN116049765A (en) * | 2023-02-01 | 2023-05-02 | 中国联合网络通信集团有限公司 | Data analysis processing method, device and equipment |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118427815A (en) * | 2024-07-01 | 2024-08-02 | 苏州市吴江区公安局 | Police data anomaly detection method and system based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112800116B (en) | Method and device for detecting abnormity of service data | |
CN112804196A (en) | Log data processing method and device | |
CN117220978B (en) | Quantitative evaluation system and evaluation method for network security operation model | |
CN117094184B (en) | Modeling method, system and medium of risk prediction model based on intranet platform | |
CN117421761B (en) | Database data information security monitoring method | |
Dou et al. | Pc 2 a: predicting collective contextual anomalies via lstm with deep generative model | |
CN117874680A (en) | Operation and maintenance management system for fort machine | |
CN118350008A (en) | API interface data risk monitoring system based on machine learning | |
CN105825130A (en) | Information security early-warning method and device | |
CN117972596B (en) | Risk prediction method based on operation log | |
CN117436073B (en) | Security log alarming method, medium and equipment based on intelligent label | |
CN117370548A (en) | User behavior risk identification method, device, electronic equipment and medium | |
CN114397842B (en) | Intelligent inspection reinforcement method for safety of power monitoring network | |
CN117633779A (en) | Rapid deployment method and system for element learning detection model of network threat in power network | |
KR102111136B1 (en) | Method, device and program for generating respond directions against attack event | |
CN114039837A (en) | Alarm data processing method, device, system, equipment and storage medium | |
CN114547640A (en) | Method and device for judging sensitive operation behaviors, electronic equipment and storage medium | |
CN118153049B (en) | Intelligent detection method and system for code security | |
CN118350011B (en) | DNS query text data analysis method and system based on machine learning | |
CN117540372B (en) | Database intrusion detection and response system for intelligent learning | |
CN117725596B (en) | Network security vulnerability detection method and device, electronic equipment and storage medium | |
CN118965359A (en) | Electric power marketing business data security identification monitoring method | |
CN118646569A (en) | Network security early warning method and system | |
Liu et al. | Artificial Intelligence-based Real-Time Electricity Metering Data Analysis and its Application to Anti-Theft Actions. | |
CN118570912A (en) | Intelligent lock abnormal information monitoring method and device, storage medium and intelligent lock |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |