CN117874680A

CN117874680A - Operation and maintenance management system for fort machine

Info

Publication number: CN117874680A
Application number: CN202410055171.8A
Authority: CN
Inventors: 陈琴; 杜江涛; 陈光宇; 宋洋
Original assignee: Beijing Huanyu Boya Technology Co ltd
Current assignee: Beijing Huanyu Boya Technology Co ltd
Priority date: 2024-01-15
Filing date: 2024-01-15
Publication date: 2024-04-12

Abstract

The application relates to the field of operation and maintenance management of fort machines, and discloses an operation and maintenance management system of fort machines, which comprises the following steps: s1, collecting operation log data from a fort machine; s2, preprocessing operation log data and extracting features to obtain feature vectors; s3, training and predicting the feature vector based on the time sequence analysis model to identify time abnormality; s4, based on the rule model, carrying out rule detection on the feature vector, and identifying a behavior violating the security policy; s5, training and predicting the feature vector based on a machine learning model, and identifying abnormal behaviors; s6, training the feature vector and calculating a reconstruction error based on the deep learning model, and identifying abnormal behaviors; s7, generating a final anomaly score by adopting a weighted voting mechanism; s8, judging an abnormal state according to the set threshold value. The invention can improve the accuracy of abnormality detection by integrating a plurality of detectors of different types and utilizing the advantages of the detectors to perform cooperative work.

Description

Operation and maintenance management system for fort machine

Technical Field

The invention relates to the technical field of operation and maintenance management of fort machines, in particular to an operation and maintenance management system of fort machines.

Background

With the rapid development of information technology, the fort operation and maintenance management system is more and more widely applied to enterprises and organizations. The fort machine is used as an important safety facility for managing and monitoring access rights to key systems and data, and plays a key role in protecting network safety. However, with the continuous evolution of network attack technologies, it has been difficult for traditional security protection means to cope with new security threats and attack approaches. Therefore, how to improve the security and the anomaly detection capability of the fort operation and maintenance management system is a problem to be solved.

The fort operation and maintenance management system is a security facility for managing and monitoring access rights to critical systems and data. The method realizes audit and control of user access by centrally managing user account numbers and authorities, thereby improving the safety and reliability of the system. However, a single anomaly detection method often cannot meet the detection requirements of complex and variable security threats and attack techniques.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides the operation and maintenance management system of the fort machine, and the accuracy and the robustness of anomaly detection can be improved by integrating a plurality of detectors of different types and utilizing the advantages of the detectors to perform cooperative work.

In order to achieve the above purpose, the invention is realized by the following technical scheme: the operation and maintenance management method of the fort machine comprises the following steps:

collecting operation log data from the fort machine;

preprocessing operation log data and extracting features to obtain feature vectors;

training and predicting the feature vector based on the time sequence analysis model to identify a time anomaly;

based on the rule model, carrying out rule detection on the feature vector to identify a behavior violating the security policy;

training and predicting the feature vector based on the machine learning model to identify abnormal behavior;

training the feature vector and calculating a reconstruction error based on the deep learning model to identify abnormal behaviors;

according to the prediction result and the detection result, a weighted voting mechanism is adopted to generate a final anomaly score;

and judging the abnormal state of the feature vector according to the set threshold value.

Preferably, the time series analysis model is an ARIMA model, and the expression is:

wherein,representing the predicted value, phi, of the time series at time point t ₁ ,φ ₂ ,…,φ _p As parameters of the autoregressive model, autoregressive coefficients, y, representing the first p time points _t-1 ,y _t-2 ,…,y _t-p Historical values, θ, representing observed values at the first p time points ₁ ,θ ₂ ,…,θ _q For the parameters of the moving average model, the moving average coefficients of the first q time points are represented, e _t-1 ,∈ _t-2 ,…,∈ _t-q Representing the residual error of the observed value at the previous q time points, E _t A residual term representing the point in time t.

Preferably, the rule model is based on a predefined security rule, and the expression is:

wherein R (x) represents the result of rule detection on the input feature vector x, m represents the number of rules, k _i Represents the number of conditions in the ith rule, r _ij (x) For the j-th condition in the i-th rule, it is indicated whether the input feature vector x satisfies the condition.

Preferably, the machine learning model is a random forest model, and the expression is:

wherein f (x) represents the result of predicting the input feature vector x, B represents the number of decision trees in the random forest, f _b (x；Θ _b ) For the prediction result of the b-th decision tree, Θ _b The parameters representing the b-th decision tree,to average the prediction results for all decision trees.

Preferably, the deep learning model is a self-encoder model, and the expression is:

wherein,representing the difference between the input feature vector x and the output after model reconstruction, g (f (x; θ) _f )；θ _g ) Is a reconstructed part of the deep learning model, f (x; θ _f ) Representing the output of the encoder, θ _g Representing the parameters of the decoder, ||x-g (f (x; θ) _f )；θ _g )|| ² To calculate Euclidean between the input eigenvector x and the reconstructed outputThe square of the distance.

Preferably, the weighted voting mechanism assigns weights to the models according to their performances, and the expression is:

wherein S (x) represents the final anomaly score, w, obtained by weighted voting according to the prediction results of each model _i Weight of the ith model for determining its contribution to the final anomaly score, s _i (x) The prediction result of the ith model on the input feature vector x is obtained.

Preferably, the threshold is set to determine an abnormal state of the feature vector, and the expression is:

here, label (x) is an abnormal state Label of the feature vector x, S (x) represents an abnormal score of the feature vector x, and τ represents a set threshold value, which is obtained according to a weighted voting mechanism.

The invention also provides an operation and maintenance management system of the fort machine, which comprises the following steps:

the data collection module is used for collecting operation logs of the fort machine;

the feature extraction module is used for preprocessing the operation log data and extracting features;

the time sequence analysis training module is used for training a time sequence analysis model;

the rule detection module is used for detecting rules;

the machine learning training module is used for training a machine learning model;

the deep learning training module is used for training a deep learning model;

the integration module is used for adopting a weighted voting mechanism;

and the abnormality judgment module is used for judging the abnormal state.

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the computer program.

The invention also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described above.

The invention provides an operation and maintenance management system of a fort machine. The beneficial effects are as follows:

1. the invention can improve the accuracy of abnormality detection by integrating a plurality of detectors of different types and utilizing the advantages of the detectors to perform cooperative work. Each detector can capture different types of abnormal behavior, thereby reducing the situations of missing and false alarms. Since different types of detectors are integrated together, a greater variety of anomalies can be handled.

2. According to the invention, the integrated model is deployed into the fort operation and maintenance management system, so that the user operation and system behavior can be monitored in real time, and abnormal conditions can be found in time. By establishing a feedback loop, the performance of the integrated model can be continuously optimized, and the safety and reliability of the system can be improved.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a schematic view of the structure of the device of the present invention;

FIG. 3 is a schematic diagram of a computer device according to the present invention.

100, a data collection module; 200. a feature extraction module; 300. a time sequence analysis training module; 400. a rule detection module; 500. a machine learning training module; 600. a deep learning training module; 700. an integration module; 800. an abnormality determination module; 40. a computer device; 41. a processor; 42. a memory; 43. a storage medium.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, an embodiment of the present invention provides a fort operation and maintenance management method, which includes steps S1 to S8, specifically as follows:

step S1, collecting operation log data from a fort machine;

in this step, the fort will collect user oplog data in the system. The operation log data records key information such as login information, operation commands, operation time and the like of a user. These oplog data may provide valuable information about the user's behavior, helping to identify potential abnormal behavior.

In this embodiment, the fort machine collects operation log data of the user in the system, including key information such as login information, operation command, operation time, operation object, and the like. Such information may help to understand the behavior and operation of the user in the system.

The login information includes a login account number, a login IP address, a login time, and the like of the user. By recording the login information, the login behavior of the user can be tracked, and abnormal login activities such as illegal login, failed login attempts and the like can be identified.

The operation command records specific operations performed by the user in the system, such as execution commands, accessing files, modifying configurations, and the like. By analyzing the operation command, potentially abnormal behavior may be detected, such as performing sensitive commands, override operations, etc.

The operation time records time stamp information of the user performing the operation. Through the time stamp information, the operation mode and habit of the user can be analyzed, and time abnormal behaviors such as operation at non-working time, frequent operation requests and the like are detected.

An operation object refers to a target object, such as a file, a database, a network device, etc., that a user performs an operation in a system. By recording the operation objects, the operation behaviors of the user on different objects can be analyzed, and abnormal operation targets, such as unauthorized operation objects, abnormal operation rights and the like, can be identified.

The purpose of the fort machine to collect these oplog data is to monitor and analyze the user's behavior, identifying potential abnormal behavior and security risks. By preprocessing operation log data, extracting features, applying time sequence analysis, rule model, machine learning model, deep learning model and other technologies, the accurate identification and timely response of abnormal behaviors can be realized, and the operation and maintenance management and safety performance of the bastion machine are improved.

Step S2, preprocessing operation log data and extracting features to obtain feature vectors;

in this step, the oplog data collected from the fort is preprocessed and feature extracted. Preprocessing includes data cleaning, noise removal, outlier processing, and the like. Feature extraction is the conversion of oplog data into feature vectors that can be used for model training and prediction. Common features include user ID, operation command, operation time, etc. Preprocessing and feature extraction may be performed using techniques such as regular expressions, time series analysis, and the like.

In this embodiment, in the process of preprocessing the operation log data and extracting the features, the following feature vectors are considered to be extracted:

1. user characteristics:

user ID: the unique identification of the user is used as a feature to distinguish the behavior of different users.

User roles: the role information of the user in the system, such as an administrator, a general user, etc., is recorded.

2. Login characteristics:

number of logins: and counting the login times of the user and reflecting the activity degree of the user.

Number of login success/failure: and recording the success and failure times of the login operation of the user, and detecting abnormal login behaviors.

Logging in an IP address: and recording the IP address of the user when logging in, and detecting the login in a different place or abnormal IP address.

3. The operation characteristics are as follows:

operation command: specific operation commands, such as file operations, system commands, etc., executed by the user are recorded.

The operation object is as follows: it is recorded which objects the user has performed an operation on, such as files, databases, network devices, etc.

Operating time: the time stamp of the user operation is recorded and used for analyzing the time mode and abnormal time behavior of the operation.

4. Abnormal characteristics:

non-on time operation: by judging the operation time, the operation performed in the non-working time is identified.

Frequent operations: the frequency of the user performing the same or similar operations in a short time is counted for detecting abnormal frequent operation behaviors.

Abnormal command: a set of sensitive commands or risky operation commands is defined, and it is detected whether the user has executed these commands.

These feature vectors may be obtained by parsing and processing the oplog data. The preprocessing step may include data cleaning, noise removal, missing value processing, and the like. Feature extraction may extract the corresponding features by using regular expressions, keyword matching, or other custom rules.

The extracted feature vectors can be used as inputs to a training model for training and predicting abnormal behavior. Different models can be modeled and analyzed for different features to improve the accuracy and robustness of anomaly detection.

Step S3, training and predicting the feature vector based on a time sequence analysis model to identify time abnormality;

in this step, the feature vectors obtained by the preprocessing and feature extraction are trained and predicted using a time series analysis model to identify temporal anomalies. Common time series analysis models include ARIMA model, exponential smoothing, and the like. Through analysis of the historical operation log data, the time series model can capture common time anomaly modes such as seasonal changes, trend changes and the like. Identifying a time anomaly helps to discover time anomaly behavior of user operations, such as login at non-working time or frequent operation requests.

In this embodiment, the time series analysis model is an ARIMA model, and the ARIMA (Autoregressive Integrated Moving Average) model is a common time series analysis method for predicting and modeling time series data. Parameters of the ARIMA model include an autoregressive order p, a differential order d, and a moving average order q.

The expression is as follows:

an autoregressive portion (AR) represents a linear relationship between the current observation and the first p observations, where φ ₁ ,φ ₂ ,…,φ _p Is an autoregressive coefficient, y _t-1 ,y _t-2 ,…,y _t-p Is a historical observation of the first p time points.

The moving average portion (MA) represents a linear relationship between the current observation and the first q residuals, where θ ₁ ,θ ₂ ,…,θ _q Is the moving average coefficient, E _t-1 ,∈ _t-2 ,…,∈ _t-q Is the residual of the first q time points.

The difference section (I, representing integration) is used to process the non-stationary time series, representing a difference operation on the original time series to make it stationary. The difference order d represents the number of times the difference operation is performed.

Finally, the ARIMA model predicts by fitting the autoregressive, differential and moving average relationships in the historical data to obtain the predicted value of the time sequence at the future time point

Specifically, in the present embodiment, the ARIMA model is used to model and predict a time series in operation log data. The method comprises the following specific steps:

1. data preparation: first, a time series in the operation log data needs to be extracted as a target variable for modeling. The time point at which the operation time is used as the time series, and the corresponding number of operations, the operation amount, and the like may be selected as the observation value.

2. Parameter selection: parameters of the ARIMA model include autoregressive order (p), differential order (d), and moving average order (q). In selecting the parameters, some common methods such as analysis of autocorrelation function (ACF) and partial autocorrelation function (PACF) and observation of characteristics such as stationarity and seasonality of time series can be used.

3. Model training: and according to the selected parameters, performing model training by using historical operation log data. During the training process, the model fits the autoregressive, differential and moving average relationships in the historical data to obtain the optimal model parameters.

4. Model prediction: a future time series is predicted using a trained ARIMA model. By inputting the observed values of the historical data, the model can generate corresponding predicted values and give corresponding confidence intervals.

5. Abnormality detection: according to the difference between the predicted value and the actual observed value, whether abnormal behavior exists or not can be judged. If the difference between the predicted value and the actual observed value exceeds a certain threshold value, it can be regarded as abnormal behavior.

Step S4, based on a rule model, carrying out rule detection on the feature vector to identify a behavior violating a security policy;

in this step, the feature vectors are subjected to rule detection using a rule model to identify behaviors that violate the security policy. The rule model logically judges the feature vector based on a predefined rule set. Rules may include access control rules, security policy rules, and the like. For example, a rule may define that if a user fails to log in consecutively more than a certain number of times in a short time, it is determined to be abnormal. The rule model has the advantages of interpretability and instantaneity, and can quickly identify the behavior violating the security policy.

In this embodiment, the rule model is based on a predefined security rule, and the expression is:

Each rule in the rule model is composed of a plurality of conditions, which may be logical expressions, comparison operations, or other predefined judgment conditions. The rule model judges each rule of the input feature vector one by one, and carries out logical AND operation on the results of all the rules to finally obtain the result of overall rule detection.

Rule models are typically used to define and detect predefined security rules in order to quickly identify and respond to specific security events or abnormal behavior. Compared to other machine learning models, rule models have the advantage of strong interpretability, easy maintenance and adjustment, but may be limited by the coverage and expressive power of the rule.

Specifically, in the present embodiment, the rule model is used to detect operation log data based on predefined security rules to identify potential abnormal behavior.

The rule model may define and detect abnormal behavior according to specific security rules. Each rule is composed of a plurality of conditions, which may include login information, operation type, operation frequency, operation object, and the like. The rule model judges whether abnormal behaviors exist or not by judging each rule in the operation log data one by one and carrying out logical AND operation on results of all the rules.

The present embodiment defines the following rules to detect abnormal behavior:

rule 1: if the same user fails to login by multiple attempts within a short time (the number of login failures is greater than a threshold), then abnormal login behavior is considered to exist.

Rule 2: if a user frequently performs sensitive operations (e.g., file deletion, rights modification, etc.) in a short period of time, it is considered that there is abnormal operation behavior.

Rule 3: if a user logs into the system and performs an operation during a non-operating period, it is considered that there is an abnormal login behavior.

Rule 4: if a user attempts to access an unauthorized file or directory multiple times within a short period of time, then an abnormal access behavior is deemed to exist.

By combining and configuring a plurality of rules, the rule model can perform comprehensive analysis and anomaly detection on the operation log data. When the operation log data satisfies one or more rules in the rule model, the rule model outputs corresponding abnormal behavior results.

The rule model has the advantages of strong interpretability and capability of intuitively knowing the judging condition and the abnormal behavior type of each rule.

Step S5, training and predicting the feature vector based on the machine learning model to identify abnormal behaviors;

in this step, the feature vectors are trained and predicted using a machine learning model to identify abnormal behavior. The machine learning model can automatically capture the characteristics of abnormal behavior by learning patterns and rules in the historical operation log data. Common machine learning models include random forests, support vector machines, neural networks, and the like. By training the model and predicting the new feature vector, it can be determined whether it belongs to abnormal behavior. The advantage of the machine learning model is that it is capable of handling complex nonlinear relationships and large-scale data sets.

In this embodiment, the machine learning model is a random forest model, and the expression is:

wherein f (x) represents the result of predicting the input feature vector x, B represents the number of decision trees in the random forest, f _b (x；Θ _b ) For the prediction result of the b-th decision tree, Θ _b The parameters representing the b-th decision tree,to average the prediction results for all decision trees. The random forest model predicts each decision tree in the random forest and averages the predicted results of all the decision trees to obtain a final predicted result.

Anomaly detection can be performed on the oplog data using a random forest model. When training the random forest model, the marked operation log data can be used as a training set, the characteristics of the operation log are used as input, and whether the operation log is abnormal or not is used as output. By training a plurality of decision trees and combining them into a random forest model, the accuracy and robustness of anomaly detection can be improved.

Specifically, in this embodiment, the random forest model is applied to anomaly detection of operation log data.

The method comprises the following steps:

1. feature selection: before the random forest model is applied, suitable features need to be selected for training and prediction. Some characteristics related to abnormal behavior may be selected according to characteristics of the operation log data, such as a user characteristic (e.g., user ID), a login characteristic (e.g., login time, login IP), an operation characteristic (e.g., operation type, operation object), and the like. These features will be provided as input to the random forest model.

2. Data preparation: in order to train and evaluate the random forest model, a labeled oplog dataset needs to be prepared. Labels of abnormal behavior can be obtained by manual labeling or other abnormality detection methods. The operation log data set is divided into a training set and a testing set for training and evaluating the random forest model.

3. Model training: and training a random forest model by using the operation log data of the training set and the corresponding labels. The random forest model generates a plurality of decision trees, each decision tree being trained based on a different random sample and feature subset.

4. Abnormality detection: and performing anomaly detection on the operation log data of the test set by using the trained random forest model. For each operation log data, it is provided as an input to a random forest model, and then it is judged whether or not it is an abnormal behavior based on the prediction result of the random forest model.

5. Outputting a result: the operation log data may be marked as normal or abnormal according to the prediction result of the random forest model. A threshold may be set as needed, and operation log data with a prediction probability higher than the threshold may be determined to be abnormal.

The random forest model has the advantages of being capable of processing complex characteristic relationships, having good robustness and being capable of providing characteristic importance assessment. By combining the prediction results of a plurality of decision trees, the random forest model can improve the accuracy of anomaly detection and has certain tolerance to some noise and anomaly values.

Step S6, training the feature vector and calculating a reconstruction error based on the deep learning model to identify abnormal behaviors;

in this step, the feature vectors are trained using a deep learning model and reconstruction errors are computed to identify abnormal behavior. Deep learning models, such as self-encoders, can capture potential patterns of abnormal behavior by learning a low-dimensional representation of the input features. During training, the model learns the representation of the features by minimizing the differences between the input features and the reconstructed output. The reconstruction error can be used as an index of the degree of abnormality, and a larger reconstruction error indicates that the input characteristic is not matched with the mode learned by the model, and possibly represents abnormal behavior.

In this embodiment, the deep learning model is a self-encoder model, and the expression is:

wherein,representing the difference between the input feature vector x and the output after model reconstruction, g (f (x; θ) _f )；θ _g ) Is of depthReconstruction part of model, f (x; θ) _f ) Representing the output of the encoder, θ _g Representing the parameters of the decoder, ||x-g (f (x; θ) _f )；θ _g )|| ² To calculate the square of the euclidean distance between the input feature vector x and the reconstructed output.

The goal of the self-encoder model is to reconstruct the input data by compressing it into a low-dimensional encoded representation, which is then decoded back into the original data space. By minimizing reconstruction errors, the self-encoder can learn the potential representation of the input data. In anomaly detection, the self-encoder may be used to detect anomaly patterns that are different from the training data.

When training the self-encoder model, normal operation log data can be used as a training set, characteristics of the operation log are used as input, the reconstructed output is compared with the original input, and the model parameter theta is optimized by minimizing the reconstruction error _f And theta _g 。

When abnormality detection is performed, new operation log data may be reconstructed using the trained self-encoder model, and then a reconstruction error is calculated. If the reconstruction error exceeds a predefined threshold, the oplog data may be marked as anomalous.

Specifically, in the present embodiment, the self-encoder model is applied to anomaly detection of operation log data. The method comprises the following steps:

1. data preparation: with the oplog data as input data, a portion of the normal oplog data may be selected as a training set for training the self-encoder model. The remaining oplog data may be used as a test set for evaluating the anomaly detection performance of the model.

2. Model training: the self-encoder model is trained using the oplog data of the training set. The self-encoder model consists of two parts, encoder and decoder. The encoder maps the input oplog data to a low-dimensional representation, and the decoder maps the low-dimensional representation back to the original data space. By minimizing the reconstruction error, the parameters of the self-encoder model can be optimized.

3. Abnormality detection: and reconstructing operation log data of the test set by using the trained self-encoder model, and calculating a reconstruction error. The reconstruction error may be measured by squaring the euclidean distance between the input data and the reconstruction output. If the reconstruction error exceeds a predefined threshold, the oplog data may be marked as anomalous.

Outputting a result: the oplog data may be marked as normal or abnormal based on a comparison of the reconstruction error and a threshold. The data with larger reconstruction errors may be an indication of abnormal behavior.

The advantage of the self-encoder model is that it is possible to learn potential representations of the data and to detect abnormal patterns that differ from the training data. By minimizing the reconstruction error, the self-encoder model can learn the pattern of normal operation log data and can generate higher reconstruction errors when abnormal behavior is encountered.

Step S7, generating a final anomaly score by adopting a weighted voting mechanism according to the prediction result and the detection result;

in this step, a weighted voting mechanism is employed to generate a final anomaly score based on the prediction results and detection results of the time series analysis model, the rule model, the machine learning model, and the deep learning model. The weighted voting mechanism may assign weights to each model based on the prediction accuracy and reliability of the respective model. Then, the prediction results of the models are multiplied by the corresponding weights, and the final anomaly scores are obtained through summation. The weighted voting mechanism has the advantages that the prediction results of a plurality of models can be synthesized, and the accuracy and the robustness of anomaly detection are improved.

In this embodiment, the weighted voting mechanism assigns weights to the models according to their performance, and the expression is:

wherein S (x) represents the final anomaly score, w, obtained by weighted voting according to the prediction results of each model _i For the weight of the ith model for determiningIts contribution to the final anomaly score, s _i (x) The prediction result of the ith model on the input feature vector x is obtained.

In heterogeneous ensemble learning, anomaly detection is performed by selecting a plurality of models, and weighted voting is performed based on their prediction results. The weight of each model may be determined based on its performance on the training set, and generally better performing models will be given higher weights.

The process of weighted voting is to multiply the predicted outcome of each model by its corresponding weight and then add them to get the final anomaly score. The prediction results of the models can be synthesized through weighted voting, and the accuracy and the robustness of anomaly detection are improved.

In this embodiment, a weighted voting mechanism is used to integrate the prediction results of the multiple models to obtain the final anomaly score. The method comprises the following steps:

1. weight determination: each model is assigned a weight that measures its contribution to the final anomaly score. The weights may be determined based on the performance of the model on the training set, such as accuracy, recall, F1 score, and the like. Better performing models may be given higher weights in order to affect the final anomaly score to a greater extent.

2. Fusion of prediction results: for a given input feature vector x, each model produces a prediction result s _i (x) Representing a prediction of the input by the model. Then, the prediction result of each model is multiplied by the corresponding weight w _i And adds them to obtain the anomaly score S (x) of . This process can be represented by an expression of a weighted voting mechanism:

3. abnormality determination based on the final abnormality score S (x), a threshold value may be set to determine whether or not the input feature vector x is abnormal. If S (x) exceeds the threshold, marking it as abnormal; otherwise, it is marked as normal.

By using a weighted voting mechanism, the prediction results of a plurality of models can be synthesized, thereby improving the accuracy and the robustness of anomaly detection.

Step S8, judging the abnormal state of the feature vector according to the set threshold value;

in this step, the final anomaly score is determined based on a set threshold value to determine the anomaly state of the feature vector. If the final anomaly score exceeds a set threshold, determining that the anomaly is detected; otherwise, the judgment is normal. By setting a proper threshold, the detection accuracy and false alarm rate of the abnormality can be flexibly controlled according to actual requirements.

In this embodiment, the threshold is set to determine the abnormal state of the feature vector, and the expression is:

1. And (3) calculating the anomaly score, namely firstly, synthesizing the prediction results of the multiple models by using a weighted voting mechanism to obtain the anomaly score S (x) of the feature vector. This anomaly score indicates the degree to which the feature vector x is judged to be anomalous.

2. And setting a threshold value tau according to actual requirements, wherein the threshold value tau is used for defining abnormal and normal demarcation points. If the anomaly score S (x) of the feature vector exceeds a set threshold τ, then marking it as anomaly; otherwise, it is marked as normal.

3. Threshold selection the selection of the threshold is a critical step and needs to be adjusted according to the specific situation. If the detection requirements for anomalies are more stringent, a lower threshold may be selected; if the tolerance for anomalies is high, a high threshold may be selected. The selection of the threshold may be based on experience or may be determined by trying different thresholds and evaluating their performance.

By setting the threshold value, the feature vector can be classified according to the abnormality score, thereby judging the abnormal state thereof.

In general, the present invention can improve the accuracy of anomaly detection by integrating a plurality of different types of detectors, and utilizing their advantages to perform cooperative work. Each detector can capture different types of abnormal behavior, thereby reducing the situations of missing and false alarms. Since different types of detectors are integrated together, a greater variety of anomalies can be handled. For example, some detectors may be good at detecting a particular type of attack, while other detectors may be better at detecting abnormal user behavior. By means of the diversified detector combinations, the perceptibility of the system to various abnormal conditions can be enhanced.

By deploying the integrated model into the fort machine operation and maintenance management system, user operation and system behavior can be monitored in real time, and abnormal conditions can be found in time. By establishing a feedback loop, the performance of the integrated model can be continuously optimized, and the safety and reliability of the system can be improved.

The fort machine operation and maintenance management system described below and the fort machine operation and maintenance management method described above can be referred to correspondingly.

Referring to fig. 2, the present invention further provides a fort operation and maintenance management system, including:

a data collection module 100 for collecting the operation log of the fort machine;

the feature extraction module 200 is used for preprocessing the operation log data and extracting features;

a time series analysis training module 300 for training a time series analysis model;

a rule detection module 400, configured to perform rule detection;

a machine learning training module 500 for training a machine learning model;

the deep learning training module 600 is used for training a deep learning model;

an integration module 700 for employing a weighted voting mechanism;

the abnormality determination module 800 is configured to determine an abnormal state.

The apparatus of this embodiment may be used to execute the above method embodiments, and the principle and technical effects are similar, and are not repeated herein.

Referring to fig. 3, the present invention further provides a computer device 40, including: a processor 41 and a memory 42, the memory 42 storing a computer program executable by the processor, which when executed by the processor performs the method as described above.

The present invention also provides a storage medium 43, on which storage medium 43 a computer program is stored which, when run by a processor 41, performs a method as above.

The storage medium 43 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as a static random access Memory (Static Random Access Memory, SRAM), an electrically erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), an erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The operation and maintenance management method of the fort machine is characterized by comprising the following steps of:

collecting operation log data from the fort machine;

2. The method for managing operation and maintenance of a fort machine according to claim 1, wherein the time series analysis model is an ARIMA model, and the expression is:

wherein,representing the predicted value, phi, of the time series at time point t ₁ ，φ ₂ ，...，φ _p As parameters of the autoregressive model, autoregressive coefficients, y, representing the first p time points _t-1 ，y _t-2 ，...，y _t-p Historical values, θ, representing observed values at the first p time points ₁ ，θ ₂ ，...，θ _q For parameters of the moving average model, the moving average coefficient, ε, representing the first q time points _t-1 ，ε _t-2 ，...，ε _t-q Representing the residual error of the observed value at the previous q time points, E _t A residual term representing the point in time t.

3. The method of claim 1, wherein the rule model is based on predefined security rules, and the expression is:

4. The method of claim 1, wherein the machine learning model is a random forest model, and the expression is:

wherein f (x) represents the result of predicting the input feature vector x, B represents the number of decision trees in the random forest, f _b (x；Θ _b ) For the prediction result of the b-th decision tree, Θ _b Representing the parameters of the b-th decision tree.To average the prediction results for all decision trees.

5. The method of claim 1, wherein the deep learning model is a self-encoder model, and the expression is:

wherein,representing the difference between the input feature vector x and the output after model reconstruction, g (f (x; θ) _f )；θ _g ) Is a reconstructed part of the deep learning model, f (x; θ _f ) Representing the output of the encoder, θ _g Parameters representing a decoder，||x-g(f(x；θ _f )；θ _g )|| ² To calculate the square of the euclidean distance between the input feature vector x and the reconstructed output.

6. The method of claim 1, wherein the weighted voting mechanism assigns weights to the models according to their performance, and the expression is:

7. The method for managing operation and maintenance of a fort machine according to claim 1, wherein the threshold is set to determine an abnormal state of a feature vector, and the expression is:

8. A bastion machine operation and maintenance management system, based on the bastion machine operation and maintenance management method of any one of claims 1 to 7, comprising:

the rule detection module is used for detecting rules;

the deep learning training module is used for training a deep learning model;

the integration module is used for adopting a weighted voting mechanism;

and the abnormality judgment module is used for judging the abnormal state.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-7 when executing the computer program.

10. A storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-7.