CN116346456A

CN116346456A - Business logic vulnerability attack detection model training method and device

Info

Publication number: CN116346456A
Application number: CN202310293852.3A
Authority: CN
Inventors: 宋杨; 郑杭杰; 闫立志; 张超; 田涛; 王晖
Original assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Current assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-06-27

Abstract

The application discloses a business logic vulnerability attack detection model training method and device, which belong to the technical field of network security, and the method comprises the following steps: acquiring first service related data recorded in a service logic process of historically operating different attack tags, wherein the attack tags comprise attack behaviors with service logic vulnerabilities and attack behaviors without the service logic vulnerabilities, extracting attack characteristic data from the first service related data to obtain training samples comprising the attack tags and the characteristic data, wherein the attack characteristic data is characteristic data relevant to whether the service logic vulnerabilities exist in the service or not, taking the attack characteristic data as input of a service logic vulnerabilities attack detection model, and carrying out model training by taking output of the corresponding attack tags as targets. Therefore, the detection accuracy and detection efficiency of the attack behavior of the business logic vulnerability can be effectively improved.

Description

Business logic vulnerability attack detection model training method and device

Technical Field

The application relates to the technical field of network security, in particular to a business logic vulnerability attack detection model training method and device.

Background

Because the service development is rapid, the iteration of the version of the application is continuously accelerated, the development level of developers is different, and the situation that program logic is not strict or logic is too complex due to reasons such as incomplete consideration and the like possibly occurs, so that some logic branches cannot be processed normally or are in error processing, and service logic loopholes occur.

In many cases, in order to attack the website, lawbreakers can perform vulnerability scanning on programs of the website, attack the website by using existing business logic vulnerabilities, so that enterprise information is divulged or serious business loss is caused. Because of the reasons that the attack characteristics of the attack are not obvious or the period frequency is low, the attack behavior of the possibly existing business logic loophole is difficult to be found effectively and timely through the existing security products or alarms, and the attack behavior is detected by adopting a manual detection mode at present, so that manpower and material resources are wasted, the detection accuracy is low, and the detection efficiency is low.

Therefore, how to improve the detection accuracy and detection efficiency of the attack behavior of the business logic vulnerability is a technical problem to be solved.

Disclosure of Invention

The embodiment of the application provides a service logic vulnerability attack detection model training method and device, which are used for improving the detection accuracy and detection efficiency of attack behaviors of service logic vulnerabilities.

In a first aspect, an embodiment of the present application provides a service logic vulnerability attack detection model training method, including:

acquiring first service related data recorded in the service logic process of historically operating different attack tags, wherein the attack tags comprise attack behaviors with service logic vulnerabilities and attack behaviors without service logic vulnerabilities;

extracting attack characteristic data from the first service related data to obtain a training sample comprising attack labels and characteristic data, wherein the attack characteristic data is characteristic data related to whether service logic vulnerability attack behaviors exist in the characterization service;

and taking the attack characteristic data as the input of a business logic vulnerability attack detection model, and carrying out model training by taking the output of the corresponding attack tag as a target.

In some embodiments, the first service related data includes a service request message and a service response message recorded in a service logic process, session data recorded in a database, personal information data in a return packet, and a URL address of an access interface.

In some embodiments, the attack signature data includes static signature data, and before extracting attack signature data for the first service-related data, further includes:

Analyzing the data source type of each first service related data, and determining whether the analyzed data source type of each first service related data meets the sample requirement;

extracting attack characteristic data from the first service related data, including:

determining first service related data with a data source type meeting sample requirements as first data, and extracting static characteristics of the first data according to a static characteristic extraction rule corresponding to the data source type; the static characteristics are basic attribute information used for representing whether vulnerability attack behaviors exist in the service or not in the first data;

and determining the first service related data of which the data source type does not meet the sample requirement as second data, and discarding the second data.

In some embodiments, the data source types include at least a traffic log class, an access record class of a WEB application protection system, and an alert output class for identifying security products of sensitive data.

In some embodiments, according to a static feature extraction rule corresponding to the data source type, performing static feature extraction on the first data includes:

traversing the first data according to basic attribute information corresponding to different predefined data source types, and marking static features belonging to the basic attribute information in the traversing process;

The quality of the marked static features is checked by matching the marked static features with basic attribute information corresponding to the data source type;

and after the quality check is confirmed, taking the marked static features as attack feature data, otherwise, correcting the marked static features, and taking the corrected static features as attack feature data.

In some embodiments, the quality check of the static feature of the tag by matching the static feature of the tag with the basic attribute information corresponding to the data source type includes:

judging whether the extracted static features are complete and whether the data structure of the static features is correct or not by matching the marked static features with basic attribute information corresponding to the data source type;

correcting the static characteristics of the mark, including:

supplementing the missing static features in the marked static features completely;

filling incorrect data structures in the marked static features with default feature values.

In some embodiments, the attack signature data further comprises statistical class signature data;

after the static feature extraction is performed on the first data according to the static feature extraction rule corresponding to the data source type, the method further comprises the following steps:

And extracting the statistical characteristics from the first data, wherein the statistical characteristics are obtained by statistics according to abnormal static characteristics in the first data in the operation process of the historical business logic.

In some embodiments, taking the attack characteristic data as an input of a business logic vulnerability attack detection model, and performing model training with the output of a corresponding attack tag as a target includes:

taking the preset proportion of the training sample as a training set, taking attack characteristic data in the training set as the input of the business logic vulnerability attack detection model, and carrying out model training by taking the output of the corresponding attack label as a target;

taking the rest training samples except the training set as a verification set, inputting attack characteristic data in the verification set into the business logic vulnerability attack detection model, and comparing an output label of the business logic vulnerability attack detection model with an attack label in the verification set to determine a training effect;

and determining the current business logic vulnerability attack detection model as an initial detection model until the preset convergence condition is determined to be met.

In some embodiments, after determining the current business logic vulnerability attack detection model as the initial detection model, further comprising:

Acquiring second service related data recorded in the service logic process without labels;

extracting attack characteristic data from the second service related data to obtain a detection sample;

inputting the detection sample into the initial detection model to obtain attack labels corresponding to the output second service related data;

and acquiring attack labels obtained by analyzing the second business related data by the user, comparing the attack labels with the output attack labels, and redefining an extraction rule of attack characteristic data when the comparison results are inconsistent.

In some embodiments, redefining the extraction rule of the attack characteristic data when the comparison results are inconsistent comprises:

when the comparison results are inconsistent, comparing the characteristic data representing whether the vulnerability attack behavior related to the service exists in the second service related data according to the analysis of the user, and comparing the characteristic data with the attack characteristic data extracted from the second service related data;

and updating the extraction rule of the attack characteristic data according to the comparison result.

In some embodiments, after updating the extraction rule of the attack characteristic data, the method further includes:

extracting the updated attack characteristic data from third service related data according to the extraction rule of the updated attack characteristic data to obtain a training sample comprising an attack tag and the updated attack characteristic data;

Taking the updated attack characteristic data as input of an initial detection model, and carrying out model training by taking the output corresponding attack tag as a target;

and determining the current initial detection model as a final business logic vulnerability detection model until the preset convergence condition is determined to be met.

In some embodiments, the business logic vulnerability attack detection model is an XGBoost model.

In a second aspect, an embodiment of the present application provides a method for detecting a business logic vulnerability attack, including:

acquiring fourth service related data recorded in a service logic process to be detected;

extracting attack characteristic data from the fourth service related data, wherein the attack characteristic data is characteristic data related to whether service logic vulnerability attack behaviors exist in the characterization service;

inputting the attack characteristic data into a business logic vulnerability detection model obtained based on the business logic vulnerability attack detection model training method in any one of the first aspect;

and determining whether the fourth business related data has an attack action aiming at the business logic vulnerability or not according to the output of the business logic vulnerability detection model.

In a third aspect, an embodiment of the present application provides a service logic vulnerability attack detection model training device, including:

The first acquisition module is used for acquiring first business related data recorded in the business logic process of historically operating different attack tags, wherein the attack tags comprise business logic vulnerability attack behaviors and business logic vulnerability attack behaviors without existence;

the extraction module is used for extracting attack characteristic data from the first service related data to obtain a training sample comprising attack labels and the characteristic data, wherein the attack characteristic data is characteristic data related to whether service logic vulnerability attack behaviors exist in the characterization service;

and the training module is used for taking the attack characteristic data as the input of the business logic vulnerability attack detection model and carrying out model training by taking the output of the corresponding attack tag as a target.

In a fourth aspect, an embodiment of the present application provides a detection apparatus for a business logic vulnerability attack, including:

the acquisition module is used for acquiring fourth service related data recorded in the service logic process to be detected;

the extraction module is used for extracting attack characteristic data from the fourth service related data, wherein the attack characteristic data is characteristic data which characterizes whether service logic vulnerability attack behaviors exist in the service or not;

the input module is used for inputting the attack characteristic data into a business logic vulnerability detection model obtained based on the business logic vulnerability attack detection model training method in any one of the first aspect;

And the determining module is used for determining whether the fourth business related data has attack behaviors aiming at the business logic loopholes according to the output of the business logic loophole detection model.

In a fifth aspect, embodiments of the present application provide an electronic device, including:

a memory for storing program instructions;

a processor for invoking program instructions stored in the memory and executing the steps comprised by the method according to any of the first or second aspects as obtained program instructions.

In a sixth aspect, the present application provides a computer readable storage medium storing a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method of any one of the first or second aspects.

In a seventh aspect, the present application provides a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method of any of the first or second aspects.

In the embodiment of the application, first service related data recorded in a service logic process of historically operating different attack tags are obtained, the attack tags comprise service logic vulnerability attack behaviors and service logic vulnerability attack behaviors which do not exist, attack feature data are extracted from the first service related data to obtain training samples comprising the attack tags and the feature data, the attack feature data are feature data relevant to the service logic vulnerability attack behaviors which characterize whether the service exists or not, the attack feature data are used as input of a service logic vulnerability attack detection model, and model training is performed with the output of the corresponding attack tags as a target. In this way, by extracting the attack characteristic data from the business data marked with the attack tag and performing model training on the business logic vulnerability attack detection model by utilizing the attack characteristic data, the detection accuracy and detection efficiency of the attack behavior of the business logic vulnerability can be effectively improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is a flowchart of a business logic vulnerability attack detection model training method provided in an embodiment of the present application;

fig. 2 is a flowchart of a method for detecting a business logic vulnerability attack according to an embodiment of the present application;

FIG. 3 is a flowchart of extracting attack features according to an embodiment of the present application;

FIG. 4 is a model training flowchart provided in an embodiment of the present application;

FIG. 5 is a flow chart of testing and calibration of a model provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a service logic vulnerability attack detection model training device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a detection device for attack of business logic vulnerabilities according to an embodiment of the present application;

fig. 8 is a schematic hardware structure of an electronic device for implementing a training method of a business logic vulnerability attack detection model according to an embodiment of the present application.

Detailed Description

In order to improve the detection accuracy and detection efficiency of the attack behavior of the business logic vulnerability, the embodiment of the application provides a training method and device for a business logic vulnerability attack detection model.

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure. Embodiments and features of embodiments in this application may be combined with each other arbitrarily without conflict. Also, while a logical order of illustration is depicted in the flowchart, in some cases the steps shown or described may be performed in a different order than presented.

The terms first and second in the description and claims of the present application and in the above-described figures are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The term "plurality" in the present application may mean at least two, for example, two, three or more, and embodiments of the present application are not limited.

In the technical scheme, the data are collected, transmitted, used and the like, and all meet the requirements of national related laws and regulations.

In order to facilitate understanding of the present application, the present application refers to the technical terms:

XGBoost: is an optimized distributed gradient lifting library. It implements a machine learning algorithm under the Gradient Boosting framework. XGBoost provides a parallel tree promotion that can quickly and accurately solve many data science problems.

Business logic vulnerabilities: because program logic is not strict or logic is too complex, some logic branches cannot be processed normally or are processed incorrectly, and the loopholes are called business logic loopholes.

The business logic vulnerability attack detection model training method and the detection method for the business logic vulnerability attack can be applied to electronic equipment, and the electronic equipment can comprise a terminal and a server; the terminal can be a smart phone, a tablet computer, a personal digital assistant (Personal Digital Assitant, PDA) and the like; the server may be an application server or a Web server. In addition, the service logic vulnerability attack detection model training method and the service logic vulnerability attack detection method can be executed by the same electronic device or executed by different electronic devices.

Fig. 1 is a flowchart of a service logic vulnerability attack detection model training method provided in an embodiment of the present application, where the method includes the following steps.

In step 101, first service related data recorded in a service logic process of historically operating different attack tags are obtained, wherein the attack tags include a service logic vulnerability attack behavior and a service logic vulnerability attack behavior which does not exist.

In specific implementation, the first service related data may include a service request message and a service response message recorded in a service logic running process, session data recorded in a database, personal information data in a return packet, a URL address of an access interface, and the like.

In specific implementation, whether the attack behavior exists in each piece of first service related data can be analyzed in advance, each piece of first service related data is labeled according to the actual attack behavior, the attack label is marked on the first service data with the business logic vulnerability attack behavior, the non-attack label is marked on the first service data without the business logic vulnerability attack behavior, or the attack label can be represented by '1'; the "0" is used to represent a non-attack tag, and other identifiers may be used to identify the tag of the first service data, which is not specifically limited in the embodiment of the present application.

In step 102, attack feature data is extracted from the first service-related data to obtain a training sample including an attack tag and feature data, where the attack feature data is feature data related to whether a service logic vulnerability attack behavior exists in the characterizing service.

In specific implementation, the attack feature data comprises static feature data and statistical feature data, wherein the static feature represents basic attribute information of whether the vulnerability attack behavior exists in the service, the statistical feature is obtained by statistics according to abnormal static features appearing in service related data in the history service logic operation process, and a judgment basis can be provided for judging whether the vulnerability attack behavior exists in the service.

In particular embodiments, the data source type of the first service-related data may include, but is not limited to, a traffic log type, an access record type of a WEB application protection system, an alarm output type of a security product for identifying sensitive data, or service-related data from other data source types.

In the implementation, before the attack characteristic data is extracted from the first service related data, the data source type analysis can be performed on each first service related data, and whether the data source type of each first service related data obtained through the analysis meets the sample requirement or not is determined. And then, when the attack characteristic data is extracted from the first service related data, the static characteristic extraction can be carried out on the first data according to the static characteristic extraction rule corresponding to the data source type, the first service related data of which the data source type does not meet the sample requirement is determined as second data, and the second data is discarded. Therefore, the data source type of the service related data is screened, so that the quality of a sample is improved, and the training effect of a model is improved.

In specific implementation, the static feature extraction can be performed on the first data according to the following steps;

step one: traversing the first data according to basic attribute information corresponding to different predefined data source types, and marking static features belonging to the basic attribute information in the traversing process.

Step two: and matching the static characteristics of the mark with basic attribute information corresponding to the data source type, and performing quality check on the static characteristics of the mark.

For example, by matching the marked static features with the basic attribute information corresponding to the data source type, whether the extracted static features are complete or not and whether the data structure of the static features is correct or not are judged. If the quality check is not passed, the static features of the mark are corrected, specifically, the missing static features in the static features of the mark may be completely supplemented, or the incorrect data structure in the static features of the mark may be filled with default feature values.

Step three: and after the quality check is confirmed to pass, taking the static characteristics of the mark as attack characteristic data.

In a specific implementation, after the static feature extraction is performed on the first data, the statistical feature needs to be extracted, where the statistical feature extraction may also be a marking manner, for example, marking a statistical feature field for the first data.

In step 103, the attack characteristic data is used as input of a business logic vulnerability attack detection model, and model training is performed with the output of the corresponding attack tag as a target.

In specific implementation, the preset proportion of the training sample can be used as a training set, for example, the preset proportion is 80%, attack characteristic data in the training set is used as input of a business logic vulnerability attack detection model, model training is performed by taking output corresponding attack labels as targets, the rest training samples except the training set are used as verification sets, for example, the rest 20% are used as verification sets, attack characteristic data in the verification sets are input into the business logic vulnerability attack detection model, training effects are determined by comparing output labels of the business logic vulnerability attack detection model and attack labels in the verification sets until the preset convergence condition is met, and the current business logic vulnerability attack detection model is determined to be an initial detection model. The preset convergence condition is that, for example, the accuracy of the model obtained after training is higher than a preset threshold, for example, the accuracy threshold is set to 95%, or the iteration number of the model reaches a preset number of times, which can be specifically set by a technician according to the actual situation.

After the initial detection model is obtained, the initial detection model can be further verified in order to further ensure the accuracy of the model.

In specific implementation, the second service related data recorded in the service logic process without labels can be obtained, the attack characteristic data is extracted from the second service related data to obtain a detection sample, then the detection sample is input into an initial detection model to obtain attack labels corresponding to the output second service related data, then the attack labels obtained by analyzing the second service related data by a user can be obtained and compared with the output attack labels to verify the training effect of the model, and the extraction rule of the attack characteristic data is redefined when the comparison results are inconsistent.

When the comparison result is inconsistent, for example, any second service related data method is detected, the label output by the model is an attack label, and the obtained user analysis results in a non-attack label; or the label output by the model is a non-attack label, and the acquired user analysis results in an attack label.

Therefore, in order to improve the accuracy of the model, when the comparison result is inconsistent, the feature data related to whether the vulnerability attack behavior exists in the service can be represented according to the second service related data analyzed by the user, and compared with the attack feature data extracted from the second service related data, and according to the comparison result, the extraction rule of the attack feature data is updated, for example, the attack feature data is added to the original attack feature data, or a certain attack feature data is deleted.

In specific implementation, after updating the extraction rule of the attack feature data, the updated attack feature data may be extracted from the third service related data according to the extraction rule of the updated attack feature data, to obtain a training sample including the attack tag and the updated attack feature data, and then, the updated attack feature data is used as an input of the initial detection model, and model training is performed with the output of the corresponding attack tag as a target until the current initial detection model is determined to be a final service logic vulnerability detection model when the preset convergence condition is determined to be satisfied. The third service-related data may be first service-related data, or second service-related data, or may include the first service-related data and the second service-related data, or may be other service-related data that is recorded in a service logic process of historically running different attack tags and is not used as a training sample, which is not limited herein.

In specific implementation, the business logic vulnerability detection model can be selected from XGBoost model, which is an optimized distributed gradient promotion library. It implements a machine learning algorithm under the Gradient Boosting framework. XGBoost provides a parallel tree promotion that can quickly and accurately solve many data science problems. Machine learning: the learning system is a multi-discipline cross specialty, covers probability theory knowledge, statistical knowledge, approximate theory knowledge and complex algorithm knowledge, uses a computer as a tool and aims at truly simulating a human learning mode, and divides the knowledge structure of the existing content to effectively improve the learning efficiency. The training speed of the model can be further improved.

In addition, after training is finished, a model file corresponding to the business logic vulnerability detection model can be stored in a database for being called when the system detects.

According to the embodiment of the application, the business logic vulnerability detection model is obtained after model training, and the detection accuracy and detection efficiency of the attack behavior of the business logic vulnerability can be effectively improved by using the business logic vulnerability detection model.

Fig. 2 is a flowchart of a method for detecting a business logic vulnerability attack according to an embodiment of the present application, where the method includes the following steps.

In step 201, fourth service related data recorded in the service logic process to be detected is obtained.

In step 202, attack feature data is extracted from the fourth service-related data, where the attack feature data is feature data related to whether a service has a service logic vulnerability attack behavior.

In step 203, the attack characteristic data is input into a pre-trained business logic vulnerability attack detection model for detection.

In step 204, according to the output of the business logic vulnerability detection model, it is determined whether the fourth business related data has an attack behavior against the business logic vulnerability.

In this way, the accuracy and the detection efficiency of the attack behavior detection of the business logic vulnerability can be improved by inputting the business related data to be detected into the pre-trained business logic vulnerability attack detection model for detection.

The following specifically describes the embodiment of the present application by taking a service logic vulnerability attack detection model as an XGBoost model as an example, and describes the generation process of the model through the following three stages of a preparation stage of training samples, a model training stage, and a model testing and correcting stage.

The first stage: a preparation stage of training samples.

In specific implementation, the first business related data required by the training model can be marked by the existing system capable of marking, namely, a specific type of data source is marked, and whether a label for attack on business logic vulnerabilities is added according to the actual influence of the specific type of data source. Meanwhile, before model training, corresponding logs are required to be analyzed aiming at different data sources, and attack characteristic data in the logs are extracted.

Among them, attack characteristic data can be classified into non-statistical type characteristic data (static characteristic data) and statistical type characteristic data.

1. Non-statistical class features are as follows:

1) Accessing a source 's_real_ip' field;

2) Accessing a source device identification "s_device_id" field;

3) A login user 'request_user' confirmed based on a sessionid check;

4) Accessing a target url' field;

5) Whether the access target url label needs to be logged in: the "url_islogin" field;

6) Whether the second label of the access target url is a path that only the administrator can access: "url_isadmin"

A field;

7) Whether the label three of the access target url returns personal information or sensitive information: the "url_ispii" field;

8) The fourth tag of the access target url is a landing page tag: the "url_logic" field;

9) The status code of the access (see if the data is actually accessed) 200 404, etc.: the "response_code" field.

2. Statistical class features are as follows:

1) Abnormal use of the device: discovering the behavior of the same device 's_device_id' of frequently switching ip 's_real_ip' (using proxy address pool, etc.;

2) Abnormal use of the device: discovering that multiple devices "s_device_id" are frequently switched with the IP "s_real_ip";

3) Homology (Source Address "s_real_ip", source device "s_device_id") while discovering information about "needed login" url_islogin ", administrator Page" url_isadmin ", sensitive information Page" url_ispi ";

the failure rate of the response_code is high;

4) The method comprises the steps of accessing a page 'url_islog', 'url_isadmin', 'url_ispi' to be logged in under the condition of no session-id and no 'request_user' by homologous ip or equipment;

5) The homologous ip or device, has not accessed "url_login", has accessed "url_islogin", "url_isadmin",

"url_ispii" page;

6) Accessing different response_user information from the request_user;

7) "url_islogin", "url_isadmin", and "url_ispii" are accessed in a short time and in a large number of pages.

In specific implementation, after the statistical features and the non-statistical features are determined, the attack feature extraction may be performed according to the following procedure, as shown in fig. 3, where fig. 3 is a flowchart of attack feature extraction provided in the embodiment of the present application, and includes the following steps.

In step 301, first service related data recorded in a service logic process of historically operating different attack tags are analyzed to determine a data source type, and the attack tags include a service logic vulnerability attack behavior and a service logic vulnerability attack behavior without existence.

For example, the feature value corresponding to the data source type field is extracted.

In step 302, it is determined whether the data source type of the first service related data meets the sample requirement, if yes, step 303 is entered, and if not, step 306 is entered.

In step 303, according to a preset non-statistical feature extraction rule corresponding to the data source type, non-statistical feature extraction is performed on the first service related data meeting the sample requirement.

In specific implementation, the corresponding non-statistical feature extraction rule can be loaded according to the data source type, and then the first service related data is analyzed according to the non-statistical feature extraction rule, such as json analysis, regular analysis and the like.

In step 304, a quality check is performed on the extracted non-statistical class features.

For example, whether the extracted non-statistical features are complete and whether the data structure is correct is judged, and the missing non-statistical features in the extracted non-statistical features are supplemented to be complete or the incorrect data structure in the extracted non-statistical features is filled with default feature values and the like.

In step 305, after the quality check is determined, statistical feature extraction is performed on the first service related data from which the non-statistical feature is extracted, so as to obtain a training sample including an attack tag and feature data.

In the implementation, the first service related data from which the non-statistical characteristics and the statistical characteristics are extracted can be encoded to meet the input format of the model.

In step 306, the first traffic related data of the data source type that does not meet the sample requirement is discarded.

And a second stage: model training stage.

In specific implementation, after the training samples are determined, model training may be performed according to the following procedure, as shown in fig. 4, where fig. 4 is a model training flowchart provided in an embodiment of the present application, and includes the following steps.

In step 401, a preset proportion of the training sample is used as a training set, attack feature data in the training set is used as input of an XGBoost model, and model training is performed with the output of a corresponding attack tag as a target.

In specific implementation, parameters of the XGBoost algorithm, including tree depth, training round number and the like, are read, and then a training set training model is used based on the algorithm parameters.

In step 402, the remaining training samples except the training set are used as a verification set, attack characteristic data in the verification set is input into the XGBoost model, and the training effect is determined by comparing the output label of the XGBoost model with the attack label in the verification set.

In step 403, when it is determined that the preset convergence condition is satisfied, the current XGBoost model is determined as the initial detection model.

In specific implementation, if the accuracy is higher than a preset threshold, for example, 95%, determining the current XGBoost model as an initial detection model, and storing a model file corresponding to the initial detection model in a database for calling when the system detects.

And a third stage: model test and correction stage.

In specific implementation, after the initial detection model is determined, the test and correction of the model may be performed according to the following procedure, as shown in fig. 5, and fig. 5 is a flowchart of the test and correction of the model provided in the embodiment of the present application, including the following steps.

In step 501, second service related data recorded during running the unlabeled service logic is obtained.

In step 502, attack characteristic data is extracted from the second service related data to obtain a detection sample.

In step 503, the detection sample is input into the initial detection model, and attack tags corresponding to the output second service related data are obtained.

In step 504, the attack tag obtained by analyzing the second service related data by the user is obtained, and compared with the attack tag output by the initial detection model.

In step 505, it is determined whether the comparison results are consistent, if yes, step 506 is entered, and if not, step 510 is entered.

In step 506, according to the characteristic data representing whether the vulnerability attack behavior related to the service exists in the second service related data analyzed by the user, comparing the characteristic data with the attack characteristic data extracted from the second service related data, and updating the extraction rule of the attack characteristic data according to the comparison result.

In specific implementation, TOP-N output in the model output result can be analyzed to see whether the attack behavior needing to be concerned really exists, the attack target and the means process are analyzed, and the possible influence is evaluated.

In step 507, according to the extraction rule of the updated attack feature data, the updated attack feature data is extracted from the third service related data, so as to obtain a training sample including the attack tag and the updated attack feature data.

In step 508, the updated attack characteristic data is used as an input of the initial detection model, and model training is performed with the output of the corresponding attack tag as a target.

In step 509, until it is determined that the preset convergence condition is satisfied, the current initial detection model is determined as the final XGBoost model.

In step 510, the current initial detection model is determined to be the final XGBoost model.

In this way, the accuracy of data analysis is improved by combining the multidimensional features of the statistical class and the non-statistical class. The effective training of the model obtained by the training in the three stages improves the detection rate of the attack of the business logic loopholes and avoids the larger business influence caused by the skipped attack.

Based on the same technical conception, the embodiment of the application also provides a service logic vulnerability attack detection model training device, and the principle of solving the problem by the service logic vulnerability attack detection model training device is similar to that of the service logic vulnerability attack detection model training method, so that the implementation of the service logic vulnerability attack detection model training device can be referred to the implementation of the service logic vulnerability attack detection model training method, and repeated parts are not repeated.

Fig. 6 is a schematic structural diagram of a training device for a business logic vulnerability attack detection model according to an embodiment of the present application, which includes a first obtaining module 601, an extracting module 602, and a training module 603.

The first obtaining module 601 is configured to obtain first service related data recorded in a service logic process of historically operating different attack tags, where the attack tags include a service logic vulnerability attack behavior and a service logic vulnerability attack behavior that does not exist;

the extracting module 602 is configured to extract attack feature data from the first service related data, to obtain a training sample including an attack tag and feature data, where the attack feature data is feature data related to whether a service logic vulnerability attack behavior exists in a representation service;

and the training module 603 is configured to take the attack feature data as an input of a service logic vulnerability attack detection model, and perform model training with output of a corresponding attack tag as a target.

In some embodiments, the attack signature data includes static signature data, and before the extracting module 602 extracts attack signature data from the first service-related data, the method further includes:

a determining module 604, configured to parse the data source type of each first service related data, and determine whether the parsed data source type of each first service related data meets the sample requirement;

the extracting module 602 is specifically configured to:

In some embodiments, the extracting module 602 is specifically configured to:

correcting the static characteristics of the mark, including:

In some embodiments, the attack signature data further includes statistical class signature data, and the extracting module 602 is further configured to:

In some embodiments, the training module 603 is specifically configured to:

In some embodiments, further comprising:

a second obtaining module 605, configured to obtain second service related data recorded in the process of running the service logic without the tag after the training module 603 determines the current service logic vulnerability attack detection model as the initial detection model;

In some embodiments, the second obtaining module 605 is specifically configured to:

In some embodiments, the training module 603 is further configured to:

Based on the same technical conception, the embodiment of the application also provides a detection device for the business logic vulnerability attack, and the principle of solving the problem by the detection device for the business logic vulnerability attack is similar to that of the detection method for the business logic vulnerability attack, so that the implementation of the detection device for the business logic vulnerability attack can be referred to the implementation of the detection method for the business logic vulnerability attack, and the repetition is omitted.

Fig. 7 is a schematic structural diagram of a detection device for a business logic vulnerability attack according to an embodiment of the present application, which includes an obtaining module 701, an extracting module 702, an input module 703, and a determining module 704.

An acquiring module 701, configured to acquire fourth service related data recorded in a service logic process to be detected;

The extracting module 702 is configured to extract attack feature data for the fourth service related data, where the attack feature data is feature data related to whether a service has a service logic vulnerability attack behavior;

an input module 703, configured to input the attack feature data into a service logic vulnerability detection model obtained based on any one of the service logic vulnerability attack detection model training methods described above;

and the determining module 704 is configured to determine, according to an output of the service logic vulnerability detection model, whether the fourth service-related data has an attack behavior for a service logic vulnerability.

In this embodiment of the present application, the division of the modules is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The coupling of the individual modules to each other may be achieved by means of interfaces which are typically electrical communication interfaces, but it is not excluded that they may be mechanical interfaces or other forms of interfaces. Thus, the modules illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed in different locations on the same or different devices. The integrated modules may be implemented in hardware or in software functional modules.

Having introduced any of the methods and apparatus described above for exemplary embodiments of the present application, an electronic device according to another exemplary embodiment of the present application is next described.

An electronic device 130 implemented according to such an embodiment of the present application is described below with reference to fig. 8. The electronic device 130 shown in fig. 8 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present application in any way.

As shown in fig. 8, the electronic device 130 is in the form of a general-purpose electronic device. Components of electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 connecting the various system components, including the memory 132 and the processor 131.

Bus 133 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, and a local bus using any of a variety of bus architectures.

Memory 132 may include readable media in the form of volatile memory such as Random Access Memory (RAM) 1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with the electronic device 130, and/or any device (e.g., router, modem, etc.) that enables the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur through an input/output (I/O) interface 135. Also, electronic device 130 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 130, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

In an exemplary embodiment, a storage medium is also provided, which, when a computer program in the storage medium is executed by a processor of an electronic device, is able to perform any of the methods described above. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, an electronic device of the present application may include at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores a computer program executable by the at least one processor, which when executed by the at least one processor, causes the at least one processor to perform the steps of any of the methods provided by the embodiments of the present application.

In an exemplary embodiment, a computer program product is also provided, which, when executed by an electronic device, is capable of carrying out any one of the exemplary methods provided herein.

Also, a computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (EPROM), flash Memory, optical fiber, compact disc read-Only Memory (Compact Disk Read Only Memory, CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for data querying in embodiments of the present application may take the form of a CD-ROM and include program code that can run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio Frequency (RF), etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In cases involving remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, such as a local area network (Local Area Network, LAN) or wide area network (Wide Area Network, WAN), or may be connected to an external computing device (e.g., connected over the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A business logic vulnerability attack detection model training method is characterized by comprising the following steps:

2. The method of claim 1, wherein the first service related data includes a service request message and a service response message recorded during operation of service logic, session data recorded in a database, personal information data in a return packet, and URL address of an access interface.

3. The method of claim 1, wherein the attack signature data comprises static signature data, and further comprising, prior to extracting attack signature data for the first traffic-related data:

4. The method of claim 3, wherein the data source types include at least a traffic log class, an access record class of a WEB application protection system, and an alert output class for identifying security products of sensitive data.

5. The method of claim 3, wherein performing static feature extraction on the first data according to a static feature extraction rule corresponding to the data source type comprises:

6. The method of claim 5, wherein the quality checking of the static feature of the tag by matching the static feature of the tag with basic attribute information corresponding to the data source type, comprises:

Correcting the static characteristics of the mark, including:

7. The method of claim 3, wherein the attack signature data further comprises statistical class signature data;

8. The method of claim 1, wherein taking the attack characteristic data as an input of a business logic vulnerability attack detection model, and performing model training with a goal of outputting a corresponding attack tag, comprises:

9. The method of claim 8, wherein after determining the current business logic vulnerability attack detection model as the initial detection model, further comprising:

10. The method of claim 9, wherein redefining the extraction rule of the attack characteristic data when the comparison result is determined to be inconsistent, comprises:

11. The method of claim 9, wherein updating the extraction rules of the attack signature data further comprises:

12. The method of any of claims 1-11, wherein the business logic vulnerability attack detection model is an XGBoost model.

13. The method for detecting the business logic vulnerability attack is characterized by comprising the following steps:

inputting the attack characteristic data into a business logic vulnerability detection model obtained based on the business logic vulnerability attack detection model training method according to any one of claims 1-12;

14. A business logic vulnerability attack detection model training device is characterized by comprising:

15. The utility model provides a detection device to business logic vulnerability attack which characterized in that includes:

the input module is used for inputting the attack characteristic data into a business logic vulnerability detection model obtained based on the business logic vulnerability attack detection model training method according to any one of claims 1-12;

16. An electronic device, comprising:

a memory for storing program instructions;

a processor for invoking program instructions stored in the memory and for performing the steps comprised in the method according to any of claims 1-13 in accordance with the obtained program instructions.

17. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-13.

18. A computer program product, the computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method of any of the preceding claims 1-13.