CN113554474B - Model verification method and device, electronic equipment and computer readable storage medium - Google Patents
Model verification method and device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN113554474B CN113554474B CN202110921148.9A CN202110921148A CN113554474B CN 113554474 B CN113554474 B CN 113554474B CN 202110921148 A CN202110921148 A CN 202110921148A CN 113554474 B CN113554474 B CN 113554474B
- Authority
- CN
- China
- Prior art keywords
- model
- abnormal flow
- abnormal
- identified
- flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012795 verification Methods 0.000 title claims abstract description 42
- 230000002159 abnormal effect Effects 0.000 claims abstract description 197
- 238000012545 processing Methods 0.000 claims description 26
- 238000004891 communication Methods 0.000 claims description 18
- 238000010606 normalization Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 7
- 238000010200 validation analysis Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 11
- 238000010801 machine learning Methods 0.000 abstract description 11
- 230000008569 process Effects 0.000 description 7
- 230000009471 action Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000005856 abnormality Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000001680 brushing effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0248—Avoiding fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Software Systems (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Game Theory and Decision Science (AREA)
- Evolutionary Computation (AREA)
- Marketing (AREA)
- Economics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Medical Informatics (AREA)
- Entrepreneurship & Innovation (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The application provides a verification method and device of a model, electronic equipment and a computer readable storage medium; wherein the method comprises the following steps: acquiring a first abnormal flow identified from the first data by the first model; the first model is obtained by adjusting a second model based on a second abnormal flow; the second abnormal flow is abnormal flow which does not comprise fourth abnormal flow in third abnormal flow, the third abnormal flow is identified from the first data by the second model, and the fourth abnormal flow is identified from the first data by a preset first rule; and comparing the first abnormal flow with the third abnormal flow, and generating a verification result according to the comparison result, wherein the verification result is used for representing whether the first model is the model enhanced by the second model. The application solves the problem that whether the recognition effect of the machine learning model after enhancement is really improved cannot be verified in the prior art.
Description
Technical Field
The present application relates to the field of data capability, and in particular, to a method and apparatus for verifying a model, an electronic device, and a computer readable storage medium.
Background
At present, advertisement abnormal flow identification mainly depends on rules based on experience, and the rule identification system has the defect that some new cheating modes cannot be actively and timely identified, namely, various information collection and feedback are needed for a certain time, and summary is carried out, so that new rules are added. Therefore, methods for identifying abnormal traffic using machine learning models have been attracting attention. Therefore, it is proposed to enhance the machine learning model with rules for abnormal traffic recognition, but no effective solution exists at present for whether the recognition effect of the enhanced machine learning model is actually improved.
Disclosure of Invention
An embodiment of the application aims to provide a method and a device for verifying a model, electronic equipment and a computer readable storage medium, so as to solve the problem that whether the recognition effect of an enhanced machine learning model is really improved cannot be verified in the prior art. The specific technical scheme is as follows:
In a first aspect of an embodiment of the present application, there is provided a method for verifying a model, including: acquiring a first abnormal flow identified from the first data by the first model; the first model is obtained by adjusting a second model based on a second abnormal flow; the second abnormal flow is abnormal flow which does not comprise fourth abnormal flow in third abnormal flow, the third abnormal flow is identified from the first data by the second model, and the fourth abnormal flow is identified from the first data by a preset first rule; and comparing the first abnormal flow with the third abnormal flow, and generating a verification result according to the comparison result, wherein the verification result is used for representing whether the first model is the model enhanced by the second model.
In a second aspect of the embodiment of the present application, there is provided a verification apparatus for a model, including: the acquisition module is used for acquiring first abnormal flow identified from the first data by the first model; the first model is obtained by adjusting a second model based on a second abnormal flow; the second abnormal flow is abnormal flow which does not comprise fourth abnormal flow in third abnormal flow, the third abnormal flow is identified from the first data by the second model, and the fourth abnormal flow is identified from the first data by a preset first rule; and the comparison module is used for comparing the first abnormal flow and the third abnormal flow and generating a verification result according to the comparison result, wherein the verification result is used for representing whether the first model is the model enhanced by the second model.
In a third aspect of the embodiments of the present application, there is provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory perform communication with each other through the communication bus; a memory for storing a computer program; a processor for implementing the method steps of the first aspect when executing a program stored on a memory
In a fourth aspect of the application, there is provided a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of the first aspect.
The application can be applied to the technical field of data capacity for data mining. According to the application, the second model can be identified, the second abnormal flow which cannot be identified by the first rule is adjusted to obtain the first model, and then the first abnormal flow identified by the first model is compared with the third abnormal flow identified by the second model, so that whether the first model is enhanced relative to the second model can be determined, whether the first model is an enhanced model can be conveniently and rapidly determined through the verification process, if so, the identification effect of the abnormal flow can be improved through the first model, and the problem that whether the identification effect of the enhanced machine learning model is really improved cannot be verified in the prior art is solved.
Drawings
FIG. 1 is a flow chart of a method of validating a model in an embodiment of the application;
FIG. 2 is a flow chart of a method of an abnormal traffic identification model with enhanced verification rules in an embodiment of the application;
FIG. 3 is a schematic structural view of a verification device of a model in an embodiment of the present application;
Fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
As shown in fig. 1, in an embodiment of the present application, there is provided a method for verifying a model, including the steps of:
Step 102, acquiring first abnormal flow identified from first data by a first model; the first model is obtained by adjusting the second model based on the second abnormal flow; the second abnormal flow is abnormal flow which does not comprise fourth abnormal flow in third abnormal flow, the third abnormal flow is identified from the first data by the second model, and the fourth abnormal flow is identified from the first data by a preset first rule;
It should be noted that, in the embodiment of the present application, the first model and the second model are both machine learning models, and may be a supervised machine learning model or an unsupervised machine learning model. The method can be determined according to whether the identified data has a data tag or not, if the identified data has the data tag, a supervision type model such as a support vector machine can be used, and if the identified data has no data tag, an unsupervised type model such as an isolated forest can be used. The foregoing is merely illustrative, and the selection of the corresponding model may be specifically performed according to practical situations.
In addition, the first rule may be determined according to a specific user experience, taking the advertisement data as an example, and the user may classify or define the abnormal data according to a specific experience, such as an automatic brushing tool, a malicious traffic consuming tool, a malicious domain name misleading, and the like. The above is merely an example, and a user may mainly seek to build an abnormal traffic definition classification table for determining the type or tool of abnormal traffic, etc., based on practices and advertisements.
And 104, comparing the first abnormal flow with the third abnormal flow, and generating a verification result according to the comparison result, wherein the verification result is used for representing whether the first model is a model enhanced by the second model.
Through the steps 102 and 104, the second model can be identified, the second abnormal flow which cannot be identified by the first rule is adjusted to obtain the first model, and then the first abnormal flow identified by the first model is compared with the third abnormal flow identified by the second model, so that whether the first model is enhanced relative to the second model can be determined, whether the first model is an enhanced model can be conveniently and rapidly determined through the verification process, if so, the identification effect of the abnormal flow can be improved through the first model, and the problem that whether the identification effect of the enhanced machine learning model is really improved cannot be verified in the prior art is solved.
In an optional implementation manner of the embodiment of the present application, for the manner of comparing the first abnormal traffic and the third abnormal traffic related to the step 104 to generate the verification result, the method may further include:
Step 11, generating a first verification result under the condition that the first abnormal flow comprises a fifth abnormal flow in addition to the third abnormal flow; the first verification result is used for representing that the first model is a model enhanced by the second model;
Step 12, generating a second verification result when the first abnormal flow is a third abnormal flow or the first abnormal flow is a part of the third abnormal flow; the second verification result is used for representing that the first model is a model after the second model fails to be enhanced.
From the above steps 11 and 12, if the first abnormal flow identified by the first model includes a fifth abnormal flow in addition to the third abnormal flow, it indicates that the first model can identify the fifth abnormal flow that the second model cannot identify, and it indicates that the first model is enhanced based on the second model. If the first abnormal flow identified by the first model is said to be equal to the third abnormal flow, or the first abnormal flow is only part of the third abnormal flow, it is said that the first model is not enhanced relative to the second model.
The comparison between the first abnormal flow rate and the third abnormal flow rate may be a comparison between the data amounts of the abnormal flow rates or a comparison between the types of the abnormal flow rates. For example, the first abnormal flow is the same as the flow type in the third abnormal flow, but for one or more abnormal flow types, the first model can identify more abnormal flow data volumes for the first model, for example, the first model can identify abnormal flow A data volume of 50M, abnormal flow B data volume of 100M, the second model can identify abnormal flow A data volume of 30M, and abnormal flow B data volume of 50M; it is stated that the first model is able to identify more abnormal traffic for the same type of abnormal traffic relative to the second model, i.e. the first model is enhanced relative to the second model.
For another example, if the first abnormal flow is not the same type of flow as the third abnormal flow and the type in the first abnormal flow is more than the type in the third abnormal flow, it is also indicated that the first model is enhanced relative to the second model.
In another optional implementation manner of the embodiment of the present application, for the manner of obtaining the first abnormal traffic identified by the first model from the first data in the step 102, the method may further include:
Step 21, adjusting the threshold value in the first model for N times until the number of abnormal flow rates identified from the first data by the first model after adjusting the threshold value based on N times is greater than the number of third abnormal flow rates; wherein the threshold value is used to characterize the maximum value of the identified abnormal flow; n is a positive integer;
step 22, determining a first abnormal flow from the abnormal flow identified in the first data by the first model after adjusting the threshold value based on N times.
It should be noted that, the first model recognizes the abnormal flow through the threshold control, and the threshold of the first model is adjusted to increase the recognized abnormal flow, that is, the enhanced model can find more abnormal flows, that is, new abnormalities can be found at the same time, and abnormalities of the original model (model before not being enhanced) can be found, so the number of output results should be greater than that of the original model, that is, the value of N should be determined according to the specific adjustment, for example, the number of output results of the first model should be greater than that of the output results of the original second model after 3 times of adjustment, and the value of N should be 3.
In still another implementation manner of the embodiment of the present application, before the acquiring the first abnormal traffic identified by the first model from the first data in step 102, the method of the embodiment of the present application may further include:
Step 31, determining a second rule for identifying abnormal traffic based on the third abnormal traffic, wherein the second rule is different from the first rule;
step 32, carrying out feature engineering on the second rule to obtain corresponding features;
and step 33, inputting the characteristics into the first model to obtain a second model.
It can be seen that the abnormal traffic, which can be identified based on the second model but cannot be identified by the first rule, is determined by a second rule different from the first rule, the feature engineering is performed based on the second rule, and the extracted feature is input to the second model to obtain the first model.
The method for performing feature engineering on the second rule to obtain the corresponding feature in step 32 may further include:
Step 41, binarizing the rule content in the second rule;
The rule content refers to the feature of quantitative description in the second rule, and binarization processing is carried out on the feature, wherein the binarization processing can obtain a binarization result based on a one-hot coding mode.
Step 42, carrying out dimensionless treatment on the binarization treatment result, and carrying out normalization treatment on the dimensionless treatment result;
wherein, dimensionless data of different specifications are converted into the same specification. Common dimensionless methods are normalization and interval scaling. The premise of standardization is that the characteristic value is subjected to normal distribution, and after standardization, the characteristic value is converted into standard normal distribution. The interval scaling method uses boundary value information to scale the value interval of a feature to a range of a certain feature, for example, [0,1] or the like.
In addition, the normalization processing is to process data according to the rows of the feature matrix for the dimensionless processing result, and aims to have a unified standard when the sample vector calculates similarity through point multiplication operation or other kernel functions.
And 43, performing dimension reduction processing on the normalization processing result to obtain the characteristics.
After the feature selection is completed, the model can be directly trained, but the problem of large calculation amount and long training time may be caused by overlarge feature matrix, so that the dimension of the feature matrix needs to be reduced to save the training time. The dimension reduction method comprises the following steps: principal component analysis (PRINCIPAL COMPONENT ANALYSIS, PCA) and linear discriminant analysis (LINEAR DISCRIMINANT ANALYSIS, LDA), which itself is also a classification model. PCA and LDA have many similarities, the nature of which is to map the original samples into a lower dimensional sample space, but the mapping targets of PCA and LDA are not the same: PCA is to maximize the divergence of the mapped samples; whereas LDA is to give the mapped samples the best classification performance. So PCA is said to be an unsupervised dimension reduction method, while LDA is said to be a supervised dimension reduction method.
The application is explained below in connection with specific implementations of embodiments of the application; the embodiment provides a method for checking an abnormal flow identification model with enhanced rules, as shown in fig. 2, the method comprises the following steps:
Step 201, respectively identifying abnormal flow by using a given model and an existing rule;
The given models can be classified into two types, i.e. supervised models and unsupervised models, according to whether the data labels are determined, if the data labels are determined, models such as a support vector machine can be used, and if the data labels are not determined, models such as isolated forests can be used. The rules are mainly determined based on experience of a practitioner, and a set of own rules exist for general practitioners or practitioners enterprises;
step 202, comparing the respectively identified abnormal flow, and determining an abnormal flow part which can be identified only by a given model and cannot be identified by a rule;
step 203, summarizing some new rules based on the determined abnormal flow of the part;
Wherein, the new rule is a rule which does not exist in the original rule system and is summarized according to the newly identified flow characteristics, and is different from the existing rule;
Step 204, performing feature engineering on the summarized new rule;
Specifically, the method can be realized by the following steps: firstly, carrying out binarization representation on quantitative descriptions in the rules, wherein a one-hot coding mode can be adopted, and then, carrying out numerical representation on qualitative descriptions, and removing dimension and normalizing if necessary; performing dimension reduction on the new features to obtain desired features;
step 205, inputting the extracted features as new features into the model to obtain a model with enhanced rules;
step 206, comparing the recognition result of the enhanced model with the recognition result of the original given model;
If the enhanced model can keep the abnormal flow identified by the original model, new abnormal flow is identified, namely the rule enhancement effect is verified.
Corresponding to fig.1, the embodiment of the present application further provides a verification device for a model, as shown in fig. 3, where the device includes:
An acquisition module 32 for acquiring a first abnormal flow identified from the first data by the first model; the first model is obtained by adjusting the second model based on the second abnormal flow; the second abnormal flow is abnormal flow which does not comprise fourth abnormal flow in third abnormal flow, the third abnormal flow is identified from the first data by the second model, and the fourth abnormal flow is identified from the first data by a preset first rule;
and the comparison module 34 is configured to compare the first abnormal flow and the third abnormal flow, and generate a verification result according to the comparison result, where the verification result is used to characterize whether the first model is a model enhanced by the second model.
By means of the device, the second model can be identified, the second abnormal flow which cannot be identified by the first rule is adjusted to obtain the first model, then the first abnormal flow identified by the first model is compared with the third abnormal flow identified by the second model, whether the first model is enhanced relative to the second model can be determined, whether the first model is an enhanced model can be conveniently and rapidly determined through the verification process, if so, the identification effect of the abnormal flow can be improved through the first model, and the problem that whether the identification effect of the enhanced machine learning model is improved really cannot be verified in the prior art is solved.
Optionally, the comparing module 34 in the embodiment of the present application may further include: the first comparison unit is used for generating a first verification result when the first abnormal flow comprises a fifth abnormal flow in addition to the third abnormal flow; the first verification result is used for representing that the first model is a model enhanced by the second model; a second comparing unit, configured to generate a second verification result when the first abnormal flow is the third abnormal flow or the first abnormal flow is a part of the third abnormal flow; the second verification result is used for representing that the first model is a model after the second model fails to be enhanced.
Optionally, the acquiring module 32 in the embodiment of the present application may further include: the adjusting unit is used for adjusting the threshold value in the first model for N times until the number of abnormal flow rates identified from the first data by the first model after the threshold value is adjusted for N times is larger than the number of third abnormal flow rates; wherein the threshold value is used to characterize the maximum value of the identified abnormal flow; n is a positive integer; and the determining unit is used for determining the first abnormal flow from the abnormal flow identified from the first data through the first model after the threshold value is adjusted for N times.
Optionally, the apparatus in the embodiment of the present application may further include: a determining module for determining a second rule for identifying abnormal traffic based on the third abnormal traffic before acquiring the first abnormal traffic identified from the first data by the first model, wherein the second rule is different from the first rule; the first processing module is used for carrying out feature engineering on the second rule to obtain corresponding features; and the second processing module is used for inputting the characteristics into the first model to obtain a second model.
Optionally, the first processing module may further include: the first processing unit is used for carrying out binarization processing on rule contents in the second rule; the second processing unit is used for carrying out dimensionless processing on the binarization processing result and carrying out normalization processing on the dimensionless processing result; and the third processing unit is used for performing dimension reduction processing on the normalization processing result to obtain the characteristics.
The embodiment of the application also provides an electronic device, as shown in fig. 4, which comprises a processor 401, a communication interface 402, a memory 403 and a communication bus 404, wherein the processor 401, the communication interface 402 and the memory 403 complete communication with each other through the communication bus 404,
A memory 403 for storing a computer program;
processor 401, when executing programs stored on memory 403, implements the method steps of fig. 1.
The processing in the electronic device implements the method steps in fig. 1, and the technical effects brought by the processing are consistent with those of the verification method of the model in fig. 1, which are not described herein.
The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In yet another embodiment of the present application, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform the method of verifying a model according to any of the above embodiments.
In yet another embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the method of processing an API of any of the above embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.
Claims (8)
1. A method of validating a model, comprising:
Acquiring a first abnormal flow identified from the first data by the first model; the first model is obtained by adjusting a second model based on a second abnormal flow; the second abnormal flow is abnormal flow which does not comprise fourth abnormal flow in third abnormal flow, the third abnormal flow is identified from the first data by the second model, and the fourth abnormal flow is identified from the first data by a preset first rule;
Comparing the first abnormal flow with the third abnormal flow, and generating a verification result according to the comparison result, wherein the verification result is used for representing whether the first model is a model enhanced by the second model; comparing the first abnormal flow and the third abnormal flow means comparing the flow types of the first abnormal flow and the third abnormal flow, and comparing the data volume of the same flow type;
Wherein the obtaining the first abnormal traffic identified from the first data by the first model includes: adjusting the threshold value in the first model for N times until the number of abnormal flow rates identified from the first data by the first model after the threshold value is adjusted for N times is greater than the number of third abnormal flow rates; wherein the threshold value is used to characterize the maximum value of the identified abnormal flow; and determining the first abnormal flow from the abnormal flow identified in the first data through a first model after threshold adjustment based on N times.
2. The method of claim 1, wherein the comparing the first abnormal traffic and the third abnormal traffic to generate a validation result comprises:
Generating a first verification result under the condition that the first abnormal flow comprises a fifth abnormal flow in addition to the third abnormal flow; the first verification result is used for representing that the first model is a model enhanced by the second model;
Generating a second verification result when the first abnormal flow is the third abnormal flow or the first abnormal flow is part of the third abnormal flow; the second verification result is used for representing that the first model is a model after the second model fails to be enhanced.
3. The method of claim 1, wherein prior to obtaining the first abnormal traffic identified from the first data by the first model, the method further comprises:
determining a second rule for identifying abnormal traffic based on the third abnormal traffic, wherein the second rule is different from the first rule;
performing feature engineering on the second rule to obtain corresponding features;
and inputting the characteristics into the first model to obtain the second model.
4. A method according to claim 3, wherein said feature engineering the second rule to obtain the corresponding feature comprises:
Performing binarization processing on rule contents in the second rule;
carrying out dimensionless treatment on the binarization treatment result, and carrying out normalization treatment on the dimensionless treatment result;
and performing dimension reduction treatment on the normalization treatment result to obtain the characteristics.
5. A device for verifying a model, comprising:
The acquisition module is used for acquiring first abnormal flow identified from the first data by the first model; the first model is obtained by adjusting a second model based on a second abnormal flow; the second abnormal flow is abnormal flow which does not comprise fourth abnormal flow in third abnormal flow, the third abnormal flow is identified from the first data by the second model, and the fourth abnormal flow is identified from the first data by a preset first rule;
The comparison module is used for comparing the first abnormal flow and the third abnormal flow and generating a verification result according to a comparison result, wherein the verification result is used for representing whether the first model is the model enhanced by the second model; comparing the first abnormal flow and the third abnormal flow means comparing the flow types of the first abnormal flow and the third abnormal flow, and comparing the data volume of the same flow type;
Wherein, the acquisition module includes: an adjusting unit, configured to perform N times of adjustment on the threshold value in the first model until the number of abnormal traffic identified from the first data by the first model after the threshold value is adjusted N times is greater than the number of third abnormal traffic; wherein the threshold value is used to characterize the maximum value of the identified abnormal flow; n is a positive integer; and the determining unit is used for determining the first abnormal flow from the abnormal flow identified from the first data through a first model after the threshold is adjusted for N times.
6. The apparatus of claim 5, wherein the comparison module comprises:
a first comparing unit, configured to generate a first verification result when the first abnormal traffic includes a fifth abnormal traffic in addition to the third abnormal traffic; the first verification result is used for representing that the first model is a model enhanced by the second model;
A second comparing unit, configured to generate a second verification result when the first abnormal flow is the third abnormal flow or the first abnormal flow is a part of the third abnormal flow; the second verification result is used for representing that the first model is a model after the second model fails to be enhanced.
7. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-4 when executing a program stored on a memory.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110921148.9A CN113554474B (en) | 2021-08-11 | 2021-08-11 | Model verification method and device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110921148.9A CN113554474B (en) | 2021-08-11 | 2021-08-11 | Model verification method and device, electronic equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113554474A CN113554474A (en) | 2021-10-26 |
CN113554474B true CN113554474B (en) | 2024-08-20 |
Family
ID=78105523
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110921148.9A Active CN113554474B (en) | 2021-08-11 | 2021-08-11 | Model verification method and device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113554474B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112950249A (en) * | 2019-12-16 | 2021-06-11 | 旺脉信息科技(上海)有限公司 | Method and system for processing advertisement flow data, electronic equipment and storage medium |
CN112950231A (en) * | 2021-03-19 | 2021-06-11 | 广州瀚信通信科技股份有限公司 | XGboost algorithm-based abnormal user identification method, device and computer-readable storage medium |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108289088B (en) * | 2017-01-09 | 2020-12-11 | 中国移动通信集团河北有限公司 | Abnormal flow detection system and method based on business model |
CN108809948B (en) * | 2018-05-21 | 2020-07-10 | 中国科学院信息工程研究所 | Abnormal network connection detection method based on deep learning |
CN109902157A (en) * | 2019-01-10 | 2019-06-18 | 平安科技(深圳)有限公司 | A kind of training sample validation checking method and device |
CN111556016B (en) * | 2020-03-25 | 2021-02-26 | 中国科学院信息工程研究所 | Network flow abnormal behavior identification method based on automatic encoder |
CN112613543B (en) * | 2020-12-15 | 2023-05-30 | 重庆紫光华山智安科技有限公司 | Enhanced policy verification method, enhanced policy verification device, electronic equipment and storage medium |
CN112784881B (en) * | 2021-01-06 | 2021-08-27 | 北京西南交大盛阳科技股份有限公司 | Network abnormal flow detection method, model and system |
CN112949760B (en) * | 2021-03-30 | 2024-05-10 | 平安科技(深圳)有限公司 | Model precision control method, device and storage medium based on federal learning |
CN113067754B (en) * | 2021-04-13 | 2022-04-26 | 南京航空航天大学 | Semi-supervised time series anomaly detection method and system |
-
2021
- 2021-08-11 CN CN202110921148.9A patent/CN113554474B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112950249A (en) * | 2019-12-16 | 2021-06-11 | 旺脉信息科技(上海)有限公司 | Method and system for processing advertisement flow data, electronic equipment and storage medium |
CN112950231A (en) * | 2021-03-19 | 2021-06-11 | 广州瀚信通信科技股份有限公司 | XGboost algorithm-based abnormal user identification method, device and computer-readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113554474A (en) | 2021-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106951925B (en) | Data processing method, device, server and system | |
CN107204960B (en) | Webpage identification method and device and server | |
CN108021651B (en) | Network public opinion risk assessment method and device | |
CN104915327A (en) | Text information processing method and device | |
CN111818198B (en) | Domain name detection method, domain name detection device, equipment and medium | |
CN111460250A (en) | Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus | |
CN111242793B (en) | Medical insurance data abnormality detection method and device | |
CN113312899B (en) | Text classification method and device and electronic equipment | |
CN109933648B (en) | Real user comment distinguishing method and device | |
CN108241867B (en) | Classification method and device | |
CN111796957A (en) | Transaction abnormal root cause analysis method and system based on application log | |
CN112685324A (en) | Method and system for generating test scheme | |
CN112765003B (en) | Risk prediction method based on APP behavior log | |
CN114595765A (en) | Data processing method and device, electronic equipment and storage medium | |
CN113837836A (en) | Model recommendation method, device, equipment and storage medium | |
CN113656354A (en) | Log classification method, system, computer device and readable storage medium | |
CN113554474B (en) | Model verification method and device, electronic equipment and computer readable storage medium | |
CN110795308A (en) | Server inspection method, device, equipment and storage medium | |
CN111754352A (en) | Method, device, equipment and storage medium for judging correctness of viewpoint statement | |
CN113344469B (en) | Fraud identification method and device, computer equipment and storage medium | |
CN116910592A (en) | Log detection method and device, electronic equipment and storage medium | |
CN112559679B (en) | Political new media propagation force detection method, device, equipment and storage medium | |
CN114298563A (en) | Alarm information analysis method and device and computer equipment | |
CN113836899A (en) | Webpage identification method and device, electronic equipment and storage medium | |
CN113468379A (en) | Data source processing method and device and intelligent analysis platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |