CN115758336A - Asset identification method and device - Google Patents
Asset identification method and device Download PDFInfo
- Publication number
- CN115758336A CN115758336A CN202211297450.2A CN202211297450A CN115758336A CN 115758336 A CN115758336 A CN 115758336A CN 202211297450 A CN202211297450 A CN 202211297450A CN 115758336 A CN115758336 A CN 115758336A
- Authority
- CN
- China
- Prior art keywords
- feature
- determining
- point
- identified
- confidence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides an asset identification method and device, and relates to the technical field of data processing. The method comprises the following steps: performing log analysis processing on the log data of the assets to obtain features to be identified; according to the influence degree of each feature to be identified on the asset safety, distributing a feature weight matched with the influence degree to the feature to be identified; constructing a feature vector of the asset according to each feature to be identified and the feature weight corresponding to the feature; performing cluster recognition processing on the feature vectors by using a cluster recognition model, and determining the confidence of the feature vectors belonging to the corresponding classification of each cluster center; and determining whether the assets are risk assets or not according to the determined confidence degrees under the categories. Therefore, identification accuracy of the risky assets is improved.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to an asset identification method and apparatus.
Background
Information security is an important component of national security, and assets are carriers of information, so that it is important to protect the assets from security, and the premise for protecting the assets from security is to accurately identify the risky assets. There are many dimensions for measuring whether an asset is safe, such as the number of external network attacks, the number of violent breakages, the number of host malicious files, and the like. How to perform spinning and cocoon stripping on massive and complex data and accurately identify risk assets is a popular subject of research of expert and scholars.
The existing risk asset identification method generally has the disadvantages of high complexity of the identification algorithm and poor judgment accuracy of some methods. Therefore, how to ensure the accuracy of the risk asset identification method is one of the considerable technical problems.
Disclosure of Invention
In view of this, the present application provides an asset identification method and apparatus for improving accuracy of risk asset identification.
Specifically, the method is realized through the following technical scheme:
according to a first aspect of the application, there is provided an asset identification method comprising:
performing log analysis processing on the log data of the assets to obtain features to be identified;
according to the influence degree of each feature to be identified on the asset safety, distributing a feature weight matched with the influence degree to the feature to be identified;
constructing a feature vector of the asset according to each feature to be identified and the feature weight corresponding to the feature;
performing cluster recognition processing on the feature vectors by using a cluster recognition model, and determining the confidence of the feature vectors belonging to the corresponding classification of each cluster center;
and determining whether the assets are risk assets or not according to the determined confidence degrees under the classifications.
According to a second aspect of the present application, there is provided an asset identification device comprising:
the analysis module is used for carrying out log analysis processing on the log data of the assets to obtain features to be identified;
the distribution module is used for distributing the characteristic weight matched with the influence degree to the characteristic to be identified according to the influence degree of each characteristic to be identified on the asset safety;
the construction module is used for constructing the feature vector of the asset according to each feature to be identified and the feature weight corresponding to the feature;
the determining module is used for carrying out clustering recognition processing on the characteristic vectors by utilizing a clustering recognition model and determining the confidence coefficient of the characteristic vectors belonging to the corresponding classification of each clustering center;
and the identification module is used for determining whether the assets are risk assets or not according to the determined confidence degrees under all the classifications.
According to a third aspect of the present application, there is provided an electronic device, comprising a processor and a machine-readable storage medium, the machine-readable storage medium storing a computer program capable of being executed by the processor, the processor being caused by the computer program to perform the method provided by the first aspect of the embodiments of the present application.
According to a fourth aspect of the present application, there is provided a machine-readable storage medium storing a computer program which, when invoked and executed by a processor, causes the processor to perform the method provided by the first aspect of the embodiments of the present application.
The beneficial effects of the embodiment of the application are as follows:
according to the asset identification method and device provided by the embodiment of the application, log analysis processing is carried out on the log data of assets to obtain the features to be identified; according to the influence degree of each feature to be identified on the asset safety, distributing a feature weight matched with the influence degree to the feature to be identified; constructing a feature vector of the asset according to each feature to be identified and the feature weight corresponding to the feature; performing cluster recognition processing on the feature vectors by using a cluster recognition model, and determining the confidence of the feature vectors belonging to the corresponding classification of each cluster center; and determining whether the assets are risk assets or not according to the determined confidence degrees under the categories. By implementing the method, when the characteristic vector is determined, the characteristic weight is introduced, so that the obtained characteristic vector can represent the actual characteristics of the asset, the confidence coefficient of the characteristic vector under each classification can be accurately determined based on the cluster identification model, and whether the asset is a risk asset is accurately determined, and the accuracy of risk identification is improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating an asset identification method provided by an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an asset identification device according to an embodiment of the present disclosure;
fig. 3 is a hardware structural diagram of an electronic device implementing an asset identification method according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with aspects such as the present application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the corresponding listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.
The asset identification method provided by the present application is explained in detail below.
Referring to fig. 1, fig. 1 is a flowchart of an asset identification method provided in the present application, which may be applied to an electronic device. When the electronic device implements the method, the method can comprise the following steps:
s101, carrying out log analysis processing on the log data of the assets to obtain features to be identified.
In this step, the electronic device may collect log data of the asset within a set time period, and then analyze the log data according to a log analysis rule to analyze the log data into data which can be identified by the electronic device and used for asset identification, and record the data as the to-be-identified feature.
Optionally, the log data of the asset may be analyzed in a forensics and source tracing manner, then a log analysis rule is set according to the risk possibly existing in the asset, and then after the log data is collected, the feature to be identified for asset risk identification is analyzed according to the set log analysis rule.
It should be noted that the features to be identified corresponding to different assets may be the same or different, and may be specifically set according to actual situations. And the types of the features to be identified can be dynamically set, and can be specifically set according to the application scene of the asset, so as to better identify the risk of the asset.
It should be noted that the asset may be, but is not limited to, a device that may be at risk, and in one example, the asset may be a network security device such as a firewall or a gateway.
And S102, according to the influence degree of each feature to be identified on the asset safety, distributing a feature weight matched with the influence degree to the feature to be identified.
In this step, in practical application, since the security of the asset may be affected differently by each feature, if the feature vector is constructed by directly using the feature value of the feature to be identified of the asset analyzed from the log data, the possible error may be relatively large, and the security of the asset may not be truly reflected. In view of this, in this implementation, a feature weight is introduced, that is, an adaptive feature weight is assigned to each feature to be identified according to the degree of influence of the feature on the security of the asset. That is, features to be identified that have a greater impact on asset security are assigned feature weights that have a higher numerical value, while features that have a lesser impact on asset security are assigned feature weights that have a lesser numerical value. For example, when the features to be identified include brute force cracking and configuration risks, which have a high impact on the safety of the asset, 20% of the weight is assigned to brute force cracking; and if the configuration risk has a low impact on the safety of the asset, 5% of characteristic weight is allocated to the configuration risk, and the like.
It should be noted that the present embodiment uses the influence degree to measure the influence of the feature to be identified on the security of the asset. The larger the influence, the higher the influence degree value; the smaller the influence, the smaller the value of the degree of influence.
Optionally, the degree of influence of the features to be identified on the safety of the assets may be set by operation and maintenance personnel according to experience; the setting may also be performed according to the analysis result of the attack feature in the log data, and other methods may also be adopted, which is not limited in the present application.
It should be noted that, when assigning a feature weight to each feature to be identified, it is necessary to ensure that the sum of the feature weights of each feature to be identified is 1.
S103, constructing a feature vector of the asset according to each feature to be identified and the feature weight corresponding to the feature.
In this step, after the feature weight is assigned to each feature to be identified, a final feature value of the feature to be identified may be obtained by calculation according to the initial feature value and the feature weight of the feature to be identified extracted from the log data, for example, the final feature value is a product of the initial feature value and the feature weight. Then, according to the final characteristic value of each characteristic to be identified, constructing a characteristic vector of the asset, which can be expressed as Is a feature vector, x, of the asset 1 ~x n The final feature value for each feature to be identified for the asset.
And S104, performing cluster recognition processing on the feature vectors by using a cluster recognition model, and determining the confidence of the feature vectors belonging to the corresponding classification of each cluster center.
In this step, the cluster recognition model trains a plurality of cluster centers, one cluster center corresponding to one normal classification, and one cluster center corresponding to one abnormal classification. And then, respectively creating a feature vector library under each classification in advance aiming at the normal classification and the abnormal classification, and then training by using the feature vector library of each classification aiming at each classification to obtain a clustering center of the classification. It should be noted that the feature vector in each feature vector library is constructed according to the construction method of steps S102 and S103. Based on this, the cluster recognition model can be obtained.
Optionally, each cluster center may further correspond to a risk classification, which may be a risk classification that may exist in the current electronic device, and may further include a risk-free security classification. And then calculating the clustering center of each classification by using the corresponding feature vector aiming at each risk classification and safety classification. Before training the cluster recognition model, a feature vector library under each class may be created in advance, and then, for each class, a cluster center of the class is obtained by training using the feature vector library of the class. It should be noted that the feature vectors in each feature vector library are constructed according to the construction method in steps S102 and S103. Based on this, the cluster recognition model can be obtained.
In this way, when step S103 is executed, the feature vector generated in step S103 can be directly input into the cluster recognition model, so that the confidence degree corresponding to the feature vector belonging to each cluster center can be determined.
It should be noted that, for each cluster center, the euclidean distance between the feature vector and the cluster center may be determined, and then the confidence of the classification corresponding to the feature vector belonging to the cluster center is represented according to the determined euclidean distance.
And S105, determining whether the assets are risk assets or not according to the determined confidence degrees under all the classifications.
In this step, a confidence threshold may be set for each classification, so that after the confidence of each classification is determined, each confidence is compared with the confidence threshold of the corresponding classification, and when the confidence is greater than the confidence threshold, it may be determined that the risk classification of the asset is a risk classification whose confidence is greater than the corresponding confidence threshold, and it may be further determined that the asset belongs to a risk asset under the risk classification.
Optionally, it may also be determined whether the asset is an at-risk asset as follows: the number of risk classifications with a confidence level greater than the corresponding confidence threshold may be determined. When the determined number is greater than the set number, it may indicate that the asset may be variously attacked, and thus, the asset may be determined to be an at-risk asset.
By implementing the asset identification method provided by the application, log analysis processing is carried out on the log data of the asset to obtain the features to be identified; according to the influence degree of each feature to be identified on the asset safety, distributing a feature weight matched with the influence degree to the feature to be identified; constructing a feature vector of the asset according to each feature to be identified and the feature weight corresponding to the feature; performing cluster recognition processing on the feature vectors by using a cluster recognition model, and determining the confidence of the feature vectors belonging to the corresponding classification of each cluster center; and determining whether the assets are risk assets or not according to the determined confidence degrees under the categories. By implementing the method, when the characteristic vector is determined, the characteristic weight is introduced, so that the obtained characteristic vector can represent the actual characteristics of the asset, the confidence coefficient of the characteristic vector under each classification can be accurately determined based on the cluster recognition model, and whether the asset is a risk asset is accurately determined, so that the accuracy of risk recognition is improved.
Optionally, based on the above embodiment, the feature to be identified in the present embodiment may include, but is not limited to, at least one of the following: the number of external network attacks, the number of violent breakdowns, the number of host malicious files, the number of grey software, the number of advertising software, the number of backdoor files, the number of weak passwords, the number of malicious communications, the number of configuration risks, and the like.
Specifically, when the log data is acquired, the log data at multiple time points in a set time period may be acquired, so as to obtain the features to be identified at each time point. For example, the acquired features to be identified at each time point may refer to table 1:
TABLE 1
On this basis, when there is log data of a plurality of time points, there are a plurality of corresponding sets of feature vectors, and then the confidence that each feature vector belongs to the classification corresponding to each cluster center can be determined based on the flow of fig. 1. Further, when the classification includes a normal classification and an abnormal classification, step S105 may be performed according to the following procedure: for each feature vector, a first confidence that the feature vector belongs to a normal classification may be determined, and a second confidence that the feature vector belongs to an abnormal classification may be determined; and if the second confidence coefficient is higher than the set confidence coefficient, determining the time point corresponding to the feature vector as an abnormal point. Thus, it can be determined whether each time point is an abnormal point. And when the number of the abnormal points is determined to be larger than the set number, determining the assets as risk abnormality.
The number of settings may be set according to actual conditions or may be set empirically. In one example, the set number may be set to the number of groups of feature vectors 30%, or the like.
Alternatively, step S104 may be performed as follows: performing cluster identification on the feature vectors by using a KNN algorithm, and determining feature points of the feature vectors, wherein the distance between each feature vector and each cluster center is within a set distance range; determining the confidence of the classification corresponding to each feature point belonging to the corresponding clustering center; and determining the confidence coefficient of the feature point corresponding to each classification as the confidence coefficient of the feature vector belonging to the classification corresponding to each clustering center.
Specifically, this application can utilize KNN clustering algorithm earlier to carry out cluster identification to the eigenvector and handle, and this KNN clustering algorithm's general thinking is: a sample belongs to a class if the majority of the K most similar samples in the feature space (i.e., the nearest neighbors in the feature space) belong to the class, where K is typically an integer no greater than 20, for which each class is assigned a cluster center. Based on this, in this embodiment, in the KNN algorithm, all the selected neighbors are the objects that have been correctly classified. The method only determines the classification of the feature vector of the asset according to the class of the nearest sample or samples in the classification decision.
Based on the principle, the distance between the characteristic vector of the asset and each clustering center can be determined, so that the characteristic vector with the distance within the set distance range can be determined and recorded as the characteristic point. And then determining the confidence of the classification corresponding to the clustering center corresponding to the feature point according to the distance between the feature vector and the clustering center. Optionally, the distance may be a euclidean distance, and a relationship between the euclidean distance and the confidence coefficient is: the smaller the Euclidean distance, the greater the confidence; the larger the euclidean distance, the smaller the confidence value. For example, a confidence value may be determined in an inverse proportional method, the confidence value being the inverse of the euclidean distance, and so on.
In addition, the confidence of the classification corresponding to each feature point belonging to the corresponding cluster center can be determined according to the following process: determining a plurality of domain points of the feature point under the corresponding clustering center; carrying out weight distribution processing on each domain point; and determining the sum of the weights of all the field points as the confidence of the classification corresponding to the feature point belonging to the corresponding clustering center.
Specifically, for each cluster center, a plurality of domain points near each feature point may be determined by using a KNN algorithm, that is, each domain point is a point closer to the feature point, then a weight is assigned to each domain point, and the assigned weights are summed, so that an obtained sum is a confidence that the feature point belongs to the classification of the corresponding cluster center. In this way, since the domain points are points near the feature points, the confidence is calculated by using the weights of the domain points of the feature points, and the obtained confidence has more reference value and representativeness.
It should be noted that the following principle can be followed when assigning the weights: the closer the domain points are to the feature points, the greater the assigned weight; while domain points that are further away from the feature point may be given less weight.
On this basis, the weight assignment process can be performed for each domain point according to the following method: determining a target domain point which is farthest away from the characteristic point in each domain point; determining a first Euclidean distance between the target domain point and the feature point; determining a second Euclidean distance between the domain point and the feature point for each domain point; and determining the ratio of the difference between the first Euclidean distance and the second Euclidean distance to the first Euclidean distance as the weight of the domain point.
Specifically, it is assumed that m domain points near the feature point P are: l1, L2, L3, L4 \8230 \8230andLm, the Euclidean distance between the target domain point farthest from the characteristic point P and the characteristic point P in the m domain points, namely the first Euclidean distance, is recorded as D max Then, the weight of each domain point is calculated according to the following formula:
d i =(D max -d i,q )/D max
in the above formula, d i Is the weight of the ith domain point, d i,q The euclidean distance between the ith domain point and the feature point P is the second euclidean distance.
From this, the confidence P of each feature point P can be expressed as: p = ∑ d i ,L i And (= x). Wherein x represents the classification corresponding to the feature point.
On the basis, the classification corresponding to the feature point with the maximum confidence coefficient can be screened out, so that whether the feature point is normal or abnormal can be determined. And finally, determining the classification result of the asset according to the classification result of the characteristic points determined at each time point, namely determining whether the asset is normal or risky.
Based on the same inventive concept, the application also provides an asset identification device corresponding to the asset identification method. The asset identification device can be implemented by referring to the description of the asset identification method, and is not discussed one by one here.
Referring to fig. 2, fig. 2 is an asset identification device according to an exemplary embodiment of the present application, including:
the analysis module 201 is configured to perform log analysis processing on the log data of the asset to obtain a feature to be identified;
the allocation module 202 is configured to allocate, according to the influence degree of each feature to be identified on the asset security, a feature weight matching the influence degree to the feature to be identified;
a constructing module 203, configured to construct a feature vector of the asset according to each feature to be identified and the feature weight corresponding to the feature;
a determining module 204, configured to perform cluster recognition processing on the feature vectors by using a cluster recognition model, and determine confidence that the feature vectors belong to a class corresponding to each cluster center;
and the identifying module 205 is configured to determine whether the asset is a risk asset according to the determined confidence level in each classification.
The method comprises the steps that log analysis processing is carried out on log data of assets through the asset identification device, and features to be identified are obtained; according to the influence degree of each feature to be identified on the asset safety, distributing a feature weight matched with the influence degree to the feature to be identified; constructing a feature vector of the asset according to each feature to be identified and the feature weight corresponding to the feature; performing cluster recognition processing on the feature vectors by using a cluster recognition model, and determining the confidence of the feature vectors belonging to the corresponding classification of each cluster center; and determining whether the assets are risk assets or not according to the determined confidence degrees under the classifications. By implementing the method, when the characteristic vector is determined, the characteristic weight is introduced, so that the obtained characteristic vector can represent the actual characteristics of the asset, the confidence coefficient of the characteristic vector under each classification can be accurately determined based on the cluster recognition model, and whether the asset is a risk asset is accurately determined, so that the accuracy of risk recognition is improved.
Optionally, based on the foregoing embodiment, in this embodiment, the determining module 204 is specifically configured to perform cluster identification on the feature vectors by using a KNN algorithm, and determine feature points in the feature vectors, where a distance between each feature point and each cluster center is within a set distance range; determining the confidence coefficient of the classification corresponding to each feature point belonging to the corresponding clustering center; and determining the confidence of the feature points corresponding to the classes as the confidence of the feature vectors belonging to the class corresponding to each cluster center.
Further, the determining module 204 is specifically configured to determine a plurality of domain points of the feature point under the corresponding clustering centers; carrying out weight distribution processing on each domain point; and determining the sum of the weights of the field points as the confidence of the classification corresponding to the characteristic point belonging to the corresponding clustering center.
Optionally, the determining module 204 is specifically configured to determine a target area point, which is farthest away from the feature point, in each area point; determining a first Euclidean cluster between the target domain point and the feature point; determining a second Euclidean distance between the domain point and the feature point aiming at each domain point; and determining the ratio of the difference between the first Euclidean distance and the second Euclidean distance to the first Euclidean distance as the weight of the domain point.
Optionally, based on any one of the foregoing embodiments, in this embodiment, the feature to be identified at least includes one of the following: the number of external network attacks, the number of violent breakdowns, the number of host malicious files, the number of gray software, the number of advertisement software, the number of backdoor files, the number of weak passwords, the number of malicious communications and the number of configuration risks.
Optionally, based on any of the above embodiments, the asset identification may be provided in an electronic device, which may be, but is not limited to, a network security device including a firewall device, and the like.
Therefore, by implementing the asset identification device provided by any one of the embodiments, the problems of high complexity and inaccurate identification result of the current risky asset identification are solved, and the accuracy of the risky asset identification result is improved.
Based on the same inventive concept, an electronic device according to an embodiment of the present application is provided, as shown in fig. 3, and includes a processor 301 and a machine-readable storage medium 302, where the machine-readable storage medium 302 stores a computer program capable of being executed by the processor 301, and the processor 301 is caused by the computer program to perform an asset identification method according to any embodiment of the present application. In addition, the electronic device further comprises a communication interface 303 and a communication bus 304, wherein the processor 301, the communication interface 303 and the machine-readable storage medium 302 are communicated with each other through the communication bus 304.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The machine-readable storage medium 302 may be a Memory, which may include a Random Access Memory (RAM), a DDR SRAM (Double Data Rate Synchronous Random Access Memory), and a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
As for the embodiments of the electronic device and the machine-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing method embodiments, the description is relatively simple, and reference may be made to the partial description of the method embodiments for relevant points.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The implementation process of the functions and actions of each unit/module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the units/modules described as separate parts may or may not be physically separate, and the parts displayed as units/modules may or may not be physical units/modules, may be located in one place, or may be distributed on a plurality of network units/modules. Some or all of the units/modules can be selected according to actual needs to achieve the purpose of the solution of the present application. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only a preferred embodiment of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (10)
1. An asset identification method, comprising:
performing log analysis processing on the log data of the assets to obtain features to be identified;
according to the influence degree of each feature to be identified on the asset safety, distributing a feature weight matched with the influence degree to the feature to be identified;
constructing a feature vector of the asset according to each feature to be identified and the feature weight corresponding to the feature;
performing cluster recognition processing on the feature vectors by using a cluster recognition model, and determining the confidence of the feature vectors belonging to the corresponding classification of each cluster center;
and determining whether the assets are risk assets or not according to the determined confidence degrees under the categories.
2. The method of claim 1, wherein performing a cluster recognition process on the feature vectors using a cluster recognition model to determine a confidence that the feature vectors belong to the class corresponding to each cluster center comprises:
performing cluster identification on the feature vectors by using a KNN algorithm, and determining feature points of the feature vectors, wherein the distance between each feature vector and each cluster center is within a set distance range;
determining the confidence coefficient of the classification corresponding to each feature point belonging to the corresponding clustering center;
and determining the confidence coefficient of the feature point corresponding to each classification as the confidence coefficient of the feature vector belonging to the classification corresponding to each clustering center.
3. The method of claim 2, wherein determining the confidence that each feature point belongs to the class to which the corresponding cluster center corresponds comprises:
determining a plurality of domain points of the feature point under the corresponding clustering center;
carrying out weight distribution processing on each domain point;
and determining the sum of the weights of the field points as the confidence of the classification corresponding to the characteristic point belonging to the corresponding clustering center.
4. The method of claim 3, wherein performing the weight assignment process for each domain point comprises:
determining a target domain point which is farthest away from the characteristic point in each domain point;
determining a first Euclidean cluster between the target domain point and the feature point;
determining a second Euclidean distance between the domain point and the feature point aiming at each domain point;
and determining the ratio of the difference between the first Euclidean distance and the second Euclidean distance to the first Euclidean distance as the weight of the domain point.
5. The method of claim 1, wherein the feature to be identified comprises at least one of: the number of external network attacks, the number of violent breakdowns, the number of host malicious files, the number of gray software, the number of advertisement software, the number of backdoor files, the number of weak passwords, the number of malicious communications and the number of configuration risks.
6. An asset identification device, comprising:
the analysis module is used for carrying out log analysis processing on the log data of the assets to obtain features to be identified;
the distribution module is used for distributing the characteristic weight matched with the influence degree to the characteristic to be identified according to the influence degree of each characteristic to be identified on the asset safety;
the construction module is used for constructing a feature vector of the asset according to each feature to be identified and the feature weight corresponding to the feature to be identified;
the determining module is used for carrying out cluster recognition processing on the characteristic vectors by utilizing a cluster recognition model and determining the confidence coefficient of the characteristic vectors belonging to the corresponding classification of each cluster center;
and the identification module is used for determining whether the assets are risk assets according to the determined confidence degrees under the classifications.
7. The apparatus of claim 6,
the determining module is specifically configured to perform cluster identification on the feature vectors by using a KNN algorithm, and determine feature points in the feature vectors, where a distance between each feature vector and each cluster center is within a set distance range; determining the confidence of the classification corresponding to each feature point belonging to the corresponding clustering center; and determining the confidence of the feature points corresponding to the classes as the confidence of the feature vectors belonging to the class corresponding to each cluster center.
8. The apparatus of claim 7,
the determining module is specifically configured to determine a plurality of domain points of the feature point under the corresponding clustering center; carrying out weight distribution processing on each domain point; and determining the sum of the weights of all the field points as the confidence of the classification corresponding to the feature point belonging to the corresponding clustering center.
9. The apparatus of claim 8,
the determining module is specifically configured to determine a target domain point with a farthest distance from the feature point in each domain point; determining a first Euclidean cluster between the target domain point and the feature point; determining a second Euclidean distance between the domain point and the feature point for each domain point; and determining the ratio of the difference between the first Euclidean distance and the second Euclidean distance to the first Euclidean distance as the weight of the domain point.
10. The apparatus of claim 6, wherein the feature to be identified comprises at least one of: the number of external network attacks, the number of violent breakdowns, the number of host malicious files, the number of gray software, the number of advertisement software, the number of backdoor files, the number of weak passwords, the number of malicious communications and the number of configuration risks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211297450.2A CN115758336A (en) | 2022-10-21 | 2022-10-21 | Asset identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211297450.2A CN115758336A (en) | 2022-10-21 | 2022-10-21 | Asset identification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115758336A true CN115758336A (en) | 2023-03-07 |
Family
ID=85352730
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211297450.2A Pending CN115758336A (en) | 2022-10-21 | 2022-10-21 | Asset identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115758336A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117312833A (en) * | 2023-11-29 | 2023-12-29 | 北京冠群信息技术股份有限公司 | Data identification method and system applied to digital asset environment |
-
2022
- 2022-10-21 CN CN202211297450.2A patent/CN115758336A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117312833A (en) * | 2023-11-29 | 2023-12-29 | 北京冠群信息技术股份有限公司 | Data identification method and system applied to digital asset environment |
CN117312833B (en) * | 2023-11-29 | 2024-02-27 | 北京冠群信息技术股份有限公司 | Data identification method and system applied to digital asset environment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ogwueleka | Data mining application in credit card fraud detection system | |
CN109840413B (en) | Phishing website detection method and device | |
CN110930218B (en) | Method and device for identifying fraudulent clients and electronic equipment | |
CN110162958B (en) | Method, apparatus and recording medium for calculating comprehensive credit score of device | |
Karanam et al. | Intrusion detection mechanism for large scale networks using CNN-LSTM | |
CN109981583A (en) | A kind of industry control network method for situation assessment | |
CN110929525A (en) | Network loan risk behavior analysis and detection method, device, equipment and storage medium | |
CN108197795A (en) | The account recognition methods of malice group, device, terminal and storage medium | |
Harbola et al. | Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set | |
Khare et al. | AI-Powered Fraud Prevention: A Comprehensive Analysis of Machine Learning Applications in Online Transactions | |
CN115758336A (en) | Asset identification method and device | |
CN113393316B (en) | Loan overall process accurate wind control and management system based on massive big data and core algorithm | |
US20230164162A1 (en) | Valuable alert screening method efficiently detecting malicious threat | |
CN111582647A (en) | User data processing method and device and electronic equipment | |
CN111245815A (en) | Data processing method, data processing device, storage medium and electronic equipment | |
CN117370548A (en) | User behavior risk identification method, device, electronic equipment and medium | |
CN114238280B (en) | Method and device for constructing financial sensitive information standard library and electronic equipment | |
CN115455386A (en) | Operation behavior identification method and device | |
CN116245630A (en) | Anti-fraud detection method and device, electronic equipment and medium | |
CN114925365A (en) | File processing method and device, electronic equipment and storage medium | |
CN114298563A (en) | Alarm information analysis method and device and computer equipment | |
Park et al. | Performance comparison of multi-class SVM with oversampling methods for imbalanced data classification | |
CN113691552A (en) | Threat intelligence effectiveness evaluation method, device, system and computer storage medium | |
CN113452648A (en) | Method, device, equipment and computer readable medium for detecting network attack | |
CN116647374B (en) | Network flow intrusion detection method based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |