CN104376261B - A kind of method of the automatic detection malicious process under evidence obtaining scene - Google Patents
A kind of method of the automatic detection malicious process under evidence obtaining scene Download PDFInfo
- Publication number
- CN104376261B CN104376261B CN201410705875.1A CN201410705875A CN104376261B CN 104376261 B CN104376261 B CN 104376261B CN 201410705875 A CN201410705875 A CN 201410705875A CN 104376261 B CN104376261 B CN 104376261B
- Authority
- CN
- China
- Prior art keywords
- collection
- dynamic link
- link library
- tuple
- crucial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 157
- 238000001514 detection method Methods 0.000 title claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 49
- 238000013398 bayesian method Methods 0.000 claims abstract description 30
- 238000013507 mapping Methods 0.000 claims abstract description 25
- 238000012360 testing method Methods 0.000 claims description 24
- 230000008878 coupling Effects 0.000 claims description 8
- 238000010168 coupling process Methods 0.000 claims description 8
- 238000005859 coupling reaction Methods 0.000 claims description 8
- 238000013461 design Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000011835 investigation Methods 0.000 claims 1
- 238000012790 confirmation Methods 0.000 abstract description 2
- 238000004540 process dynamic Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Virology (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention provides it is a kind of evidence obtaining scene under based on process dynamic link database data automatic detection malicious process method.Method comprises the following steps:1) set up the mapping mode of first tuple from dynamic link library data to N;2) relatively optimum crucial dynamic link library collection is calculated according to greedy algorithm;3) set up identification model and detected using hiding Nae Bayesianmethod.Compared with existing malware detection method, present invention achieves the automatic identification in the case where the software signatures storehouse that means no harm is without correlation experience in numerous unknown processes to malicious software process, and can be corrected using confirmation collection in training set and inconsistent collection to be identified source.Additionally, the present invention can process the original dynamic link database data from separate sources.The present invention is especially suitable for the scene detected without priori and extensive automatization's malicious process.
Description
Technical field
The present invention relates to malicious process identification and computer forensics field, and in particular to one kind under evidence obtaining scene
Method based on the dynamic link database data automatic detection malicious process of process.
Background technology
With the fast development of national economy and society, the level of IT application of China's all trades and professions is also being improved constantly.
Under the informationalized background of the whole people, the quantity of computer rogue program is increasing, the frequency of appearance also more and more higher, and to these
Rogue program carries out efficient Aulomatizeted Detect and is just particularly important.The field still relies more heavily on and maliciously enters at present
The condition code of journey, more experiences for depending on people, concern do not rely on condition code storehouse come automatic identification method it is also little.
The content of the invention
Present invention aim at, there is provided a kind of dynamic link database data automatic detection under evidence obtaining scene based on process is disliked
The method of meaning process, realization is in the case where the software signatures storehouse that means no harm is without correlation experience to malice in numerous unknown processes
The automatic identification of software process, can process the original dynamic link database data from separate sources.The present invention is especially suitable for
Without the scene that priori and extensive automatization's malicious process are detected.
To reach above-mentioned purpose, the present invention proposes that a kind of dynamic link database data under evidence obtaining scene based on process is automatic
The method of detection malicious process, method comprise the following steps:
1) set up the mapping mode of first tuple from dynamic link library data to N;
Define 1:One N unit tuple is the sequence being made up of 0 or 1 that a length is N, and N is nonnegative integer here;
Define 2:Flag bit is added to a special bit of unit of N units group end, and it is used for representing representated by this tuple
Process whether be malicious process, it is used for the identification process of Malware;
Dynamic link database data is quickly analyzed and is processed in order to be able to by recognizer, is needed the dynamic of each process
Link database data is mapped to a data structure, that is, define the N units tuple in 1;Dynamic link library collection is the standard of mapping, for
One dynamic link library collection comprising N number of dynamic link library, corresponding data structure are a N+1 units tuple, determine including one
Flag bit in justice 2, the corresponding dynamic link library data structure of process of concentration to be identified does not have this position, is N first;
Mapping mode is presented below:
A. by tuple, each is all set to 0;
B. the dynamic link library collection as standard is traveled through, for each dynamic link library, in the dynamic link library number of process
According to middle search, if it is present the position corresponding to the dynamic link library of the record corresponding to process is set to 1;
If c. the process belongs to training set or confirms collection, if for malicious process, it is known that being, be set to 1, if
Collection to be identified is test set, then remove the flag bit;
2) relatively optimum crucial dynamic link library collection is calculated according to greedy algorithm;
Establishing from N number of dynamic link database data to N after the mapping mode of first tuple, needing to select dynamic link library
To form dynamic link library collection, and these selected dynamic link libraries are referred to as crucial dynamic link library;The choosing of this collection
Selecting to produce the detection model of foundation affects and then produces impact to recognition accuracy;
Define 3:Theoretical optimum dynamic link library collection is such a collection, does not have any other collection to use same algorithm same
Performance after modeling in one training set in same test set is better than the collection, and this theoretical optimum dynamic link library collection depends on instruction
Practice the selection of collection, test set and algorithm;
Define 4:Relatively optimum dynamic link library collection is often not equal to theoretical optimum dynamic link library collection, but in similar training
In the case of collection and test set, relatively good performance is suffered from, additionally, the collection can be by controllable computational complexity certain
Obtain in step;
When crucial dynamic link library is selected, as crucial dynamic link library is the public affairs for describing process in each classification
Common attribute, the related dynamic link library of process can not be chosen as crucial dynamic link library;In the case of no priori,
Firstly the need of investigate be windows system dynamics chained library, not directly as crucial dynamic link library, otherwise set up model
When " dimension blast " can occur;The probability for occurring in view of each dynamic link library in the sample inequality, then by sample
Originally the mode for being counted is counted soft in the conventional software processes and malice of training set initializing crucial dynamic link library collection
The dynamic link library occurred in part process, what they combined integrate as I;
In order to improve the accuracy rate of identification, more training data is generally required;The malicious process for differentiating if desired
Dynamic link database data is different with the dynamic link library source for training, and needs go out initially with enumerating dynamic link library dll
Now ratio bound combination, and collect, verify the mode fed back to obtain an optimum dynamic chain in combined test in checking
Storehouse collection is connect as beginning, checking collection must be with data to be identified from same source;So as to by it is a quantity excessively huge can
In the limited computable interval of energy property combinatorial mapping to;
Concrete operations are presented below:
A. (limit is set common from 0 to 49.5% in the way of the step-length (step-length takes 0.5%) of a setting and a setting
100, the upper limit is from 50% to 100% totally 101) enumerating, such one has 10100 kinds of bounds combinations;
B. for each combines, a crucial dynamic link library collection is all corresponded to, calculation is if a dynamic chain
Connect storehouse to occur all in probability between the bound combination, then in the conventional software processes and malicious software process of training set
The dynamic link library is put into the crucial dynamic link library collection;
C. to the test on checking collection one by one of these corresponding crucial dynamic link library collection of combination, find out relatively optimum crucial
Dynamic link library collection S:
Optimize crucial dynamic link library collection by the way of greedy algorithm and checking data are combined;By progressively adding dynamic
State chained library is modeled to alternative key dynamic link library collection and carrys out feedback result using checking collection, progressively finds one relatively
Optimum crucial dynamic link library collection;
Concrete operations are presented below:
A. there is n element in setting relatively optimum key dynamic link library collection S, circulate and the element that S in collection I does not have is added into collection
S, adds one every time, models the result obtained on checking collection, takes out, and adds next, repetition, until loop ends;
B. statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it, repeats a;
C. the process terminates until verifying that the result on collection no longer improves, and at this moment obtains final set S;
3) set up identification model and detected using hiding Nae Bayesianmethod:
After relatively optimum key dynamic link library collection is obtained, it is thus necessary to determine that modeling algorithm;Due to dynamic link library
Between the degree of coupling it is higher, there is stronger interdependency, and hide Nae Bayesianmethod (hidden naive Bayes)
It is suitable to process the higher degree of coupling;In Nae Bayesianmethod is hidden, there is a hiding father node and carry out table in each attribute
Up to other attributes for the impact of the attribute;
After hiding Nae Bayesianmethod is determined, model is set up using training set, the process of concentration to be identified is pressed
Tuple is converted into according to the mode in 1);Hiding Nae Bayesianmethod finally determines what identification was concentrated by design conditions probability
The flag bit of process;Flag bit represents malicious process for 1, and 0 represents normal processes.
Further, wherein above-mentioned steps comprising the following steps that 1):
Step 1) -1:Initial state;
Step 1) -2:By tuple, each is all set to 0;
Step 1) -3:Traversal dynamic link library collection, for each dynamic link library, in the dynamic link library number of target process
According to middle search, if it is present the position corresponding to the dynamic link library in the record corresponding to target process is set to 1;
Step 1) -4:Each process is traveled through, judges whether the process belongs to training set or confirm collection, if it is, entering
1) -5, otherwise it is collection to be identified, enters 1) -6;
Step 1) -5:If the process is malicious process, set flag bit as 1, travels through and finish and enter 1) -7, otherwise after
Continuous 1) -4;
Step 1) -6:Remove the flag bit, traversal is finished and enters 1) -7, otherwise continue 1) -4;
Step 1) -7:The mapping mode for setting up first tuple from dynamic link library data to N is finished.
Further, wherein above-mentioned steps comprising the following steps that 2):
Step 2) -1:Initial state;
Step 2) -2:The dynamic link that statistics occurs in the conventional software processes and malicious software process of training set
Storehouse, what they combined integrate as I;
Step 2) -3:(limit might as well be set in the way of the step-length (0.5% might as well be set as) of a setting and a setting
It is that from 0 to 49.5% totally 100, the upper limit is from 50% to 100% totally 101) enumerating all of bound combination;
Step 2) -4:Traversal all combinations 2) in -3, for each combination, enumerate all dynamic link libraries in I;
Step 2) -5:If the dynamic link library occurs all in the conventional software processes and malicious software process of training set
In probability between the bound combination, then the dynamic link library is put into into the collection, 2) -6 is entered if enumerating and finishing, otherwise returned
Return 2) -5;
Step 2) -6:Enumerate each corresponding collection of bound combination to be tested on checking collection, record result;
Step 2) -7:If enumerate finished, optimum collection S is found out, otherwise return 2) -6;
Step 2) -8:If having n element in collection S, circulate and the element that S in collection I does not have added into collection S, add one every time,
Obtain collecting X;
Step 2) -9:Model on X, obtain the result on checking collection, take out the element of addition, recover S;
Step 2) -10:If loop ends, 2) -11 are entered, otherwise return to 2) -8;
Step 2) -11:Statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it;
Step 2) -12:If the result during 2) -11 on checking collection no longer improves, terminate, obtain relatively optimum
Dynamic link library collection, otherwise return 2) -8;
Step 2) -13:Relatively optimum crucial dynamic link library collection is calculated according to greedy algorithm to finish.
Further, wherein above-mentioned steps comprising the following steps that 3):
Step 3) -1:Initial state;
Step 3) -2:Process in collection to be identified and the training set selected is converted into into tuple according to the mode in 1);
Step 3) -3:Identification model is set up in training set using hiding Nae Bayesianmethod;
Step 3) -4:Hiding Nae Bayesianmethod finally determines the process that identification is concentrated by design conditions probability
Flag bit;
Step 3) -5:If flag bit enters 3) -6 for 1, otherwise into 3) -7;
Step 3) -6:The tuple corresponding process is defined as into malicious process;
Step 3) -7:The tuple corresponding process is defined as into normal processes;
Step 3) -8:Identification model is set up using hiding Nae Bayesianmethod and detection is carried out and is finished.
Beneficial effects of the present invention, there is provided a kind of to be examined based on the dynamic link database data of process automatically under evidence obtaining scene
The method for surveying malicious process, compared with existing malware detection method, present invention achieves in the software signatures that mean no harm
Storehouse without correlation experience in the case of automatic identification in numerous unknown processes to malicious software process, and can in training set and
It is corrected using confirmation collection when collection to be identified originates inconsistent.Additionally, the present invention can be processed from the original of separate sources
Dynamic link database data.The present invention is especially suitable for the scene detected without priori and extensive automatization's malicious process.It is real
Proof is trampled under conventional application scenarios, this method can reach more than more than 90 percent accuracy rate and time loss is only counted
Second.
Description of the drawings
Fig. 1 is disliked for a kind of dynamic link database data automatic detection under evidence obtaining scene based on process of the embodiment of the present invention
The flow chart of meaning process.
Flow charts of the Fig. 2 for the mapping mode of first tuple from dynamic link library data to N is set up in Fig. 1.
Flow charts of the Fig. 3 to calculate relatively optimum crucial dynamic link library collection according to greedy algorithm in Fig. 1.
Fig. 4 sets up identification model the flow chart for being detected for hiding Nae Bayesianmethod is adopted in Fig. 1.
Specific embodiment
In order to know more about the technology contents of the present invention, especially exemplified by specific embodiment and institute's accompanying drawings are coordinated to be described as follows.
Fig. 1 is disliked for a kind of dynamic link database data automatic detection under evidence obtaining scene based on process of the embodiment of the present invention
The flow chart of the method for meaning process.
It is a kind of evidence obtaining scene under based on process dynamic link database data automatic detection malicious process method, its feature
It is to comprise the following steps:
S101 sets up the mapping mode of first tuple from dynamic link library data to N.
Define 1:One N unit tuple is the sequence being made up of 0 or 1 that a length is N, and N is nonnegative integer here;
Define 2:Flag bit is added to a special bit of unit of N units group end, and it is used for representing representated by this tuple
Process whether be malicious process, it is used for the identification process of Malware;
Dynamic link database data is quickly analyzed and is processed in order to be able to by recognizer, is needed the dynamic of each process
Link database data is mapped to a data structure, that is, define the N units tuple in 1;Dynamic link library collection is the standard of mapping, for
One dynamic link library collection comprising N number of dynamic link library, corresponding data structure are a N+1 units tuple, determine including one
Flag bit in justice 2, the corresponding dynamic link library data structure of process of concentration to be identified does not have this position, is N first;
Mapping mode is presented below:
A. by tuple, each is all set to 0;
B. the dynamic link library collection as standard is traveled through, for each dynamic link library, in the dynamic link library number of process
According to middle search, if it is present the position corresponding to the dynamic link library of the record corresponding to process is set to 1;
If c. the process belongs to training set or confirms collection, if for malicious process, it is known that being, be set to 1, if
Collection to be identified is test set, then remove the flag bit;
S103 calculates relatively optimum crucial dynamic link library collection according to greedy algorithm.
Establishing from N number of dynamic link database data to N after the mapping mode of first tuple, needing to select dynamic link library
To form dynamic link library collection, and these selected dynamic link libraries are referred to as crucial dynamic link library;The choosing of this collection
Selecting to produce the detection model of foundation affects and then produces impact to recognition accuracy;
Define 3:Theoretical optimum dynamic link library collection is such a collection, does not have any other collection to use same algorithm same
Performance after modeling in one training set in same test set is better than the collection, and this theoretical optimum dynamic link library collection depends on instruction
Practice the selection of collection, test set and algorithm;
Define 4:Relatively optimum dynamic link library collection is often not equal to theoretical optimum dynamic link library collection, but in similar training
In the case of collection and test set, relatively good performance is suffered from, additionally, the collection can be by controllable computational complexity certain
Obtain in step;
When crucial dynamic link library is selected, as crucial dynamic link library is the public affairs for describing process in each classification
Common attribute, the related dynamic link library of process can not be chosen as crucial dynamic link library;In the case of no priori,
Firstly the need of investigate be windows system dynamics chained library, not directly as crucial dynamic link library, otherwise set up model
When " dimension blast " can occur;The probability for occurring in view of each dynamic link library in the sample inequality, then by sample
Originally the mode for being counted is counted soft in the conventional software processes and malice of training set initializing crucial dynamic link library collection
The dynamic link library occurred in part process, what they combined integrate as I;
In order to improve the accuracy rate of identification, more training data is generally required;The malicious process for differentiating if desired
Dynamic link database data is different with the dynamic link library source for training, and needs go out initially with enumerating dynamic link library dll
Now ratio bound combination, and collect, verify the mode fed back to obtain an optimum dynamic chain in combined test in checking
Storehouse collection is connect as beginning, checking collection must be with data to be identified from same source;So as to by it is a quantity excessively huge can
In the limited computable interval of energy property combinatorial mapping to;
Concrete operations are presented below:
A. (limit is set common from 0 to 49.5% in the way of the step-length (step-length takes 0.5%) of a setting and a setting
100, the upper limit is from 50% to 100% totally 101) enumerating, such one has 10100 kinds of bounds combinations;
B. for each combines, a crucial dynamic link library collection is all corresponded to, calculation is if a dynamic chain
Connect storehouse to occur all in probability between the bound combination, then in the conventional software processes and malicious software process of training set
The dynamic link library is put into the crucial dynamic link library collection;
C. to the test on checking collection one by one of these corresponding crucial dynamic link library collection of combination, find out relatively optimum crucial
Dynamic link library collection S:
Optimize crucial dynamic link library collection by the way of greedy algorithm and checking data are combined;By progressively adding dynamic
State chained library is modeled to alternative key dynamic link library collection and carrys out feedback result using checking collection, progressively finds one relatively
Optimum crucial dynamic link library collection;
Concrete operations are presented below:
A. there is n element in setting relatively optimum key dynamic link library collection S, circulate and the element that S in collection I does not have is added into collection
S, adds one every time, models the result obtained on checking collection, takes out, and adds next, repetition, until loop ends;
B. statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it, repeats a;
C. the process terminates until verifying that the result on collection no longer improves, and at this moment obtains final set S;
S105 sets up identification model and is detected using Nae Bayesianmethod is hidden.
After relatively optimum key dynamic link library collection is obtained, it is thus necessary to determine that modeling algorithm;Due to dynamic link library
Between the degree of coupling it is higher, there is stronger interdependency, and hide Nae Bayesianmethod (hidden naive Bayes)
It is suitable to process the higher degree of coupling;In Nae Bayesianmethod is hidden, there is a hiding father node and carry out table in each attribute
Up to other attributes for the impact of the attribute;
After hiding Nae Bayesianmethod is determined, model is set up using training set, the process of concentration to be identified is pressed
Tuple is converted into according to the mode in S101;Hiding Nae Bayesianmethod finally determines that by design conditions probability identification is concentrated
Process flag bit;Flag bit represents malicious process for 1, and 0 represents normal processes.
Fig. 2 is the flow chart for setting up the mapping mode of first tuple from dynamic link library data to N.
Define 1:One N unit tuple is the sequence being made up of 0 or 1 that a length is N, and N is nonnegative integer here;
Define 2:Flag bit is added to a special bit of unit of N units group end, and it is used for representing representated by this tuple
Process whether be malicious process, it is used for the identification process of Malware;
Dynamic link database data is quickly analyzed and is processed in order to be able to by recognizer, is needed the dynamic of each process
Link database data is mapped to a data structure, that is, define the N units tuple in 1;Dynamic link library collection is the standard of mapping, for
One dynamic link library collection comprising N number of dynamic link library, corresponding data structure are a N+1 units tuple, determine including one
Flag bit in justice 2, the corresponding dynamic link library data structure of process of concentration to be identified does not have this position, is N first;
Mapping mode is presented below:
A. by tuple, each is all set to 0;
B. the dynamic link library collection as standard is traveled through, for each dynamic link library, in the dynamic link library number of process
According to middle search, if it is present the position corresponding to the dynamic link library of the record corresponding to process is set to 1;
If c. the process belongs to training set or confirms collection, if for malicious process, it is known that being, be set to 1, if
Collection to be identified is test set, then remove the flag bit;
Comprise the following steps that:
Step 1:Initial state;Step 2:By tuple, each is all set to 0;Step 3:Traversal dynamic link library collection, for
Each dynamic link library, searches in the dynamic link database data of target process, if it is present corresponding to target process
Position in record corresponding to the dynamic link library is set to 1;Step 4:Each process is traveled through, judges whether the process belongs to training
Collection confirms collection, if it is, entering 5, is otherwise collection to be identified, enters 6;Step 5:If the process is malicious process, if
It is 1 to determine flag bit, and traversal is finished and enters 7, otherwise continues 4;Step 6:Remove the flag bit, traversal is finished and enters 7, otherwise continues
4;Step 7:The mapping mode for setting up first tuple from dynamic link library data to N is finished.
Fig. 3 is the flow chart for calculating relatively optimum crucial dynamic link library collection according to greedy algorithm.Establishing from N number of
Dynamic link database data to after the mapping mode of N units tuple needs to select dynamic link library to form dynamic link library collection, and
These selected dynamic link libraries are referred to as crucial dynamic link library;The selection of this collection can be produced to the detection model set up
It is raw to affect and then impact is produced on recognition accuracy;
Define 3:Theoretical optimum dynamic link library collection is such a collection, does not have any other collection to use same algorithm same
Performance after modeling in one training set in same test set is better than the collection, and this theoretical optimum dynamic link library collection depends on instruction
Practice the selection of collection, test set and algorithm;
Define 4:Relatively optimum dynamic link library collection is often not equal to theoretical optimum dynamic link library collection, but in similar training
In the case of collection and test set, relatively good performance is suffered from, additionally, the collection can be by controllable computational complexity certain
Obtain in step;
When crucial dynamic link library is selected, as crucial dynamic link library is the public affairs for describing process in each classification
Common attribute, the related dynamic link library of process can not be chosen as crucial dynamic link library;In the case of no priori,
Firstly the need of investigate be windows system dynamics chained library, not directly as crucial dynamic link library, otherwise set up model
When " dimension blast " can occur;The probability for occurring in view of each dynamic link library in the sample inequality, then by sample
Originally the mode for being counted is counted soft in the conventional software processes and malice of training set initializing crucial dynamic link library collection
The dynamic link library occurred in part process, what they combined integrate as I;
In order to improve the accuracy rate of identification, more training data is generally required;The malicious process for differentiating if desired
Dynamic link database data is different with the dynamic link library source for training, and needs go out initially with enumerating dynamic link library dll
Now ratio bound combination, and collect, verify the mode fed back to obtain an optimum dynamic chain in combined test in checking
Storehouse collection is connect as beginning, checking collection must be with data to be identified from same source;So as to by it is a quantity excessively huge can
In the limited computable interval of energy property combinatorial mapping to;
Concrete operations are presented below:
A. (limit is set common from 0 to 49.5% in the way of the step-length (step-length takes 0.5%) of a setting and a setting
100, the upper limit is from 50% to 100% totally 101) enumerating, such one has 10100 kinds of bounds combinations;
B. for each combines, a crucial dynamic link library collection is all corresponded to, calculation is if a dynamic chain
Connect storehouse to occur all in probability between the bound combination, then in the conventional software processes and malicious software process of training set
The dynamic link library is put into the crucial dynamic link library collection;
C. to the test on checking collection one by one of these corresponding crucial dynamic link library collection of combination, find out relatively optimum crucial
Dynamic link library collection S:
Optimize crucial dynamic link library collection by the way of greedy algorithm and checking data are combined;By progressively adding dynamic
State chained library is modeled to alternative key dynamic link library collection and carrys out feedback result using checking collection, progressively finds one relatively
Optimum crucial dynamic link library collection;
Concrete operations are presented below:
A. there is n element in setting relatively optimum key dynamic link library collection S, circulate and the element that S in collection I does not have is added into collection
S, adds one every time, models the result obtained on checking collection, takes out, and adds next, repetition, until loop ends;
B. statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it, repeats a;
C. the process terminates until verifying that the result on collection no longer improves, and at this moment obtains final set S;
Comprise the following steps that:
Step 1:Initial state;Step 2:Statistics occurs in the conventional software processes and malicious software process of training set
Dynamic link library, they combine integrates as I;Step 3:Set with the step-length (0.5% might as well be set as) of a setting and one
Fixed mode (might as well set and be limited to from 0 to 49.5% totally 100, the upper limit is from 50% to 100% totally 101) is all of to enumerate
Bound is combined;Step 4:All combinations in traversal 3, for each combination, enumerate all dynamic link libraries in I;Step
Rapid 5:If the dynamic link library occurs all in probability in the conventional software processes and malicious software process of training set on this
Between lower values, then the dynamic link library is put into into the collection, 6 is entered if enumerating and finishing, otherwise return 5;Step 6:Enumerate
Each corresponding collection of bound combination is tested on checking collection, records result;Step 7:If enumerate finished, find out most
Excellent collection S, otherwise returns 6;Step 8:If having n element in collection S, circulate and the element that S in collection I does not have is added into collection S, add every time
One, obtain collecting X;Step 9:Model on X, obtain the result on checking collection, take out the element of addition, recover S;Step
10:If loop ends, 11 are entered, otherwise return to 8;Step 11:Statistics obtain the result on checking collection it is best possess n+1
The set T of element, replaces S with it;Step 12:If the result during 11 on checking collection no longer improves, terminate, obtain
Relatively optimum dynamic link library collection, otherwise returns 8;Step 13:Relatively optimum crucial dynamic chain is calculated according to greedy algorithm
Connect storehouse collection to finish.
Fig. 4 is to set up identification model the flow chart for being detected using hiding Nae Bayesianmethod.Obtaining phase
After optimum key dynamic link library collection, it is thus necessary to determine that modeling algorithm;As between dynamic link library, the degree of coupling is higher, exist
Stronger interdependency, and hide Nae Bayesianmethod (hidden naive Bayes) and be adapted to process higher coupling
Degree;In Nae Bayesianmethod is hidden, there is a hiding father node to express other attributes for the category in each attribute
The impact of property;
After hiding Nae Bayesianmethod is determined, model is set up using training set, the process of concentration to be identified is pressed
Tuple is converted into according to the mode in Fig. 2;Hiding Nae Bayesianmethod finally determines that by design conditions probability identification is concentrated
Process flag bit;Flag bit represents malicious process for 1, and 0 represents normal processes.
Comprise the following steps that:
Step 1:Initial state;Step 2:By the process in collection to be identified and the training set selected according to the mode in Fig. 2
It is converted into tuple;Step 3:Identification model is set up in training set using hiding Nae Bayesianmethod;Step 4:Hide simplicity
Bayes method finally determines the flag bit of the process that identification is concentrated by design conditions probability;Step 5:If flag bit is
1 enters 6, otherwise into 7;Step 6:The tuple corresponding process is defined as into malicious process;Step 7:Will be the tuple corresponding
Process is defined as normal processes;Step 8:Identification model is set up using hiding Nae Bayesianmethod and detection is carried out and is finished.
Although the present invention is disclosed above with preferred embodiment, so which is not limited to the present invention.Skill belonging to of the invention
Has usually intellectual in art field, without departing from the spirit and scope of the present invention, when can be used for a variety of modifications and variations.Cause
This, protection scope of the present invention ought be defined depending on those as defined in claim.
Claims (4)
1. it is a kind of evidence obtaining scene under automatic detection malicious process method, it is characterised in that comprise the following steps:
1)Set up the mapping mode of first tuple from dynamic link library data to N;
Define 1:One N unit tuple is the sequence being made up of 0 or 1 that a length is N, and N is nonnegative integer here;
Define 2:Flag bit is added to a special bit of unit of N units group end, and it is used for representing entering representated by this tuple
Whether journey is malicious process, and it is used for the identification process of Malware;
Dynamic link database data is quickly analyzed and is processed in order to be able to by recognizer, is needed the dynamic link of each process
Database data is mapped to a data structure, that is, define the N units tuple in 1;Dynamic link library collection is the standard of mapping, for one
Dynamic link library collection comprising N number of dynamic link library, corresponding data structure are a N+1 units tuple, define in 2 including one
Flag bit, the corresponding dynamic link library data structure of process of concentration to be identified does not have this position, is N first;
Mapping mode is presented below:
A. by tuple, each is all set to 0;
B. the dynamic link library collection as standard is traveled through, for each dynamic link library, in the dynamic link database data of process
Search, if it is present the position corresponding to the dynamic link library of the record corresponding to process is set to 1;
If c. the process belongs to training set or confirms collection, if for malicious process, it is known that being, be set to 1, if waiting to know
Ji not be test set, then remove the flag bit;
2)Relatively optimum crucial dynamic link library collection is calculated according to greedy algorithm;
Establishing from N number of dynamic link database data to N after the mapping mode of first tuple, needing to select dynamic link library to carry out shape
Into dynamic link library collection, and these selected dynamic link libraries are referred to as crucial dynamic link library;The selection meeting of this collection
The detection model set up is produced to be affected and then produces impact to recognition accuracy;
Define 3:Theoretical optimum dynamic link library collection is such a collection, does not have any other collection to use same algorithm in same instruction
Practice the performance after modeling on collection in same test set and be better than the collection, this theoretical optimum dynamic link library collection depends on training
The selection of collection, test set and algorithm;
Define 4:Relatively optimum dynamic link library collection is not equal to theoretical optimum dynamic link library collection, but in similar training set and test
In the case of collection, relatively good performance is suffered from, additionally, the collection can be obtained in certain step by controllable computational complexity
;
When crucial dynamic link library is selected, as crucial dynamic link library is the public category for describing process in each classification
Property, the related dynamic link library of process can not be chosen as crucial dynamic link library;In the case of no priori, first
Need investigation is the system dynamics chained library of windows, not directly as crucial dynamic link library, meeting when otherwise setting up model
Occur " dimension blast ";The probability for occurring in view of each dynamic link library in the sample inequality, then by entering to sample
The mode of row statistics counts conventional software processes and Malware in training set and enters initializing crucial dynamic link library collection
The dynamic link library occurred in journey, what they combined integrate as I;
In order to improve the accuracy rate of identification, more training data is generally required;The dynamic of the malicious process if necessary to differentiate
Link database data is different with the dynamic link library source for training, and needs ratio occur initially with dynamic link library dll is enumerated
Rate bound is combined, and is collected, verified the mode fed back to obtain an optimum dynamic link library in combined test in checking
As starting, collection verifies that collection is necessary and data to be identified are from same source;So as to by a quantity excessively huge probability
In the limited computable interval of combinatorial mapping to;
Concrete operations are presented below:
A. the mode of 0.5%, and setting is taken with the step-length of a setting, step-length, set limit from 0 to 49.5% totally 100, on
Limit from 50% to 100% totally 101 enumerating, such one has 10100 kinds of bounds combinations;
B. for each combines, a crucial dynamic link library collection is all corresponded to, calculation is if a dynamic link library
In the conventional software processes and malicious software process of training set, probability is appeared between the bound combination, then by the dynamic
Chained library is put into the crucial dynamic link library collection;
C. to the test on checking collection one by one of these corresponding crucial dynamic link library collection of combination, find out relatively optimum crucial dynamic
Chained library collection S:
Optimize crucial dynamic link library collection by the way of greedy algorithm and checking data are combined;By progressively adding dynamic chain
Connect storehouse to be modeled to alternative key dynamic link library collection and carry out feedback result using checking collection, progressively find an optimum relatively
Crucial dynamic link library collection;
Concrete operations are presented below:
A. there is n element in setting relatively optimum key dynamic link library collection S, circulate and the element that S in collection I does not have is added into collection S, often
Secondary to add one, modeling obtains the result on checking collection, takes out, and addition is next, repeats, until loop ends;
B. statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it, repeats a;
C. the process terminates until verifying that the result on collection no longer improves, and at this moment obtains final set S;
3)Identification model is set up using hiding Nae Bayesianmethod and is detected:
After relatively optimum key dynamic link library collection is obtained, it is thus necessary to determine that modeling algorithm;Due between dynamic link library
The degree of coupling is higher, there is stronger interdependency, and hides Nae Bayesianmethod(hidden naive Bayes)It is adapted to
Process the higher degree of coupling;In Nae Bayesianmethod is hidden, there is a hiding father node to express which in each attribute
His attribute is for the impact of the attribute;
After hiding Nae Bayesianmethod is determined, model is set up using training set, by the process of concentration to be identified according to 1)
In mode be converted into tuple;Hiding Nae Bayesianmethod finally determines the process that identification is concentrated by design conditions probability
Flag bit;Flag bit represents malicious process for 1, and 0 represents normal processes.
2. it is according to claim 1 evidence obtaining scene under automatic detection malicious process method, above-mentioned steps 1)It is concrete
Step is as follows:
Step 1)-1:Initial state;
Step 1)-2:By tuple, each is all set to 0;
Step 1)-3:Traversal dynamic link library collection, for each dynamic link library, in the dynamic link database data of target process
Search, if it is present the position corresponding to the dynamic link library in the record corresponding to target process is set to 1;
Step 1)-4:Each process is traveled through, judges whether the process belongs to training set or confirm collection, if it is, entering 1)- 5,
Otherwise it is collection to be identified, enters 1)-6;
Step 1)-5:If the process is malicious process, flag bit is set as 1, traversal is finished and enters 1)- 7, otherwise continue 1)-
4;
Step 1)-6:Remove the flag bit, traversal is finished and enters 1)- 7, otherwise continue 1)-4;
Step 1)-7:The mapping mode for setting up first tuple from dynamic link library data to N is finished.
3. it is according to claim 1 evidence obtaining scene under automatic detection malicious process method, above-mentioned steps 2)It is concrete
Step is as follows:
Step 2)-1:Initial state;
Step 2)-2:The dynamic link library that statistics occurs in the conventional software processes and malicious software process of training set, he
Combine integrate as I;
Step 2)-3:With the step-length for setting, a step size settings as 0.5%, and one setting mode, set be limited to from 0 to
49.5% totally 100, the upper limit from 50% to 100% totally 101 enumerating all of bound combination;
Step 2)-4:Traversal 2)All combinations in -3, for each combination, enumerate all dynamic link libraries in I;
Step 2)-5:If dynamic link library probability in the conventional software processes and malicious software process of training set all occurs
Between the bound combination, then the dynamic link library is put into into the collection, 2 is entered if enumerating and finishing)- 6, otherwise return 2)-
5;
Step 2)-6:Enumerate each corresponding collection of bound combination to be tested on checking collection, record result;
Step 2)-7:If enumerate finished, optimum collection S is found out, 2 are otherwise returned)-6;
Step 2)-8:If having n element in collection S, circulate and the element that S in collection I does not have is added into collection S, add one every time, obtain
Collection X;
Step 2)-9:Model on X, obtain the result on checking collection, take out the element of addition, recover S;
Step 2)-10:If loop ends, 2 are entered)- 11, otherwise return to 2)-8;
Step 2)-11:Statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it;
Step 2)-12:If 2)Result during -11 on checking collection no longer improves, and terminates, and obtains relatively optimum dynamic
State chained library collection, otherwise returns 2)-8;
Step 2)-13:Relatively optimum crucial dynamic link library collection is calculated according to greedy algorithm to finish.
4. it is according to claim 1 evidence obtaining scene under automatic detection malicious process method, above-mentioned steps 3)It is concrete
Step is as follows:
Step 3)-1:Initial state;
Step 3)-2:By the process in collection to be identified and the training set selected according to 1)In mode be converted into tuple;
Step 3)-3:Identification model is set up in training set using hiding Nae Bayesianmethod;
Step 3)-4:Hiding Nae Bayesianmethod finally determines the mark of the process that identification is concentrated by design conditions probability
Position;
Step 3)-5:If flag bit enters 3 for 1)- 6, otherwise into 3)-7;
Step 3)-6:The tuple corresponding process is defined as into malicious process;
Step 3)-7:The tuple corresponding process is defined as into normal processes;
Step 3)-8:Identification model is set up using hiding Nae Bayesianmethod and detection is carried out and is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410705875.1A CN104376261B (en) | 2014-11-27 | 2014-11-27 | A kind of method of the automatic detection malicious process under evidence obtaining scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410705875.1A CN104376261B (en) | 2014-11-27 | 2014-11-27 | A kind of method of the automatic detection malicious process under evidence obtaining scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104376261A CN104376261A (en) | 2015-02-25 |
CN104376261B true CN104376261B (en) | 2017-04-05 |
Family
ID=52555163
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410705875.1A Expired - Fee Related CN104376261B (en) | 2014-11-27 | 2014-11-27 | A kind of method of the automatic detection malicious process under evidence obtaining scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104376261B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109783753A (en) * | 2018-12-14 | 2019-05-21 | 平安普惠企业管理有限公司 | The tree-shaped drawing generating method of web site url, device, equipment and storage medium |
CN109714329A (en) * | 2018-12-24 | 2019-05-03 | 成都蜀道易信科技有限公司 | Low rate DDoS detection method based on Bayesian network under a kind of cloud environment |
CN109918907B (en) * | 2019-01-30 | 2021-05-25 | 国家计算机网络与信息安全管理中心 | Method, controller and medium for obtaining evidence of malicious codes in process memory of Linux platform |
CN112906786A (en) * | 2021-02-07 | 2021-06-04 | 滁州职业技术学院 | Data classification improvement method based on naive Bayes model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622536A (en) * | 2011-01-26 | 2012-08-01 | 中国科学院软件研究所 | Method for catching malicious codes |
CN103886252A (en) * | 2013-04-26 | 2014-06-25 | 卡巴斯基实验室封闭式股份公司 | Software Code Malicious Selection Evaluation Executed In Trusted Process Address Space |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011053893A (en) * | 2009-09-01 | 2011-03-17 | Hitachi Ltd | Illicit process detection method and illicit process detection system |
-
2014
- 2014-11-27 CN CN201410705875.1A patent/CN104376261B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622536A (en) * | 2011-01-26 | 2012-08-01 | 中国科学院软件研究所 | Method for catching malicious codes |
CN103886252A (en) * | 2013-04-26 | 2014-06-25 | 卡巴斯基实验室封闭式股份公司 | Software Code Malicious Selection Evaluation Executed In Trusted Process Address Space |
Non-Patent Citations (1)
Title |
---|
计算机入侵取证中的入侵事件重构技术研究;季雨辰等;《计算机工程》;20140131;第40卷(第1期) * |
Also Published As
Publication number | Publication date |
---|---|
CN104376261A (en) | 2015-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108777873A (en) | The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend | |
CN103116713B (en) | Based on compound and the prediction of protein-protein interaction method of random forest | |
CN101976313B (en) | Frequent subgraph mining based abnormal intrusion detection method | |
CN104376261B (en) | A kind of method of the automatic detection malicious process under evidence obtaining scene | |
CN111506599B (en) | Industrial control equipment identification method and system based on rule matching and deep learning | |
CN108268581A (en) | The construction method and device of knowledge mapping | |
US9720986B2 (en) | Method and system for integrating data into a database | |
CN103838754B (en) | Information retrieval device and method | |
CN102176223B (en) | Protein complex identification method based on key protein and local adaptation | |
CN104700033A (en) | Virus detection method and virus detection device | |
CN109241740A (en) | Malware benchmark test set creation method and device | |
CN106156082A (en) | A kind of body alignment schemes and device | |
CN104392171B (en) | A kind of automatic internal memory evidence analysis method based on data association | |
CN103970733A (en) | New Chinese word recognition method based on graph structure | |
CN104331664B (en) | A kind of method that unknown rogue program feature is automatically analyzed under evidence obtaining scene | |
CN112104518B (en) | Bit data feature mining method, system, equipment and readable medium | |
CN103324888A (en) | Method and system for automatically extracting virus characteristics based on family samples | |
CN105119910A (en) | Template-based online social network rubbish information real-time detecting method | |
CN108009298B (en) | Internet character search information integration analysis control method | |
CN111767546B (en) | Deep learning-based input structure inference method and device | |
CN112257332B (en) | Simulation model evaluation method and device | |
CN115602244B (en) | Genome variation detection method based on sequence alignment skeleton | |
CN107133281B (en) | Global multi-query optimization method based on grouping | |
CN102135940A (en) | Finite automata-based automatic behavior modeling method | |
CN110544510B (en) | Contig integration method based on adjacent algebraic model and quality grade evaluation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170405 |