[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN104376261B - A kind of method of the automatic detection malicious process under evidence obtaining scene - Google Patents

A kind of method of the automatic detection malicious process under evidence obtaining scene Download PDF

Info

Publication number
CN104376261B
CN104376261B CN201410705875.1A CN201410705875A CN104376261B CN 104376261 B CN104376261 B CN 104376261B CN 201410705875 A CN201410705875 A CN 201410705875A CN 104376261 B CN104376261 B CN 104376261B
Authority
CN
China
Prior art keywords
collection
dynamic link
link library
tuple
crucial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410705875.1A
Other languages
Chinese (zh)
Other versions
CN104376261A (en
Inventor
伏晓
端恒
端一恒
骆斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201410705875.1A priority Critical patent/CN104376261B/en
Publication of CN104376261A publication Critical patent/CN104376261A/en
Application granted granted Critical
Publication of CN104376261B publication Critical patent/CN104376261B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides it is a kind of evidence obtaining scene under based on process dynamic link database data automatic detection malicious process method.Method comprises the following steps:1) set up the mapping mode of first tuple from dynamic link library data to N;2) relatively optimum crucial dynamic link library collection is calculated according to greedy algorithm;3) set up identification model and detected using hiding Nae Bayesianmethod.Compared with existing malware detection method, present invention achieves the automatic identification in the case where the software signatures storehouse that means no harm is without correlation experience in numerous unknown processes to malicious software process, and can be corrected using confirmation collection in training set and inconsistent collection to be identified source.Additionally, the present invention can process the original dynamic link database data from separate sources.The present invention is especially suitable for the scene detected without priori and extensive automatization's malicious process.

Description

A kind of method of the automatic detection malicious process under evidence obtaining scene
Technical field
The present invention relates to malicious process identification and computer forensics field, and in particular to one kind under evidence obtaining scene Method based on the dynamic link database data automatic detection malicious process of process.
Background technology
With the fast development of national economy and society, the level of IT application of China's all trades and professions is also being improved constantly. Under the informationalized background of the whole people, the quantity of computer rogue program is increasing, the frequency of appearance also more and more higher, and to these Rogue program carries out efficient Aulomatizeted Detect and is just particularly important.The field still relies more heavily on and maliciously enters at present The condition code of journey, more experiences for depending on people, concern do not rely on condition code storehouse come automatic identification method it is also little.
The content of the invention
Present invention aim at, there is provided a kind of dynamic link database data automatic detection under evidence obtaining scene based on process is disliked The method of meaning process, realization is in the case where the software signatures storehouse that means no harm is without correlation experience to malice in numerous unknown processes The automatic identification of software process, can process the original dynamic link database data from separate sources.The present invention is especially suitable for Without the scene that priori and extensive automatization's malicious process are detected.
To reach above-mentioned purpose, the present invention proposes that a kind of dynamic link database data under evidence obtaining scene based on process is automatic The method of detection malicious process, method comprise the following steps:
1) set up the mapping mode of first tuple from dynamic link library data to N;
Define 1:One N unit tuple is the sequence being made up of 0 or 1 that a length is N, and N is nonnegative integer here;
Define 2:Flag bit is added to a special bit of unit of N units group end, and it is used for representing representated by this tuple Process whether be malicious process, it is used for the identification process of Malware;
Dynamic link database data is quickly analyzed and is processed in order to be able to by recognizer, is needed the dynamic of each process Link database data is mapped to a data structure, that is, define the N units tuple in 1;Dynamic link library collection is the standard of mapping, for One dynamic link library collection comprising N number of dynamic link library, corresponding data structure are a N+1 units tuple, determine including one Flag bit in justice 2, the corresponding dynamic link library data structure of process of concentration to be identified does not have this position, is N first;
Mapping mode is presented below:
A. by tuple, each is all set to 0;
B. the dynamic link library collection as standard is traveled through, for each dynamic link library, in the dynamic link library number of process According to middle search, if it is present the position corresponding to the dynamic link library of the record corresponding to process is set to 1;
If c. the process belongs to training set or confirms collection, if for malicious process, it is known that being, be set to 1, if Collection to be identified is test set, then remove the flag bit;
2) relatively optimum crucial dynamic link library collection is calculated according to greedy algorithm;
Establishing from N number of dynamic link database data to N after the mapping mode of first tuple, needing to select dynamic link library To form dynamic link library collection, and these selected dynamic link libraries are referred to as crucial dynamic link library;The choosing of this collection Selecting to produce the detection model of foundation affects and then produces impact to recognition accuracy;
Define 3:Theoretical optimum dynamic link library collection is such a collection, does not have any other collection to use same algorithm same Performance after modeling in one training set in same test set is better than the collection, and this theoretical optimum dynamic link library collection depends on instruction Practice the selection of collection, test set and algorithm;
Define 4:Relatively optimum dynamic link library collection is often not equal to theoretical optimum dynamic link library collection, but in similar training In the case of collection and test set, relatively good performance is suffered from, additionally, the collection can be by controllable computational complexity certain Obtain in step;
When crucial dynamic link library is selected, as crucial dynamic link library is the public affairs for describing process in each classification Common attribute, the related dynamic link library of process can not be chosen as crucial dynamic link library;In the case of no priori, Firstly the need of investigate be windows system dynamics chained library, not directly as crucial dynamic link library, otherwise set up model When " dimension blast " can occur;The probability for occurring in view of each dynamic link library in the sample inequality, then by sample Originally the mode for being counted is counted soft in the conventional software processes and malice of training set initializing crucial dynamic link library collection The dynamic link library occurred in part process, what they combined integrate as I;
In order to improve the accuracy rate of identification, more training data is generally required;The malicious process for differentiating if desired Dynamic link database data is different with the dynamic link library source for training, and needs go out initially with enumerating dynamic link library dll Now ratio bound combination, and collect, verify the mode fed back to obtain an optimum dynamic chain in combined test in checking Storehouse collection is connect as beginning, checking collection must be with data to be identified from same source;So as to by it is a quantity excessively huge can In the limited computable interval of energy property combinatorial mapping to;
Concrete operations are presented below:
A. (limit is set common from 0 to 49.5% in the way of the step-length (step-length takes 0.5%) of a setting and a setting 100, the upper limit is from 50% to 100% totally 101) enumerating, such one has 10100 kinds of bounds combinations;
B. for each combines, a crucial dynamic link library collection is all corresponded to, calculation is if a dynamic chain Connect storehouse to occur all in probability between the bound combination, then in the conventional software processes and malicious software process of training set The dynamic link library is put into the crucial dynamic link library collection;
C. to the test on checking collection one by one of these corresponding crucial dynamic link library collection of combination, find out relatively optimum crucial Dynamic link library collection S:
Optimize crucial dynamic link library collection by the way of greedy algorithm and checking data are combined;By progressively adding dynamic State chained library is modeled to alternative key dynamic link library collection and carrys out feedback result using checking collection, progressively finds one relatively Optimum crucial dynamic link library collection;
Concrete operations are presented below:
A. there is n element in setting relatively optimum key dynamic link library collection S, circulate and the element that S in collection I does not have is added into collection S, adds one every time, models the result obtained on checking collection, takes out, and adds next, repetition, until loop ends;
B. statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it, repeats a;
C. the process terminates until verifying that the result on collection no longer improves, and at this moment obtains final set S;
3) set up identification model and detected using hiding Nae Bayesianmethod:
After relatively optimum key dynamic link library collection is obtained, it is thus necessary to determine that modeling algorithm;Due to dynamic link library Between the degree of coupling it is higher, there is stronger interdependency, and hide Nae Bayesianmethod (hidden naive Bayes) It is suitable to process the higher degree of coupling;In Nae Bayesianmethod is hidden, there is a hiding father node and carry out table in each attribute Up to other attributes for the impact of the attribute;
After hiding Nae Bayesianmethod is determined, model is set up using training set, the process of concentration to be identified is pressed Tuple is converted into according to the mode in 1);Hiding Nae Bayesianmethod finally determines what identification was concentrated by design conditions probability The flag bit of process;Flag bit represents malicious process for 1, and 0 represents normal processes.
Further, wherein above-mentioned steps comprising the following steps that 1):
Step 1) -1:Initial state;
Step 1) -2:By tuple, each is all set to 0;
Step 1) -3:Traversal dynamic link library collection, for each dynamic link library, in the dynamic link library number of target process According to middle search, if it is present the position corresponding to the dynamic link library in the record corresponding to target process is set to 1;
Step 1) -4:Each process is traveled through, judges whether the process belongs to training set or confirm collection, if it is, entering 1) -5, otherwise it is collection to be identified, enters 1) -6;
Step 1) -5:If the process is malicious process, set flag bit as 1, travels through and finish and enter 1) -7, otherwise after Continuous 1) -4;
Step 1) -6:Remove the flag bit, traversal is finished and enters 1) -7, otherwise continue 1) -4;
Step 1) -7:The mapping mode for setting up first tuple from dynamic link library data to N is finished.
Further, wherein above-mentioned steps comprising the following steps that 2):
Step 2) -1:Initial state;
Step 2) -2:The dynamic link that statistics occurs in the conventional software processes and malicious software process of training set Storehouse, what they combined integrate as I;
Step 2) -3:(limit might as well be set in the way of the step-length (0.5% might as well be set as) of a setting and a setting It is that from 0 to 49.5% totally 100, the upper limit is from 50% to 100% totally 101) enumerating all of bound combination;
Step 2) -4:Traversal all combinations 2) in -3, for each combination, enumerate all dynamic link libraries in I;
Step 2) -5:If the dynamic link library occurs all in the conventional software processes and malicious software process of training set In probability between the bound combination, then the dynamic link library is put into into the collection, 2) -6 is entered if enumerating and finishing, otherwise returned Return 2) -5;
Step 2) -6:Enumerate each corresponding collection of bound combination to be tested on checking collection, record result;
Step 2) -7:If enumerate finished, optimum collection S is found out, otherwise return 2) -6;
Step 2) -8:If having n element in collection S, circulate and the element that S in collection I does not have added into collection S, add one every time, Obtain collecting X;
Step 2) -9:Model on X, obtain the result on checking collection, take out the element of addition, recover S;
Step 2) -10:If loop ends, 2) -11 are entered, otherwise return to 2) -8;
Step 2) -11:Statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it;
Step 2) -12:If the result during 2) -11 on checking collection no longer improves, terminate, obtain relatively optimum Dynamic link library collection, otherwise return 2) -8;
Step 2) -13:Relatively optimum crucial dynamic link library collection is calculated according to greedy algorithm to finish.
Further, wherein above-mentioned steps comprising the following steps that 3):
Step 3) -1:Initial state;
Step 3) -2:Process in collection to be identified and the training set selected is converted into into tuple according to the mode in 1);
Step 3) -3:Identification model is set up in training set using hiding Nae Bayesianmethod;
Step 3) -4:Hiding Nae Bayesianmethod finally determines the process that identification is concentrated by design conditions probability Flag bit;
Step 3) -5:If flag bit enters 3) -6 for 1, otherwise into 3) -7;
Step 3) -6:The tuple corresponding process is defined as into malicious process;
Step 3) -7:The tuple corresponding process is defined as into normal processes;
Step 3) -8:Identification model is set up using hiding Nae Bayesianmethod and detection is carried out and is finished.
Beneficial effects of the present invention, there is provided a kind of to be examined based on the dynamic link database data of process automatically under evidence obtaining scene The method for surveying malicious process, compared with existing malware detection method, present invention achieves in the software signatures that mean no harm Storehouse without correlation experience in the case of automatic identification in numerous unknown processes to malicious software process, and can in training set and It is corrected using confirmation collection when collection to be identified originates inconsistent.Additionally, the present invention can be processed from the original of separate sources Dynamic link database data.The present invention is especially suitable for the scene detected without priori and extensive automatization's malicious process.It is real Proof is trampled under conventional application scenarios, this method can reach more than more than 90 percent accuracy rate and time loss is only counted Second.
Description of the drawings
Fig. 1 is disliked for a kind of dynamic link database data automatic detection under evidence obtaining scene based on process of the embodiment of the present invention The flow chart of meaning process.
Flow charts of the Fig. 2 for the mapping mode of first tuple from dynamic link library data to N is set up in Fig. 1.
Flow charts of the Fig. 3 to calculate relatively optimum crucial dynamic link library collection according to greedy algorithm in Fig. 1.
Fig. 4 sets up identification model the flow chart for being detected for hiding Nae Bayesianmethod is adopted in Fig. 1.
Specific embodiment
In order to know more about the technology contents of the present invention, especially exemplified by specific embodiment and institute's accompanying drawings are coordinated to be described as follows.
Fig. 1 is disliked for a kind of dynamic link database data automatic detection under evidence obtaining scene based on process of the embodiment of the present invention The flow chart of the method for meaning process.
It is a kind of evidence obtaining scene under based on process dynamic link database data automatic detection malicious process method, its feature It is to comprise the following steps:
S101 sets up the mapping mode of first tuple from dynamic link library data to N.
Define 1:One N unit tuple is the sequence being made up of 0 or 1 that a length is N, and N is nonnegative integer here;
Define 2:Flag bit is added to a special bit of unit of N units group end, and it is used for representing representated by this tuple Process whether be malicious process, it is used for the identification process of Malware;
Dynamic link database data is quickly analyzed and is processed in order to be able to by recognizer, is needed the dynamic of each process Link database data is mapped to a data structure, that is, define the N units tuple in 1;Dynamic link library collection is the standard of mapping, for One dynamic link library collection comprising N number of dynamic link library, corresponding data structure are a N+1 units tuple, determine including one Flag bit in justice 2, the corresponding dynamic link library data structure of process of concentration to be identified does not have this position, is N first;
Mapping mode is presented below:
A. by tuple, each is all set to 0;
B. the dynamic link library collection as standard is traveled through, for each dynamic link library, in the dynamic link library number of process According to middle search, if it is present the position corresponding to the dynamic link library of the record corresponding to process is set to 1;
If c. the process belongs to training set or confirms collection, if for malicious process, it is known that being, be set to 1, if Collection to be identified is test set, then remove the flag bit;
S103 calculates relatively optimum crucial dynamic link library collection according to greedy algorithm.
Establishing from N number of dynamic link database data to N after the mapping mode of first tuple, needing to select dynamic link library To form dynamic link library collection, and these selected dynamic link libraries are referred to as crucial dynamic link library;The choosing of this collection Selecting to produce the detection model of foundation affects and then produces impact to recognition accuracy;
Define 3:Theoretical optimum dynamic link library collection is such a collection, does not have any other collection to use same algorithm same Performance after modeling in one training set in same test set is better than the collection, and this theoretical optimum dynamic link library collection depends on instruction Practice the selection of collection, test set and algorithm;
Define 4:Relatively optimum dynamic link library collection is often not equal to theoretical optimum dynamic link library collection, but in similar training In the case of collection and test set, relatively good performance is suffered from, additionally, the collection can be by controllable computational complexity certain Obtain in step;
When crucial dynamic link library is selected, as crucial dynamic link library is the public affairs for describing process in each classification Common attribute, the related dynamic link library of process can not be chosen as crucial dynamic link library;In the case of no priori, Firstly the need of investigate be windows system dynamics chained library, not directly as crucial dynamic link library, otherwise set up model When " dimension blast " can occur;The probability for occurring in view of each dynamic link library in the sample inequality, then by sample Originally the mode for being counted is counted soft in the conventional software processes and malice of training set initializing crucial dynamic link library collection The dynamic link library occurred in part process, what they combined integrate as I;
In order to improve the accuracy rate of identification, more training data is generally required;The malicious process for differentiating if desired Dynamic link database data is different with the dynamic link library source for training, and needs go out initially with enumerating dynamic link library dll Now ratio bound combination, and collect, verify the mode fed back to obtain an optimum dynamic chain in combined test in checking Storehouse collection is connect as beginning, checking collection must be with data to be identified from same source;So as to by it is a quantity excessively huge can In the limited computable interval of energy property combinatorial mapping to;
Concrete operations are presented below:
A. (limit is set common from 0 to 49.5% in the way of the step-length (step-length takes 0.5%) of a setting and a setting 100, the upper limit is from 50% to 100% totally 101) enumerating, such one has 10100 kinds of bounds combinations;
B. for each combines, a crucial dynamic link library collection is all corresponded to, calculation is if a dynamic chain Connect storehouse to occur all in probability between the bound combination, then in the conventional software processes and malicious software process of training set The dynamic link library is put into the crucial dynamic link library collection;
C. to the test on checking collection one by one of these corresponding crucial dynamic link library collection of combination, find out relatively optimum crucial Dynamic link library collection S:
Optimize crucial dynamic link library collection by the way of greedy algorithm and checking data are combined;By progressively adding dynamic State chained library is modeled to alternative key dynamic link library collection and carrys out feedback result using checking collection, progressively finds one relatively Optimum crucial dynamic link library collection;
Concrete operations are presented below:
A. there is n element in setting relatively optimum key dynamic link library collection S, circulate and the element that S in collection I does not have is added into collection S, adds one every time, models the result obtained on checking collection, takes out, and adds next, repetition, until loop ends;
B. statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it, repeats a;
C. the process terminates until verifying that the result on collection no longer improves, and at this moment obtains final set S;
S105 sets up identification model and is detected using Nae Bayesianmethod is hidden.
After relatively optimum key dynamic link library collection is obtained, it is thus necessary to determine that modeling algorithm;Due to dynamic link library Between the degree of coupling it is higher, there is stronger interdependency, and hide Nae Bayesianmethod (hidden naive Bayes) It is suitable to process the higher degree of coupling;In Nae Bayesianmethod is hidden, there is a hiding father node and carry out table in each attribute Up to other attributes for the impact of the attribute;
After hiding Nae Bayesianmethod is determined, model is set up using training set, the process of concentration to be identified is pressed Tuple is converted into according to the mode in S101;Hiding Nae Bayesianmethod finally determines that by design conditions probability identification is concentrated Process flag bit;Flag bit represents malicious process for 1, and 0 represents normal processes.
Fig. 2 is the flow chart for setting up the mapping mode of first tuple from dynamic link library data to N.
Define 1:One N unit tuple is the sequence being made up of 0 or 1 that a length is N, and N is nonnegative integer here;
Define 2:Flag bit is added to a special bit of unit of N units group end, and it is used for representing representated by this tuple Process whether be malicious process, it is used for the identification process of Malware;
Dynamic link database data is quickly analyzed and is processed in order to be able to by recognizer, is needed the dynamic of each process Link database data is mapped to a data structure, that is, define the N units tuple in 1;Dynamic link library collection is the standard of mapping, for One dynamic link library collection comprising N number of dynamic link library, corresponding data structure are a N+1 units tuple, determine including one Flag bit in justice 2, the corresponding dynamic link library data structure of process of concentration to be identified does not have this position, is N first;
Mapping mode is presented below:
A. by tuple, each is all set to 0;
B. the dynamic link library collection as standard is traveled through, for each dynamic link library, in the dynamic link library number of process According to middle search, if it is present the position corresponding to the dynamic link library of the record corresponding to process is set to 1;
If c. the process belongs to training set or confirms collection, if for malicious process, it is known that being, be set to 1, if Collection to be identified is test set, then remove the flag bit;
Comprise the following steps that:
Step 1:Initial state;Step 2:By tuple, each is all set to 0;Step 3:Traversal dynamic link library collection, for Each dynamic link library, searches in the dynamic link database data of target process, if it is present corresponding to target process Position in record corresponding to the dynamic link library is set to 1;Step 4:Each process is traveled through, judges whether the process belongs to training Collection confirms collection, if it is, entering 5, is otherwise collection to be identified, enters 6;Step 5:If the process is malicious process, if It is 1 to determine flag bit, and traversal is finished and enters 7, otherwise continues 4;Step 6:Remove the flag bit, traversal is finished and enters 7, otherwise continues 4;Step 7:The mapping mode for setting up first tuple from dynamic link library data to N is finished.
Fig. 3 is the flow chart for calculating relatively optimum crucial dynamic link library collection according to greedy algorithm.Establishing from N number of Dynamic link database data to after the mapping mode of N units tuple needs to select dynamic link library to form dynamic link library collection, and These selected dynamic link libraries are referred to as crucial dynamic link library;The selection of this collection can be produced to the detection model set up It is raw to affect and then impact is produced on recognition accuracy;
Define 3:Theoretical optimum dynamic link library collection is such a collection, does not have any other collection to use same algorithm same Performance after modeling in one training set in same test set is better than the collection, and this theoretical optimum dynamic link library collection depends on instruction Practice the selection of collection, test set and algorithm;
Define 4:Relatively optimum dynamic link library collection is often not equal to theoretical optimum dynamic link library collection, but in similar training In the case of collection and test set, relatively good performance is suffered from, additionally, the collection can be by controllable computational complexity certain Obtain in step;
When crucial dynamic link library is selected, as crucial dynamic link library is the public affairs for describing process in each classification Common attribute, the related dynamic link library of process can not be chosen as crucial dynamic link library;In the case of no priori, Firstly the need of investigate be windows system dynamics chained library, not directly as crucial dynamic link library, otherwise set up model When " dimension blast " can occur;The probability for occurring in view of each dynamic link library in the sample inequality, then by sample Originally the mode for being counted is counted soft in the conventional software processes and malice of training set initializing crucial dynamic link library collection The dynamic link library occurred in part process, what they combined integrate as I;
In order to improve the accuracy rate of identification, more training data is generally required;The malicious process for differentiating if desired Dynamic link database data is different with the dynamic link library source for training, and needs go out initially with enumerating dynamic link library dll Now ratio bound combination, and collect, verify the mode fed back to obtain an optimum dynamic chain in combined test in checking Storehouse collection is connect as beginning, checking collection must be with data to be identified from same source;So as to by it is a quantity excessively huge can In the limited computable interval of energy property combinatorial mapping to;
Concrete operations are presented below:
A. (limit is set common from 0 to 49.5% in the way of the step-length (step-length takes 0.5%) of a setting and a setting 100, the upper limit is from 50% to 100% totally 101) enumerating, such one has 10100 kinds of bounds combinations;
B. for each combines, a crucial dynamic link library collection is all corresponded to, calculation is if a dynamic chain Connect storehouse to occur all in probability between the bound combination, then in the conventional software processes and malicious software process of training set The dynamic link library is put into the crucial dynamic link library collection;
C. to the test on checking collection one by one of these corresponding crucial dynamic link library collection of combination, find out relatively optimum crucial Dynamic link library collection S:
Optimize crucial dynamic link library collection by the way of greedy algorithm and checking data are combined;By progressively adding dynamic State chained library is modeled to alternative key dynamic link library collection and carrys out feedback result using checking collection, progressively finds one relatively Optimum crucial dynamic link library collection;
Concrete operations are presented below:
A. there is n element in setting relatively optimum key dynamic link library collection S, circulate and the element that S in collection I does not have is added into collection S, adds one every time, models the result obtained on checking collection, takes out, and adds next, repetition, until loop ends;
B. statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it, repeats a;
C. the process terminates until verifying that the result on collection no longer improves, and at this moment obtains final set S;
Comprise the following steps that:
Step 1:Initial state;Step 2:Statistics occurs in the conventional software processes and malicious software process of training set Dynamic link library, they combine integrates as I;Step 3:Set with the step-length (0.5% might as well be set as) of a setting and one Fixed mode (might as well set and be limited to from 0 to 49.5% totally 100, the upper limit is from 50% to 100% totally 101) is all of to enumerate Bound is combined;Step 4:All combinations in traversal 3, for each combination, enumerate all dynamic link libraries in I;Step Rapid 5:If the dynamic link library occurs all in probability in the conventional software processes and malicious software process of training set on this Between lower values, then the dynamic link library is put into into the collection, 6 is entered if enumerating and finishing, otherwise return 5;Step 6:Enumerate Each corresponding collection of bound combination is tested on checking collection, records result;Step 7:If enumerate finished, find out most Excellent collection S, otherwise returns 6;Step 8:If having n element in collection S, circulate and the element that S in collection I does not have is added into collection S, add every time One, obtain collecting X;Step 9:Model on X, obtain the result on checking collection, take out the element of addition, recover S;Step 10:If loop ends, 11 are entered, otherwise return to 8;Step 11:Statistics obtain the result on checking collection it is best possess n+1 The set T of element, replaces S with it;Step 12:If the result during 11 on checking collection no longer improves, terminate, obtain Relatively optimum dynamic link library collection, otherwise returns 8;Step 13:Relatively optimum crucial dynamic chain is calculated according to greedy algorithm Connect storehouse collection to finish.
Fig. 4 is to set up identification model the flow chart for being detected using hiding Nae Bayesianmethod.Obtaining phase After optimum key dynamic link library collection, it is thus necessary to determine that modeling algorithm;As between dynamic link library, the degree of coupling is higher, exist Stronger interdependency, and hide Nae Bayesianmethod (hidden naive Bayes) and be adapted to process higher coupling Degree;In Nae Bayesianmethod is hidden, there is a hiding father node to express other attributes for the category in each attribute The impact of property;
After hiding Nae Bayesianmethod is determined, model is set up using training set, the process of concentration to be identified is pressed Tuple is converted into according to the mode in Fig. 2;Hiding Nae Bayesianmethod finally determines that by design conditions probability identification is concentrated Process flag bit;Flag bit represents malicious process for 1, and 0 represents normal processes.
Comprise the following steps that:
Step 1:Initial state;Step 2:By the process in collection to be identified and the training set selected according to the mode in Fig. 2 It is converted into tuple;Step 3:Identification model is set up in training set using hiding Nae Bayesianmethod;Step 4:Hide simplicity Bayes method finally determines the flag bit of the process that identification is concentrated by design conditions probability;Step 5:If flag bit is 1 enters 6, otherwise into 7;Step 6:The tuple corresponding process is defined as into malicious process;Step 7:Will be the tuple corresponding Process is defined as normal processes;Step 8:Identification model is set up using hiding Nae Bayesianmethod and detection is carried out and is finished.
Although the present invention is disclosed above with preferred embodiment, so which is not limited to the present invention.Skill belonging to of the invention Has usually intellectual in art field, without departing from the spirit and scope of the present invention, when can be used for a variety of modifications and variations.Cause This, protection scope of the present invention ought be defined depending on those as defined in claim.

Claims (4)

1. it is a kind of evidence obtaining scene under automatic detection malicious process method, it is characterised in that comprise the following steps:
1)Set up the mapping mode of first tuple from dynamic link library data to N;
Define 1:One N unit tuple is the sequence being made up of 0 or 1 that a length is N, and N is nonnegative integer here;
Define 2:Flag bit is added to a special bit of unit of N units group end, and it is used for representing entering representated by this tuple Whether journey is malicious process, and it is used for the identification process of Malware;
Dynamic link database data is quickly analyzed and is processed in order to be able to by recognizer, is needed the dynamic link of each process Database data is mapped to a data structure, that is, define the N units tuple in 1;Dynamic link library collection is the standard of mapping, for one Dynamic link library collection comprising N number of dynamic link library, corresponding data structure are a N+1 units tuple, define in 2 including one Flag bit, the corresponding dynamic link library data structure of process of concentration to be identified does not have this position, is N first;
Mapping mode is presented below:
A. by tuple, each is all set to 0;
B. the dynamic link library collection as standard is traveled through, for each dynamic link library, in the dynamic link database data of process Search, if it is present the position corresponding to the dynamic link library of the record corresponding to process is set to 1;
If c. the process belongs to training set or confirms collection, if for malicious process, it is known that being, be set to 1, if waiting to know Ji not be test set, then remove the flag bit;
2)Relatively optimum crucial dynamic link library collection is calculated according to greedy algorithm;
Establishing from N number of dynamic link database data to N after the mapping mode of first tuple, needing to select dynamic link library to carry out shape Into dynamic link library collection, and these selected dynamic link libraries are referred to as crucial dynamic link library;The selection meeting of this collection The detection model set up is produced to be affected and then produces impact to recognition accuracy;
Define 3:Theoretical optimum dynamic link library collection is such a collection, does not have any other collection to use same algorithm in same instruction Practice the performance after modeling on collection in same test set and be better than the collection, this theoretical optimum dynamic link library collection depends on training The selection of collection, test set and algorithm;
Define 4:Relatively optimum dynamic link library collection is not equal to theoretical optimum dynamic link library collection, but in similar training set and test In the case of collection, relatively good performance is suffered from, additionally, the collection can be obtained in certain step by controllable computational complexity ;
When crucial dynamic link library is selected, as crucial dynamic link library is the public category for describing process in each classification Property, the related dynamic link library of process can not be chosen as crucial dynamic link library;In the case of no priori, first Need investigation is the system dynamics chained library of windows, not directly as crucial dynamic link library, meeting when otherwise setting up model Occur " dimension blast ";The probability for occurring in view of each dynamic link library in the sample inequality, then by entering to sample The mode of row statistics counts conventional software processes and Malware in training set and enters initializing crucial dynamic link library collection The dynamic link library occurred in journey, what they combined integrate as I;
In order to improve the accuracy rate of identification, more training data is generally required;The dynamic of the malicious process if necessary to differentiate Link database data is different with the dynamic link library source for training, and needs ratio occur initially with dynamic link library dll is enumerated Rate bound is combined, and is collected, verified the mode fed back to obtain an optimum dynamic link library in combined test in checking As starting, collection verifies that collection is necessary and data to be identified are from same source;So as to by a quantity excessively huge probability In the limited computable interval of combinatorial mapping to;
Concrete operations are presented below:
A. the mode of 0.5%, and setting is taken with the step-length of a setting, step-length, set limit from 0 to 49.5% totally 100, on Limit from 50% to 100% totally 101 enumerating, such one has 10100 kinds of bounds combinations;
B. for each combines, a crucial dynamic link library collection is all corresponded to, calculation is if a dynamic link library In the conventional software processes and malicious software process of training set, probability is appeared between the bound combination, then by the dynamic Chained library is put into the crucial dynamic link library collection;
C. to the test on checking collection one by one of these corresponding crucial dynamic link library collection of combination, find out relatively optimum crucial dynamic Chained library collection S:
Optimize crucial dynamic link library collection by the way of greedy algorithm and checking data are combined;By progressively adding dynamic chain Connect storehouse to be modeled to alternative key dynamic link library collection and carry out feedback result using checking collection, progressively find an optimum relatively Crucial dynamic link library collection;
Concrete operations are presented below:
A. there is n element in setting relatively optimum key dynamic link library collection S, circulate and the element that S in collection I does not have is added into collection S, often Secondary to add one, modeling obtains the result on checking collection, takes out, and addition is next, repeats, until loop ends;
B. statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it, repeats a;
C. the process terminates until verifying that the result on collection no longer improves, and at this moment obtains final set S;
3)Identification model is set up using hiding Nae Bayesianmethod and is detected:
After relatively optimum key dynamic link library collection is obtained, it is thus necessary to determine that modeling algorithm;Due between dynamic link library The degree of coupling is higher, there is stronger interdependency, and hides Nae Bayesianmethod(hidden naive Bayes)It is adapted to Process the higher degree of coupling;In Nae Bayesianmethod is hidden, there is a hiding father node to express which in each attribute His attribute is for the impact of the attribute;
After hiding Nae Bayesianmethod is determined, model is set up using training set, by the process of concentration to be identified according to 1) In mode be converted into tuple;Hiding Nae Bayesianmethod finally determines the process that identification is concentrated by design conditions probability Flag bit;Flag bit represents malicious process for 1, and 0 represents normal processes.
2. it is according to claim 1 evidence obtaining scene under automatic detection malicious process method, above-mentioned steps 1)It is concrete Step is as follows:
Step 1)-1:Initial state;
Step 1)-2:By tuple, each is all set to 0;
Step 1)-3:Traversal dynamic link library collection, for each dynamic link library, in the dynamic link database data of target process Search, if it is present the position corresponding to the dynamic link library in the record corresponding to target process is set to 1;
Step 1)-4:Each process is traveled through, judges whether the process belongs to training set or confirm collection, if it is, entering 1)- 5, Otherwise it is collection to be identified, enters 1)-6;
Step 1)-5:If the process is malicious process, flag bit is set as 1, traversal is finished and enters 1)- 7, otherwise continue 1)- 4;
Step 1)-6:Remove the flag bit, traversal is finished and enters 1)- 7, otherwise continue 1)-4;
Step 1)-7:The mapping mode for setting up first tuple from dynamic link library data to N is finished.
3. it is according to claim 1 evidence obtaining scene under automatic detection malicious process method, above-mentioned steps 2)It is concrete Step is as follows:
Step 2)-1:Initial state;
Step 2)-2:The dynamic link library that statistics occurs in the conventional software processes and malicious software process of training set, he Combine integrate as I;
Step 2)-3:With the step-length for setting, a step size settings as 0.5%, and one setting mode, set be limited to from 0 to 49.5% totally 100, the upper limit from 50% to 100% totally 101 enumerating all of bound combination;
Step 2)-4:Traversal 2)All combinations in -3, for each combination, enumerate all dynamic link libraries in I;
Step 2)-5:If dynamic link library probability in the conventional software processes and malicious software process of training set all occurs Between the bound combination, then the dynamic link library is put into into the collection, 2 is entered if enumerating and finishing)- 6, otherwise return 2)- 5;
Step 2)-6:Enumerate each corresponding collection of bound combination to be tested on checking collection, record result;
Step 2)-7:If enumerate finished, optimum collection S is found out, 2 are otherwise returned)-6;
Step 2)-8:If having n element in collection S, circulate and the element that S in collection I does not have is added into collection S, add one every time, obtain Collection X;
Step 2)-9:Model on X, obtain the result on checking collection, take out the element of addition, recover S;
Step 2)-10:If loop ends, 2 are entered)- 11, otherwise return to 2)-8;
Step 2)-11:Statistics obtains the best set T for possessing n+1 element of result on checking collection, replaces S with it;
Step 2)-12:If 2)Result during -11 on checking collection no longer improves, and terminates, and obtains relatively optimum dynamic State chained library collection, otherwise returns 2)-8;
Step 2)-13:Relatively optimum crucial dynamic link library collection is calculated according to greedy algorithm to finish.
4. it is according to claim 1 evidence obtaining scene under automatic detection malicious process method, above-mentioned steps 3)It is concrete Step is as follows:
Step 3)-1:Initial state;
Step 3)-2:By the process in collection to be identified and the training set selected according to 1)In mode be converted into tuple;
Step 3)-3:Identification model is set up in training set using hiding Nae Bayesianmethod;
Step 3)-4:Hiding Nae Bayesianmethod finally determines the mark of the process that identification is concentrated by design conditions probability Position;
Step 3)-5:If flag bit enters 3 for 1)- 6, otherwise into 3)-7;
Step 3)-6:The tuple corresponding process is defined as into malicious process;
Step 3)-7:The tuple corresponding process is defined as into normal processes;
Step 3)-8:Identification model is set up using hiding Nae Bayesianmethod and detection is carried out and is finished.
CN201410705875.1A 2014-11-27 2014-11-27 A kind of method of the automatic detection malicious process under evidence obtaining scene Expired - Fee Related CN104376261B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410705875.1A CN104376261B (en) 2014-11-27 2014-11-27 A kind of method of the automatic detection malicious process under evidence obtaining scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410705875.1A CN104376261B (en) 2014-11-27 2014-11-27 A kind of method of the automatic detection malicious process under evidence obtaining scene

Publications (2)

Publication Number Publication Date
CN104376261A CN104376261A (en) 2015-02-25
CN104376261B true CN104376261B (en) 2017-04-05

Family

ID=52555163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410705875.1A Expired - Fee Related CN104376261B (en) 2014-11-27 2014-11-27 A kind of method of the automatic detection malicious process under evidence obtaining scene

Country Status (1)

Country Link
CN (1) CN104376261B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783753A (en) * 2018-12-14 2019-05-21 平安普惠企业管理有限公司 The tree-shaped drawing generating method of web site url, device, equipment and storage medium
CN109714329A (en) * 2018-12-24 2019-05-03 成都蜀道易信科技有限公司 Low rate DDoS detection method based on Bayesian network under a kind of cloud environment
CN109918907B (en) * 2019-01-30 2021-05-25 国家计算机网络与信息安全管理中心 Method, controller and medium for obtaining evidence of malicious codes in process memory of Linux platform
CN112906786A (en) * 2021-02-07 2021-06-04 滁州职业技术学院 Data classification improvement method based on naive Bayes model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622536A (en) * 2011-01-26 2012-08-01 中国科学院软件研究所 Method for catching malicious codes
CN103886252A (en) * 2013-04-26 2014-06-25 卡巴斯基实验室封闭式股份公司 Software Code Malicious Selection Evaluation Executed In Trusted Process Address Space

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011053893A (en) * 2009-09-01 2011-03-17 Hitachi Ltd Illicit process detection method and illicit process detection system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622536A (en) * 2011-01-26 2012-08-01 中国科学院软件研究所 Method for catching malicious codes
CN103886252A (en) * 2013-04-26 2014-06-25 卡巴斯基实验室封闭式股份公司 Software Code Malicious Selection Evaluation Executed In Trusted Process Address Space

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
计算机入侵取证中的入侵事件重构技术研究;季雨辰等;《计算机工程》;20140131;第40卷(第1期) *

Also Published As

Publication number Publication date
CN104376261A (en) 2015-02-25

Similar Documents

Publication Publication Date Title
CN108777873A (en) The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend
CN103116713B (en) Based on compound and the prediction of protein-protein interaction method of random forest
CN101976313B (en) Frequent subgraph mining based abnormal intrusion detection method
CN104376261B (en) A kind of method of the automatic detection malicious process under evidence obtaining scene
CN111506599B (en) Industrial control equipment identification method and system based on rule matching and deep learning
CN108268581A (en) The construction method and device of knowledge mapping
US9720986B2 (en) Method and system for integrating data into a database
CN103838754B (en) Information retrieval device and method
CN102176223B (en) Protein complex identification method based on key protein and local adaptation
CN104700033A (en) Virus detection method and virus detection device
CN109241740A (en) Malware benchmark test set creation method and device
CN106156082A (en) A kind of body alignment schemes and device
CN104392171B (en) A kind of automatic internal memory evidence analysis method based on data association
CN103970733A (en) New Chinese word recognition method based on graph structure
CN104331664B (en) A kind of method that unknown rogue program feature is automatically analyzed under evidence obtaining scene
CN112104518B (en) Bit data feature mining method, system, equipment and readable medium
CN103324888A (en) Method and system for automatically extracting virus characteristics based on family samples
CN105119910A (en) Template-based online social network rubbish information real-time detecting method
CN108009298B (en) Internet character search information integration analysis control method
CN111767546B (en) Deep learning-based input structure inference method and device
CN112257332B (en) Simulation model evaluation method and device
CN115602244B (en) Genome variation detection method based on sequence alignment skeleton
CN107133281B (en) Global multi-query optimization method based on grouping
CN102135940A (en) Finite automata-based automatic behavior modeling method
CN110544510B (en) Contig integration method based on adjacent algebraic model and quality grade evaluation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170405