CN106557695B - A kind of malicious application detection method and system - Google Patents
A kind of malicious application detection method and system Download PDFInfo
- Publication number
- CN106557695B CN106557695B CN201510621631.XA CN201510621631A CN106557695B CN 106557695 B CN106557695 B CN 106557695B CN 201510621631 A CN201510621631 A CN 201510621631A CN 106557695 B CN106557695 B CN 106557695B
- Authority
- CN
- China
- Prior art keywords
- application
- malicious
- application program
- labeled
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Storage Device Security (AREA)
Abstract
The present invention relates to a kind of malicious application detection method and systems.The described method includes: S1, the application program to be detected progress static code scanning to receiving, three dimensional analysis application programs are exported with the presence or absence of the malicious act of any malicious act information met in malicious act information bank based on authority application, function call and information, malicious act if it exists, the application program is then labeled as doubtful malicious application, the application program is then labeled as normal use by malicious act if it does not exist;S2, by carried out between the malicious application sample being labeled as in the application program and malicious application sample database of doubtful malicious application based on Apply Names, packet name, signing certificate, bibliographic structure, text file and image file similarity analysis, and the application program that similarity meets setting value is labeled as malicious application.The invention avoids the performance bottlenecks for loading application execution by virtual machine and analyzing, and effectively reduce rate of false alarm, promote the accuracy of identification.
Description
Technical field
The present invention relates to development of Mobile Internet technology, more specifically to a kind of malicious application detection method and system.
Background technique
As the universal of mobile intelligent terminal, mobile Internet business flourish, the quantity of mobile application software is in
Existing rapid growth trend.The prelude for having opened mobile Internet industry development, intelligence are changed in the subversiveness that mobile intelligent terminal causes
Energy terminal changes the Working Life mode of people, and the safety of mobile application software also faces severe situation.
The rapid growth of mobile application software brings spreading unchecked on a large scale for the applications such as various piracies, malicious application, virus.
Relative to traditional PC terminal, the malicious application feature of mobile terminal is more obvious, and the mutation speed of malicious application is very fast, daily
There is a large amount of mutation malicious application to occur.
Ended for the end of the year 2014, Android platform application software quantity breaks through 2,000,000, becomes the most system of application software
Platform, and because the application and development mode of Android platform is determined that, relative to traditional PC terminal, the mutation of malicious application is more
For publisher's self-developing and to propagate, the mutation period is longer, and because Android application is easy to carry out reverse engineering, malice
Code is easy to formation mutation after being recompiled packing and issues again, so the mutation of malicious application is more easier, to make
Frequent at mutation, the period is very short.Therefore for mobile terminal malicious application prevent and treat, how the change of effective solution malicious application
Kind identification is particularly important.
In traditional PC terminal, the identification for mutation mainly uses three kinds of methods:
1, it being based on broad spectral features code: being also gene expression characteristics code, gene code, which detects, summarizes the feature of certain class malicious application, and one
A gene code can correspond to a major class malicious application.In addition to this, gene code can also effectively tackle mutation malicious application, centainly
The awkward situation that condition code killing has no way out to unknown malicious application is compensated in degree.
But there are following limitations for the identification technology based on broad spectral features code:
(1) probability of wrong report is increased.Gene expression characteristics code killing is easy to sentence the normal software with certain feature codes
To threaten, some normal softwares can be reported by mistake.
(2) gene expression characteristics code analysis extraction difficulty is very big, needs very professional technical staff, and the extraction of condition code
Quality greatly affects final malicious application judgement, therefore the artifical influence factor of this method is very big, and effect depends on
In the quality of safe professional technician.
(3) it needs a large amount of sample to be analyzed, before gene expression characteristics code is analyzed and extracts, malice can not be coped with and answered
Propagation, it is fast for mobile terminal from malicious application mutation, period short feature is propagated, this method can not effectively solve malice
The killing problem of application.
2, trigger-initiated scanning technology: the also known as malicious application scanning technique of Behavior-based control analysis, is to analyze malicious application
Behavioral characteristics different from normal software distinguish malicious application, so also can effectively find unknown malicious application, with
And the various mutation of malicious application.In security expert's eye, the behavior of malicious application and ordinary procedure is made a world of difference, such as common journey
Sequence will not generate file in system core catalogue, will not hang up hook in system, will not register service topsy-turvy etc.
Deng.Inspirational education realizes some analytical mathematics of security expert using computer automatic analysis technology, according to the row of application
To whether there is malicious act to determine to apply.
The limitation of trigger-initiated scanning technology is as follows:
(1) rate of false alarm is very high.The software of same behavior might not all be malicious application, such as read address list to finger
The behavior for determining address transmission, is not necessarily and steals user information, it is also possible to be data backup software.
(2) operational efficiency is low.Since it is desired that running malicious application in virtual machine, and it is collected simultaneously the row of malicious application
It is analyzed for data, the operational efficiency of this mode is low, more suitable for running in background server, for there is user's interaction
Anti-virus tools, user experience is bad.
3, be based on artificial intelligence (AI) technology: artificial intelligence technology is by behavior integration analysis, to malicious application
It practises, constantly voluntarily optimizes the malicious application behavioural characteristic library of oneself, while automatically extracting condition code.From the malicious application of most initial
Behavioural characteristic code ultimately forms more optimized behavioural characteristic code library, by continuing to optimize, increasing to cope with various unknown malice
Using with malicious application mutation;Simultaneously by automatically extracting malicious application condition code, to enhance the killing applied to known malicious
Efficiency.
Artificial intelligence technology main problem is as follows:
(1) artificial intelligence is the process for needing constantly to learn, only when malicious application sample is enough, people
Work intelligent engine can complete the learning process of oneself, so that behavioural characteristic code is improved, so this technological lag is in malice
The propagation of application.
(2) algorithm model of artificial intelligence is extremely complex, while the feature of malicious application is again changeable, designs good
Habit model is extremely difficult, and often a kind of model is not able to satisfy the needs of all applications.
(3) on mobile terminals, malicious application mutation is characterized in that quantity is more, variation is fast, the propagation time is short, a mutation
May only propagate several days will disappear, and other mutation occurs, very low using artificial intelligence efficiency under this feature.
In short, mobile Internet business under the new situation, the quantity of mobile terminal has substantially exceeded the number of PC terminal
Amount, mobile application becomes the following most important application form can no longer meet using the malicious application detection method in PC epoch
It needs, only seeks a kind of completely new solution, could ensure the interests of user, ensure the information security of user, promote to produce
The sustainable development of industry chain.
Summary of the invention
The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, providing a kind of malicious application inspection
Method and system is surveyed, automatization level is detected to improve the mutation malicious application of mobile application, reduces False Rate, promoted to unknown
The discovery efficiency of mutation.
The technical solution adopted by the present invention to solve the technical problems is: propose a kind of malicious application detection method, including
Following steps:
S1, static code scanning is carried out to the application program to be detected that receives, based on authority application, function call and
Information, which exports three dimensional analysis application programs, whether there is any malicious act information met in malicious act information bank
Malicious act, malicious act, then be labeled as doubtful malicious application for the application program, if it does not exist malicious act if it exists,
The application program is then labeled as normal use;
S2, it will be labeled as between the malicious application sample in the application program and malicious application sample database of doubtful malicious application
Carry out based on Apply Names, packet name, signing certificate, bibliographic structure, text file and image file similarity analysis, and by phase
Malicious application is labeled as like the application program that degree meets setting value.
According to one embodiment of present invention, the method also includes:
S3, the application program deposit erroneous judgement information bank that malicious application will be not labeled as in the step S2;
S4, the result based on application program in manual analysis erroneous judgement information bank will not be that malice is answered in the erroneous judgement information bank
Application program is labeled as normal use deposit normal use library, and the information of the normal use is stored in white list library;
S5, the result based on application program in manual analysis erroneous judgement information bank will be malicious applications in the erroneous judgement information bank
Application program be labeled as malicious application deposit malicious application library, and by the malicious application be stored in malicious application sample database.
According to one embodiment of present invention, the step S1 further comprises:
S11, the application program decompiling to be detected received is formed into code file and corresponding competence profile
And resource file, and parse the Apply Names of application program, packet name, signing certificate and bibliographic structure;
S12, reachability matrix model is called, exports three dimensional searches and analysis from authority application, function call and information
With the presence or absence of times met in malicious act information bank in code file, competence profile and the resource file that decompiling is formed
The malicious act of one malicious act information, wherein the reachability matrix model is based on malicious act information bank and white list library
Pre-generated;
S13, the application program that malicious act will be present are labeled as doubtful malicious application and are stored in doubtful malicious application library, will
Normal use is labeled as there is no the application program of malicious act and is stored in normal use library.
According to one embodiment of present invention, the step S2 further comprises:
S21, will be labeled as doubtful malicious application application program signing certificate and malicious application sample database in malice
It is matched using sample, if the signing certificate is present in malicious application sample database, is directly labeled as the application program
Malicious application is simultaneously stored in malicious application library;
If S22, the signing certificate are not present in malicious application sample database, the application name of the further progress application program
The similarity analysis of title and packet name, finds out sample set similar with the Apply Names and packet name from malicious application sample database;
If finding the sample set in S23, step S22, by the sample in the sample set respectively with application to be analyzed
Program carry out bibliographic structure, text file and image file similarity analysis, calculate similarity value, and have sample with to point
When the similarity of the application program of analysis meets setting value, which is labeled as malicious application and is stored in malicious application library
In;
If not found in the sample set or step S23 in S24, step S22 does not have sample and application program to be analyzed
Similarity when meeting setting value, by malicious application sample database whole samples and application program to be analyzed carry out catalogue knot
The similarity analysis of structure, text file and image file calculates similarity value, and is having sample and application program to be analyzed
When similarity meets setting value, which is labeled as malicious application and is stored in malicious application library.
According to one embodiment of present invention, the similarity analysis of Apply Names and packet name uses editing distance algorithm,
Bibliographic structure similarity analysis uses catalogue Comparison Method, and text file similarity analysis uses editing distance algorithm, image file
Similarity analysis uses perceptual hash algorithm.
The present invention is to solve its technical problem also to propose a kind of malicious application detection system, comprising:
Malicious act information bank saves various evils for exporting three dimensions according to authority application, function call and information
Meaning behavioural information;
Malicious application sample database, for storing the information of various malicious application samples;
Static inspirational education subsystem, for carrying out static code scanning to the application program to be detected received,
It whether there is based on authority application, function call and information three dimensional analysis application programs of output and meet malicious act information
The malicious act of any malicious act information in library, malicious act, then be labeled as doubtful malice for the application program if it exists
Using malicious act, then be labeled as normal use for the application program if it does not exist;
Similarity analysis subsystem, for doubtful malicious application will to be labeled as by the static inspirational education subsystem
It is carried out between malicious application sample in application program and malicious application sample database based on Apply Names, packet name, signing certificate, mesh
The similarity analysis of directory structures, text file and image file, and the application program that similarity meets setting value is labeled as disliking
Meaning application.
According to one embodiment of present invention, the system also includes:
Doubtful malicious application library is labeled as doubtful malicious application by the static inspirational education subsystem for saving
Application program;
Information bank is judged by accident, for saving the application program for not being labeled as malicious application by the similarity analysis subsystem;
Normal use library, for saving the application journey for being labeled as normal use by the static inspirational education subsystem
Sequence and the result for judging application program in information bank by accident based on manual analysis are labeled as the application program of normal use;
White list library is labeled as normal use for saving the result based on application program in manual analysis erroneous judgement information bank
Application program information;
Malicious application library, for saving the application program for being labeled as malicious application by the similarity analysis subsystem.
According to one embodiment of present invention, the static state inspirational education subsystem further comprises:
Reachability matrix algorithm assembly, it is pre-generated based on permission Shen for loading malicious act information bank and white list library
Please, function call and information export the reachability matrix model of three dimensions;
Decompiling component, for the application program decompiling to be detected received to be formed code file and corresponding power
Configuration file and resource file are limited, and parses the Apply Names of application program, packet name, signing certificate and bibliographic structure;
Malicious act analytic unit, for calling reachability matrix model, from authority application, function call and information output three
With the presence or absence of satisfaction malice in code file, competence profile and the resource file that a dimensional searches and analysis decompiling are formed
The malicious act of any malicious act information in behavioural information library;
Component is dispatched, the application program for malicious act to will be present is labeled as doubtful malicious application and is stored in doubtful malice
Application library is labeled as normal use there will be no the application program of malicious act and is stored in normal use library.
According to one embodiment of present invention, the similarity analysis subsystem further comprises:
Signing certificate matching component, for obtaining the signing certificate of application program to be analyzed from doubtful malicious application library
It is matched with the malicious application sample in malicious application sample database, if the signing certificate is present in malicious application sample database,
The application program is directly then labeled as malicious application and is stored in malicious application library;
First similarity analysis component, for being not present in malicious application sample in the signing certificate of application program to be analyzed
When in this library, the Apply Names of the further progress application program and the similarity analysis of packet name, from malicious application sample database
Find out sample set similar with the Apply Names and packet name;
Second similarity analysis component, for when the first similarity analysis component finds the sample set, by this
Sample in sample set carries out the similarity of bibliographic structure, text file and image file with application program to be analyzed respectively
Analysis calculates similarity value, and when there is the similarity of sample and application program to be analyzed to meet setting value, this is applied journey
Sequence is labeled as malicious application and is stored in malicious application library;
Third similarity analytic unit, for when the first similarity analysis component does not find the sample set or
When the similarity that the second similarity analysis component does not find sample and application program to be analyzed meets setting value, it will dislike
Meaning carries out the phase of bibliographic structure, text file and image file using whole samples in sample database with application program to be analyzed
It is analyzed like degree, calculates similarity value, and when there is the similarity of sample and application program to be analyzed to meet setting value, this is answered
Malicious application is labeled as with program and is stored in malicious application library.
According to one embodiment of present invention, the first similarity analysis component carries out application name using editing distance algorithm
Title and packet name similarity analysis;Catalogue comparison is respectively adopted in the second similarity analysis component and third similarity analytic unit
Method carries out bibliographic structure similarity analysis, text file similarity analysis is carried out using editing distance algorithm, using perceptual hash
Algorithm carries out image file similarity analysis.
Malicious application detection method of the invention and system are run based on inspirational education for inspirational education
The problem of inefficiency, proposes to analyze scanning technique using static behavior, to avoid passing through virtual machine load application execution simultaneously
The performance bottleneck of analysis;By on the basis of inspirational education, increasing similarity analysis process, effectively solves inspirational education and miss
The high problem of report rate;By similarity analysis, fuzzy matching is carried out to the signing certificate of application, title and packet name, then cooperate and answer
The various analysis such as the similarity with code, resource file and bibliographic structure combine, and effectively reduce rate of false alarm, promote identification
Accuracy.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is the structural schematic diagram of the malicious application detection system of one embodiment of the invention;
Fig. 2 is the flow chart of the malicious application detection method of one embodiment of the invention;
Fig. 3 is the flow chart of a specific embodiment of step S210 in Fig. 2;
Fig. 4 is the flow chart of a specific embodiment of step S220 in Fig. 2.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Fig. 1 shows the structural schematic diagram of malicious application detection system 100 according to an embodiment of the invention.Such as Fig. 1
Shown, malicious application detection system 100 is mainly by static inspirational education subsystem 110, similarity analysis subsystem 120, evil
Meaning behavioural information library 130, erroneous judgement information bank 150, malicious application sample database 160, normal use library 170, is doubted at white list library 140
It is constituted like malicious application library 180 and malicious application library 190.Wherein, static inspirational education subsystem 110 and similarity analysis
System 120 is the core of system 100.Static inspirational education subsystem 110 carries out the application program to be detected received
Static code scanning exports three dimensional analysis application programs with the presence or absence of full based on authority application, function call and information
The malicious act of any malicious act information in sufficient malicious act information bank 130, malicious act, then apply journey for this if it exists
Sequence is labeled as doubtful malicious application, if it does not exist malicious act, then the application program is labeled as normal use.It is inspired by static state
The doubtful malicious application that formula scanning subsystem 110 detects, into similarity analysis subsystem 120.Similarity analysis subsystem
120 will be labeled as the application program and malicious application sample database 160 of doubtful malicious application by static inspirational education subsystem 110
In malicious application sample between carry out based on Apply Names, packet name, signing certificate, bibliographic structure, text file and image text
The similarity analysis of part, and the application program that similarity meets setting value is labeled as malicious application.Wherein, malicious act information
Library 130, which is used to export three dimensions according to authority application, function call and information, saves various malicious act information;Malicious application
Sample database 160 is used to store the information of various malicious application samples;Doubtful malicious application library 180 is for saving by static heuristic
Scanning subsystem 110 is labeled as the application program of doubtful malicious application;Information bank 150 is judged by accident for saving not by similarity analysis
Subsystem 120 is labeled as the application program of malicious application;Normal use library 170 is for saving by static inspirational education subsystem
110 are labeled as the application program of normal use and the result mark based on application program in manual analysis erroneous judgement information bank 150
For the application program of normal use;White list library 140 is used to save based on application program in manual analysis erroneous judgement information bank 150
As a result it is labeled as the information of the application program of normal use;Malicious application library 190 is for saving by similarity analysis subsystem 120
It is labeled as the application program of malicious application.
Further as shown in Figure 1, static inspirational education subsystem 110 is by reachability matrix algorithm assembly 111, decompiling group
Part 112, malicious act analytic unit 113 and scheduling component 114 are constituted.Reachability matrix algorithm assembly 111 is used to inspire in static state
When formula scanning subsystem 110 starts, load malicious act information bank 130 and white list library 140 it is pre-generated based on authority application,
Function call and information export the reachability matrix model of three dimensions.Decompiling component 112 is used for static inspirational education
The application program to be detected that system 110 receives carries out APK decompiling, forms Smali code file and corresponding permission is matched
File and resource file are set, and parses the Apply Names of application program, packet name, signing certificate and bibliographic structure.Then, it dislikes
The reachability matrix model that behavioural analysis component 113 of anticipating calls reachability matrix algorithm assembly 111 to generate, from authority application, function tune
With code file, the competence profile for exporting three dimensional searches and analysis 112 decompiling of decompiling component formation with information
And resource file, judge in application program with the presence or absence of any malicious act information met in malicious act information bank 130
Malicious act.When there is the case where meeting a certain malicious act in application program, scheduling component 114 labels it as doubtful evil
Meaning is applied and is stored in doubtful malicious application library 180.When there is no meet in malicious act information bank 130 maliciously to go in application program
For the case where when, scheduling component 114 label it as normal use and be stored in normal use library 170.
Similarity analysis subsystem 120 is for further screening the application program in doubtful malicious application library 180.
After similarity analysis subsystem 120 starts, the sample information in malicious application sample database 160 can be loaded, then from doubtful malice
Application program to be analyzed is obtained in application library 180 carries out similarity analysis.Specifically as shown in Figure 1, similarity analysis subsystem
120 by signing certificate matching component 121, the first similarity analysis component 122, the second similarity analysis component 123 and third phase
It is constituted like degree analytic unit 124.Signing certificate matching component 121 obtains the signing certificate of doubtful malicious application to be analyzed, with
Malicious application sample in malicious application sample database 160 carries out signing certificate matching.If it find that the doubtful malicious application is made
Signing certificate is present in malicious application sample database 160, then the application program is directly labeled as malicious application and deposited
Enter in malicious application library 190, detection terminates.If signing certificate matching is unsatisfactory for, made by the first similarity analysis component 122
With the similarity analysis of such as Apply Names of the editing distance algorithm further progress application program and packet name, from malicious application
Sample set similar with the Apply Names and packet name is found out in sample database 160.If the sample set exists, by the second phase
Like degree analytic unit 123 using the sample set as analyst coverage, by the sample in the sample set respectively with application to be analyzed
Program carries out the similarity analysis of bibliographic structure, text file and image file, calculates similarity value.In specific embodiment, the
Two similarity analytic units 123 are respectively adopted catalogue Comparison Method and carry out bibliographic structure similarity analysis, using editing distance algorithm
Text file similarity analysis is carried out, image file similarity analysis is carried out using perceptual hash algorithm.When hair available sample with
When the similarity of application program to be analyzed meets setting value, then the application program is labeled as malicious application and is stored in maliciously to answer
With library 190.If the first similarity component 122 does not find sample set or the second similarity analysis component 123 is not sent out
When now thering is the similarity of sample and application program to be analyzed to meet setting value in the sample set, then by third similarity analysis
Component 124 by malicious application sample database whole samples and application program to be analyzed carry out bibliographic structure, text file and
The similarity analysis of image file calculates similarity value.Similarly, in specific embodiment, third similarity analytic unit 124 divides
Not Cai Yong catalogue Comparison Method carry out bibliographic structure similarity analysis, text file similarity point is carried out using editing distance algorithm
Analysis carries out image file similarity analysis using perceptual hash algorithm.When third similarity analytic unit 124 send out available sample with
When the similarity of application program to be analyzed meets setting value, which is labeled as malicious application and is stored in malicious application
Library 190.If being not labeled as malicious application through the analysis of third similarity analytic unit 124, corresponding application program is deposited
Enter to judge by accident information bank 150, further artificial treatment is carried out by operation maintenance personnel 300.Judge the application program warp in information bank 150 by accident
After crossing artificial treatment, according to manual analysis as a result, be not malicious application application program be noted as normal use deposit just
Normal application library 170, and the information of the normal use is also stored in white list library 140.If belonging to new evil through manual analysis
Meaning is in application, the application program is noted as malicious application deposit malicious application library 190, while the application program is stored in malice
Using sample database 160.This part work of operation maintenance personnel 300 belongs to daily maintenance work, will carry out for a long time, with maintenance knowledge library
Update.
Malicious application detection system 100 described above is run for inspirational education and is imitated based on inspirational education
The low problem of rate, using static behavior analyze scanning technique, thus avoid passing through virtual machine load application execution and analyze
Performance bottleneck will form Smali code first, to the permission of application in the form of static analysis code after Android application decompiling
Application, function call and information output etc. are analyzed, so that discovery has the application of malicious act.In order to solve inspirational education
The high problem of rate of false alarm in technology, malicious application detection system 100 pass through similarity point in the doubtful malicious application having found
Analysis, by various analysis knots such as Apply Names, packet name, signing certificate, bibliographic structure, text file and image file similarities
It closes, effectively reduces rate of false alarm, promote the accuracy of identification.
Based on malicious application detection system of the invention described above, the present invention also proposes a kind of malicious application detection
Method.Fig. 2 shows the flow charts of malicious application detection method 200 according to an embodiment of the invention.As shown in Fig. 2, should
Malicious application detection method 200 includes the following steps:
In step S210, static code scanning is carried out to the application program to be detected received, is based on authority application, letter
Number calls and information exports three dimensional analysis application programs and whether there is any malice met in malicious act information bank
The malicious act of behavioural information, malicious act, then be labeled as doubtful malicious application for the application program, dislike if it does not exist if it exists
The application program is then labeled as normal use by meaning behavior.
In step S220, the malicious application in the application program and malicious application sample database of doubtful malicious application will be labeled as
The similarity based on Apply Names, packet name, signing certificate, bibliographic structure, text file and image file point is carried out between sample
Analysis, and the application program that similarity meets setting value is labeled as malicious application.
In step S230, the application program deposit erroneous judgement information of malicious application will be not labeled as in the step S220
Library.
In step S240, based on manual analysis erroneous judgement information bank in application program as a result, by the erroneous judgement information bank
Be not malicious application application program be labeled as normal use deposit normal use library, and by the information of the normal use be stored in it is white
List library.
In step S250, based on manual analysis erroneous judgement information bank in application program as a result, by the erroneous judgement information bank
Be malicious application application program be labeled as malicious application deposit malicious application library, and by the malicious application be stored in malicious application sample
This library.
Above-mentioned malicious application detection method of the invention combine static two kinds of technologies of inspirational education and similarity analysis into
The detection of row malicious application, avoids the performance bottleneck for loading application execution by virtual machine and analyzing, effectively reduces rate of false alarm, mention
Rise the accuracy of identification.
Fig. 3 shows a specific reality of static inspirational education step S210 in above-mentioned malicious application detection method 200
Apply the flow chart of example.As shown in figure 3, step S210 specifically comprises the following steps:
In step S211, by the application program decompiling to be detected received formation Smali code file and accordingly
Competence profile and resource file, and parse the Apply Names of application program, packet name, signing certificate and bibliographic structure.
In later step S212, reachability matrix model is called, exports three dimensions from authority application, function call and information
Code file, competence profile and the resource file that scanning and analysis decompiling are formed, judge to whether there is in application program
Meet the malicious act of any malicious act information in malicious act information bank.Wherein, reachability matrix model starts in system
When load malicious act information bank and white list library it is pre-generated.It is a in one specific example, it is carried out by reachability matrix model
The specific algorithm of scanning and analysis is as follows:
The first step constructs basic behavioural information table: construction authority configuration, function call and information output information table, from evil
Meaning takes the content that the corresponding authority application of malicious act, function call and information export three dimensions in behavioural information library respectively
Out, unified basic behavioural information table is configured to after duplicate removal.
Second step, construct malicious act information matrix: the number of the rectangular array is the length of basic behavioural information table, row
Number is the number of malicious act information, and matrix element 0,1 is constituted.
Third step, construct scanning result matrix: the matrix is one-column matrix, and capable length is the length of basic behavioural information table
Degree, by scanning competence profile, Smali code file and the resource file of application to be detected, and with basic behavioural information
Table is matched, and when with a certain matching in the table, it is just 1 that matrix, which corresponds to row, is otherwise 0.
4th step constructs malicious act trip current: being transported by malicious act information matrix and scanning result matrix multiple
It calculates, obtains malicious act trip current, which is row vector, and the number of column is the number of malicious act information.
When the value of column a certain in malicious act trip current is 1, that is, it is corresponding to indicate that the application program has met the column
, that is, there is malicious act in malicious act rule.
In later step S213, the application program that malicious act will be present is labeled as doubtful malicious application and is stored in doubtful evil
Meaning application library, is labeled as normal use there will be no the application program of malicious act and is stored in normal use library.
Fig. 4 shows a specific embodiment of similarity analysis step S220 in above-mentioned malicious application detection method 200
Flow chart.As shown in figure 4, step S220 specifically comprises the following steps:
In step S221, application program to be analyzed is obtained from doubtful malicious application library.
In later step S222, signing certificate and the malice in malicious application sample database for the application program being analysed to are answered
It is matched with sample.
In later step S223, judge that signing certificate used in application program to be analyzed whether there is in malicious application
In sample database.If the signing certificate is present in malicious application sample database, S224 is thened follow the steps, directly by the application program mark
Note is malicious application and is stored in malicious application library that process terminates, no to then follow the steps S225.
In step S225, the similarity analysis of the Apply Names of the further progress application program, packet name, from malicious application
Sample set similar with the Apply Names and packet name is searched in sample database.
In later step S226, judge whether to find sample set similar with the Apply Names and packet name, if finding,
Step S227 is executed, it is no to then follow the steps S228.
In step S227, the sample in the sample set found is subjected to catalogue knot with application program to be analyzed respectively
The similarity analysis of structure, text file and image file calculates similarity value, and is having sample and application program to be analyzed
When similarity meets setting value, which is labeled as malicious application and is stored in malicious application library.Further, if
Similarity in the sample set found without sample and application program to be analyzed meets setting value, then with complete in sample database
Portion's sample is set to execute above-mentioned similarity analysis.
In step S228, for not finding the feelings of sample set similar with the Apply Names of application program and packet name
Condition, by the whole samples and application program progress bibliographic structure, text file and image to be analyzed in malicious application sample database
The similarity analysis of file calculates similarity value, and is having sample and the similarity of application program to be analyzed to meet setting value
When, which is labeled as malicious application and is stored in malicious application library.The malicious application not being marked in step S228
Application program, then be stored into erroneous judgement information bank in, further artificial treatment is carried out by operation maintenance personnel.
In a specific example according to the present invention, the decision rule of similarity analysis are as follows:
1,85% or more code similarity;
2,60% or more text file similarity;
3,75% or more image file similarity;
4,70% or more bibliographic structure similarity.
Meet the above rule, is then judged to adjusting after malicious application, the above parameter can be analyzed according to operation data.
In a specific example according to the present invention, the similarity analysis of bibliographic structure uses catalogue method of comparison, algorithm phase
To relatively simply, based on the bibliographic structure of malicious application sample, directory hierarchy is pressed with the bibliographic structure of application to be analyzed
It compares, calculates the same directory number between application to be analyzed and sample application, divided by the resulting percentage of total directories,
Up to bibliographic structure similarity value.
In a specific example according to the present invention, text file similarity analysis uses editing distance algorithm, i.e. source word
Symbol string, at least needs to can be deformed into target string by how many edit operation, this value is smaller, and supporting paper is more similar.Most
Whole calculating formula of similarity are as follows: (1- editing distance/file size) * 100%.The similarity value of each file is calculated separately, most
Average value is calculated again eventually, the similarity value for final two applications that you can get it.
In a specific example according to the present invention, image file similarity analysis uses perceptual hash algorithm, to two
Picture to be compared of the same name respectively generates one 64 " fingerprint " (fingerprint) character string, then compares two pictures
Fingerprint.As a result closer, just illustrate that picture is more similar.The comparison of " fingerprint " character string uses Hamming distance method, does not distinguish character
Position is compared 64 characters, and the kinds of characters number found is Hamming distance value.Hamming distance value is maximum with 10
Value, illustrates that image is completely dissimilar greater than 10, illustrates that image is similar less than 5.Finally all images are compared and analyzed,
It obtains Hamming distance value, calculates average Hamming distance value, the similarity of image resource is calculated through this.Final similarity calculation is public
Formula are as follows: (1- be averaged Hamming distance value/10) * 100%.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (8)
1. a kind of malicious application detection method, which comprises the steps of:
S1, static code scanning is carried out to the application program to be detected received, is based on authority application, function call and information
Exporting three dimensional analysis application programs whether there is the evil for meeting any malicious act information in malicious act information bank
Meaning behavior, malicious act, then be labeled as doubtful malicious application for the application program, if it does not exist malicious act if it exists, then will
The application program is labeled as normal use;
S2, it will be carried out between the malicious application sample being labeled as in the application program and malicious application sample database of doubtful malicious application
Based on Apply Names, packet name, signing certificate, bibliographic structure, text file and image file similarity analysis, and by similarity
The application program for meeting setting value is labeled as malicious application;
Wherein, the step S2 further comprises:
S21, will be labeled as doubtful malicious application application program signing certificate and malicious application sample database in malicious application
Sample is matched, if the signing certificate is present in malicious application sample database, the application program is directly labeled as malice
Using and be stored in malicious application library;
If S22, the signing certificate are not present in malicious application sample database, the Apply Names of the further progress application program and
The similarity analysis of packet name finds out sample set similar with the Apply Names and packet name from malicious application sample database;
If finding the sample set in S23, step S22, by the sample in the sample set respectively with application program to be analyzed
The similarity analysis of bibliographic structure, text file and image file is carried out, calculates similarity value, and having sample and to be analyzed
When the similarity of application program meets setting value, which is labeled as malicious application and is stored in malicious application library;
If not finding in S24, step S22 does not have the phase of sample with application program to be analyzed in the sample set or step S23
When meeting setting value like degree, by malicious application sample database whole samples and application program to be analyzed carry out bibliographic structure,
The similarity analysis of text file and image file calculates similarity value, and in the phase for having sample with application program to be analyzed
When meeting setting value like degree, which is labeled as malicious application and is stored in malicious application library.
2. malicious application detection method according to claim 1, which is characterized in that the method also includes:
S3, the application program deposit erroneous judgement information bank that malicious application will be not labeled as in the step S2;
S4, the result based on application program in manual analysis erroneous judgement information bank will not be malicious applications in the erroneous judgement information bank
Application program is labeled as normal use deposit normal use library, and the information of the normal use is stored in white list library;
S5, the result based on application program in manual analysis erroneous judgement information bank will be answering for malicious application in the erroneous judgement information bank
It is labeled as malicious application deposit malicious application library with program, and the malicious application is stored in malicious application sample database.
3. malicious application detection method according to claim 2, which is characterized in that the step S1 further comprises:
S11, the application program decompiling to be detected received is formed code file and corresponding competence profile and
Resource file, and parse the Apply Names of application program, packet name, signing certificate and bibliographic structure;
S12, reachability matrix model is called, exports three dimensional searches and the anti-volume of analysis from authority application, function call and information
It translates in the code file to be formed, competence profile and resource file with the presence or absence of any evil met in malicious act information bank
The malicious act for behavioural information of anticipating, wherein the reachability matrix model is preparatory based on malicious act information bank and white list library
It generates;
S13, the application program that malicious act will be present are labeled as doubtful malicious application and are stored in doubtful malicious application library, will not deposit
Normal use is labeled as in the application program of malicious act and is stored in normal use library.
4. malicious application detection method according to claim 2, which is characterized in that the similarity of Apply Names and packet name point
Analysis uses editing distance algorithm, and bibliographic structure similarity analysis uses catalogue Comparison Method, and text file similarity analysis is using volume
Distance algorithm is collected, image file similarity analysis uses perceptual hash algorithm.
5. a kind of malicious application detection system characterized by comprising
Malicious act information bank saves various malice rows for exporting three dimensions according to authority application, function call and information
For information;
Malicious application sample database, for storing the information of various malicious application samples;
Static inspirational education subsystem is based on for carrying out static code scanning to the application program to be detected received
Authority application, function call and information three dimensional analysis application programs of output, which whether there is, to be met in malicious act information bank
Any malicious act information malicious act, malicious act, then be labeled as doubtful malicious application for the application program if it exists,
The application program is then labeled as normal use by malicious act if it does not exist;
Similarity analysis subsystem, for the application of doubtful malicious application will to be labeled as by the static inspirational education subsystem
It is carried out between malicious application sample in program and malicious application sample database based on Apply Names, packet name, signing certificate, catalogue knot
The similarity analysis of structure, text file and image file, and the application program that similarity meets setting value is labeled as malice and is answered
With;
Wherein, the similarity analysis subsystem further comprises:
Signing certificate matching component, for obtaining the signing certificate and evil of application program to be analyzed from doubtful malicious application library
Meaning is matched using the malicious application sample in sample database, if the signing certificate is present in malicious application sample database, directly
It connects and the application program is labeled as malicious application and is stored in malicious application library;
First similarity analysis component, for being not present in malicious application sample database in the signing certificate of application program to be analyzed
When middle, the Apply Names of the further progress application program and the similarity analysis of packet name are found out from malicious application sample database
Sample set similar with the Apply Names and packet name;
Second similarity analysis component, for when the first similarity analysis component finds the sample set, by the sample
Sample in set carries out the similarity point of bibliographic structure, text file and image file with application program to be analyzed respectively
Analysis calculates similarity value, and when there is the similarity of sample and application program to be analyzed to meet setting value, by the application program
It is labeled as malicious application and is stored in malicious application library;
Third similarity analytic unit, for when the first similarity analysis component does not find the sample set or described
When the similarity that second similarity analysis component does not find sample and application program to be analyzed meets setting value, will maliciously it answer
The similarity of bibliographic structure, text file and image file is carried out with application program to be analyzed with whole samples in sample database
Analysis calculates similarity value, and when there is the similarity of sample and application program to be analyzed to meet setting value, this is applied journey
Sequence is labeled as malicious application and is stored in malicious application library.
6. malicious application detection system according to claim 5, which is characterized in that the system also includes:
Doubtful malicious application library, for saving the application for being labeled as doubtful malicious application by the static inspirational education subsystem
Program;
Information bank is judged by accident, for saving the application program for not being labeled as malicious application by the similarity analysis subsystem;
Normal use library, for save by the static inspirational education subsystem be labeled as normal use application program and
Result based on application program in manual analysis erroneous judgement information bank is labeled as the application program of normal use;
White list library is labeled as answering for normal use for saving the result based on application program in manual analysis erroneous judgement information bank
With the information of program;
Malicious application library, for saving the application program for being labeled as malicious application by the similarity analysis subsystem.
7. malicious application detection system according to claim 6, which is characterized in that the static state inspirational education subsystem
Further comprise:
Reachability matrix algorithm assembly, it is pre-generated based on authority application, letter for loading malicious act information bank and white list library
Number calls and information exports the reachability matrix model of three dimensions;
Decompiling component, for matching the application program decompiling to be detected received formation code file and corresponding permission
File and resource file are set, and parses the Apply Names of application program, packet name, signing certificate and bibliographic structure;
Malicious act analytic unit exports three dimensions from authority application, function call and information for calling reachability matrix model
It whether there is in code file, competence profile and the resource file that degree scanning and analysis decompiling are formed and meet malicious act
The malicious act of any malicious act information in information bank;
Component is dispatched, the application program for malicious act to will be present is labeled as doubtful malicious application and is stored in doubtful malicious application
Library is labeled as normal use there will be no the application program of malicious act and is stored in normal use library.
8. malicious application detection system according to claim 6, which is characterized in that the first similarity analysis component is using volume
It collects distance algorithm and carries out Apply Names and packet name similarity analysis;The second similarity analysis component and third similarity analysis
Component is respectively adopted catalogue Comparison Method and carries out bibliographic structure similarity analysis, and it is similar to carry out text file using editing distance algorithm
Degree analysis carries out image file similarity analysis using perceptual hash algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510621631.XA CN106557695B (en) | 2015-09-25 | 2015-09-25 | A kind of malicious application detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510621631.XA CN106557695B (en) | 2015-09-25 | 2015-09-25 | A kind of malicious application detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106557695A CN106557695A (en) | 2017-04-05 |
CN106557695B true CN106557695B (en) | 2019-05-10 |
Family
ID=58414474
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510621631.XA Active CN106557695B (en) | 2015-09-25 | 2015-09-25 | A kind of malicious application detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106557695B (en) |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341401B (en) * | 2017-06-21 | 2019-09-20 | 清华大学 | A kind of malicious application monitoring method and equipment based on machine learning |
CN109214182B (en) * | 2017-07-03 | 2022-04-15 | 阿里巴巴集团控股有限公司 | Method for processing Lesox software in running of virtual machine under cloud platform |
TWI668592B (en) * | 2017-07-28 | 2019-08-11 | 中華電信股份有限公司 | Method for automatically determining the malicious degree of Android App by using multiple dimensions |
CN109670304B (en) * | 2017-10-13 | 2020-12-22 | 北京安天网络安全技术有限公司 | Malicious code family attribute identification method and device and electronic equipment |
CN109714296A (en) * | 2017-10-26 | 2019-05-03 | 中国电信股份有限公司 | Threaten intelligence analysis method and apparatus |
CN108416192A (en) * | 2018-03-01 | 2018-08-17 | 中国工商银行股份有限公司 | A kind of device and method of detection personation enterprise application |
CN109639884A (en) * | 2018-11-21 | 2019-04-16 | 惠州Tcl移动通信有限公司 | A kind of method, storage medium and terminal device based on Android monitoring sensitive permission |
CN111859381A (en) * | 2019-04-29 | 2020-10-30 | 深信服科技股份有限公司 | File detection method, device, equipment and medium |
CN110222511B (en) * | 2019-06-21 | 2021-04-23 | 杭州安恒信息技术股份有限公司 | Malicious software family identification method and device and electronic equipment |
CN110414236B (en) * | 2019-07-26 | 2021-04-16 | 北京神州绿盟信息安全科技股份有限公司 | Malicious process detection method and device |
US11288401B2 (en) * | 2019-09-11 | 2022-03-29 | AO Kaspersky Lab | System and method of reducing a number of false positives in classification of files |
CN110826068B (en) * | 2019-11-01 | 2022-03-18 | 海南车智易通信息技术有限公司 | Safety detection method and safety detection system |
CN111124486A (en) * | 2019-12-05 | 2020-05-08 | 任子行网络技术股份有限公司 | Method, system and storage medium for discovering android application to refer to third-party tool |
CN111310181A (en) * | 2020-02-21 | 2020-06-19 | 广州欢网科技有限责任公司 | Application program processing method, device and system in application store system |
CN111556042B (en) * | 2020-04-23 | 2022-12-20 | 杭州安恒信息技术股份有限公司 | Malicious URL detection method and device, computer equipment and storage medium |
CN112016606A (en) * | 2020-08-20 | 2020-12-01 | 恒安嘉新(北京)科技股份公司 | Detection method, device and equipment for application program APP and storage medium |
CN112632548B (en) * | 2020-12-30 | 2024-01-23 | 北京天融信网络安全技术有限公司 | Malicious android program detection method and device, electronic equipment and storage medium |
CN113435177A (en) * | 2021-07-14 | 2021-09-24 | 上海浦东发展银行股份有限公司 | Target code file package comparison method, device, equipment, medium and system |
CN113779583B (en) * | 2021-11-10 | 2022-02-22 | 北京微步在线科技有限公司 | Behavior detection method and device, storage medium and electronic equipment |
CN114186231B (en) * | 2021-12-10 | 2024-09-20 | 中国电信股份有限公司 | Method and system for detecting gambling APP and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101140611A (en) * | 2007-09-18 | 2008-03-12 | 北京大学 | Malevolence code automatic recognition method |
CN101373501A (en) * | 2008-05-12 | 2009-02-25 | 公安部第三研究所 | Method for capturing dynamic behavior aiming at computer virus |
CN102779257A (en) * | 2012-06-28 | 2012-11-14 | 奇智软件(北京)有限公司 | Security detection method and system of Android application program |
CN103793650A (en) * | 2013-12-02 | 2014-05-14 | 北京邮电大学 | Static analysis method and static analysis device for Android application program |
CN104331662A (en) * | 2013-07-22 | 2015-02-04 | 深圳市腾讯计算机系统有限公司 | Method and device for detecting Android malicious application |
CN104866763A (en) * | 2015-05-28 | 2015-08-26 | 天津大学 | Permission-based Android malicious software hybrid detection method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101402057B1 (en) * | 2012-09-19 | 2014-06-03 | 주식회사 이스트시큐리티 | Analyzing system of repackage application through calculation of risk and method thereof |
-
2015
- 2015-09-25 CN CN201510621631.XA patent/CN106557695B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101140611A (en) * | 2007-09-18 | 2008-03-12 | 北京大学 | Malevolence code automatic recognition method |
CN101373501A (en) * | 2008-05-12 | 2009-02-25 | 公安部第三研究所 | Method for capturing dynamic behavior aiming at computer virus |
CN102779257A (en) * | 2012-06-28 | 2012-11-14 | 奇智软件(北京)有限公司 | Security detection method and system of Android application program |
CN104331662A (en) * | 2013-07-22 | 2015-02-04 | 深圳市腾讯计算机系统有限公司 | Method and device for detecting Android malicious application |
CN103793650A (en) * | 2013-12-02 | 2014-05-14 | 北京邮电大学 | Static analysis method and static analysis device for Android application program |
CN104866763A (en) * | 2015-05-28 | 2015-08-26 | 天津大学 | Permission-based Android malicious software hybrid detection method |
Also Published As
Publication number | Publication date |
---|---|
CN106557695A (en) | 2017-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106557695B (en) | A kind of malicious application detection method and system | |
US11861477B2 (en) | Utilizing machine learning models to identify insights in a document | |
EP4058916B1 (en) | Detecting unknown malicious content in computer systems | |
US20200097601A1 (en) | Identification of an entity representation in unstructured data | |
US11190562B2 (en) | Generic event stream processing for machine learning | |
CN108229170B (en) | Software analysis method and apparatus using big data and neural network | |
Blazquez et al. | Web data mining for monitoring business export orientation | |
CN112749284A (en) | Knowledge graph construction method, device, equipment and storage medium | |
US20210342247A1 (en) | Mathematical models of graphical user interfaces | |
Loyola et al. | UNSL at eRisk 2021: A Comparison of Three Early Alert Policies for Early Risk Detection. | |
Liu et al. | Functions-based CFG embedding for malware homology analysis | |
Zuhair et al. | Phishing classification models: issues and perspectives | |
CN113157871B (en) | News public opinion text processing method, server and medium applying artificial intelligence | |
CN110866172A (en) | Data analysis method for block chain system | |
CN114329455A (en) | User abnormal behavior detection method and device based on heterogeneous graph embedding | |
CN113688346A (en) | Illegal website identification method, device, equipment and storage medium | |
CN111967003A (en) | Automatic wind control rule generation system and method based on black box model and decision tree | |
Mandal et al. | Exploiting aspect-classified sentiments for cyber-crime analysis and hack prediction | |
Haas | Protocol to discover machine-readable entities of the ecosystem management actions taxonomy | |
Guillerme | treats: A modular R package for simulating trees and traits | |
CN113626815A (en) | Virus information identification method, virus information identification device and electronic equipment | |
CN111475812A (en) | Webpage backdoor detection method and system based on data executable characteristics | |
US20240346140A1 (en) | Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program | |
US20240346142A1 (en) | Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program | |
CN118245982B (en) | Method and device for identifying camouflage application program based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |