[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110532776B - Android malicious software efficient detection method, system and medium based on runtime data analysis - Google Patents

Android malicious software efficient detection method, system and medium based on runtime data analysis Download PDF

Info

Publication number
CN110532776B
CN110532776B CN201910836444.1A CN201910836444A CN110532776B CN 110532776 B CN110532776 B CN 110532776B CN 201910836444 A CN201910836444 A CN 201910836444A CN 110532776 B CN110532776 B CN 110532776B
Authority
CN
China
Prior art keywords
app
api
data
calling
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910836444.1A
Other languages
Chinese (zh)
Other versions
CN110532776A (en
Inventor
吕品
乔智
许嘉
李陶深
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University
Original Assignee
Guangxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University filed Critical Guangxi University
Priority to CN201910836444.1A priority Critical patent/CN110532776B/en
Publication of CN110532776A publication Critical patent/CN110532776A/en
Application granted granted Critical
Publication of CN110532776B publication Critical patent/CN110532776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Virology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method, a system and a medium for efficiently detecting Android malicious software based on data analysis during operation, wherein after the APP is operated, a simulator operates a tracking record on the behavior of the APP and generates the operation data of the APP; extracting the running data of the APP through a Heterogeneous Information Network (HIN) to obtain structured data of the running data of the APP, and forming a core matrix in a meta-path mode; and inputting the kernel matrix into a machine learning classifier trained in advance to obtain a detection result. The invention extracts the behavior data of the APP by utilizing a dynamic characteristic extraction technology, carries out structuring processing on the extracted behavior data of the APP through a Heterogeneous Information Network (HIN), forms a kernel matrix by the structured data in a meta-path mode, and trains by using a Support Vector Machine (SVM) classifier, thereby realizing less training time and higher accuracy.

Description

Android malicious software efficient detection method, system and medium based on runtime data analysis
Technical Field
The invention relates to the technical field of software and information security, in particular to a method, a system and a medium for efficiently detecting Android malicious software based on runtime data analysis.
Background
As a mobile platform with the highest market share, the Android system constructs an open ecosystem. Its openness has promoted the prosperity of the application market, but also has brought a great security threat to users due to the flooding of malicious software. A2018 Android malicious software topic report issued by a 360 Internet security center shows that: in 2018, in all years, a 360-degree internet security center captures about 434.2 million newly-added malware samples of a mobile terminal, about 1.2 million newly-added malware samples are added on average every day, the infection amount of the mobile terminal malware is monitored accumulatively and is about 1.1 hundred million people times, and the infection amount of the mobile terminal malware is about 29.2 million people times every day. Android malware detection has become a problem generally concerned by the industrial and academic circles, and the method for efficiently detecting the malware for the Android system research has very important significance.
At present, Android malicious software detection technologies can be roughly divided into two types, namely a static feature extraction technology and a dynamic feature extraction technology. Most of research work of static feature extraction is to perform decompilation on an APP and analyze a decompilated code. Some common open source tools are also commonly used for static feature analysis. Research on dynamic feature extraction focuses on monitoring APP behavioral data related to user privacy or sensitive API calls. Whether static feature extraction or dynamic feature extraction is adopted, the detection of the malicious software can be realized by combining a classification algorithm and the like.
At present, Android malicious software detection technologies can be roughly divided into two types, namely a static sign extraction technology and a dynamic feature extraction technology. In the static sign extraction, mostly, the APP is decompiled, and the decompiled code is analyzed. Dynamic feature extraction focuses on monitoring APP behavioral data related to user privacy or sensitive API calls. The training time is too long due to the fact that the amount of useless information of the static feature extraction is large. Therefore, how to reduce the time for training the model as much as possible on the premise of keeping a higher recognition accuracy is a key technical problem to be solved urgently at present.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a method, a system and a medium for efficiently detecting Android malicious software based on runtime data analysis.
In order to solve the technical problems, the invention adopts the technical scheme that:
an Android malicious software efficient detection method based on runtime data analysis comprises the following implementation steps:
1) acquiring a package name and a starting page name of an APP;
2) after running the APP based on the package name and the starting page name, simulating behavior operation of a person on the APP through an operation simulation tool, tracking and recording and generating operation data of the APP, wherein the operation data of the APP comprises calling of the APP to an API and calling information of the API to the API;
3) extracting the running data of the APP through a Heterogeneous Information Network (HIN) to obtain structured data of the running data of the APP, and forming a core matrix by the structured data of the running data of the APP in a meta-path mode; the heterogeneous information network HIN comprises two node types and two edge types, wherein the two node types are APP and API, the two edges are calling of the APP to the API and calling of the API to the API, and the relationship is formed by the calling times of the APP to the API and the calling times of the API to the API;
4) and inputting the core matrix into a pre-trained machine learning classifier to obtain a detection result of whether the APP is the malicious software, wherein the machine learning classifier establishes a mapping relation between the structured data of the operation data of the APP and the detection result of whether the APP is the malicious software through pre-training.
Optionally, the obtaining of the package name and the start page name of the APP in step 1) specifically means obtaining the package name and the start page name of the APP by decompiling the APP.
Optionally, running the APP in step 2) specifically means running the APP in a virtual machine.
Optionally, extracting the running data of the APP through the heterogeneous information network HIN in step 3) to obtain the structured data of the running data of the APP specifically means filtering out the call of the APP to the API, and the API does not call other APIs again, so that all the remaining relationships are API call sequences.
Optionally, the machine learning classifier in step 4) is a support vector machine classifier.
Optionally, step 4) is preceded by a step of training a support vector machine classifier, and the detailed steps include:
s1) extracting corresponding core matrixes respectively by executing the steps 1) to 3) aiming at various common APPs and malicious APPs, and attaching common or malicious labels to the obtained core matrixes, so as to establish a training sample data set and a test sample set;
s2) training the support vector machine classifier based on the training sample data in the training sample data set, and skipping to execute the next step after finishing the training in a specified amount or time;
s3) testing the support vector machine classifier based on the test sample data in the test sample set to obtain the classification accuracy of the support vector machine classifier after the training is finished;
s4) judging whether a training termination condition is met, wherein the training termination condition is that training of a specified amount or time is completed or the classification accuracy reaches a preset threshold value; skipping to execute step S2) if the training termination condition is not satisfied, otherwise ending and exiting if the training termination condition is satisfied.
In addition, the invention also provides an Android malicious software efficient detection system based on runtime data analysis, which comprises the following steps:
the package information acquisition program unit is used for acquiring the package name and the starting page name of the APP;
the operation data acquisition program unit is used for simulating behavior operation of a person on the APP through an operation simulation tool after the APP is operated, tracking and recording and generating operation data of the APP, wherein the operation data of the APP comprises calling of the APP to the API and calling information of the API to the API;
the structured data acquisition program unit is used for extracting the running data of the APP through a Heterogeneous Information Network (HIN) to obtain structured data of the running data of the APP, and forming a core matrix by the structured data of the running data of the APP in a meta-path mode; the heterogeneous information network HIN comprises two node types and two edge types, wherein the two node types are APP and API, the two edges are calling of the APP to the API and calling of the API to the API, and the relationship is formed by the calling times of the APP to the API and the calling times of the API to the API;
and the result classification program unit is used for inputting the core matrix into a pre-trained machine learning classifier to obtain a detection result of whether the APP is the malicious software, and the machine learning classifier establishes a mapping relation between the structured data of the operation data of the APP and the detection result of whether the APP is the malicious software through pre-training.
In addition, the invention also provides an Android malicious software efficient detection system based on runtime data analysis, which comprises computer equipment, wherein the computer equipment is programmed or configured to execute the steps of the Android malicious software efficient detection method based on runtime data analysis.
In addition, the invention also provides an Android malicious software efficient detection system based on runtime data analysis, which comprises computer equipment, wherein a storage medium of the computer equipment stores a computer program which is programmed or configured to execute the Android malicious software efficient detection method based on runtime data analysis.
In addition, the present invention also provides a computer readable storage medium having stored thereon a computer program programmed or configured to execute the runtime data analysis-based Android malware efficient detection method.
Compared with the prior art, the invention has the following advantages: the invention extracts the behavior data of the APP by utilizing a dynamic characteristic extraction technology, carries out structuring processing on the extracted behavior data of the APP through a Heterogeneous Information Network (HIN), forms a kernel matrix by the structured data in a meta-path mode, and trains by using a Support Vector Machine (SVM) classifier, thereby realizing less training time and higher accuracy. The feature extraction technology of the invention is a bright spot, which is different from the traditional dynamic feature extraction software, for example, TaintDroid and DroidBox only simulate the running of APP on a virtual machine, and observe whether malicious behaviors, such as file read-write operation, SMS short message and telephone information, occur in the virtual machine. The method provided by the invention focuses on the calling sequence of the API in the software, and can realize extraction of the calling condition of the API in the dynamic operation process of the APP based on tracking software such as TraceView and the like of Android. In addition, the invention also combines the heterogeneous information network and the malicious software monitoring, and utilizes the heterogeneous information network to represent the dynamically extracted information, thereby strengthening the relevance between data, making the malicious software more difficult to escape the detection and improving the monitoring accuracy. The training efficiency is also not achieved by the existing malicious software monitoring technology, the training time of the model is greatly reduced on the premise of keeping a higher accuracy, and the method is a more efficient and practical Android malicious software detection method.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of the training and using principles of the method according to the embodiment of the present invention.
FIG. 3 is an example of the content of a portion of an HTML document containing APP running data extracted according to an embodiment of the present invention.
FIG. 4 is an HTML document of a call relationship between APIs that generate APP running data in an embodiment of the present invention.
FIG. 5 is a diagram illustrating filtering of operational data for APP generation in an embodiment of the present invention.
FIG. 6 is a schematic diagram of training test time comparison between the method of the embodiment of the present invention and the prior art method.
FIG. 7 is a diagram illustrating the comparison between the occupancy rates of the memory and the CPU in the method of the embodiment of the present invention and the existing method.
FIG. 8 is a graphical illustration comparing the API quantities of an embodiment method of the present invention and a prior art method.
Detailed Description
As shown in fig. 1 and fig. 2, the implementation steps of the Android malware efficient detection method based on runtime data analysis in the embodiment include:
1) acquiring a package name and a start page name (Activity) of an APP (Android software);
2) after running the APP based on the package name and the starting page name, simulating behavior operation of a person on the APP through an operation simulation tool, tracking and recording and generating operation data of the APP, wherein the operation data of the APP comprises calling of the APP to an API and calling information of the API to the API;
3) extracting the running data of the APP through a Heterogeneous Information Network (HIN) to obtain structured data of the running data of the APP, and forming a core matrix by the structured data of the running data of the APP in a meta-path mode; the heterogeneous information network HIN comprises two node types and two edge types, wherein the two node types are APP and API, the two edges are calling of the APP to the API and calling of the API to the API, and the relationship is formed by the calling times of the APP to the API and the calling times of the API to the API;
4) and inputting the core matrix into a pre-trained machine learning classifier to obtain a detection result of whether the APP is the malicious software, wherein the machine learning classifier establishes a mapping relation between the structured data of the operation data of the APP and the detection result of whether the APP is the malicious software through pre-training.
In order to monitor the behavior of the APP with TraceView, it is necessary to know which processes to monitor, and the packet name of the APP can be used to distinguish the processes. In addition, in order to start the APP, the start page name of the APP needs to be known. In this embodiment, the obtaining of the package name and the start page name of the APP in step 1) specifically means obtaining the package name and the start page name of the APP by performing decompiling on the APP. As a specific implementation manner, a decompiling tool Apktool can be used to decompile the APP, and the obtained android document includes authority information and registered page information. And extracting the package name and the starting page name in the android manifest document. For example, the start page of an APP is an Activity component in a document that contains "identity.
In this embodiment, running the APP in step 2) specifically means running the APP in the virtual machine, and the security of the environment can be improved by means of the virtual machine, and the detection accuracy of the running data can also be improved.
In this embodiment, the following tools are used in step 2):
1. the Monkey tool is used as an operation simulation tool to simulate behavior operation of a human on the APP, and in the embodiment, the Monkey tool is used to randomly generate 50 times of operations including page switching, trackball and other random operations;
2. and the Android self-contained TraceView is used for starting trace monitoring and monitoring the running data of the APP in real time.
3. And the Dtracedump tool is used for converting the generated binary trace file into an HTML file so as to write Python script to extract the running data of the APP.
FIG. 3 is a partial content example of an HTML document containing APP running data extracted in the present embodiment. As can be seen from fig. 3, in this embodiment, the APP operation data includes information of calls of the APP to the API and calls of the API to the API, and all call relationships are organized into a tree, where the API No. 6 is "(topevel)", that is, a tree root, and may be understood as an APP node, so that the APIs No. 0, 1, 2, 3, 4, and 5 below 6 are all called by the APP, and these information are extracted. Line 3 in fig. 3, 112/112 represents API No. 0 being called 112 times by APP, API being called 112 times in total. Line 4 in fig. 3, 1/3 represents API No. 1 being called by APP 1 times for a total of 3 API calls. FIG. 4 is an HTML document illustrating the call relationship between APIs, wherein: API No. 2 under API No. 1 on line 3 means API No. 1 calls API No. 2; line 6 at 1/2 shows that API number 2 was called 1 time by API number 1. The feature extraction part of the death wedding extracts the APP and the API and the calling relationship graph between the API and the APP.
In this embodiment, the APP can be opened in the virtual machine by specifically using the ADB command, the packet name and the start Activity, then the monitoring of the APP is started by using TraceView, the Trace monitoring is finished after the APP is randomly operated for 50 times by using a Monkey tool, and a Trace file is extracted. The Trace file can only see some called APIs at this time, and the calling relation between the calling times and the APIs needs further analysis. Therefore, the DmTracedump is used to analyze the Trace to generate an HTML document. After obtaining the HTML document, the Python script is used to construct the heterogeneous information network HIN in this embodiment to obtain the features to be extracted for training from the HTML document.
In the heterogeneous information network HIN constructed in this embodiment, a node is composed of an APP and an API, and the information of the relationship is composed of the number of calls of the APP to the API and the number of calls between the API and the API. The method is combined with the characteristic that the semantic high abstraction of the heterogeneous information network HIN, and is utilized in the detection of the malicious software. After the heterogeneous information network is constructed, the embodiment measures the similarity of each APP by using the meta path, so as to detect the malware. In this embodiment, extracting the running data of the APP through the heterogeneous information network HIN in step 3) to obtain the structured data of the running data of the APP specifically means filtering out the call of the APP to the API, and the API does not call other APIs again, so that all the remaining relationships are API call sequences. Since the API that does not include the context call cannot express whether the APP is malicious in an actual experiment, as the API2, the API3, and the API4 in fig. 5 are all called only by the APP and do not have other calling information, and it is impossible to say that the APP is malicious by calling an API only by the APP, the present embodiment filters out these information (the filtering target is the call of the APP to the API and the API does not call other APIs any more), so that the overall technology becomes more efficient.
In this embodiment, the machine learning classifier in step 4) is a support vector machine classifier SVM, the meta-path is used to describe the formed heterogeneous information network, and the kernel matrix formed by the meta-path is input to the support vector machine classifier SVM for training. Step 4) also comprises a step of training the support vector machine classifier, and the detailed steps comprise:
s1) extracting corresponding core matrixes respectively by executing the steps 1) to 3) aiming at various common APPs and malicious APPs, and attaching common or malicious labels to the obtained core matrixes, so as to establish a training sample data set and a test sample set;
s2) training the support vector machine classifier based on the training sample data in the training sample data set, and skipping to execute the next step after finishing the training in a specified amount or time;
s3) testing the support vector machine classifier based on the test sample data in the test sample set to obtain the classification accuracy of the support vector machine classifier after the training is finished;
s4) judging whether a training termination condition is met, wherein the training termination condition is that training of a specified amount or time is completed or the classification accuracy reaches a preset threshold value; skipping to execute step S2) if the training termination condition is not satisfied, otherwise ending and exiting if the training termination condition is satisfied.
In order to verify the Android malware efficient detection method based on runtime data analysis in this embodiment, an existing hind root method is selected and compared with the Android malware efficient detection method based on runtime data analysis (DyFex method for short) in this embodiment. The HinDroid method is based on static feature extraction, and analyzes the calling relationship between APIs of the APP installation files by decompiling the APP installation files. In this embodiment, the Android malware efficient detection method based on runtime data analysis is to collect data in the runtime of an APP and extract a call sequence of an API from the data. The comparison shows that the accuracy of the Android malicious software efficient detection method based on the runtime data analysis can reach 95.6%, and the accuracy of the comparison Hindroid method is 98.6%, which shows that the accuracy of the Android malicious software efficient detection method is very close to that of the Hindroid method. In addition, on the premise that a HinDroid method and an Android malware efficient detection method (DyFex method for short) based on runtime data analysis in the embodiment keep a similar accuracy, three aspects of comparison are performed from training test time (as shown in fig. 6), storage and CPU occupancy rates (as shown in fig. 7) and API number (as shown in fig. 8). As can be seen from fig. 6, 7 and 8, from the perspective of the training time and the testing time: the Android malware efficient detection method based on runtime data analysis in the embodiment is greatly reduced compared with the HinDroid method, and is only about 40% of the HinDroid method, and the number of APIs extracted by the Android malware efficient detection method based on runtime data analysis in the embodiment is also only 36.6% of the HinDroid method. Therefore, it can be proved that the training time is greatly reduced on the premise that the high recognition accuracy is maintained by the Android malicious software efficient detection method based on the runtime data analysis.
In addition, this embodiment also provides an Android malware efficient detection system based on runtime data analysis, including:
the package information acquisition program unit is used for acquiring the package name and the starting page name of the APP;
the operation data acquisition program unit is used for simulating behavior operation of a person on the APP through an operation simulation tool after the APP is operated, tracking and recording and generating operation data of the APP, wherein the operation data of the APP comprises calling of the APP to the API and calling information of the API to the API;
the structured data acquisition program unit is used for extracting the running data of the APP through a Heterogeneous Information Network (HIN) to obtain structured data of the running data of the APP, and forming a core matrix by the structured data of the running data of the APP in a meta-path mode; the heterogeneous information network HIN comprises two node types and two edge types, wherein the two node types are APP and API, the two edges are calling of the APP to the API and calling of the API to the API, and the relationship is formed by the calling times of the APP to the API and the calling times of the API to the API;
and the result classification program unit is used for inputting the core matrix into a pre-trained machine learning classifier to obtain a detection result of whether the APP is the malicious software, and the machine learning classifier establishes a mapping relation between the structured data of the operation data of the APP and the detection result of whether the APP is the malicious software through pre-training.
In addition, the embodiment also provides an Android malware efficient detection system based on runtime data analysis, which includes a computer device programmed or configured to execute the steps of the aforementioned Android malware efficient detection method based on runtime data analysis according to the embodiment.
In addition, the embodiment also provides an Android malware efficient detection system based on runtime data analysis, which includes a computer device, where a storage medium of the computer device stores a computer program that is programmed or configured to execute the aforementioned Android malware efficient detection method based on runtime data analysis according to the embodiment.
In addition, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, which is programmed or configured to execute the foregoing runtime data analysis-based Android malware efficient detection method according to the present embodiment.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (9)

1. A runtime data analysis-based Android malicious software efficient detection method is characterized by comprising the following implementation steps:
1) acquiring a package name and a starting page name of an APP;
2) after running the APP based on the package name and the starting page name, simulating behavior operation of a person on the APP through an operation simulation tool, tracking and recording and generating operation data of the APP, wherein the operation data of the APP comprises calling of the APP to an API and calling information of the API to the API;
3) extracting the running data of the APP through a Heterogeneous Information Network (HIN) to obtain structured data of the running data of the APP, and forming a core matrix by the structured data of the running data of the APP in a meta-path mode; the heterogeneous information network HIN comprises two node types and two edge types, wherein the two node types are APP and API, the two edges are calling of the APP to the API and calling of the API to the API, and the relationship is formed by the calling times of the APP to the API and the calling times of the API to the API; extracting the running data of the APP through the heterogeneous information network HIN to obtain the structured data of the running data of the APP specifically means filtering out the calling of the APP to the API, wherein the API does not call the relationships of other APIs, and the rest relationships are all API calling sequences;
4) and inputting the core matrix into a pre-trained machine learning classifier to obtain a detection result of whether the APP is the malicious software, wherein the machine learning classifier establishes a mapping relation between the structured data of the operation data of the APP and the detection result of whether the APP is the malicious software through pre-training.
2. The runtime data analysis-based Android malware efficient detection method as claimed in claim 1, wherein the obtaining of the package name and the start page name of the APP in step 1) specifically means obtaining of the package name and the start page name of the APP by decompiling the APP.
3. The runtime data analysis-based Android malware efficient detection method according to claim 1, wherein running the APP in step 2) specifically means running the APP in a virtual machine.
4. The efficient Android malware detection method based on runtime data analysis according to any one of claims 1-3, wherein the machine learning classifier in step 4) is a support vector machine classifier.
5. The runtime data analysis-based Android malware efficient detection method according to claim 4, characterized in that step 4) is preceded by a step of training a support vector machine classifier, and the detailed steps include:
s1) extracting corresponding core matrixes respectively by executing the steps 1) to 3) aiming at various common APPs and malicious APPs, and attaching common or malicious labels to the obtained core matrixes, so as to establish a training sample data set and a test sample set;
s2) training the support vector machine classifier based on the training sample data in the training sample data set, and skipping to execute the next step after finishing the training in a specified amount or time;
s3) testing the support vector machine classifier based on the test sample data in the test sample set to obtain the classification accuracy of the support vector machine classifier after the training is finished;
s4) judging whether a training termination condition is met, wherein the training termination condition is that training of a specified amount or time is completed or the classification accuracy reaches a preset threshold value; skipping to execute step S2) if the training termination condition is not satisfied, otherwise ending and exiting if the training termination condition is satisfied.
6. An Android malware efficient detection system based on runtime data analysis is characterized by comprising:
the package information acquisition program unit is used for acquiring the package name and the starting page name of the APP;
the operation data acquisition program unit is used for simulating behavior operation of a person on the APP through an operation simulation tool after the APP is operated, tracking and recording and generating operation data of the APP, wherein the operation data of the APP comprises calling of the APP to the API and calling information of the API to the API;
the structured data acquisition program unit is used for extracting the running data of the APP through a Heterogeneous Information Network (HIN) to obtain structured data of the running data of the APP, and forming a core matrix by the structured data of the running data of the APP in a meta-path mode; the heterogeneous information network HIN comprises two node types and two edge types, wherein the two node types are APP and API, the two edges are calling of the APP to the API and calling of the API to the API, and the relationship is formed by the calling times of the APP to the API and the calling times of the API to the API; extracting the running data of the APP through the heterogeneous information network HIN to obtain the structured data of the running data of the APP specifically means filtering out the calling of the APP to the API, wherein the API does not call the relationships of other APIs, and the rest relationships are all API calling sequences;
and the result classification program unit is used for inputting the core matrix into a pre-trained machine learning classifier to obtain a detection result of whether the APP is the malicious software, and the machine learning classifier establishes a mapping relation between the structured data of the operation data of the APP and the detection result of whether the APP is the malicious software through pre-training.
7. An Android malware efficient detection system based on runtime data analysis, comprising a computer device, characterized in that the computer device is programmed or configured to execute the steps of the Android malware efficient detection method based on runtime data analysis of any one of claims 1 to 5.
8. An Android malware efficient detection system based on runtime data analysis, comprising a computer device, wherein a storage medium of the computer device stores a computer program programmed or configured to execute the Android malware efficient detection method based on runtime data analysis according to any one of claims 1 to 5.
9. A computer-readable storage medium, wherein the computer-readable storage medium has stored thereon a computer program programmed or configured to execute the runtime data analysis-based Android malware efficient detection method of any one of claims 1-5.
CN201910836444.1A 2019-09-05 2019-09-05 Android malicious software efficient detection method, system and medium based on runtime data analysis Active CN110532776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910836444.1A CN110532776B (en) 2019-09-05 2019-09-05 Android malicious software efficient detection method, system and medium based on runtime data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910836444.1A CN110532776B (en) 2019-09-05 2019-09-05 Android malicious software efficient detection method, system and medium based on runtime data analysis

Publications (2)

Publication Number Publication Date
CN110532776A CN110532776A (en) 2019-12-03
CN110532776B true CN110532776B (en) 2021-08-27

Family

ID=68667279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910836444.1A Active CN110532776B (en) 2019-09-05 2019-09-05 Android malicious software efficient detection method, system and medium based on runtime data analysis

Country Status (1)

Country Link
CN (1) CN110532776B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149124B (en) * 2020-11-02 2022-04-29 电子科技大学 Android malicious program detection method and system based on heterogeneous information network
CN113742727B (en) * 2021-08-27 2024-11-01 恒安嘉新(北京)科技股份公司 Program identification model training and program identification method, device, equipment and medium
CN114756860A (en) * 2022-02-22 2022-07-15 广州大学 Malicious software detection method based on meta-path

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108616A (en) * 2017-12-19 2018-06-01 努比亚技术有限公司 Malicious act detection method, mobile terminal and storage medium
CN109711163A (en) * 2018-12-26 2019-05-03 西安电子科技大学 Android malware detection method based on API Calls sequence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108616A (en) * 2017-12-19 2018-06-01 努比亚技术有限公司 Malicious act detection method, mobile terminal and storage medium
CN109711163A (en) * 2018-12-26 2019-05-03 西安电子科技大学 Android malware detection method based on API Calls sequence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A Malware Detection System Based on Heterogeneous Information Network;Shang-Nan Yin;《RACS’18:Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems》;20181012;第154-159页 *

Also Published As

Publication number Publication date
CN110532776A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
CN105989283B (en) A kind of method and device identifying virus mutation
Hsien-De Huang et al. R2-d2: Color-inspired convolutional neural network (cnn)-based android malware detections
CN108334781B (en) Virus detection method, device, computer readable storage medium and computer equipment
CN107590388B (en) Malicious code detection method and device
CN109509021B (en) Behavior track-based anomaly identification method and device, server and storage medium
CN105069355B (en) The static detection method and device of webshell deformations
CN110532776B (en) Android malicious software efficient detection method, system and medium based on runtime data analysis
CN106557695A (en) A kind of malicious application detection method and system
CN109271788B (en) Android malicious software detection method based on deep learning
CN110795732A (en) SVM-based dynamic and static combination detection method for malicious codes of Android mobile network terminal
CN109992968A (en) Android malicious act dynamic testing method based on binary system dynamic pitching pile
CN109214178B (en) APP application malicious behavior detection method and device
CN113468524B (en) RASP-based machine learning model security detection method
CN112688966A (en) Webshell detection method, device, medium and equipment
CN113901465A (en) Heterogeneous network-based Android malicious software detection method
CN114090406A (en) Electric power Internet of things equipment behavior safety detection method, system, equipment and storage medium
CN114491523A (en) Malicious software detection method and device, electronic equipment, medium and product
Bernardi et al. A fuzzy-based process mining approach for dynamic malware detection
CN109902487B (en) Android application malicious property detection method based on application behaviors
CN114285587A (en) Domain name identification method and device and domain name classification model acquisition method and device
WO2016127037A1 (en) Method and device for identifying computer virus variants
CN112749387A (en) Sandbox-based malicious behavior analysis method
CN108563950B (en) Android malicious software detection method based on SVM
CN115982719A (en) Knowledge graph-based artificial intelligence intrusion and attack simulation system
CN111459774A (en) Method, device and equipment for acquiring flow of application program and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant