OpCode-Level Function Call Graph Based Android Malware Classification Using Deep Learning
<p>The structure of LSTM.</p> "> Figure 2
<p>The framework of our proposed Android malware detection method. (<b>a</b>) Training phase; (<b>b</b>) Detection phase.</p> "> Figure 3
<p>Overview of the generation of the FCG process.</p> "> Figure 4
<p>Illustrating the generating OpCode sequences based on FCG from a (training) app.</p> "> Figure 5
<p>The network structure of the LSTM-based model.</p> "> Figure 6
<p>The training framework of the SVM-based classifier.</p> "> Figure 7
<p>The result of different values of max_path and max_sequence. (<b>a</b>) The result of max_path; (<b>b</b>) The result of max_sequence.</p> "> Figure 8
<p>Illustrating the generating features from API name, sensitive API name, and OpCode sequence of sensitive FCG. (<b>a</b>) Generating features from API name; (<b>b</b>) Generating features from sensitive API name; (<b>c</b>) Generating features from the OpCode sequence of sensitive FCG.</p> "> Figure 9
<p>The result of different classification algorithms using different features.</p> "> Figure 10
<p>Comparative results of our proposed method and the two competing methods.</p> ">
Abstract
:1. Introduction
- We propose to combine deep learning and FCG graph analysis for Android malware classification.
- We use the OpCode-level FCG which can better capture the intrinsic association between code and get higher semantic link information to fine-grain characterize behaviors of Android programs and construct the Android malware discriminant model based on LSTM, which fully considers the relevance and temporal of malicious program behaviors.
2. Related Work
2.1. Static Analysis
2.2. Dynamic Analysis
2.3. Hybrid Analysis
3. Preliminaries
4. Proposed Detection Method
4.1. The Training Phase
4.1.1. Constructing Dataset
- Sub-step 1: Using the VirusTotal website that integrates 88 virus detection engines to check all samples for binary results (malware or not).
- Sub-step 2: For malicious samples, since each virus detection engine has different naming rules for malicious families, we use the report obtained from Virustotal as the input of the tool Euphony, and obtained a standardized intermediate result (each detection engine determines what type the sample is).
- Sub-step 3: The detection engines on VirusTotal are scored according to the AVTEST website’s evaluation results in January 2019. The higher the engine detection effect, the greater the weight. Thus, we set the weights of prominent engines to an arithmetic progression. Weights of prominent detection engines are shown in Table 1. The weight of the remaining engines is 0.
- Sub-step 4: Choosing the largest result of the final score and mapping it to our classification category. The formula is expressed as:
4.1.2. Pre-Processing
- Process 1: Transforming Android apks into OpCode representations. This process has two sub-steps, which is highlighted below.
- -
- Sub-step 1: Using unzip to decompress .apk files to obtain their .dex files. This step aims to get the classes.dex file by decompressing the Android .apk file.
- -
- Sub-step 2: This step aims to get Dalvik code from the corresponding APK using disassemble tools (e.g., dedexer [31]). This is necessary in order to use OpCode-level features.
- Process 2: Generating the FCG of different training programs through the FlowDroid tool, which automatically extracts API calls from decompiled smali codes and converts to static execution sequences of corresponding API calls. This process is highlighted in Figure 3. The FCG is denoted by , where represents the set of functions called by Android malware sample, is the set of function call relationships, and the edge indicates that there is a function call relationship from the caller to the caller .
4.1.3. Generating OpCode Sequences Based on FCG
4.1.4. Transforming into a Vector
4.1.5. Training an LSTM-Based Neural Network
- Sub-step 1: Statistics and processing of opcode sequences. Counting the vocabulary and using the lexical word frequency to number the vocabulary, adding the default vocabulary and dealing with the special vocabulary, etc..
- Sub-step 2: Building (data, tag) pairs. Constructing skip_gram training data. skip_gram is a kind of data constructor, which utilizes the center word for predicting their surrounding words.
- Sub-step 3: Establishing the Word2Vec neural network, and using nce_loss as the loss function to train. When the training is completed, the parameters of the network are an embedded matrix.
- Sub-step 4: Transforming into a word embedding format through the obtained embedded matrix.
4.2. The Detection Phase
- Step 1: Transforming target apps into OpCode sequences based on FCG and vectors. It has four sub-steps:
- -
- Sub-step 1: Obtaining the Dalvik code from the target apps (similar to 4.1.2. Process 1 ).
- -
- Sub-step 2: Extracting API function calls from the target apps (similar to 4.1.2. Process 2).
- -
- Sub-step 3: Assembling the program slices into OpCode sequences based on FCG (similar to 4.1.3).
- -
- Sub-step 4: Transforming OpCode sequences based on FCG into vectors (similar to 4.1.4).
- Step 2: Detection. This step uses the learned LSTM-based neural network to classify the vectors corresponding to the OpCode sequences based on FCG that are extracted from the target apps. When a vector is classified as “1” (i.e., “Trojan”), it means that the corresponding app is malware and the category is “Trojan”. When a vector is classified as “2” (i.e., “Adware”), it means that the corresponding app is malware and the category is “Adware”. Otherwise, the corresponding OpCode sequences based on FCG is classified as “0” (i.e., “Normal”).
5. Experimental Results Analysis
5.1. Dataset
5.2. Experimental Settings
- Step 1: Transforming OpCode sequence based on FCG of the app into vector representations, which has three sub-steps.
- -
- Sub-step 1: Generating feature dictionary. Counting the OpCode that has appeared in the training dataset, and using the OpCode set as a feature set.
- -
- Sub-step 2: Getting the vector representations of different apps. Counting times of different OpCode appears in different apps and using them as the corresponding feature values.
- -
- Sub-step 3: Normalizing frequency features. This is essential to prevent the impact of enormous frequency feature differences on model training due to differences in program size.
- Step 2: Establishing the SVM-based classifier and adopting the grid search and cross-validation to adjust parameters.
5.3. Evaluation Metrics
5.4. Results
- Name of ordinary function call:Bag-of-Words + svm: NOFC-BoW&SVM.
- Name of sensitive function call:Bag-of-Words + SVM: NSFC-BoW&SVM.
- Name of ordinary function call:Deep learning model-LSTM: NOFC-LSTM.
- Name of sensitive function call:Deep learning model-LSTM: NSFC-LSTM.
- Opcode sequence of ordinary function order:Bag-of-Words + SVM: OOFO-BoW&SVM.
- Opcode sequence of sensitive function order:Bag-of-Words + SVM: OSFO-BoW&SVM.
- Opcode sequence of ordinary function order:Deep learning model-LSTM: OOFO-LSTM.
- Opcode sequence of sensitive function order:Deep learning model-LSTM: OSFO-LSTM.
- The detection accuracy of the malware recognition method NSFC-LSTM has no significant improvement compared to NOFC-LSTM. However, the detection accuracy of the detection method OOFO-LSTM is superior to OSFO-LSTM. Selecting only sensitive features destroys sequence information because a malicious behavior is usually formed by a series of common operations and sensitive operations. Even sensitive operations are built on normal operations, LSTM cannot just build a perfect sequence model through sensitive operations, resulting in poor results. Thus, LSTM cannot just build a perfect sequence model through sensitive operations.
- The detection accuracy of the malware recognition method NSFC-BoW&SVM has no significant improvement compared to NOFC-BoW&SVM. The detection accuracy of the detection method OOFO-BoW&SVM is also superior to OSFO-BoW&SVM.
- Using the sensitive functions to prune the path does not optimize the classification effect. Conversely, features from opcode sequences based on function order have a better classification result than that using path pruning because pruning path often destroys sequence information of malicious behavior, which is usually formed by a series of common operations and sensitive operations.
- For features from the opcode order based on the function call, the LSTM model has better performance than the BoW&SVM model. For features from the function call path text, the BoW&SVM model achieves a better experimental result than LSTM model. For opcode features: The opcode features have a long chain of dependencies, and the LSTM is designed to solve long-dependency problems. However, the number of Android opcodes in the BoW model is too small, resulting in too small feature dimensions and insufficient representation capabilities. For function name features: When LSTM processes each path, most of the path sequence length is short, which can not play the advantage of LSTM in long sequence processing, and the BoW model is better at this text type feature.
- Combining graph analysis (function call graph) with program analysis (Dalvik opcode) can achieve better experimental results than graph analysis alone or program analysis alone. The function call graph is used to structure the program, which has strong anti-killing instruction level confusion resistance. Using the Dalvik opcode can describe the program behavior mode in a fine-grained manner. Combining the advantages of both can significantly improve the accuracy of the general behavior of the Android malware family, and thus improve the efficiency of recognizing and classifying large-scale Android malware.
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- IDC. The Number of Devices Connected to the INTERNET. Available online: https://www.idc.com/getdoc.jsp?containerId=US45527219 (accessed on 18 June 2020).
- Stats, S.G. The Market Share of Android Smartphones. Available online: http://gs.statcounter.com/os-market-share/mobile/worldwide (accessed on 18 June 2020).
- Malware Evolution, K.M. The Number of Malicious Installation Packages Appeared per Day. Available online: https://securelist.com/mobile-malware-evolution-2018/89689/ (accessed on 18 June 2020).
- Center, I.S. The Monetary Loss per Victim Caused by Fraud in China. Available online: http://zt.360.cn/1101061855.php?dtid=1101061451&did=610100815 (accessed on 18 June 2020).
- Yuan, Z.; Lu, Y.; Wang, Z.; Xue, Y. Droid-Sec: Deep Learning in Android Malware Detection. In Proceedings of the 2014 ACM Conference on SIGCOMM, Chicago, IL, USA, 17–22 August 2014; pp. 371–372. [Google Scholar]
- Kim, T.; Kang, B.; Rho, M.; Sezer, S.; Im, E.G. A multimodal deep learning method for Android malware detection using various features. IEEE Trans. Inf. Forensics Secur. 2018, 14, 773–788. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Cai, J.; Cheng, S.; Li, W. DroidDeepLearner: Identifying Android malware using deep learning. In Proceedings of the 2016 IEEE 37th Sarnoff Symposium, Newark, NJ, USA, 19–21 September 2016; pp. 160–165. [Google Scholar]
- Arzt, S. FlowDroid. Available online: https://github.com/secure-software-engineering/FlowDroid (accessed on 18 June 2020).
- Hochreiter, S.; Schmidhuber, J. Long short term memory. Neural comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- AVTEST. AVTEST: One of the world’s Leading Third-Party Inde-Pendent Testing Organizations. Available online: https://www.av-test.org/en/antivirus/mobile-devices/ (accessed on 18 June 2020).
- VirusTotal. VirusTotal: Free Online Virus, Malware and URL Scanner. Available online: https://www.virustotal.com/ (accessed on 18 June 2020).
- Sadeghi, A.; Bagheri, H.; Garcia, J.; Malek, S. A taxonomy and qualitative comparison of program analysis techniques for security assessment of Android software. IEEE Trans. Software Eng. 2016, 43, 492–530. [Google Scholar] [CrossRef]
- Wu, D.J.; Mao, C.H.; Wei, T.E.; Lee, H.M.; Wu, K.P. Droidmat: Android malware detection through manifest and api calls tracing. In Proceedings of the 2012 Seventh Asia Joint Conference on Information Security, Tokyo, Japan, 9–10 August 2012; pp. 62–69. [Google Scholar]
- Burguera, I.; Zurutuza, U.; Nadjm-Tehrani, S. Crowdroid: Behavior-based malware detection system for Android. In Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices, Chicago, IL, USA, 17–21 October 2011; pp. 15–26. [Google Scholar]
- Sanz, B.; Santos, I.; Laorden, C.; Ugarte-Pedrero, X.; Bringas, P.G. On the automatic categorisation of Android applications. In Proceedings of the 2012 IEEE Consumer Communications and Networking Conference (CCNC), Las Vegas, NV, USA, 14–17 January 2012; pp. 149–153. [Google Scholar]
- Arp, D.; Spreitzenbarth, M.; Hubner, M.; Gascon, H.; Rieck, K.; Siemens, C. Drebin: Effective and explainable detection of Android malware in your pocket. In Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 23–26 February 2014; pp. 23–26. [Google Scholar]
- Hou, S.; Saas, A.; Ye, Y.; Chen, L. Droiddelver: An Android malware detection system using deep belief network based on api call blocks. In Proceedings of the International Conference on Web-Age Information Management, Nanchang, China, 3–5 June 2016; pp. 54–66. [Google Scholar]
- Milosevic, N.; Dehghantanha, A.; Choo, K.K.R. Machine learning aided Android malware classification. Comput. Electr. Eng. 2017, 61, 266–274. [Google Scholar] [CrossRef] [Green Version]
- Karbab, E.B.; Debbabi, M.; Derhab, A.; Mouheb, D. MalDozer: Automatic framework for Android malware detection using deep learning. Digital Investig. 2018, 24, S48–S59. [Google Scholar] [CrossRef]
- Wu, W.C.; Hung, S.H. DroidDolphin: A dynamic Android malware detection framework using big data and machine learning. In Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems, Towson, MD, USA, 5–8 October 2014; pp. 247–252. [Google Scholar]
- Desnos, A.; Lantz, P. Droidbox: An Android Application Sandbox for Dynamic Analysis; Technical Report; Lund University: Lund, Sweden, 2011. [Google Scholar]
- Chang, S. APE: A Smart Automatic Testing Environment for Android Malware; Department of Computer Science and Information Engineering, National Taiwan University: Taipei, Taiwan, 2013. [Google Scholar]
- Tam, K.; Khan, S.J.; Fattori, A.; Cavallaro, L. CopperDroid: Automatic Reconstruction of Android Malware Behaviors. In Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 8–11 February 2015. [Google Scholar]
- Wong, M.Y.; Lie, D. IntelliDroid: A Targeted Input Generator for the Dynamic Analysis of Android Malware. In Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 21–24 February 2016; pp. 21–24. [Google Scholar]
- Zhauniarovich, Y.; Ahmad, M.; Gadyatskaya, O.; Crispo, B.; Massacci, F. Stadyna: Addressing the problem of dynamic code updates in the security analysis of Android applications. In Proceedings of the 5th ACM Conference on Data and Application Security and Privacy, San Antonio, TX, USA, 2–4 March 2015; pp. 37–48. [Google Scholar]
- Yuan, Z.; Lu, Y.; Xue, Y. Droiddetector: Android malware characterization and detection using deep learning. Tsinghua Sci. Technol. 2016, 21, 114–123. [Google Scholar] [CrossRef]
- Hou, S.; Ye, Y.; Song, Y.; Abdulhayoglu, M. Make Evasion Harder: An Intelligent Android Malware Detection System. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018; pp. 5279–5283. [Google Scholar]
- Rumelhart, D.; Hinton, G.; Williams, R. Learning Internal Representations by Error Propagation. In Readings in Cognitive Science; California Univ San Diego La Jolla Inst for Cognitive Science: California, CA, USA, 1988; pp. 399–421. [Google Scholar]
- Lab, T.M.S. 2018 Mobile Security Report. Available online: https://m.qq.com/security_lab/news_detail_489.html (accessed on 18 June 2020).
- Hurier, M.; Suarez-Tangil, G.; Dash, S.K.; Bissyandé, T.F.; Le Traon, Y.; Klein, J.; Cavallaro, L. Euphony: Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware. In Proceedings of the 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), Buenos Aires, Argentina, 20–21 May 2017; pp. 425–435. [Google Scholar]
- Dedexer. Dedexer: A Disassembler Tool for DEX Files. Available online: http://dedexer.sourceforge.net/ (accessed on 18 June 2020).
- Drissi, M.; Watkins, O.; Khant, A.; Ojha, V.; Sandoval, P.; Segev, R.; Weiner, E.; Keller, R. Program language translation using a grammar-driven tree-to-tree model. arXiv 2018, arXiv:807.01784. [Google Scholar]
- VirusShare. VirusShare.com. Available online: https://virusshare.com/ (accessed on 18 June 2020).
- Allix, K.; Bissyandé, T.F.; Klein, J.; Le Traon, Y. Androzoo: Collecting millions of Android apps for the research community. In Proceedings of the 13th Working Conference on Mining Software Repositories (MSR), Austin, TX, USA, 14–15 May 2016; pp. 468–471. [Google Scholar]
- Scikit Learn. Scikit-Learn: Machine Learning in Python. Available online: http://scikit-learn.org/stable/ (accessed on 18 June 2020).
- TensorFlow. TensorFlow: An Open Source Software Library for Machine intelligence. Available online: https://www.tensorflow.org/ (accessed on 18 June 2020).
- Majnik, M.; Bosnić, Z. ROC analysis of classifiers in machine learning: A survey. Intell. Data Anal. 2013, 17, 531–558. [Google Scholar] [CrossRef]
- Rasthofer, S.; Arzt, S.; Bodden, E. A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks. In Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 23–26 February 2014; p. 1125. [Google Scholar]
- Sonnenburg, S.; Rätsch, G.; Henschel, S.; Widmer, C.; Behr, J.; Zien, A.; De Bona, F.; Binder, A.; Gehl, C.; Franc, V. The SHOGUN machine learning toolbox. J. Mach. Learn. Res. 2010, 11, 1799–1802. [Google Scholar]
- Ross, A.S.; Doshi-Velez, F. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Engine | Weight |
---|---|
Alibaba | 0.063 |
Tencent | 0.06 |
Sophos AV | 0.057 |
Antiy-AVL | 0.054 |
McAfee | 0.051 |
F-Secure | 0.048 |
Avira | 0.045 |
bitdefender | 0.042 |
TrendMicro | 0.039 |
GDsts | 0.036 |
kaspersky | 0.033 |
ikarus | 0.03 |
AVG | 0.027 |
Avast | 0.024 |
AhnLab-V3 | 0.021 |
Qihoo-360 | 0.019 |
Kingsoft | 0.016 |
AhnLab | 0.013 |
McAfee-GW-Ec | 0.01 |
Parameter | Description | Value |
---|---|---|
unit | Dimensionality of the output space. | 32 |
activation | Activation function | softmax |
kernel_regularizer | Regularizer function applied to the kernel weights matrix | l2(0.01) |
dropout | Fraction of the units to drop for the linear transformation of the inputs. | 0.5 |
return_sequences | Whether to return the last output in the output sequence or the full sequence | FALSE |
batch_size | The number of software to training fir each batch | 16 |
epochs | The number of total training steps | 100 |
max_nb_words | The number of valid words for embedding | 20,000 |
max_path | The maximum number of extracted paths for each app | 150 |
max_sequence | The maximum number of extracted words for each path | 600 |
Parameter | Description | Value |
---|---|---|
C | Penalty parameter C of the error term. | 1000 |
kernel | Specifies the kernel type to be used in the algorithm | linear |
degree | Degree of the polynomial ker-nel function (‘poly’) | 3 |
gamma | Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’ | auto |
coef0 | Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’ | 0 |
shrinking | Whether to use the shrinking heuristic | TRUE |
probability | Whether to enable probability estimates | FALSE |
tol | Tolerance for stopping criterion | 0.001 |
Method | Precision | Recall | F1-Score |
---|---|---|---|
NOFC-BoW&SVM | 0.94 | 0.95 | 0.95 |
NSFC-BoW&SVM | 0.95 | 0.95 | 0.95 |
NOFC-LSTM | 0.93 | 0.94 | 0.93 |
NSFC-LSTM | 0.94 | 0.94 | 0.93 |
OOFO-BoW&SVM | 0.95 | 0.96 | 0.95 |
OSFO-BoW&SVM | 0.95 | 0.96 | 0.95 |
OOFO-LSTM | 0.97 | 0.97 | 0.97 |
OSFO-LSTM | 0.92 | 0.92 | 0.92 |
Class | Precision | Recall | bf F1-Score | bf AUC |
---|---|---|---|---|
0 | 0.95 | 0.99 | 0.96 | 0.98 |
1 | 0.98 | 0.98 | 0.98 | 0.99 |
2 | 0.98 | 0.96 | 0.97 | 0.98 |
Total | 0.97 | 0.98 | 0.97 | 0.98 |
Class | Precision | Recall | F1-Score | AUC |
---|---|---|---|---|
0 | 0.89 | 0.93 | 0.91 | 0.94 |
1 | 0.91 | 0.91 | 0.91 | 0.94 |
2 | 0.94 | 0.92 | 0.93 | 0.95 |
Total | 0.91 | 0.92 | 0.92 | 0.94 |
Method | Number of Samples | Total Time Consumption | Average Time Consumption | Model Size(M) |
---|---|---|---|---|
Our method | 2796 | 11.3470 | 0.00406 | 6.17 |
comparative testing 1 | 2796 | 15.6182 | 0.00559 | 12.2 |
comparative testing 2 | 2796 | 1435.02 | 0.51324 | 566 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Niu, W.; Cao, R.; Zhang, X.; Ding, K.; Zhang, K.; Li, T. OpCode-Level Function Call Graph Based Android Malware Classification Using Deep Learning. Sensors 2020, 20, 3645. https://doi.org/10.3390/s20133645
Niu W, Cao R, Zhang X, Ding K, Zhang K, Li T. OpCode-Level Function Call Graph Based Android Malware Classification Using Deep Learning. Sensors. 2020; 20(13):3645. https://doi.org/10.3390/s20133645
Chicago/Turabian StyleNiu, Weina, Rong Cao, Xiaosong Zhang, Kangyi Ding, Kaimeng Zhang, and Ting Li. 2020. "OpCode-Level Function Call Graph Based Android Malware Classification Using Deep Learning" Sensors 20, no. 13: 3645. https://doi.org/10.3390/s20133645
APA StyleNiu, W., Cao, R., Zhang, X., Ding, K., Zhang, K., & Li, T. (2020). OpCode-Level Function Call Graph Based Android Malware Classification Using Deep Learning. Sensors, 20(13), 3645. https://doi.org/10.3390/s20133645