[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Experimental comparison of features, analyses, and classifiers for Android malware detection

Published: 26 September 2023 Publication History

Abstract

Android malware detection has been an active area of research. In the past decade, several machine learning-based approaches based on different types of features that may characterize Android malware behaviors have been proposed. The usually-analyzed features include API usages and sequences at various abstraction levels (e.g., class and package), extracted using static or dynamic analysis. Additionally, features that characterize permission uses, native API calls and reflection have also been analyzed. Initial works used conventional classifiers such as Random Forest to learn on those features. In recent years, deep learning-based classifiers such as Recurrent Neural Network have been explored. Considering various types of features, analyses, and classifiers proposed in literature, there is a need of comprehensive evaluation on performances of current state-of-the-art Android malware classification based on a common benchmark. In this study, we evaluate the performance of different types of features and the performance between a conventional classifier, Random Forest (RF) and a deep learning classifier, Recurrent Neural Network (RNN). To avoid temporal and spatial biases, we evaluate the performances in a time- and space-aware setting in which classifiers are trained with older apps and tested on newer apps, and the distribution of test samples is representative of in-the-wild malware-to-benign ratio. Features are extracted from a common benchmark of 7,860 benign samples and 5,912 malware, whose release years span from 2010 to 2020. Among other findings, our study shows that permission use features perform the best among the features we investigated; package-level features generally perform better than class-level features; static features generally perform better than dynamic features; and RNN classifier performs better than RF classifier when trained on sequence-type features

References

[1]
Aafer Y, Du W, Yin H (2013) Droidapiminer: Mining api-level features for robust malware detection in android. In: International conference on security and privacy in communication systems, pp. 86–103. Springer
[2]
Afonso VM, de Amorim MF, Grégio ARA, Junquera GB, and de Geus PL Identifying android malware using dynamically obtained features Journal of Computer Virology and Hacking Techniques 2015 11 1 9-17
[3]
Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2623–2631
[4]
Allix K, Bissyandé TF, Jérome Q, Klein J, Le Traon Y, et al. Empirical assessment of machine learning-based malware detectors for android Empirical Software Engineering 2016 21 1 183-211
[5]
Allix K, Bissyandé TF, Klein J, Le Traon Y (2015) Are your training datasets yet relevant? In: International Symposium on Engineering Secure Software and Systems, pp. 51–67. Springer
[6]
Allix K, Bissyandé TF, Klein J, Le Traon Y (2016) Androzoo: Collecting millions of android apps for the research community. In: Proceedings of the 13th International Conference on Mining Software Repositories, pp. 468–471. ACM
[7]
Alshahrani H, Mansourt H, Thorn S, Alshehri A, Alzahrani A, Fu H (2019) Ddefender: Android application threat detection using static and dynamic analysis. In: 2018 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–6. IEEE (2018)
[8]
Android (2019) UI/Application Exerciser Monkey. https://developer.android.com/studio/test/monkey
[9]
Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, and Siemens C Drebin: Effective and explainable detection of android malware in your pocket Ndss 2014 14 23-26
[10]
Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, McDaniel P (2014) Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-awaretaint analysis for Android apps. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’14, pp. 259-269. ACM, New York, NY, USA.
[11]
Au KWY, Zhou YF, Huang Z, Lie D (2012) Pscout: analyzing the Android permission specification. In: Proceedings of the 2012 ACM conference on Computer and communications security, pp. 217–228. ACM
[12]
Bai Y, Xing Z, Li X, Feng Z, Ma D (2020) Unsuccessful story about few shot malware family classification and siamese network to the rescue. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), pp. 1560–1571. IEEE
[13]
Barandiaran I The random subspace method for constructing decision forests IEEE Trans Pattern Anal Mach Intell 1998 20 8 1-22
[14]
Bläsing T, Batyuk L, Schmidt AD, Camtepe SA, Albayrak S (2010) An android application sandbox system for suspicious software detection. In: 2010 5th International Conference on Malicious and Unwanted Software, pp. 55–62. IEEE
[15]
Cai H Assessing and improving malware detection sustainability through app evolution studies ACM Transactions on Software Engineering and Methodology (TOSEM) 2020 29 2 1-28
[16]
Chan PP, Song WK (2014) Static detection of android malware by using permissions and api calls. In: 2014 International Conference on Machine Learning and Cybernetics, vol. 1, pp. 82–87. IEEE
[17]
Chawla NV, Bowyer KW, Hall LO, and Kegelmeyer WP Smote: synthetic minority over-sampling technique Journal of artificial intelligence research 2002 16 321-357
[18]
Chen S, Xue M, Tang Z, Xu L, Zhu H (2016) Stormdroid: A streaminglized machine learning-based system for detecting android malware. In: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pp. 377–388
[19]
Choudhary SR, Gorla A, Orso A (2015) Automated test input generation for android: Are we there yet?(e). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 429–440. IEEE
[20]
Demissie BF, Ceccato M, Shar LK (2018) Anflo: Detecting anomalous sensitive informa41 tion flows in android apps. In: 2018 IEEE/ACM 5th International Conference on Mobile Software Engineering and Systems (MOBILESoft), pp. 24–34. IEEE
[21]
Demissie BF, Ceccato M, and Shar LK Security analysis of permission re-delegation vulnerabilities in android apps Empir Softw Eng 2020 25 6 5084-5136
[22]
Deng L, Yu D et al (2014) Deep learning: methods and applications. Foundations and Trends® in Signal Processing 7(3–4):197–387
[23]
Dini G, Martinelli F, Saracino A, Sgandurra D (2012) Madam: a multi-level anomaly detec tor for android malware. In: International Conference on Mathematical Methods, Models, and Architectures for Computer Network Security, pp. 240–253. Springer
[24]
Enck W, Ongtang M, McDaniel P (2009) On lightweight mobile phone application certifi51 cation. In: Proceedings of the 16th ACM conference on Computer and communications security, pp. 235–245. ACM
[25]
Eskandari M and Hashemi S A graph mining approach for detecting unknown malwares J Vis Lang & Comput 2012 23 3 154-162
[26]
Fan M, Liu J, Luo X, Chen K, Chen T, Tian Z, Zhang X, Zheng Q, Liu T (2016) Frequent subgraph based familial classification of android malware. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 24–35. IEEE
[27]
Fu X, Cai H (2019) On the deterioration of learning-based malware detectors for android. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 272–273. IEEE
[28]
Garcia J, Hammad M, and Malek S Lightweight, obfuscation-resilient detection and family identification of android malware ACM Transactions on Software Engineering and Methodology (TOSEM) 2018 26 3 11
[29]
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864
[30]
Huang CY, Tsai YT, Hsu CH (2013) Performance evaluation on permission-based detection for android malware. In: Advances in Intelligent Systems and Applications-Volume 2, pp.111–120. Springer
[31]
Huang L, Joseph AD, Nelson B, Rubinstein BI, Tygar JD (2011) Adversarial machine learning. In: Proceedings of the 4th ACM workshop on Security and artificial intelligence, pp. 43–58
[32]
Ikram M, Beaume P, Kaafar MA (2019) Dadidroid: An obfuscation resilient tool for detecting android malware via weighted directed call graph modelling. arXiv:1905.09136
[33]
Karbab EB, Debbabi M, Derhab A, and Mouheb D Maldozer: Automatic framework for android malware detection using deep learning Digital Investigation 2018 24 S48-S59
[34]
Kim T, Kang B, Rho M, Sezer S, and Im EG A multimodal deep learning method for android malware detection using various features IEEE Transactions on Information Forensics and Security 2018 14 3 773-788
[35]
Lindorfer M, Neugschwandtner M, Platzer C (2015) Marvin: Efficient and comprehensive mobile app classification through static and dynamic analysis. In: 2015 IEEE 39th annual computer software and applications conference, vol. 2, pp. 422–433. IEEE
[36]
Lindorfer M, Neugschwandtner M, Weichselbaum L, Fratantonio Y, Van Der Veen V, Platzer C (2014) Andrubis-1,000,000 apps later: A view on current android malware behaviors. In: 2014 third international workshop on building analysis datasets and gathering experience returns for security (BADGERS), pp. 3–17. IEEE
[37]
Liu X, Liu J (2014) A two-layered permission-based android malware detection scheme. In: 2014 2nd IEEE International Conference on Mobile Cloud Computing, Services, and Engineering, pp. 142–148. IEEE
[38]
Liu Y, Tantithamthavorn C, Li L, Liu Y (2022) Deep learning for android malware defenses: a systematic literature review. ACM Journal of the ACM (JACM)
[39]
Liu Z, Xia X, Hassan AE, Lo D, Xing Z, Wang X (2018) Neural-machine-translation based commit message generation: how far are we? In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 373–384
[40]
Ma Z, Ge H, Liu Y, Zhao M, and Ma J A combination method for android malware detection based on control flow graphs and machine learning algorithms IEEE access 2019 7 21235-21245
[41]
McLaughlin N, Martinez del Rincon J, Kang B, Yerima S, Miller P, Sezer S, Safaei Y, Trickel E, Zhao Z, Doupé A et al (2017) Deep android malware detection. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp.301–308. ACM
[42]
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26
[43]
Narayanan A, Chandramohan M, Venkatesan R, Chen L, Liu Y, Jaiswal S (2017) graph2vec: Learning distributed representations of graphs. arXiv:1707.05005
[44]
Narayanan A, Soh C, Chen L, Liu Y, Wang L (2018) apk2vec: Semi-supervised multi-view representation learning for profiling android applications. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 357–366. IEEE
[45]
Narudin FA, Feizollah A, Anuar NB, and Gani A Evaluation of machine learning classifiers for mobile malware detection Soft Computing 2016 20 1 343-357
[46]
Naway A, Li Y (2018) A review on the use of deep learning in android malware detection. arXiv preprint arXiv:1812.10360
[47]
Onwuzurike L, Mariconti E, Andriotis P, Cristofaro ED, Ross G, and Stringhini G Mamadroid: Detecting android malware by building markov chains of behavioral models (extended version) ACM Transactions on Privacy and Security (TOPS) 2019 22 2 14
[48]
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, and Duchesnay E Scikit-learn: Machine learning in Python Journal of Machine Learning Research 2011 12 2825-2830
[49]
Pendlebury F, Pierazzi F, Jordaney R, Kinder J, Cavallaro L et al (2019) Tesseract: Eliminating experimental bias in malware classification across space and time. In: Proceedings of the 28th USENIX Security Symposium, pp. 729–746. USENIX Association
[50]
Rastogi V, Chen Y, Jiang X (2013) Droidchameleon: evaluating android anti-malware against transformation attacks. In: Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security, pp. 329–334
[51]
Sanz B, Santos I, Laorden C, Ugarte-Pedrero X, Bringas PG, Álvarez G (2013) Puma:Permission usage to detect malware in android. In: International Joint Conference CISIS’12-ICEUTE 12-SOCO 12 Special Sessions, pp. 289–298. Springer
[52]
Shahpasand M, Hamey L, Vatsalan D, Xue M (2019) Adversarial attacks on mobile malware detection. In: 2019 IEEE 1st International Workshop on Artificial Intelligence for Mobile (AI4Mobile), pp. 17–20. IEEE
[53]
Shar LK, Demissie BF, Ceccato M, Minn W (2020) Experimental comparison of features and classifiers for android malware detection. In: Proceedings of the IEEE/ACM 7th International Conference on Mobile Software Engineering and Systems, pp. 50–60. IEEE/ACM
[54]
Sharma A, Dash SK (2014) Mining api calls and permissions for android malware detection. In: International Conference on Cryptology and Network Security, pp. 191–205. Springer
[55]
Shen F, Del Vecchio J, Mohaisen A, Ko SY, and Ziarek L Android malware detection using complex-flows IEEE Transactions on Mobile Computing 2018 18 6 1231-1245
[56]
Shi L, Ming J, Fu J, Peng G, Xu D, Gao K, Pan X (2020) Vahunt: Warding off new repackaged android malware in app-virtualization’s clothing. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pp. 535–549
[57]
Soot (2018) Soot - a java optimization framework, https://github.com/sable/soot
[58]
Spreitzenbarth M, Freiling F, Echtler F, Schreck T, Hoffmann J (2013) Mobile-sandbox: having a deeper look into android applications. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp. 1808–1815
[59]
Suarez-Tangil G, Dash SK, Ahmadi M, Kinder J, Giacinto G, Cavallaro L (2017) Droidsieve: Fast and accurate classification of obfuscated android malware. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp.309–320
[61]
Thomé J, Shar LK, Bianculli D, Briand L (2017) Search-driven string constraint solving for vulnerability detection. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp. 198–208. IEEE
[62]
Tobiyama S, Yamaguchi Y, Shimada H, Ikuse T, Yagi T (2016) Malware detection with deep neural network using process behavior. In: 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 577–582. IEEE
[63]
Tobiyama S, Yamaguchi Y, Shimada H, Ikuse T, Yagi T (2016) Malware detection with deep neural network using process behavior. In: 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 577–582. IEEE
[64]
Wu B, Chen S, Gao C, Fan L, Liu Y, Wen W, and Lyu MR Why an android app is classified as malware: Toward malware classification interpretation ACM Transactions on Software Engineering and Methodology (TOSEM) 2021 30 2 1-29
[65]
Wu DJ, Mao CH, Wei TE, Lee HM, Wu KP (2012) Droidmat: Android malware detection through manifest and api calls tracing. In: 2012 Seventh Asia Joint Conference on Information Security, pp. 62–69. IEEE
[66]
Xu B, Shirani A, Lo D, Alipour MA (2018) Prediction of relatedness in stack overflow: deep learning vs. svm: a reproducibility study. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1–10
[67]
Xu K, Li Y, Deng R, Chen K, Xu J (2019) Droidevolver: Self-evolving android malware detection system. In: 2019 IEEE European Symposium on Security and Privacy (EuroSP), pp. 47–62.
[68]
Xu K, Li Y, Deng RH, Chen K (2018) Deeprefiner: Multi-layer android malware detection system applying deep neural networks. In: 2018 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 473–487. IEEE
[69]
Yang W, Prasad M, Xie T (2018) Enmobile: Entity-based characterization and analysis of mobile malware. In: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp. 384–394. IEEE
[70]
Yang X, Lo D, Li L, Xia X, Bissyandé TF, and Klein J Characterizing malicious android apps by mining topic-specific data flow signatures Information and Software Technology 2017 90 27-39
[71]
Yerima SY, Sezer S, and Muttik I High accuracy android malware detection using ensemble learning IET Information Security 2015 9 6 313-320
[72]
Yuan Z, Lu Y,Wang Z, Xue Y (2014) Droid-sec: deep learning in android malware detection. In: ACMSIGCOMMComputer Communication Review, vol. 44, pp. 371–372. ACM
[73]
Zhang M, Duan Y, Yin H, Zhao Z (2014) Semantics-aware android malware classification using weighted contextual api dependency graphs. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp. 1105–1116
[74]
Zhang X, Zhang Y, Zhong M, Ding D, Cao Y, Zhang Y, Zhang M, Yang M (2020) Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pp. 757–770
[75]
Zhao Y, Li L, Wang H, Cai H, Bissyandé TF, Klein J, and Grundy J On the impact of sample duplication in machine-learning-based android malware detection ACM Transactions on Software Engineering and Methodology (TOSEM) 2021 30 3 1-38
[76]
Zou D, Wu Y, Yang S, Chauhan A, Yang W, Zhong J, Dou S, Jin H (2021) Intdroid: Android malware detection based on api intimacy analysis. ACM Transactions on Software Engineering and Methodology (TOSEM) 30(3):1–32

Cited By

View all
  • (2024)VioDroid-Finder: automated evaluation of compliance and consistency for Android appsEmpirical Software Engineering10.1007/s10664-024-10470-829:3Online publication date: 3-May-2024

Index Terms

  1. Experimental comparison of features, analyses, and classifiers for Android malware detection
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Empirical Software Engineering
            Empirical Software Engineering  Volume 28, Issue 6
            Nov 2023
            1028 pages

            Publisher

            Kluwer Academic Publishers

            United States

            Publication History

            Published: 26 September 2023
            Accepted: 24 July 2023

            Author Tags

            1. Malware detection
            2. Machine learning
            3. Deep learning
            4. Android

            Qualifiers

            • Research-article

            Funding Sources

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 30 Dec 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)VioDroid-Finder: automated evaluation of compliance and consistency for Android appsEmpirical Software Engineering10.1007/s10664-024-10470-829:3Online publication date: 3-May-2024

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media