Abstract
Android security incidents occurred frequently in recent years. To improve the accuracy and efficiency of large-scale Android malware detection, in this work, we propose a hybrid model based on deep autoencoder (DAE) and convolutional neural network (CNN). First, to improve the accuracy of malware detection, we reconstruct the high-dimensional features of Android applications (apps) and employ multiple CNN to detect Android malware. In the serial convolutional neural network architecture (CNN-S), we use Relu, a non-linear function, as the activation function to increase sparseness and “dropout” to prevent over-fitting. The convolutional layer and pooling layer are combined with the full-connection layer to enhance feature extraction capability. Under these conditions, CNN-S shows powerful ability in feature extraction and malware detection. Second, to reduce the training time, we use deep autoencoder as a pre-training method of CNN. With the combination, deep autoencoder and CNN model (DAE-CNN) can learn more flexible patterns in a short time. We conduct experiments on 10,000 benign apps and 13,000 malicious apps. CNN-S demonstrates a significant improvement compared with traditional machine learning methods in Android malware detection. In details, compared with SVM, the accuracy with the CNN-S model is improved by 5%, while the training time using DAE-CNN model is reduced by 83% compared with CNN-S model.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amos B, Turner H, White J (2013) Applying machine learning classifiers to dynamic android malware detection at scale. In: 9th international wireless communications and mobile computing conference (IWCMC), July 1–5, 2013, Sardinia, Italy, pp 1666–1671
Atat R, Liu L, Chen H, Wu J, Li H, Yi Y (2017) Enabling cyber-physical communication in 5G cellular networks: challenges, spatial spectrum sensing, and cyber-security. IET Cyber-Phys Syst Theory Appl 2(1):49–54
Bengio Y (2009) Learning deep architectures for AI. Foundations & Trends®. Mach Learn 2(1):1–127
Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst 28(10):2294–2305
Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017a) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197
Chang X, Yu YL, Yang Y, Xing EP (2017b) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632
Chen X, Li J, Huang X, Ma J, Lou W (2015) New publicly verifiable databases with efficient updates. IEEE Trans Dependable Secure Comput 12(5):546–556
China Internet Security (2016) Research report. http://zt.360.cn/1101061855.php?dtid=1101061451&did=490301065. Accessed Dec 2017
Enck W, Gilbert P, Han S, Tendulkar V, Chun B, Cox LP, Jung J, McDaniel P, Sheth AN (2014) TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. Acm Trans Comput Syst (TOCS) 32(2):1–29
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics (AISTATS), April 11–13, 2011, Ft. Lauderdale, FL, USA, pp 315–323
Guan X, Wang X, Zhang X (2009) Fast intrusion detection based on a non-negative matrix factorization model. J Netw Comput Appl 32(1):31–44
Gupta BB, Agrawal DP, Yamaguchi S (2016) Handbook of research on modern cryptographic solutions for computer and cyber Security, 1st edn. IGI Publishing, Hershey, PA. ISBN:1522501053 9781522501053
Hamedani K, Liu L, Atat R, Wu J, Yi Y (2018) Reservoir computing meets smart grids: attack detection using delayed feedback networks. IEEE Trans Ind Inf 14(2):734–743
Hinton GE, Zemel RS (1994) Autoencoders, minimum description length and Helmholtz free energy. Adv Neural Inf Process Syst (NIPS) 6:3–10
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. CoRR 1207:0580
Hinton GE, Osindero S, Teh YW (2014) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Huang HD, Yu CM, Kao HY (2017) R2-D2: color-inspired convolutional neural network (cnn)-based android malware detections. CoRR 1705:04448
Ibtihal M, Driss EO, Hassan N (2017) Homomorphic encryption as a service for outsourced images in mobile cloud computing environment. Int J Cloud Appl Comput 7(2):27–40
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), October 25–29, 2014 Doha, Qatar, pp 1746–1751
Klieber W, Flynn L, Bhosale A, Jia L, Bauer L (2014) Android taint flow analysis for app sets. In: Proceedings of the 3rd ACM SIGPLAN international workshop on the state of the art in Java program analysis, June 9–11, 2014, Edinburgh, United Kingdom, pp 1–6
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lee K, Choi HO, Min SD, Lee J, Gupta B, Nam Y (2017) A comparative evaluation of atrial fibrillation detection methods in koreans based on optical recordings using a smartphone. IEEE Access 5:11437–11443
Li Q, Li X (2015) Android malware detection based on static analysis of characteristic tree. In: 2015 International conference on cyber-enabled distributed computing and knowledge discovery (CyberC), September 17–19, 2015, Xi’an, China, pp 84–91
Li P, Li J, Huang Z, Gao CZ, Chen WB, Chen K (2017a) Privacy-preserving outsourced classification in cloud computing. Cluster Comput. https://doi.org/10.1007/s10586-017-0849-9
Li Z, Nie F, Chang X, Yang Y (2017b) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110
Li J, Sun L, Yan Q, Li Z, Srisa-an W, Ye H (2018a) Significant permission identification for machine learning based android malware detection. IEEE Trans Industr Inf PP(99):1–1
Li Y, Wang G, Nie L, Wang Q (2018b) Distance metric optimization driven convolutional neural network for age invariant face recognition. Pattern Recogn 75C:51–62
Liu X, Liu J, Wang W, He Y, Zhang X (2018) Discovering and understanding android sensor usage behaviors with data flow analysis. World Wide Web 21(1):105–126
Lu L, Li Z, Wu Z, Lee W, Jiang G (2012) CHEX: statically vetting android apps for component hijacking vulnerabilities. In: ACM conference on computer and communications security, October 16–18, 2012, Raleigh, NC, USA, pp 229–240
Memos VA, Psannis KE, Ishibashi Y, Kim BG, Gupta B (2018) An efficient algorithm for media-based surveillance system (EAMSuS) in IoT smart city framework. Future Gener Comput Syst 83:619–628
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), June 21–24, 2010, Haifa, Israel, pp 807–814
Nix R, Zhang J (2017) Classification of Android apps and malware using deep neural networks. Int Jt Conf Neural Netw (IJCNN) May 14–19, 2017, Alaska, USA, pp 1871–1878
Pandita R, Xiao X, Yang W, Enck W, Xie T (2013) WHYPER: towards automating risk assessment of mobile applications. Usenix Conf Secur 2013:527–542
Pedregosa F, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. Mach Learn Res 12(10):2825–2830
Rastogi S, Bhushan K, Gupta BB (2016) Android applications repackaging detection techniques for smartphone devices. Procedia Comput Sci 78:26–32
Sarma BP, Li N, Gates C, Potharaju R, Nita-Rotaru C, Molloy I (2012) Android permissions: a perspective combining risks and benefits. In: ACM symposium on access control models and technologies. June 21–23, 2017, Indianapolis, IN, USA, pp 13–22
Shabtai A, Fledel Y, Elovici Y (2010) Automated static code analysis for classifying android applications using machine learning. In: 2010 international conference on computational intelligence and security (CIS), December 11–14, 2010, Nanning, Guangxi, China, pp 329–333
Shen C, Chen Y, Guan X, Maxion Y (2017) Pattern-growth based mining mouse-interaction behavior for an active user authentication system. IEEE Trans Dependable Secur Comput PP(99):1–1
Shen J, Zhou T, Chen X, Li J, Susilo W (2018a) Anonymous and traceable group data sharing in cloud computing. IEEE Trans Inf Forensics Secur 13(4):912–925
Shen C, Li Y, Chen Y, Guan X, Maxion R (2018b) Performance analysis of multi-motion sensor behavior for active smartphone authentication. IEEE Trans Inf Forensics Secur 13(1):48–62
Shen J, Gui Z, Ji S, Shen J, Tan H, Tang Y (2018c) Cloud-aided lightweight certificateless authentication protocol with anonymity for wireless body area networks. Netw Comput Appl 106:117–123
Shen C, Chen Y, Guan X (2018d) Performance evaluation of implicit smartphones authentication via sensor-behavior analysis. Inf Sci 430:538–553
Wang W, Battiti R (2006) Identifying intrusions in computer networks with principal component analysis. ARES 2006:270–279
Wang W, Guan X, Zhang X (2008) Processing of massive audit data streams for real-time anomaly intrusion detection. Comput Commun 31(1):58–72
Wang W, Guyet T, Quiniou R, Cordier M, Masseglia F, Zhang X (2014a) Autonomic intrusion detection: adaptively detecting anomalies over unlabeled audit data streams in computer networks. Knowl Based Syst 70:103–117
Wang W, Wang X, Feng D, Liu J, Han Z, Zhang X (2014b) Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans Inf Forensics Secur 9(11):1869–1882
Wang W, He Y, Liu J, Gombault S (2015) Constructing important features from massive network traffic for lightweight intrusion detection. IET Inf Secur 9(6):374–379
Wang X, Wang W, He Y, Liu J, Han Z, Zhang X (2017) Characterizing android apps’ behavior for effective detection of malapps at large scale. Future Gener Comput Syst 75:30–45
Wang Y, Zhu G, Shi Y (2018a) Transportation spherical watermarking. IEEE Trans Image Process 27(4):2063–2077
Wang H, Wang W, Cui Z, Zhou X, Zhao J, Li Y (2018b) A new dynamic firefly algorithm for demand estimation of water resources. Inf Sci 438:95–106
Wang W, Li Y, Wang X, Liu J, Zhang X (2018c) Detecting android malicious apps and categorizing benign apps with ensemble of classifiers. Future Gener Comput Syst 78:987–994
Wang W, Liu J, Pitsilis G, Zhang X (2018d) Abstracting massive data for lightweight intrusion detection in computer networks. Inf Sci 433–434:417–430
Wu WC, Hung SH (2014) DroidDolphin: a dynamic android malware detection framework using big data and machine learning. In: Proceedings of the 2014 conference on research in adaptive and convergent systems (RACS), pp 247–252
Wu J, Guo S, Li J, Zeng D (2016a) Big data meet green challenges: greening big data. IEEE Syst J 10(3):873–887
Wu J, Guo S, Li J, Zeng D (2016b) Big data meet green challenges: big data toward green applications. IEEE Syst J 10(3):888–900
Xie D, Lai X, Lei X, Fan L (2018) Cognitive multiuser energy harvesting decode-and-forward relaying system with direct links. IEEE Access 6:5596–5606
Yerima SY, Sezer S, McWilliams G (2014) Analysis of Bayesian classification-based approaches for android malware detection. IET Inf Secur 8(1):25–36
Yuan Z, Lu Y, Xue Y (2016) Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci Technol 21(1):114–123
Zhang C, Liu C, Zhang X Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150
Zhou Y, Jiang X (2012) Dissecting android malware: characterization and evolution. In: IEEE symposium on security and privacy (SP), May 20–23, 2012, San Francisco, USA, pp 95–109
Acknowledgements
The work reported in this paper was supported in part by National Key R&D Program of China, under Grant 2017YFB0802805, in part by Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, under Grant AGK2015002, in part by ZTE Corporation Foundation, under Grant K17L00190, in part by funds of Science and Technology on Electronic Information Control Laboratory, under Grant K16GY00040, in part by the Fundamental Research funds for the central Universities of China, under Grant K17JB00060 and K17JB00020, and in part by Natural Science Foundation of China, under Grant U1736114 and 61672092.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, W., Zhao, M. & Wang, J. Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Human Comput 10, 3035–3043 (2019). https://doi.org/10.1007/s12652-018-0803-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-0803-6