[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

GlareShell: : Graph learning-based PHP webshell detection for web server of industrial internet

Published: 01 May 2024 Publication History

Abstract

With the explosive growth of the Industrial Internet scale, cyberattacks targeting industrial control systems also increased. The management and operation of Industrial Internet are usually performed via web servers which retain a large attack surface. In the Industrial Internet, attackers usually exploit vulnerabilities to inject malicious codes for remotely executing commands, stealing confidential data, and invading web servers. Existing approaches capture statistical and contextual dependence information from Webshell using machine learning (ML) or deep learning (DL) algorithms. However, the semantic feature mining of program code within Webshell is not sufficient when entering new types of Webshell. In this paper, we propose a graph learning-based PHP Webshell detection framework, GlareShell, using the word embedding technique, a risk weight allocation mechanism, and the graph neural network (GNN). First, GlareShell leverages static analysis to extract interprocedural control flow graphs (ICFGs) from PHP script files and then prunes these ICFGs to remove noisy statements. Then, word embedding techniques are employed to generate semantic representations from PHP statements. Next, we design a risk weight allocation mechanism to derive the risk levels of statements and concatenate them with word embeddings as attributions. The identified risk levels could guide the passing of potential attack patterns inside GNN models. Finally, GlareShell builds a GNN classifier directly from the ICFG with corresponding node attributions to identify the malicious PHP scripts. Experiment results on collected datasets prove the promise of our graph learning framework in the Webshell detection domain.

Highlights

We proposed a novel graph learning-based PHP Webshell detection framework, namely GlareShell, that integrates the semantic information extracted from word embedding techniques and derived risk levels to identify the maliciousness of PHP script files.
We find that the risk weight mechanism is effective in improving the GNN algorithm in the security domain.
We evaluated GlareShell on the collected dataset, which consists of about 3K Webshell and 10K normal script files. Experiment results show the effectiveness of our graph learning-based detection framework.

References

[1]
Qi Longchen, Kong Rui, Lu Yang, Zhuang Honglin, An end-to-end detection method for webshell with deep learning, in: 2018 Eighth International Conference on Instrumentation & Measurement, Computer, Communication and Control, IMCCC, IEEE, 2018, pp. 660–665.
[2]
W3Techs - World Wide Web Technology Surveys, Most popular server-side programming languages., 2023, Accessed Aug 27, 2023. https://w3techs.com/.
[3]
Pratap Kumar, Ravi K. Sheth, A review on 0-day vulnerability testing in web application, in: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, 2016, pp. 1–4.
[4]
Hannousse Abdelhakim, Yahiouche Salima, Handling webshell attacks: A systematic mapping and survey, Comput. Secur. 108 (2021).
[5]
Yang Wenchuan, Sun Bang, Cui Baojiang, A webshell detection technology based on HTTP traffic analysis, in: Innovative Mobile and Internet Services in Ubiquitous Computing: Proceedings of the 12th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS-2018), Springer, 2019, pp. 336–342.
[6]
Wu Yixin, Sun Yuqiang, Huang Cheng, Jia Peng, Liu Luping, Session-based webshell detection using machine learning in web logs, Secur. Commun. Netw. 2019 (2019) 1–11.
[7]
Liu Hongyu, Lang Bo, Liu Ming, Yan Hanbing, CNN and RNN based payload classification methods for attack detection, Knowl.-Based Syst. 163 (2019) 332–341.
[8]
Yong Fang, Yaoyao Qiu, Liang Liu, Cheng Huang, Detecting webshell based on random forest with fasttext, in: Proceedings of the 2018 International Conference on Computing and Artificial Intelligence, 2018, pp. 52–56.
[9]
Guo You, Marco-Gisbert Hector, Keir Paul, Mitigating webshell attacks through machine learning techniques, Future Internet 12 (1) (2020) 12.
[10]
Zhang Han, Liu Ming, Yue Zihan, Xue Zhi, Shi Yong, He Xiangjian, A php and jsp web shell detection system with text processing based on machine learning, in: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE, 2020, pp. 1584–1591.
[11]
Liu Zhiqiang, Li Daofeng, Wei Lulu, et al., A new method for webshell detection based on bidirectional gru and attention mechanism, Secur. Commun. Netw. 2022 (2022).
[12]
Pu Ao, Feng Xia, Zhang Yuhan, Wan Xuelin, Han Jiaxuan, Huang Cheng, BERT-embedding-based JSP webshell detection on bytecode level using xgboost, Secur. Commun. Netw. 2022 (2022).
[13]
An Tongjian, Shui Xuefei, Gao Hongkui, Deep learning based webshell detection coping with long text and lexical ambiguity, in: International Conference on Information and Communications Security, Springer, 2022, pp. 438–457.
[14]
Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, Shi Wu, Order matters: Semantic-aware neural networks for binary code similarity detection, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 1145–1152.
[15]
Yamaguchi Fabian, Golde Nico, Arp Daniel, Rieck Konrad, Modeling and discovering vulnerabilities with code property graphs, in: 2014 IEEE Symposium on Security and Privacy, IEEE, 2014, pp. 590–604.
[16]
Siow Jing Kai, Liu Shangqing, Xie Xiaofei, Meng Guozhu, Liu Yang, Learning program semantics with code representations: An empirical study, in: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER, IEEE, 2022, pp. 554–565.
[17]
Narayanan Annamalai, Chandramohan Mahinthan, Chen Lihui, Liu Yang, A multi-view context-aware approach to android malware detection and malicious code localization, Empir. Softw. Eng. 23 (2018) 1222–1274.
[18]
PHP Group, PHP: Hypertext preprocessor, 2023, Accessed Aug 27, 2023. https://www.php.net/.
[19]
Oleksii Starov, Johannes Dahse, Syed Sharique Ahmad, Thorsten Holz, Nick Nikiforakis, No honor among thieves: A large-scale analysis of malicious web shells, in: Proceedings of the 25th International Conference on World Wide Web, 2016, pp. 1021–1032.
[20]
Penghui Li, Wei Meng, Lchecker: Detecting loose comparison bugs in php, in: Proceedings of the Web Conference 2021, 2021, pp. 2721–2732.
[21]
PHP Parser, A PHP parser written in PHP, 2023, Accessed Aug 27, 2023. https://github.com/nikic/PHP-Parser.
[22]
Jeon Sanghoon, Kim Huy Kang, Autovas: An automated vulnerability analysis system with a deep learning approach, Comput. Secur. 106 (2021).
[23]
Hu Xiaohui, Sun Rui, Xu Kejia, Zhang Yongzheng, Chang Peng, Exploit internal structural information for IoT malware detection based on hierarchical transformer model, in: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE, 2020, pp. 927–934.
[24]
Function and Method listing, List of all the functions and methods in the manual, 2023, Accessed Aug 21, 2023. https://www.php.net/manual/en/indexes.functions.php.
[25]
Ohjoon Kwon, Dohyun Kim, Soo-Ryeon Lee, Junyoung Choi, SangKeun Lee, Handling out-of-vocabulary problem in hangeul word embeddings, in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, pp. 3213–3221.
[26]
Yifei Xu, Zhengzi Xu, Bihuan Chen, Fu Song, Yang Liu, Ting Liu, Patch based vulnerability matching for binary programs, in: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020, pp. 376–387.
[27]
Yue Duan, Xuezixiang Li, Jinghan Wang, Heng Yin, Deepbindiff: Learning program-wide code representations for binary diffing, in: Network and Distributed System Security Symposium, 2020.
[28]
Xu Ke, Li Yingjiu, Deng Robert H., Chen Kai, Deeprefiner: Multi-layer android malware detection system applying deep neural networks, in: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), IEEE, 2018, pp. 473–487.
[29]
Bojanowski Piotr, Grave Edouard, Joulin Armand, Mikolov Tomas, Enriching word vectors with subword information, Trans. Assoc. Comput. Linguist. 5 (2017) 135–146.
[30]
Jeffrey Pennington, Richard Socher, Christopher D. Manning, Glove: Global vectors for word representation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2014, pp. 1532–1543.
[31]
Le Quoc, Mikolov Tomas, Distributed representations of sentences and documents, in: International Conference on Machine Learning, PMLR, 2014, pp. 1188–1196.
[32]
Liu Songsong, Feng Pengbin, Sun Kun, HoneyBog: A hybrid webshell honeypot framework against command injection, in: 2021 IEEE Conference on Communications and Network Security, CNS, IEEE, 2021, pp. 218–226.
[33]
Zhou Jie, Cui Ganqu, Hu Shengding, Zhang Zhengyan, Yang Cheng, Liu Zhiyuan, Wang Lifeng, Li Changcheng, Sun Maosong, Graph neural networks: A review of methods and applications, AI Open 1 (2020) 57–81.
[34]
Defferrard Michaël, Bresson Xavier, Vandergheynst Pierre, Convolutional neural networks on graphs with fast localized spectral filtering, Adv. Neural Inf. Process. Syst. 29 (2016).
[35]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio, Graph Attention Networks, in: International Conference on Learning Representations, 2018.
[36]
Xu Keyulu, Hu Weihua, Leskovec Jure, Jegelka Stefanie, How powerful are graph neural networks?, 2018, arXiv preprint arXiv:1810.00826.
[37]
Changhua Luo, Penghui Li, Wei Meng, TChecker: Precise Static Inter-Procedural Analysis for Detecting Taint-Style Vulnerabilities in PHP Applications, in: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 2175–2188.
[38]
Deng Liting, Wen Hui, Xin Mingfeng, Li Hong, Pan Zhiwen, Sun Limin, Enimanal: Augmented cross-architecture IoT malware analysis using graph neural networks, Comput. Secur. (2023).
[39]
Samhi Jordan, Kober Maria, Kabore Abdoul Kader, Arzt Steven, Bissyandé Tegawendé F, Klein Jacques, Negative results of fusing code and documentation for learning to accurately identify sensitive source and sink methods: An application to the android framework for data leak detection, in: 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER, IEEE, 2023, pp. 783–794.
[40]
Brody Shaked, Alon Uri, Yahav Eran, How attentive are graph attention networks?, 2021, arXiv preprint arXiv:2105.14491.
[41]
Xie Yaochen, Xu Zhao, Zhang Jingtun, Wang Zhengyang, Ji Shuiwang, Self-supervised learning of graph neural networks: A unified review, IEEE Trans. Pattern Anal. Mach. Intell. 45 (2) (2022) 2412–2429.
[42]
Yuan Hao, Yu Haiyang, Gui Shurui, Ji Shuiwang, Explainability in graph neural networks: A taxonomic survey, IEEE Trans. Pattern Anal. Mach. Intell. 45 (5) (2022) 5782–5799.
[43]
VLD, Provides functionality to dump the internal representation of php scripts, 2023, Accessed Aug 27, 2023. https://pecl.php.net/package/vld.
[44]
Zhang Lan, Liu Peng, Choi Yoon-Ho, Chen Ping, Semantics-preserving reinforcement learning attack against graph neural networks for malware detection, IEEE Trans. Dependable Secure Comput. 20 (2) (2022) 1390–1402.

Cited By

View all
  • (2024)PHP-based malicious webshell detection based on abstract syntax tree simplification and explicit duration recurrent networksComputers and Security10.1016/j.cose.2024.104049146:COnline publication date: 1-Nov-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer Networks: The International Journal of Computer and Telecommunications Networking
Computer Networks: The International Journal of Computer and Telecommunications Networking  Volume 245, Issue C
May 2024
487 pages

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 01 May 2024

Author Tags

  1. Webshell detection
  2. Graph neural network
  3. Word embedding
  4. ICFG

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PHP-based malicious webshell detection based on abstract syntax tree simplification and explicit duration recurrent networksComputers and Security10.1016/j.cose.2024.104049146:COnline publication date: 1-Nov-2024

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media