research-article

GlareShell: : Graph learning-based PHP webshell detection for web server of industrial internet

Authors:

Ning Xi,

Jianfeng MaAuthors Info & Claims

Volume 245, Issue C

https://doi.org/10.1016/j.comnet.2024.110406

Published: 01 May 2024 Publication History

Abstract

With the explosive growth of the Industrial Internet scale, cyberattacks targeting industrial control systems also increased. The management and operation of Industrial Internet are usually performed via web servers which retain a large attack surface. In the Industrial Internet, attackers usually exploit vulnerabilities to inject malicious codes for remotely executing commands, stealing confidential data, and invading web servers. Existing approaches capture statistical and contextual dependence information from Webshell using machine learning (ML) or deep learning (DL) algorithms. However, the semantic feature mining of program code within Webshell is not sufficient when entering new types of Webshell. In this paper, we propose a graph learning-based PHP Webshell detection framework, GlareShell, using the word embedding technique, a risk weight allocation mechanism, and the graph neural network (GNN). First, GlareShell leverages static analysis to extract interprocedural control flow graphs (ICFGs) from PHP script files and then prunes these ICFGs to remove noisy statements. Then, word embedding techniques are employed to generate semantic representations from PHP statements. Next, we design a risk weight allocation mechanism to derive the risk levels of statements and concatenate them with word embeddings as attributions. The identified risk levels could guide the passing of potential attack patterns inside GNN models. Finally, GlareShell builds a GNN classifier directly from the ICFG with corresponding node attributions to identify the malicious PHP scripts. Experiment results on collected datasets prove the promise of our graph learning framework in the Webshell detection domain.

Highlights

•

We proposed a novel graph learning-based PHP Webshell detection framework, namely GlareShell, that integrates the semantic information extracted from word embedding techniques and derived risk levels to identify the maliciousness of PHP script files.

•

We find that the risk weight mechanism is effective in improving the GNN algorithm in the security domain.

•

We evaluated GlareShell on the collected dataset, which consists of about 3K Webshell and 10K normal script files. Experiment results show the effectiveness of our graph learning-based detection framework.

References

[1]

Qi Longchen, Kong Rui, Lu Yang, Zhuang Honglin, An end-to-end detection method for webshell with deep learning, in: 2018 Eighth International Conference on Instrumentation & Measurement, Computer, Communication and Control, IMCCC, IEEE, 2018, pp. 660–665.

Abstract

Highlights

References

Cited By

Recommendations

BejaGNN: behavior-based Java malware detection via graph neural network

Detecting Webshell Based on Random Forest with FastText

Obfuscated PHP Webshell Detection Using the Webshell Tailored TextRank Algorithm

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations