research-article

Poison Attack and Poison Detection on Deep Source Code Processing Models

Authors:

Jia Li ♂,

Zhuo Li,

Huangzhao Zhang,

Ge Li,

Zhi Jin,

Xing Hu,

Xin XiaAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 3

Article No.: 62, Pages 1 - 31

https://doi.org/10.1145/3630008

Published: 14 March 2024 Publication History

Get Access

Abstract

In the software engineering (SE) community, deep learning (DL) has recently been applied to many source code processing tasks, achieving state-of-the-art results. Due to the poor interpretability of DL models, their security vulnerabilities require scrutiny. Recently, researchers have identified an emergent security threat to DL models, namely, poison attacks. The attackers aim to inject insidious backdoors into DL models by poisoning the training data with poison samples. The backdoors mean that poisoned models work normally with clean inputs but produce targeted erroneous results with inputs embedded with specific triggers. By using triggers to activate backdoors, attackers can manipulate poisoned models in security-related scenarios (e.g., defect detection) and lead to severe consequences.

To verify the vulnerability of deep source code processing models to poison attacks, we present a poison attack approach for source code named CodePoisoner as a strong imaginary enemy. CodePoisoner can produce compilable and functionality-preserving poison samples and effectively attack deep source code processing models by poisoning the training data with poison samples. To defend against poison attacks, we further propose an effective poison detection approach named CodeDetector. CodeDetector can automatically identify poison samples in the training data. We apply CodePoisoner and CodeDetector to six deep source code processing models, including defect detection, clone detection, and code repair models. The results show that ❶ CodePoisoner conducts successful poison attacks with a high attack success rate (average: 98.3%, maximum: 100%). It validates that existing deep source code processing models have a strong vulnerability to poison attacks. ❷ CodeDetector effectively defends against multiple poison attack approaches by detecting (maximum: 100%) poison samples in the training data. We hope this work can help SE researchers and practitioners notice poison attacks and inspire the design of more advanced defense techniques.

References

[1]

Wikipedia. 2023. Wikipedia. https://www.wikipedia.org

Abstract

References

Cited By

Index Terms

Recommendations

Invisible Poison: A Blackbox Clean Label Backdoor Attack to Deep Neural Networks

Planting a Poison SEAD: Using Social Engineering Active Defense (SEAD) to Counter Cybercriminals

Exploiting Supervised Poison Vulnerability to Strengthen Self-supervised Defense

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations