More Web Proxy on the site http://driver.im/

research-article

Exploiting Pre-Trained Language Models for Black-Box Attack against Knowledge Graph Embeddings

Authors:

Guangqian Yang,

Zhendong MaoAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 19, Issue 1

Article No.: 1, Pages 1 - 14

https://doi.org/10.1145/3688850

Published: 29 November 2024 Publication History

Abstract

Despite the emerging research on adversarial attacks against knowledge graph embedding (KGE) models, most of them focus on white-box attack settings. However, white-box attacks are difficult to apply in practice compared to black-box attacks since they require access to model parameters that are unlikely to be provided. In this article, we propose a novel black-box attack method that only requires access to knowledge graph data, making it more realistic in real-world attack scenarios. Specifically, we utilize pre-trained language models (PLMs) to encode text features of the knowledge graphs, an aspect neglected by previous research. We then employ these encoded text features to identify the most influential triples for constructing corrupted triples for the attack. To improve the transferability of the attack, we further propose to fine-tune the PLM model by enriching triple embeddings with structure information. Extensive experiments conducted on two knowledge graph datasets illustrate the effectiveness of our proposed method.

References

[1]

Prithu Banerjee, Lingyang Chu, Yong Zhang, Laks V. S. Lakshmanan, and Lanjun Wang. 2021. Stealthy targeted data poisoning attack on knowledge graphs. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE ’21). IEEE, 2069–2074.

[2]

Patrick Betz, Christian Meilicke, and Heiner Stuckenschmidt. 2022. Adversarial explanations for knowledge graph embeddings. In Proceedings of the 31st International Joint Conference on Artificial Intelligence, 2820–2826.

[3]

Peru Bhardwaj, John Kelleher, Luca Costabello, and Declan O’Sullivan. 2021. Adversarial attacks on knowledge graph embeddings via instance attribution methods. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 8225–8239.

[4]

Peru Bhardwaj, John Kelleher, Luca Costabello, and Declan O’Sullivan. 2021. Poisoning knowledge graph embeddings via relation inference patterns. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Vol. 1 (Long Papers), 1875–1888.

[5]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. Advances in Neural Information Processing Systems 2 (2013), 2787–2795.

[6]

Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, and Zhangyang Wang. 2024. LLaGA: Large language and graph assistant. arXiv:2402.08170. Retrieved from https://arxiv.org/abs/2402.08170

[7]

Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (2018). DOI:

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers), 4171–4186.

[9]

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572. Retrieved from https://arxiv.org/abs/1412.6572

[10]

Kazuaki Hanawa, Sho Yokoi, Satoshi Hara, and Kentaro Inui. 2021. Evaluation of similarity-based explanations. In Proceedings of the International Conference on Learning Representations.

[11]

Bosung Kim, Taesuk Hong, Youngjoong Ko, and Jungyun Seo. 2020. Multi-task learning for knowledge graph completion with pre-trained language models. In Proceedings of the 28th International Conference on Computational Linguistics, 1737–1743.

[12]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7871–7880.

[13]

Yuhan Li, Zhixun Li, Peisong Wang, Jia Li, Xiangguo Sun, Hong Cheng, and Jeffrey Xu Yu. 2023. A survey of graph meets large language model: Progress and future directions. arXiv:2311.12399. Retrieved from https://arxiv.org/abs/2311.12399

[14]

Yicong Li, Xiangguo Sun, Hongxu Chen, Sixiao Zhang, Yu Yang, and Guandong Xu. 2024. Attention is not the only choice: Counterfactual reasoning for path-based explainable recommendation. IEEE Transactions on Knowledge and Data Engineering (2024), 1–14.

[15]

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence 29, 1 (2015). DOI:

[16]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https://arxiv.org/abs/1907.11692

[17]

Xin Lv, Yankai Lin, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Peng Li, and Jie Zhou. 2022. Do pre-trained models benefit knowledge graph completion? A reliable evaluation and a reasonable approach. In Proceedings of the Findings of the Association for Computational Linguistics (ACL ’22), 3570–3581.

[18]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 2 (2013), 3111–3119.

Digital Library

[19]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP ’14), 1532–1543.

[20]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 701–710.

Digital Library

[21]

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long Papers), 2227–2237.

[22]

Pouya Pezeshkpour, Yifan Tian, and Sameer Singh. 2019. Investigating robustness and interpretability of link prediction via adversarial modifications. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL ’19), 3336–3347.

[23]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training.

[24]

Andrea Rossi, Denilson Barbosa, Donatella Firmani, Antonio Matinata, and Paolo Merialdo. 2021. Knowledge graph embedding for link prediction: A comparative analysis. ACM Transactions on Knowledge Discovery from Data 15, 2 (2021), 1–49.

Digital Library

[25]

Aditya Sharma and Partha Talukdar. 2018. Towards understanding the geometry of knowledge graph embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Vol. 1 (Long Papers), 122–131.

[26]

Xiangguo Sun, Hong Cheng, Hang Dong, Bo Qiao, Si Qin, and Qingwei Lin. 2023. Counter-empirical attacking based on adversarial reinforcement learning for time-relevant scoring system. IEEE Transactions on Knowledge and Data Engineering (2023), 1–12.

Digital Library

[27]

Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan. 2023. All in one: Multi-task prompting for graph neural networks. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2120–2131.

Digital Library

[28]

Xiangguo Sun, Hong Cheng, Bo Liu, Jia Li, Hongyang Chen, Guandong Xu, and Hongzhi Yin. 2023. Self-supervised hypergraph representation learning for sociological analysis. IEEE Transactions on Knowledge and Data Engineering 35, 11 (2023), 11860–11871.

Digital Library

[29]

Xiangguo Sun, Hongzhi Yin, Bo Liu, Hongxu Chen, Jiuxin Cao, Yingxia Shao, and Nguyen Quoc Viet Hung. 2021. Heterogeneous hypergraph embedding for graph classification. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 725–733.

Digital Library

[30]

Xiangguo Sun, Hongzhi Yin, Bo Liu, Qing Meng, Jiuxin Cao, Alexander Zhou, and Hongxu Chen. 2022. Structure learning via meta-hyperedge for dynamic rumor detection. IEEE Transactions on Knowledge and Data Engineering 35, 9 (2022), 9128–9139.

Digital Library

[31]

Xiangguo Sun, Jiawen Zhang, Xixi Wu, Hong Cheng, Yun Xiong, and Jia Li. 2023. Graph prompt learning: A comprehensive survey and beyond. arXiv:2311.16534. Retrieved from https://arxiv.org/abs/2311.16534

[32]

Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. arXiv:1904.09223. Retrieved from https://arxiv.org/abs/1904.09223

[33]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv:1312.6199. Retrieved from https://arxiv.org/abs/1312.6199

[34]

Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. 2023. Graphgpt: Graph instruction tuning for large language models. arXiv:2310.13023. Retrieved from https://arxiv.org/abs/2310.13023

[35]

Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning. PMLR, 2071–2080.

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 6000–6010.

[37]

Jiapu Wang, Boyue Wang, Junbin Gao, Yongli Hu, and Baocai Yin. 2023. Multi-concept representation learning for knowledge graph completion. ACM Transactions on Knowledge Discovery from Data 17, 1 (2023), 1–19.

Digital Library

[38]

Zhaohan Xi, Tianyu Du, Changjiang Li, Ren Pang, Shouling Ji, Xiapu Luo, Xusheng Xiao, Fenglong Ma, and Ting Wang. 2023. On the security risks of knowledge graph reasoning. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security ’23), 3259–3276.

Digital Library

[39]

Bishan Yang, Scott Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations (ICLR ’15).

[40]

Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. KG-BERT: BERT for knowledge graph completion. arXiv:1909.03193. Retrieved from https://arxiv.org/abs/1909.03193

[41]

Hengtong Zhang, Tianhang Zheng, Jing Gao, Chenglin Miao, Lu Su, Yaliang Li, and Kui Ren. 2019. Data poisoning attack against knowledge graph embedding. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 4853–4859.

Digital Library

[42]

Sixiao Zhang, Hongxu Chen, Haoran Yang, Xiangguo Sun, Philip S. Yu, and Guandong Xu. 2022. Graph masked autoencoders with transformers. arXiv:2202.08391. Retrieved from https://arxiv.org/abs/2202.08391

[43]

Tianzhe Zhao, Jiaoyan Chen, Yanchi Ru, Qika Lin, Yuxia Geng, and Jun Liu. 2024. Untargeted adversarial attack on knowledge graph embeddings. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1701–1711.

Digital Library

Index Terms

Exploiting Pre-Trained Language Models for Black-Box Attack against Knowledge Graph Embeddings
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
    2. Natural language processing
2. Security and privacy
  1. Software and application security
    1. Domain-specific security and privacy architectures

Recommendations

Poisoning Attack on Federated Knowledge Graph Embedding
WWW '24: Proceedings of the ACM Web Conference 2024

Federated Knowledge Graph Embedding (FKGE) is an emerging collaborative learning technique for deriving expressive representations (i.e., embeddings) from client-maintained distributed knowledge graphs (KGs). However, poisoning attacks in FKGE, which ...
Black-box adversarial attacks on XSS attack detection model
Abstract
Cross-site scripting (XSS) has been extensively studied, although mitigating such attacks in web applications remains challenging. While there is an increasing number of XSS attack detection approaches designed based on machine learning and deep ...
Natural attack for pre-trained models of code
ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Pre-trained models of code have achieved success in many important software engineering tasks. However, these powerful models are vulnerable to adversarial attacks that slightly perturb model inputs to make a victim model produce wrong outputs. Current ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 19, Issue 1

January 2025

431 pages

EISSN:1556-472X

DOI:10.1145/3703003

Editor:
Jian Pei
Duke University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 November 2024

Online AM: 04 September 2024

Accepted: 04 August 2024

Revised: 30 May 2024

Received: 21 March 2024

Published in TKDD Volume 19, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
282
Total Downloads

Downloads (Last 12 months)282
Downloads (Last 6 weeks)59

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents