[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection

Anudeex Shetty, Yue Teng, Ke He, Qiongkai Xu


Abstract
Embedding as a Service (EaaS) has become a widely adopted solution, which offers feature extraction capabilities for addressing various downstream tasks in Natural Language Processing (NLP). Prior studies have shown that EaaS can be prone to model extraction attacks; nevertheless, this concern could be mitigated by adding backdoor watermarks to the text embeddings and subsequently verifying the attack models post-publication. Through the analysis of the recent watermarking strategy for EaaS, EmbMarker, we design a novel CSE (Clustering, Selection, Elimination) attack that removes the backdoor watermark while maintaining the high utility of embeddings, indicating that the previous watermarking approach can be breached. In response to this new threat, we propose a new protocol to make the removal of watermarks more challenging by incorporating multiple possible watermark directions. Our defense approach, WARDEN, notably increases the stealthiness of watermarks and has been empirically shown to be effective against CSE attack.
Anthology ID:
2024.acl-long.725
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13430–13444
Language:
URL:
https://aclanthology.org/2024.acl-long.725
DOI:
10.18653/v1/2024.acl-long.725
Bibkey:
Cite (ACL):
Anudeex Shetty, Yue Teng, Ke He, and Qiongkai Xu. 2024. WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13430–13444, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection (Shetty et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.725.pdf