GitHub - S-Abdelnabi/Fact-Saboteurs: Code for our USENIX Security'23 paper: Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems

Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems

Authors: Sahar Abdelnabi, Mario Fritz
To appear at USENIX Security '23
This repository contains code to reproduce the main results from the paper.

Abstract

Mis- and disinformation are a substantial global threat to our security and safety. To cope with the scale of online misinformation, researchers have been working on automating fact-checking by retrieving and verifying against relevant evidence. However, despite many advances, a comprehensive evaluation of the possible attack vectors against such systems is still lacking. Particularly, the automated fact-verification process might be vulnerable to the exact disinformation campaigns it is trying to combat. In this work, we assume an adversary that automatically tampers with the online evidence in order to disrupt the fact-checking model via camouflaging the relevant evidence or planting a misleading one. We first propose an exploratory taxonomy that spans these two targets and the different threat model dimensions. Guided by this, we design and propose several potential attack methods. We show that it is possible to subtly modify claim-salient snippets in the evidence and generate diverse and claim-aligned evidence. Thus, we highly degrade the fact-checking performance under many different permutations of the taxonomy’s dimensions. The attacks are also robust against post-hoc modifications of the claim. Our analysis further hints at potential limitations in models’ inference when faced with contradicting evidence. We emphasize that these attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios of such models, and we conclude by discussing challenges and directions for future defenses.

Content

Data

We share our version of the raw data that was used to train the KGAT and attack models.
We also share the attack sentences.

KGAT

We share code to evaluate the attacks on KGAT in addition to checkpoints. To run the attacks, you can either use our attack sentences, or compute them from scratch (see attacks).

Stance verification

We train BERT stance verification models on pairs of <claims,evidence>. This works as the attacker's verification model.
We share training code and checkpoints.

Attacks

Code to generate the attack sentences from scratch.

Citation

If you find this code helpful, please cite our paper:

@inproceedings{abdelnabi23usenix,
    title = {Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems},
    author = {Sahar Abdelnabi and Mario Fritz},
    year = {2023},
    booktitle = {USENIX Security Symposium (USENIX Security)}
}

Acknowledgement

We thank the authors of the following repositories:

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
attacks		attacks
data		data
kgat		kgat
stance_verification		stance_verification
README.md		README.md
attacks.PNG		attacks.PNG
teaser.PNG		teaser.PNG
threat_model.PNG		threat_model.PNG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems

Abstract

Content

Data

KGAT

Stance verification

Attacks

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

S-Abdelnabi/Fact-Saboteurs

Folders and files

Latest commit

History

Repository files navigation

Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems

Abstract

Content

Data

KGAT

Stance verification

Attacks

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages