Skip to content

bluedream02/FraudShield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[WWW 2026] FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks

WWW 2026 arXiv License

This is the official implementation of "FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks" (WWW 2026).

📑 Table of Contents

📌 Overview

The paper introduces FraudShield, a tactic-aware defense pipeline that extracts suspicious fraud signals, aligns them with fraud tactics, and augments model inputs with structured evidence to improve safe refusal behavior while preserving utility on benign tasks.

  • Defense: ours (keyword/tactic extraction + XML-based augmentation)
  • Baselines: vanilla, safetyprompt, selfreminder, goal
  • Effectiveness: one-round / multi-round DSR
  • Utility: MMLU ACC
  • Judge flow: one-round manual judge (--mode judge), multi-round auto judge

📦 Installation

Create environment

conda create -n fraudshield python=3.10 -y
conda activate fraudshield
pip install -r requirements.txt

Configure API keys

Edit:

config/keys.json

Prepare datasets

Make sure the following directories exist:

  • ./data/Fraud-R1-main/dataset/FP-base-full
  • ./data/Fraud-R1-main/dataset/FP-levelup-full
  • ./data/MMLU/dev
  • ./data/MMLU/test

🚀 Quick Start

1) Run defense/baseline generation (single-round)

python main.py \
  --mode attack \
  --attack_type LevelAttack \
  --sub_task one-round \
  --scenario assistant \
  --model gpt-4o-mini \
  --baseline ours \
  --question_input_path ./data/Fraud-R1-main/dataset/FP-base-full/FP-base-English.json \
  --answer_save_path ./results/one-round/FP-base-English_ours.json

2) Run DSR evaluation

For one-round, run manual judge first:

python main.py \
  --mode judge \
  --question_input_path ./results/one-round/FP-base-English_ours.json \
  --answer_save_path ./results/one-round/FP-base-English_ours_eval.json
python main.py \
  --mode eval \
  --eval_type one-round \
  --eval_input_folder ./results/one-round-LevelAttack \
  --eval_output_file ./results/metrics/one-round
python main.py \
  --mode eval \
  --eval_type multi-round \
  --eval_input_folder ./results/multi-round-LevelAttack \
  --eval_output_file ./results/metrics/multi-round

3) Run utility evaluation (MMLU ACC)

python main.py \
  --mode utility \
  --model gpt-4o-mini \
  --baseline ours \
  --mmlu_data_dir ./data/MMLU \
  --mmlu_ntrain 5 \
  --mmlu_n_samples 2000 \
  --mmlu_seed 42 \
  --mmlu_save_path ./results/utility/mmlu_acc.json

🎁 Acknowledgement

This work builds upon several excellent open-source projects and related works:

  • Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements (ACL 2025 Findings) - Paper | GitHub
  • HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs - Paper | GitHub
  • Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization (ACL 2024)- Paper | GitHub

We thank the authors for their valuable contributions to the community.

📖 Citation

If you find this repository useful, please cite the paper:

@inproceedings{xu2026fraudshield,
  title={FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks},
  author={Xu, Naen and Zhang, Jinghuai and He, Ping and Zhou, Chunyi and Wang, Jun and Fu, Zhihui and Du, Tianyu and Wang, Zhaoxiang and Ji, Shouling},
  booktitle={Proceedings of the ACM Web Conference 2026},
  pages={2649--2660},
  year={2026}
}

About

[WWW 2026] FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages