[Oct. 16, 2025] We corrected previous implementation of EER calculation, and updated the code, result tables in arXiv and Hugging Face. I applogize for the mistake.
[Sep. 20, 2025] Add functions and a demo notebook for drift detection. Please check details README_drift.md.
[July 4, 2025] We added supplementary information to our training data to help guide your selection of which data to use. Please check here.
[June 27, 2025] Initial release!!!
The AntiDeepfake project provides a series of powerful foundation models post-trained for deepfake detection. The AntiDeepfake model can be used for feature extraction for deepfake detection in a zero-shot manner, or it may be further fine-tuned and optimized for a specific database or deepfake-related task.
For more technical details and analysis, please refer to our paper Post-training for Deepfake Speech Detection.
- Try it out
- Installation
- Usage demonstration
- Usage in details
- Attribution and Licenses
- Acknowledgments
Inference script is available on each model’s Hugging Face page. Simply copy some audio files and run the script to get their detection scores.
To train or run large-scale evaluations and save the score files for analysis, please follow the steps below.
This setup is recommended if you plan to run custom experiments with the code. The commands below provides the same behavior as running install.sh.
### New conda environments ###
conda create --name antideepfake python==3.9.0
conda activate antideepfake
conda install pip==24.0
### Install PyTorch ###
pip install torch==2.6.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118
### Install Fariseq ###
# pip install fairseq
# or to reproduce our venv:
git clone https://github.com/pytorch/fairseq
cd fairseq
# checkout this specific commit. Latest commit does not work
git checkout 862efab86f649c04ea31545ce28d13c59560113d
pip install --editable .
cd ../
### Install SpeechBrain ###
pip install speechbrain==1.0.2
### Install other packages ###
pip install tensorboard tensorboardX soundfile pandarallel scikit-learn numpy==1.21.2 pandas==1.4.3 scipy==1.7.2
### Note ###
# Please make sure that your AntiDeepfake directory
# does not contain copies of fairseq or speechbrain repo.Here is a demonstration of using an AntiDeepfake checkpoint for deepfake detection.
# go to an empty project folder
# create a Data folder and download a toy dataset
mkdir Data
cd Data
wget -O toy_example.tar.gz https://zenodo.org/records/7497769/files/project-04-toy_example.tar.gz
tar -xzvf toy_example.tar.gz
cd -
# git clone the code
git clone https://github.com/nii-yamagishilab/AntiDeepfake.git
# install dependency
bash AntiDeepfake/install.sh
# download an AntiDeepfake checkpoint
cd AntiDeepfake
mkdir downloads
wget -O downloads/mms_300m.ckpt https://zenodo.org/records/15580543/files/mms_300m.ckpt
# use venv created by install.sh
conda activate antideepfake
# scoring
python main.py inference hparams/mms_300m.yaml --base_path $PWD/.. --exp_name eval_antideepfake_mms_300m_toy_example --test_csv protocols/toy_example_test.csv --ckpt_path downloads/mms_300m.ckpt
# ...
# INFO | __main__ | Loading pre-trained weights detector from downloads/mms_300m.ckpt
# 100%|███████████████████████████████████████████████████████████████████| 150/150 [00:03<00:00, 38.71it/s]
# Scores saved to ...
# computing metrics
python evaluation.py ../Log/exps/exp_mms_300m_eval_antideepfake_mms_300m_toy_example/evaluation_score.csv
# ...
# ===== METRICS SUMMARY =====
# For accuracy, precision, recall, f1, fpr and fnr, threshold of real class probablity is 0.5
#
# roc_auc accuracy precision recall f1 fpr fnr eer eer_threshold
#subset
#Pooled 0.9935 0.9467 0.6818 0.9375 0.7895 0.0522 0.0625 0.0611 0.2111For details on inference, post-training, and fine-tuning, please check the following section.
We assume the following project structure. This is provided as a reference to help understand how the code and data are organized:
/base_path/ # The root of the project that contains data,
│ # code, and experiment
│
├── Data/ # Contains multiple databases
│ ├── ASVspoof2019-LA/ # Example database
│ ├── ASVspoof2021-DF/ # Example database
│ ├── ... # Other databases
│
├── fairseq/ # Directory for fairseq installation
├── speechbrain/ # Directory for speechbrain installation
│
├── Log/ # Stores log files and model checkpoints
│ ├── exps/ # Directory for experiment outputs
│ ├── ssl-weights/ # Contains downloaded checkpoint files
│
│
├── AntiDeepfake/ # This AntiDeepfake repository
│ ├── hparams # Configuration files
│ │ ├── xx.yaml #
│ ├── ... # Code files
│ ├── protocols/
│ │ ├── xx.py # python script for generating protocols
│ │ ├── train.csv # The generated training set protocol
│ │ ├── valid.csv # The generated validation set protocol
│ │ ├── test.csv # The generated test set protocol
The folder structure can be altered. By doing so, please remember to change the path variables in hparams/*.yaml.
Training and inference scripts provided in this repository are designed to load audio files listed in train/valid/test CSV files in protocols.
In the demonstration above, we used protocols/toy_example_test.csv to do inference. You can check the content of this CSV file.
Python scripts for generating database protocols are provided in protocols. Each script is named after the database it processes.
All protocols are designed to follow the same format so we can easily shuffle, split, or merge them. To merge multiple CSV protocols as in our experiment, you can refer to generate_protocol_by_proportion.py.
To generate protocols for your own data:
- refer to
toy_example.pyand the downloaded toy_example.tar.gz for example. - refer to
ASVspoof2019-LA.pyif you have a protocol file with ground truth labels for each audio file. - refer to
CVoiceFake.pyif your real and fake audios are stored separately. - refer to
WildSVDD.pyif your audio filenames indicate whether they are real or fake.
Our code creates the models using configuration files. Their front ends are initialized with random weights.
-
If you do inference without post-training or fine-tuning, please download an AntiDeepfake checkpoint (see example in Usage Demonstration above). The weights of the front end and the rest of the model will be overwritten using the AntiDeepfake checkpoint.
-
If you do post-training or fine-tuning upon an AntiDeepfake checkpoint, please download the AntiDeepfake checkpoint. The weights of the front end and the rest of the model will be overwritten using the AntiDeepfake checkpoint.
-
If you want to do your own post-training using Fairseq pre-trained SSL front ends, please download Fairseq checkpoints. The weights of the front end will be re-initialized using the Fairseq checkpoint.
| Model | Download Link |
|---|---|
| AntiDeepfake | Zenodo and Hugging Face |
| Fairseq MMS | Pretrained models from here |
| Fairseq XLS-R | Model link from here |
| Fairseq Wav2Vec 2.0 | Base (no finetuning) and Large (LV-60 + CV + SWBD + FSH), no finetuning, from here |
| Fairseq HuBERT | Extra Large (~1B params), trained on Libri-Light 60k hrs, no finetuning, from here |
To post-train or fine-tune upon an AntiDeepfake checkpoint (e.g., MMS-300M):
python main.py hparams/mms_300m.yaml \
--base_path /your/base_path \
--exp_name fine_tuning \
--lr 1e-6 \
# Perform full validation every 2025 mini-batches
--valid_step 2025 \
# Enable RawBoost data augmentation
--use_da True \
# Initialize model weights with AntiDeepfake checkpoint
--ckpt_path /path/to/your/downloaded/antideepfake/mms_300m.ckpt
To start post-train with a Fairseq checkpoint (e.g., MMS-300M):
python main.py hparams/mms_300m.yaml \
--base_path /your/base_path \
--exp_name post_training \
--lr 1e-7 \
--valid_step 100000 \
# Disable RawBoost data augmentation
--use_da False \
# Initialize model weights with Fairseq checkpoint (default setting)
--ckpt_path /base_path/Log/ssl-weights/base_300m.pt
Notes:
- Configuration YAML files are named after the model they correspond to. Please use the corresponding configuration file in
hparams. - Training logs and checkpoints will be saved under
/base_path/Log/exps/exp_mms_300m_<exp_name>. - If the above
expfolder already exists, the script will try to resume training from the last saved checkpoint in the folder. - For multi-GPU training, please use:
torchrun --nnodes=1 --nproc-per-node=<NUM_GPU> main.py hparams/<MODEL>.yamlFor using the best validation checkpoint from your own experiment:
python main.py inference hparams/mms_300m.yaml \
--base_path /your/base_path \
# Exp folder name must match the name used during training
--exp_name fine_training \
--test_csv /path/to/your/test.csv
The script will automatically search for the best validation checkpoint in the specified experiment folder /base_path/Log/exps/exp_mms_300m_<exp_name>. It will generate an evaluation_score.csv file in the same folder.
For using AntiDeepfake checkpoints without training:
python main.py inference hparams/mms_300m.yaml \
--base_path /your/base_path \
# Use a new exp folder name to avoid conflicts
--exp_name eval_antideepfake_mms_300m \
--test_csv /path/to/your/test.csv
# Initialize model weights with AntiDeepfake checkpoint
--ckpt_path /path/to/your/downloaded/antideepfake/mms_300m.ckpt
Run:
python evaluation.py /path/to/your/evaluation_score.csv
You will get results similar to this:
===== METRICS SUMMARY =====
For accuracy, precision, recall, f1, fpr and fnr, threshold of real class probablity is 50.00%
roc_auc accuracy precision recall f1 fpr fnr eer eer_threshold
subset
Pooled 0.9935 0.9467 0.6818 0.9375 0.7895 0.0522 0.0625 0.0611 0.2111
The row Pooled show results computed over all scores listed in the score file. By default, we use 0.5 as the probability threshold for evaluation. You can change this threshold by:
python evaluation.py /path/to/your/evaluation_score.csv 0.7
If your test set contains files from multiple dataset and you want to check results of a specific subset (i.e., file IDs start with a common prefix string), run:
python evaluation.py /path/to/your/evaluation_score.csv ASV19LAdemo ...
You will get results like this:
===== METRICS SUMMARY =====
For accuracy, precision, recall, f1, fpr and fnr, threshold of real class probablity is 50.00%
roc_auc accuracy precision recall f1 fpr fnr eer eer_threshold
subset
Pooled 0.9935 0.9467 0.6818 0.9375 0.7895 0.0522 0.0625 0.0611 0.2111
ASV19LAdemo 0.9935 0.9467 0.6818 0.9375 0.7895 0.0522 0.0625 0.0611 0.2111
...
The released AntiDeepfake checkpoints are post-trained checkpoints. They can be fine-tuned to a specific task in a specific domain.
Described below is the performance of fine-tuning AntiDeepfake models on Deepfake-Eval-2024 train set (PT = Pre-training, PST = Post-training, FT = Fine-tuning, 4s = Input Duration is 4 seconds).
Please note that we do not provide these fine-tuned checkpoints.
| Model ID | PT+PST+FT 4s | PT+PST+FT 10s | PT+PST+FT 13s | PT+PST+FT 30s | PT+PST+FT 50s | PT+FT 4s | PT+FT 10s | PT+FT 13s | PT+FT 30s | PT+FT 50s |
|---|---|---|---|---|---|---|---|---|---|---|
| W2V-Large | 19.56 | 12.10 | 10.94 | 10.52 | 11.37 | 24.42 | 22.46 | 22.14 | 21.15 | 21.51 |
| MMS-300M | 17.15 | 13.37 | 12.31 | 11.05 | 10.75 | 19.77 | 13.29 | 12.77 | 12.01 | 12.29 |
| MMS-1B | 12.11 | 10.36 | 10.03 | 8.61 | 9.37 | 19.86 | 10.32 | 11.55 | 11.05 | 11.52 |
| XLS-R-1B | 11.85 | 10.00 | 9.27 | 8.50 | 8.29 | 19.95 | 17.18 | 16.31 | 10.63 | 11.21 |
| XLS-R-2B | 12.14 | 9.80 | 9.98 | 9.46 | 9.68 | 12.88 | 10.75 | 10.39 | 9.67 | 9.98 |
All AntiDeepfake models were developed by Yamagishi Lab at the National Institute of Informatics (NII), Japan. All model weights and code scripts are intellectual property of NII and are made available for research and educational purposes under the licenses
- Code – BSD-3-Clause, please check
LICENSE-CODE. - Model checkpoints – CC BY-NC-SA 4.0, please check
LICENSE-CHECKPOINT.
This project is based on results obtained from project JPNP22007, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).
It is also partially supported by the following grants from the Japan Science and Technology Agency (JST):
- AIP Acceleration Research (Grant No. JPMJCR24U3)
- PRESTO (Grant No. JPMJPR23P9)
This study was carried out using the TSUBAME4.0 supercomputer at the Institute of Science Tokyo.
Codes are based on the implementations of wav2vec 2.0 pretraining with SpeechBrain and project-NN-Pytorch-scripts.
If you find this repository useful, please consider citing:
@inproceedings{antideepfake_2025,
title={Post-training for Deepfake Speech Detection},
author={Ge, Wanying and Wang, Xin and Liu, Xuechen and Yamagishi, Junichi},
booktitle={2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
year={2025},
volume={},
number={},
pages={},
}
