ReMIA: a Powerful and Efficient Alternative to Membership Inference Attacks against Synthetic Data Generators

This is the code repository for the article ReMIA: a Powerful and Efficient Alternative to Membership Inference Attacks against Synthetic Data Generators by Davide Scassola, Andrea Coser, and Sebastiano Saccani.

Installation

This project requires multiple environments to run. We recommend using Conda to manage them.

Installing the main environment

make .venv
source .venv/bin/activate
make install

To install the additional environments, we recommend using conda. If you don't have Conda, you can install micromamba with:

"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
alias conda=micromamba

Installing Achille's Heels environment

The environment is needed to run a modified version of the original code from the article Achilles' Heels: Vulnerable Record Identification in Synthetic Data Publishing.

cd submodules/achilles_heels
conda create --name achilles_heels python=3.10
conda activate achilles_heels
git clone https://github.com/alan-turing-institute/reprosyn
cd reprosyn
curl -sSL https://install.python-poetry.org | python3 -
~/.local/bin/poetry install -E ektelo
cd ..
git clone git@github.com:imperial-aisp/querysnout.git
cd querysnout/src/optimized_qbs
python setup.py install
cd ../../..
pip install torch==2.0.0

Installing DOMIAS environment

conda create --name domias python=3.10
conda activate domias
pip install domias==0.0.5
pip install torch==2.2.2
pip install tqdm
pip install pykeops==2.3
pip install numpy==1.26.4
pip install pandas==2.3.3

Installing Synthcity environment

conda create --name synthcity python=3.12
conda activate synthcity
pip install synthcity==0.2.12
pip install torch==2.2.2
pip install numpy==1.26.4
pip install pandas==2.3.3
pip install transformers==4.44.2
pip install opacus==1.4.0

We also provide the full list of dependencies in the envs_pip_list folder in order to improve reproducibility.

Getting the Data

Metadata and download scripts are already provided in the data folder. In order to download a dataset, activate the main environment and run

python data/<name-of-the-dataset>/download.py

the data will be stored as data/<name-of-the-dataset>/data.csv. Sometimes the download script will only print instructions to download the data manually; in that case, you would have to store the data as data/<name-of-the-dataset>/data.csv yourself.

Running experiments

In order to run experiments, you have to first activate the main environment:

source .venv/bin/activate

You can run a single privacy evaluation experiment in the following way:

python scripts/evaluate_privacy.py --metric <metric> --dataset <dataset> --generator <generator> --seed <seed> --training_size <size>

The arguments are:

--metric / -M (string): Privacy metric to evaluate.
--dataset / -D (string): Dataset to use for evaluation.
--generator / -G (string): Data generator/synthesis method.
--seed / -S (integer): Random seed for reproducibility. Default: 0
--training_size / -T (integer): Size of the training dataset. Default: 1000

The available options are:

Metric: remia, domias, shadow_modeling_achilles_heels
Dataset: adult, california, uk_census
Generator: synthpop, ctgan, tvae, baynet, arf, ddpm, adsgan, pategan, privbayes_<epsilon>, leak_<fraction>, perturbation_<alpha>

When an experiment is completed, the result will be printed and stored in the experiments/output folder.

Example:

python scripts/evaluate_privacy.py --metric remia --dataset adult --generator synthpop

Reproducing paper experiments

In order to run all the experiments, run

python scripts/reproduce_experiments.py

Results will be stored in the folder experiments/privacy_evaluation and experiments/quality_evaluation

You can then obtain the relative plots by running

python scripts/article_tables_and_plots.py

these will be stored in the article/figures folder

Getting stored article experiments

We provide the results of the experiments that we included in the article in the experiments.tar.xz file. You can extract it with the following command:

tar -xJf experiments.tar.xz

or simply run

make experiments

when the experiments folder is not present in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReMIA: a Powerful and Efficient Alternative to Membership Inference Attacks against Synthetic Data Generators

Installation

Installing the main environment

Installing Achille's Heels environment

Installing DOMIAS environment

Installing Synthcity environment

Getting the Data

Running experiments

Example:

Reproducing paper experiments

Getting stored article experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
article		article
data		data
envs_pip_list		envs_pip_list
scripts		scripts
src		src
submodules/achilles_heels		submodules/achilles_heels
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
experiments.tar.xz		experiments.tar.xz
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ReMIA: a Powerful and Efficient Alternative to Membership Inference Attacks against Synthetic Data Generators

Installation

Installing the main environment

Installing Achille's Heels environment

Installing DOMIAS environment

Installing Synthcity environment

Getting the Data

Running experiments

Example:

Reproducing paper experiments

Getting stored article experiments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages