AutoGraphAD: Unsupervised network anomaly detection using Variational Graph Autoencoders

Introduction

The code is structured in two folders, one for our approach and another for running Anomal-E using our own approach as well.

To run the code, you will need to create two separate virtual environments, as the libraries needed to run each part of the code are incompatible.

The libraries needed to run each part of the experiment are included in the requirement files and can easily be installed. For the Machine Learning libraries, please ensure you download the libraries that match your CUDA version.

The original dataset, datasets file and model weights have been uploaded in a FigShare repository. They can be accessed with this link: https://doi.org/10.6084/m9.figshare.30643508

Anomaly Score and Anomal-E estimator hyperparameter search

These are our hyperparameter search grids.

AutoGraphAD Hyperparamter search grid

Hyperparameter	Hyperparameter Values
No. Layers	[1, 2]
No. Hidden	32
Learning Rate	1e-3
Activation Function	ReLU
Loss Function for Structural Loss	Binary Cross Entropy
Loss Function for Feature Loss	[Mean Square Error, Cosine Embedding Loss]
Optimiser	AdamW
Weight Decay	1e-5
KL Annealing	[Yes, No]
KL Annealing Epochs	10
KL Min Weight	0.0
Regulariser	[Yes, No]
Multiple Draw	[Yes, No]
Maximum Epochs	100
Early Stop Patience	20
Feature Importance	1.0
Structural Importance	1.0

Anomal-E downstream estimator hyperparameter search grid

Hyperparameter	Hyperparameter Values
PCA no. components	[0.96, 0.98, 0.99]
PCA Weighted	True
PCA Whiten	[True, False]
PCA Standardisation	True
HBOS Bins	[5, 6, 8, 10, 12, 14, 16, 18, 20]
HBOS Alpha	[0.05, 0.1]
HBOS Tol	[0.1, 0.5]
Contamination (Anomal-E)	[0.02, 0.035, 0.05, 0.1, 0.2]
Alpha (AutoGraphAD)	[0.1, 0.5, 1.0]
Beta (AutoGraphAD)	[0.1, 0.5, 1.0]
Gamma (AutoGraphAD)	[0.1, 0.5, 1.0]
MSE use (AutoGraphAD)	[True, False]
Percentile (AutoGraphAD)	[95, 97, 98, 99]

Hyperparameters chosen for AutoGraphAD and Anomal-E

Hyperparameters chosen for Anomal-E

Estimator	Contamination Level	Hyperparameters
PCA	No Contamination	Contamination: 0.02, Number of Components: 0.96, Whiten: False
PCA	3.5% Contamination	Contamination: 0.05, Number of Components: 0.98, Whiten: False
PCA	5.7% Contamination	Contamination: 0.1, Number of Components: 0.98, Whiten: False
CBLOF	No Contamination	Contamination: 0.02, Alpha: 0.9, Beta: 5, Number of Clusters: 36, Use Weights: True
CBLOF	3.5% Contamination	Contamination: 0.05, Alpha: 0.9, Beta: 5, Number of Clusters: 40, Use Weights: True
CBLOF	5.7% Contamination	Contamination: 0.1, Alpha: 0.9, Beta: 5, Number of Clusters: 50, Use Weights: False
HBOS	No Contamination	Contamination: 0.02, Alpha: 0.05, Beta: 5, Number of Bins: 14, Tol: 0.1
HBOS	3.5% Contamination	Contamination: 0.05, Alpha: 0.1, Beta: 5, Number of Bins: 5, Tol: 0.1
HBOS	5.7% Contamination	Contamination: 0.1, Alpha: 0.1, Beta: 5, Number of Bins: 12, Tol: 0.1

AutopGraphAD Model Hyperparameters

Model Variant	Contamination	Model Hyperparameters
VGAE Regulariser	0%	20% Negative Sampling, Edge Dropping, Node Masking with Annealing, 1 Layer
VGAE Regulariser	3.5%	40% Negative Sampling, Edge Dropping, Node Masking with Annealing, 2 Layers
VGAE Multiple Draw	5.7%	20% Negative Sampling, Edge Dropping, Node Masking with Annealing, 1 Layer, 10 Draws

AutopGraphAD Anomaly Score Hyperparameters

Contamination	Anomaly Score Hyperparameters
0% Contamination	Alpha: 0.1, Beta: 0.5, Gamma: 0.1, MSE: True, Percentile: 95
3.5% Contamination	Alpha: 1.0, Beta: 0.1, Gamma: 1.0, MSE: False, Percentile: 95
5.7% Contamination	Alpha: 0.5, Beta: 0.1, Gamma: 0.5, MSE: True, Percentile: 95

Using the VGAE for Anomaly Detection

Training Models

Training models is done through the file training_script.py. Inside the file, you can declare your model and configure it:

Configure model architecture (e.g. Embedding Size, Number of Layers etc.)
Optimisation hyperparameters (e.g., learning rate, weight decay)
KL Annealing
Checkpoint saving paths and the dataset to be used

The models that are defined in a dictionary inside the file. Please modify the dictionary to train and design your files.

Evaluating the Model's performance and performing Hyperparameter optimisation

Hyperparameter optimisation and model evaluation happen in the optimised_grid_search.py file.

This script contains two main methods:

One for hyperparameter optimisation
One for testing the hyperparameters.

You only need to set the path for where the results need to be saved in the case of hyperparameter optimisation. When testing the models, you would need to set the location of the best results so it can read the hyperparameters that achieved those results during optimisation.

Additionally, you can add more hyperparameter options to expand the search space through the lists at the beginning of the file.

Generating the Graph Datasets

To generate the Graph Datasets, you will need the original dataset in a .parquet format. This format was chosen due to its rapid loading and saving times. Additionally, it has built-in data compression allowing for a reduced size and easier data transfer.

The .parquet file can be found in the FigShare repository.

To generate the graph dataset, you need to follow these instructions:

Set a repository to save the dataset
Set the path of the raw file
Select the mode of the generator
Select the individual settings that can be found in the datasets.py file.

The settings that we used for our dataset generation for 0% contamination in the training dataset are the following:

batch_size=1
window_size=180
window_stride=180
train_split=70
test_split=10
remove_attacks=True
classification_threshold=0.0
node_labels=True
l2_norm=True

The settings that we used for our dataset generation for 3.36% contamination in the training dataset are the following:

batch_size=1
window_size=180
window_stride=180
train_split=70
test_split=10
remove_attacks=False
classification_threshold=0.0
node_labels=True
l2_norm=True

The settings that we used for our dataset generation for 5.76% contamination in the training dataset are the following:

batch_size=1
window_size=180
window_stride=180
train_split=70
test_split=10
remove_attacks=False
classification_threshold=0.0
node_labels=True
benign_downsampling=0.01
l2_norm=True

To add negative edges, you will need to use the method negative_sampling() provided by the dataset class. To save the processed datasets, you will need to use the method save_datasets(). Please ensure that the folder where the dataset has been created before calling the saving method.

Using the Anomal-E for Anomaly Detection

Training Models

Training models is done through the Jupyter Notebook AnomalERunning.ipynb. You can do the following actions through the notebook:

You can load different datasets
Set the model's path for saving the model's weights at the best achieved loss
Set the hyperparameters and set the patience for early stopping.

Evaluating the Model's performance and performing Hyperparameter optimisation

Hyperparameter optimisation and model evaluation happen in the optimised_grid_search.py file.

This script contains two main methods:

One for hyperparameter optimisation
One for testing the hyperparameters.

You only need to set the path for where the results need to be saved in the case of hyperparameter optimisation. When testing the models, you would need to set the location of the best results so it can read the hyperparameters that achieved those results during optimisation.

Additionally, you can add more hyperparameter options to expand the search space through the lists at the beginning of the file.

Generating the Graph Datasets

To generate the Graph Datasets, you will need the original dataset in a .parquet format. This format was chosen due to its rapid loading and saving times. Additionally, it has built-in data compression allowing for a reduced size and easier data transfer.

The .parquet file can be found in the FigShare repository.

To generate the graph dataset, you need to follow these instructions:

Set a repository to save the dataset
Set the path of the raw file
Select the mode of the generator
Select the individual settings that can be found in the datasets.py file.

The settings that we used for our dataset generation for 0% contamination in the training dataset are the following:

batch_size=1
window_size=180
window_stride=180
train_split=70
test_split=10
remove_attacks=True
classification_threshold=0.0
node_labels=True
l2_norm=True

The settings that we used for our dataset generation for 3.36% contamination in the training dataset are the following:

batch_size=1
window_size=180
window_stride=180
train_split=70
test_split=10
remove_attacks=False
classification_threshold=0.0
node_labels=True
l2_norm=True

The settings that we used for our dataset generation for 5.76% contamination in the training dataset are the following:

batch_size=1
window_size=180
window_stride=180
train_split=70
test_split=10
remove_attacks=False
classification_threshold=0.0
node_labels=True
benign_downsampling=0.01
l2_norm=True

To save the processed datasets, you will need to use the method save_datasets() provided in the dataset class. Please ensure that the folder where the dataset has been created before calling the saving method.

PCAP to NF

To process the PCAP files to NetFlows, it is required to install the NFStream library. You can use any of the existing Virtual Environments as it is compatible with both sets of libraries.

PCAP to CSV

To start the processing, you will need to:

Create a folder and load all the PCAP files there.
Run the script pcap_to_nf_v2.py.
Set up some settings regarding with what settings the PCAP files will be processed. This includes the PCAP folder, inactive timeout and max flow length.

For UNSW-NB15, the same settings as mentioned in the paper were used.

CSV labelling

Once we have created the NetFlow CSVs, the next goal is to label them. For labelling, we created a ground truth file that contains information about each anomalous flow from the original dataset. Ground truth works by creating the following tuple:

Source IP
Source Post
Destination IP
Destination Port
Start timestamp
Finish timestamp
Protocolo.

This tuple is then matched to the appropriate flow in the generated CSV files for labelling.

We create and use a .config file to indicate in which rows each component of the tuple is present.

For the labelling, we use the script nf_labeler_v3.py, and we add in the flags to pass the paths to:

The ground truth.
Config file.
CSV file of the folder to be labelled.

Joining disjoined flows

To join any possible disjoined flows, we use the script nf_joiner_v2.py. This file joins close flows and creates a single larger flow that is comprised from the smaller flows. Thus allowing for the creation of a longer, cleaner communication episode.

To use this file, we need to pass the flags:

Contain the path to the folder containing the CSVs.
The maximum Flow expiration timeout

Citation

If you find this code or research useful, please cite our paper:

AutoGraphAD: Unsupervised network anomaly detection using Variational Graph Autoencoders

Georgios Anyfantis and Pere Barlet-Ros

Read the paper on ArXiv

@misc{anyfantis2026autographadunsupervisednetworkanomaly,
      title={AutoGraphAD: Unsupervised network anomaly detection using Variational Graph Autoencoders}, 
      author={Georgios Anyfantis and Pere Barlet-Ros},
      year={2026},
      eprint={2511.17113},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2511.17113}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Anomal-E		Anomal-E
PCAP_to_NF		PCAP_to_NF
VGAE_AnomalyScore		VGAE_AnomalyScore
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoGraphAD: Unsupervised network anomaly detection using Variational Graph Autoencoders

Introduction

Anomaly Score and Anomal-E estimator hyperparameter search

AutoGraphAD Hyperparamter search grid

Anomal-E downstream estimator hyperparameter search grid

Hyperparameters chosen for AutoGraphAD and Anomal-E

Hyperparameters chosen for Anomal-E

AutopGraphAD Model Hyperparameters

AutopGraphAD Anomaly Score Hyperparameters

Using the VGAE for Anomaly Detection

Training Models

Evaluating the Model's performance and performing Hyperparameter optimisation

Generating the Graph Datasets

Using the Anomal-E for Anomaly Detection

Training Models

Evaluating the Model's performance and performing Hyperparameter optimisation

Generating the Graph Datasets

PCAP to NF

PCAP to CSV

CSV labelling

Joining disjoined flows

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoGraphAD: Unsupervised network anomaly detection using Variational Graph Autoencoders

Introduction

Anomaly Score and Anomal-E estimator hyperparameter search

AutoGraphAD Hyperparamter search grid

Anomal-E downstream estimator hyperparameter search grid

Hyperparameters chosen for AutoGraphAD and Anomal-E

Hyperparameters chosen for Anomal-E

AutopGraphAD Model Hyperparameters

AutopGraphAD Anomaly Score Hyperparameters

Using the VGAE for Anomaly Detection

Training Models

Evaluating the Model's performance and performing Hyperparameter optimisation

Generating the Graph Datasets

Using the Anomal-E for Anomaly Detection

Training Models

Evaluating the Model's performance and performing Hyperparameter optimisation

Generating the Graph Datasets

PCAP to NF

PCAP to CSV

CSV labelling

Joining disjoined flows

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages