Skip to content

georgeani/AutoGraphAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoGraphAD: Unsupervised network anomaly detection using Variational Graph Autoencoders

Introduction

The code is structured in two folders, one for our approach and another for running Anomal-E using our own approach as well.

To run the code, you will need to create two separate virtual environments, as the libraries needed to run each part of the code are incompatible.

The libraries needed to run each part of the experiment are included in the requirement files and can easily be installed. For the Machine Learning libraries, please ensure you download the libraries that match your CUDA version.

The original dataset, datasets file and model weights have been uploaded in a FigShare repository. They can be accessed with this link: https://doi.org/10.6084/m9.figshare.30643508

Anomaly Score and Anomal-E estimator hyperparameter search

These are our hyperparameter search grids.

AutoGraphAD Hyperparamter search grid

Hyperparameter Hyperparameter Values
No. Layers [1, 2]
No. Hidden 32
Learning Rate 1e-3
Activation Function ReLU
Loss Function for Structural Loss Binary Cross Entropy
Loss Function for Feature Loss [Mean Square Error, Cosine Embedding Loss]
Optimiser AdamW
Weight Decay 1e-5
KL Annealing [Yes, No]
KL Annealing Epochs 10
KL Min Weight 0.0
Regulariser [Yes, No]
Multiple Draw [Yes, No]
Maximum Epochs 100
Early Stop Patience 20
Feature Importance 1.0
Structural Importance 1.0

Anomal-E downstream estimator hyperparameter search grid

Hyperparameter Hyperparameter Values
PCA no. components [0.96, 0.98, 0.99]
PCA Weighted True
PCA Whiten [True, False]
PCA Standardisation True
HBOS Bins [5, 6, 8, 10, 12, 14, 16, 18, 20]
HBOS Alpha [0.05, 0.1]
HBOS Tol [0.1, 0.5]
Contamination (Anomal-E) [0.02, 0.035, 0.05, 0.1, 0.2]
Alpha (AutoGraphAD) [0.1, 0.5, 1.0]
Beta (AutoGraphAD) [0.1, 0.5, 1.0]
Gamma (AutoGraphAD) [0.1, 0.5, 1.0]
MSE use (AutoGraphAD) [True, False]
Percentile (AutoGraphAD) [95, 97, 98, 99]

Hyperparameters chosen for AutoGraphAD and Anomal-E

Hyperparameters chosen for Anomal-E

Estimator Contamination Level Hyperparameters
PCA No Contamination Contamination: 0.02, Number of Components: 0.96, Whiten: False
PCA 3.5% Contamination Contamination: 0.05, Number of Components: 0.98, Whiten: False
PCA 5.7% Contamination Contamination: 0.1, Number of Components: 0.98, Whiten: False
CBLOF No Contamination Contamination: 0.02, Alpha: 0.9, Beta: 5, Number of Clusters: 36, Use Weights: True
CBLOF 3.5% Contamination Contamination: 0.05, Alpha: 0.9, Beta: 5, Number of Clusters: 40, Use Weights: True
CBLOF 5.7% Contamination Contamination: 0.1, Alpha: 0.9, Beta: 5, Number of Clusters: 50, Use Weights: False
HBOS No Contamination Contamination: 0.02, Alpha: 0.05, Beta: 5, Number of Bins: 14, Tol: 0.1
HBOS 3.5% Contamination Contamination: 0.05, Alpha: 0.1, Beta: 5, Number of Bins: 5, Tol: 0.1
HBOS 5.7% Contamination Contamination: 0.1, Alpha: 0.1, Beta: 5, Number of Bins: 12, Tol: 0.1

AutopGraphAD Model Hyperparameters

Model Variant Contamination Model Hyperparameters
VGAE Regulariser 0% 20% Negative Sampling, Edge Dropping, Node Masking with Annealing, 1 Layer
VGAE Regulariser 3.5% 40% Negative Sampling, Edge Dropping, Node Masking with Annealing, 2 Layers
VGAE Multiple Draw 5.7% 20% Negative Sampling, Edge Dropping, Node Masking with Annealing, 1 Layer, 10 Draws

AutopGraphAD Anomaly Score Hyperparameters

Contamination Anomaly Score Hyperparameters
0% Contamination Alpha: 0.1, Beta: 0.5, Gamma: 0.1, MSE: True, Percentile: 95
3.5% Contamination Alpha: 1.0, Beta: 0.1, Gamma: 1.0, MSE: False, Percentile: 95
5.7% Contamination Alpha: 0.5, Beta: 0.1, Gamma: 0.5, MSE: True, Percentile: 95

Using the VGAE for Anomaly Detection

Training Models

Training models is done through the file training_script.py. Inside the file, you can declare your model and configure it:

  • Configure model architecture (e.g. Embedding Size, Number of Layers etc.)
  • Optimisation hyperparameters (e.g., learning rate, weight decay)
  • KL Annealing
  • Checkpoint saving paths and the dataset to be used

The models that are defined in a dictionary inside the file. Please modify the dictionary to train and design your files.

Evaluating the Model's performance and performing Hyperparameter optimisation

Hyperparameter optimisation and model evaluation happen in the optimised_grid_search.py file.

This script contains two main methods:

  • One for hyperparameter optimisation
  • One for testing the hyperparameters.

You only need to set the path for where the results need to be saved in the case of hyperparameter optimisation. When testing the models, you would need to set the location of the best results so it can read the hyperparameters that achieved those results during optimisation.

Additionally, you can add more hyperparameter options to expand the search space through the lists at the beginning of the file.

Generating the Graph Datasets

To generate the Graph Datasets, you will need the original dataset in a .parquet format. This format was chosen due to its rapid loading and saving times. Additionally, it has built-in data compression allowing for a reduced size and easier data transfer.

The .parquet file can be found in the FigShare repository.

To generate the graph dataset, you need to follow these instructions:

  • Set a repository to save the dataset
  • Set the path of the raw file
  • Select the mode of the generator
  • Select the individual settings that can be found in the datasets.py file.

The settings that we used for our dataset generation for 0% contamination in the training dataset are the following:

  • batch_size=1
  • window_size=180
  • window_stride=180
  • train_split=70
  • test_split=10
  • remove_attacks=True
  • classification_threshold=0.0
  • node_labels=True
  • l2_norm=True

The settings that we used for our dataset generation for 3.36% contamination in the training dataset are the following:

  • batch_size=1
  • window_size=180
  • window_stride=180
  • train_split=70
  • test_split=10
  • remove_attacks=False
  • classification_threshold=0.0
  • node_labels=True
  • l2_norm=True

The settings that we used for our dataset generation for 5.76% contamination in the training dataset are the following:

  • batch_size=1
  • window_size=180
  • window_stride=180
  • train_split=70
  • test_split=10
  • remove_attacks=False
  • classification_threshold=0.0
  • node_labels=True
  • benign_downsampling=0.01
  • l2_norm=True

To add negative edges, you will need to use the method negative_sampling() provided by the dataset class. To save the processed datasets, you will need to use the method save_datasets(). Please ensure that the folder where the dataset has been created before calling the saving method.

Using the Anomal-E for Anomaly Detection

Training Models

Training models is done through the Jupyter Notebook AnomalERunning.ipynb. You can do the following actions through the notebook:

  • You can load different datasets
  • Set the model's path for saving the model's weights at the best achieved loss
  • Set the hyperparameters and set the patience for early stopping.

Evaluating the Model's performance and performing Hyperparameter optimisation

Hyperparameter optimisation and model evaluation happen in the optimised_grid_search.py file.

This script contains two main methods:

  • One for hyperparameter optimisation
  • One for testing the hyperparameters.

You only need to set the path for where the results need to be saved in the case of hyperparameter optimisation. When testing the models, you would need to set the location of the best results so it can read the hyperparameters that achieved those results during optimisation.

Additionally, you can add more hyperparameter options to expand the search space through the lists at the beginning of the file.

Generating the Graph Datasets

To generate the Graph Datasets, you will need the original dataset in a .parquet format. This format was chosen due to its rapid loading and saving times. Additionally, it has built-in data compression allowing for a reduced size and easier data transfer.

The .parquet file can be found in the FigShare repository.

To generate the graph dataset, you need to follow these instructions:

  • Set a repository to save the dataset
  • Set the path of the raw file
  • Select the mode of the generator
  • Select the individual settings that can be found in the datasets.py file.

The settings that we used for our dataset generation for 0% contamination in the training dataset are the following:

  • batch_size=1
  • window_size=180
  • window_stride=180
  • train_split=70
  • test_split=10
  • remove_attacks=True
  • classification_threshold=0.0
  • node_labels=True
  • l2_norm=True

The settings that we used for our dataset generation for 3.36% contamination in the training dataset are the following:

  • batch_size=1
  • window_size=180
  • window_stride=180
  • train_split=70
  • test_split=10
  • remove_attacks=False
  • classification_threshold=0.0
  • node_labels=True
  • l2_norm=True

The settings that we used for our dataset generation for 5.76% contamination in the training dataset are the following:

  • batch_size=1
  • window_size=180
  • window_stride=180
  • train_split=70
  • test_split=10
  • remove_attacks=False
  • classification_threshold=0.0
  • node_labels=True
  • benign_downsampling=0.01
  • l2_norm=True

To save the processed datasets, you will need to use the method save_datasets() provided in the dataset class. Please ensure that the folder where the dataset has been created before calling the saving method.

PCAP to NF

To process the PCAP files to NetFlows, it is required to install the NFStream library. You can use any of the existing Virtual Environments as it is compatible with both sets of libraries.

PCAP to CSV

To start the processing, you will need to:

  • Create a folder and load all the PCAP files there.
  • Run the script pcap_to_nf_v2.py.
  • Set up some settings regarding with what settings the PCAP files will be processed. This includes the PCAP folder, inactive timeout and max flow length.

For UNSW-NB15, the same settings as mentioned in the paper were used.

CSV labelling

Once we have created the NetFlow CSVs, the next goal is to label them. For labelling, we created a ground truth file that contains information about each anomalous flow from the original dataset. Ground truth works by creating the following tuple:

  • Source IP
  • Source Post
  • Destination IP
  • Destination Port
  • Start timestamp
  • Finish timestamp
  • Protocolo.

This tuple is then matched to the appropriate flow in the generated CSV files for labelling.

We create and use a .config file to indicate in which rows each component of the tuple is present.

For the labelling, we use the script nf_labeler_v3.py, and we add in the flags to pass the paths to:

  • The ground truth.
  • Config file.
  • CSV file of the folder to be labelled.

Joining disjoined flows

To join any possible disjoined flows, we use the script nf_joiner_v2.py. This file joins close flows and creates a single larger flow that is comprised from the smaller flows. Thus allowing for the creation of a longer, cleaner communication episode.

To use this file, we need to pass the flags:

  • Contain the path to the folder containing the CSVs.
  • The maximum Flow expiration timeout

Citation

If you find this code or research useful, please cite our paper:

AutoGraphAD: Unsupervised network anomaly detection using Variational Graph Autoencoders

Georgios Anyfantis and Pere Barlet-Ros

Read the paper on ArXiv

@misc{anyfantis2026autographadunsupervisednetworkanomaly,
      title={AutoGraphAD: Unsupervised network anomaly detection using Variational Graph Autoencoders}, 
      author={Georgios Anyfantis and Pere Barlet-Ros},
      year={2026},
      eprint={2511.17113},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2511.17113}, 
}

About

Code used for paper AutoGraphAD: Unsupervised network anomaly detection using Variational Graph Autoencoders

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors