FEAT Add VLGuard multimodal safety dataset loader by romanlutz · Pull Request #1447 · microsoft/PyRIT

romanlutz · 2026-03-10T12:54:33Z

Summary

Adds support for the VLGuard dataset (ICML 2024), a vision-language safety benchmark that evaluates whether multimodal models refuse unsafe content while remaining helpful on safe content.

What is VLGuard?

VLGuard contains ~2,000 image-instruction pairs across 4 categories (Privacy, Risky Behavior, Deception, Hateful Speech) and 8 subcategories (Personal Data, Professional Advice, Political, Sexually
Explicit, Violence, Disinformation, Discrimination by Sex, Discrimination by Race).

It supports three evaluation subsets:

unsafes — unsafe images with instructions (tests whether the model refuses to describe unsafe visual content)
safe_unsafes — safe images with unsafe instructions (tests whether the model refuses unsafe text prompts)
safe_safes — safe images with safe instructions (tests whether the model remains helpful)

Usage

from pyrit.datasets.seed_datasets.remote import _VLGuardDataset, VLGuardCategory, VLGuardSubset

Load unsafe image examples (default)

loader = VLGuardDataset(token="hf...")
dataset = await loader.fetch_dataset()

Load safe images with unsafe instructions, filtered to Privacy category

loader = VLGuardDataset(
subset=VLGuardSubset.SAFE_UNSAFES,
categories=[VLGuardCategory.PRIVACY],
token="hf...",
)
dataset = await loader.fetch_dataset()

Note: This is a gated dataset on HuggingFace. Users must accept the terms at https://huggingface.co/datasets/ys-zong/VLGuard and provide a HuggingFace token.

Changes

pyrit/datasets/seed_datasets/remote/vlguard_dataset.py — new dataset loader
pyrit/datasets/seed_datasets/remote/init.py — register exports
tests/unit/datasets/test_vlguard_dataset.py — 14 unit tests
doc/code/datasets/1_loading_datasets.ipynb — regenerated to show VLGuard in dataset list

Add support for the VLGuard dataset (ICML 2024) which contains image-instruction pairs for evaluating vision-language model safety across 4 categories (Privacy, Risky Behavior, Deception, Hateful Speech) with 8 subcategories. Supports three evaluation subsets: - unsafes: unsafe images with instructions (tests refusal) - safe_unsafes: safe images with unsafe instructions (tests refusal) - safe_safes: safe images with safe instructions (tests helpfulness) Downloads from HuggingFace (gated dataset, requires token and terms acceptance). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add brief explainer for each VLGuardCategory enum member - Add VLGuardSubcategory enum for the 8 subcategories - Add clarifying comment on max_examples * 2 check - Fix Optional -> | None per style guide Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Cover edge cases (invalid instr-resp, missing image field, no extractable instruction) and both cache/download paths in _download_dataset_files_async. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The actual dataset uses 'harmful_category' and 'harmful_subcategory' (not 'category'/'subcategory'), with lowercase values. Also the fourth category is 'discrimination' not 'Hateful Speech', and subcategories include 'sex', 'race', and 'other'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ValbuenaVC

Looks good! Two minor comments that aren't blocking

ValbuenaVC · 2026-04-22T23:43:53Z

+    DISCRIMINATION = "discrimination"
+
+
+class VLGuardSubcategory(Enum):


Nit: do all subcategories apply to all categories? If not, we should document this

ValbuenaVC · 2026-04-22T23:44:11Z

+        Returns:
+            tuple[list[dict], Path]: Tuple of (metadata list, image directory path).
+        """
+        from huggingface_hub import hf_hub_download


Nit: why is this import down here?

romanlutz and others added 2 commits March 10, 2026 05:33

Update 1_loading_datasets notebook to include VLGuard

255dd50

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz force-pushed the romanlutz/vlguard-dataset branch from cac3cad to 255dd50 Compare March 10, 2026 13:24

ValbuenaVC reviewed Mar 10, 2026

View reviewed changes

Comment thread pyrit/datasets/seed_datasets/remote/vlguard_dataset.py Outdated

Comment thread pyrit/datasets/seed_datasets/remote/vlguard_dataset.py Outdated

Comment thread pyrit/datasets/seed_datasets/remote/vlguard_dataset.py

romanlutz and others added 7 commits April 22, 2026 07:22

Merge main and resolve conflicts

38b2aa1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add VLGuard paper to bibliography

8869d77

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add tests to improve diff coverage for VLGuard dataset

c3b8382

Cover edge cases (invalid instr-resp, missing image field, no extractable instruction) and both cache/download paths in _download_dataset_files_async. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into pr-1447

b9f845e

Merge branch 'main' into romanlutz/vlguard-dataset

fee7c65

ValbuenaVC approved these changes Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT Add VLGuard multimodal safety dataset loader#1447

FEAT Add VLGuard multimodal safety dataset loader#1447
romanlutz wants to merge 9 commits intomicrosoft:mainfrom
romanlutz:romanlutz/vlguard-dataset

romanlutz commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ValbuenaVC left a comment

Uh oh!

ValbuenaVC Apr 22, 2026

Uh oh!

ValbuenaVC Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		DISCRIMINATION = "discrimination"


		class VLGuardSubcategory(Enum):

Conversation

romanlutz commented Mar 10, 2026

Load unsafe image examples (default)

Load safe images with unsafe instructions, filtered to Privacy category

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ValbuenaVC left a comment

Choose a reason for hiding this comment

Uh oh!

ValbuenaVC Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ValbuenaVC Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants