Conversation
| *, | ||
| objective_target: PromptTarget, | ||
| attack_scoring_config: AttackScoringConfig, | ||
| attack_adversarial_config: AttackAdversarialConfig | None = None, |
There was a problem hiding this comment.
naming update unrelated to this PR because previous name was confusing (and new). Because the factory has these set already, but here we are overriding it
|
I know this is still WIP 😃 but flagging early that my gut feeling is that rapid response will be effectively the union of a bunch of more atomic scenarios instead of being one mega-scenario covering everything. |
- content_harms.py: keep thin alias (ours), discard main's full class - rapid_response.py: update to new _scenario_strategies API from PR microsoft#1627 - test_content_harms.py: removed (replaced by test_rapid_response.py) - test_rapid_response.py: update _scenario_composites -> _scenario_strategies Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add display_group field to AtomicAttack (defaults to atomic_attack_name) - Add display_group_map and get_display_groups() to ScenarioResult - Update console_printer to aggregate by display_group - Rename _build_atomic_attack_name -> _build_display_group in Scenario base - RapidResponse: unique compound atomic_attack_name per technique x dataset - Update scenarios.instructions.md for _scenario_strategies and _build_display_group Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- register_scenario_techniques() always uses default adversarial target - Custom adversarial targets flow through factory.create() overrides - Remove _apply_display_groups helper (display_group_map now persisted) - Persist display_group_map in ScenarioResultEntry for DB round-trips - Add accepts_scorer_override field to TechniqueSpec (TAP=False) - Replace 'tap' magic string check with registry.accepts_scorer_override() Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…registry specs - Add 'core' and 'default' tags to SCENARIO_TECHNIQUES entries - Add build_strategy_class_from_specs() to AttackTechniqueRegistry that creates ScenarioStrategy subclasses from TechniqueSpec lists - Delete static RapidResponseStrategy enum; generate dynamically in RapidResponse.get_strategy_class() with lazy caching - Uses spec list (pure data), not mutable registry — no side effects - Update airt/__init__.py and content_harms.py with __getattr__ for lazy resolution of dynamic strategy class - Update all test references to use _strategy_class() helper Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Aligns with AttackTechniqueRegistry, AttackTechniqueFactory, AttackTechnique. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move factory.create() inside the dataset loop so each AtomicAttack gets an independent attack_technique instance. Previously, a single instance was shared across all datasets for a technique, which could cause state leakage between concurrent attack executions. Benchmark: factory.create() costs ~6.5ms each, so 28 calls (4 techniques x 7 datasets) adds only ~180ms — negligible at current scale. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce TagQuery frozen dataclass with AND/OR/NOT predicates and
&, |, ~ operators for arbitrary boolean composition. This enables
queries like:
TagQuery(include_all={'core'}) & TagQuery(include_any={'A', 'B'})
which matches items tagged both 'core' AND at least one of 'A'/'B'.
- New file: pyrit/registry/tag_query.py
- Update build_strategy_class_from_specs to use dict[str, TagQuery]
- Update rapid_response.py aggregate_tags to use TagQuery
- 17 unit tests for TagQuery
- Export from pyrit/registry/__init__.py
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move Callable, TagQuery, PromptChatTarget, ScenarioStrategy, TrueFalseScorer into TYPE_CHECKING blocks (TC001/TC003) - Add Returns/Raises sections to docstrings (DOC201/DOC501) - Add docstrings for public methods (D102) - Make Taggable protocol read-only (fixes frozen dataclass compat) - Add __post_init__ validation to TagQuery with tests - Simplify _matches_leaf return (SIM103) - Fix test lint: rename S to strat (N806), lambda to def (E731), lowercase test name (N802), fix import ordering (I001) - Add type: ignore comments for dynamic enum construction (mypy) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ValbuenaVC
left a comment
There was a problem hiding this comment.
TL;DR Looks good! Some comments and a rough suggestion for scenarios long-term below.
Suggestion: this PR touches on something I've noticed with scenarios for a while now. Not a blocking comment since it's out of scope but I think scenarios need something like a state machine to standardize the relationship between two important components: each timestep of a scenario and how to transition between steps.
Like Frederic said it's likelier that we'll have a ton of rapid response scenarios that are each semantically very different but are expected to work the same way. Sharing the scenario across them makes much more sense if we keep the same transition between steps (the "rapid" part), but change what the steps are (the "response" part). How we do this I think depends a lot on how we see scenarios changing over time but something like ScenarioStep and ScenarioStrategy seem like natural places to start. Maybe something like this (very, very rough) idea:
class FooStep(ScenarioStep):
...
class ContentHarmsStep(ScenarioStep):
""" Each timestep of a scenario owns its valid inputs, outputs, and lifecycle """
outputs = ["safety_violation", ...]
inputs = [AttackTechniqueA, ...]
...
# Granularity: attack-level or attack-step-level?
def process_async(self, input: AttackTechnique) -> str:
# Hand-wavy and wrong types, this would need much stronger contracts
result = input.run_attack()
match result:
case self._has_safety_violation(result):
return "safety_violation"
class RapidResponseStrategy(ScenarioStrategy):
"""
Strategies now focus on defining a policy and valid states for the scenario
overall. You can recycle steps and keep their transitions the same to support
rapid response situations.
"""
state = StateEnum.UNINITIALIZED
valid_step_types = [ContentHarmsStep, FooStep, BarStep]
policy: {
StateEnum.UNINITIALIZED: self._start_scenario,
StateEnum.OPENING_PHASE: self._opening_phase,
...
}
def step():
while state != StateEnum.COMPLETE:
result = policy[state]()
state.update(result)
...
class MyCustomScenario(Scenario):
...
# This would be inherited from scenario so the user can focus on tweaking the event loop
# and inner state that's unique to the scenario rather than managing its lifecycle
def run_async():
self.strategy.event_loop()
Great comment! After this PR, RapidResponse is essentially declarative' it only specifies techniques, datasets, and defaults. The execution lifecycle (factory resolution, technique × dataset loop, scorer overrides, resume/retry) is all inherited from the base class. All our existing scenarios could be written like this so it simplifies our existing stuff a bunch. I think the state machine becomes compelling when we need conditional transitions - e.g., "broad sweep first, then focus multi-turn attacks on categories that showed weakness." But if we want a per-attack escalation, a simpler _on_attack_complete_async hook in the base class could handle "if this succeeded, probe deeper" without the state machine overhead. For full multi-phase orchestration where the entire scenario pivots based on aggregate results, the policy/state pattern you're sketching would be the right abstraction. But I think we'd get a ton of value just with the hook. Worth revisiting when we hit a concrete use case that needs branching; right now all scenarios are flat execution. |
| extra_kwargs: Static extra keyword arguments forwarded to the attack | ||
| constructor. Must not contain ``attack_adversarial_config`` (use | ||
| ``adversarial_chat`` instead). | ||
| accepts_scorer_override: Whether the technique accepts a scenario-level |
There was a problem hiding this comment.
are extra_kwargs supposed to be attack specific (like tree_width)?
There was a problem hiding this comment.
in general, i think this AttackTechniqueSpec is a good idea but am confused with the input for it
There was a problem hiding this comment.
Yeah, it's a tough bridge because we're trying to declare something that is live (ty adversarial chat for making it complicated). But I tried to update the docs to be more clear
There was a problem hiding this comment.
Do we have a way to validate the kwargs are all present / valid ?
There was a problem hiding this comment.
Not a GREAT way, until it's actually being run.
I keep saying this, but there is this tension between being able to tell things (like the command line) the different techniques, and actually instantiating those techniques at run time.
But maybe we could have a unit test that iterates through them all at least. It isn't perfect but it can at least catch bad configurations.
I'll include that in this PR.
…ix spellings - Promote factory-based _get_atomic_attacks_async from RapidResponse to Scenario base class - Remove redundant RapidResponse._get_attack_technique_factories override - Update doc examples to follow RapidResponse pattern (no override needed) - Fix British spellings (behaviour->behavior, recognised->recognized) - Fix mypy errors with cast(TrueFalseScorer, ...) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…4_16_rapid_response # Conflicts: # pyrit/scenario/scenarios/airt/content_harms.py
| @@ -147,18 +148,20 @@ async def print_summary_async(self, result: ScenarioResult) -> None: | |||
|
|
|||
| # Per-strategy breakdown | |||
| self._print_section_header("Per-Strategy Breakdown") | |||
There was a problem hiding this comment.
Per-Group Breakdown
ValbuenaVC
left a comment
There was a problem hiding this comment.
Minor comments, looks good!
| ValueError: If a spec declares ``adversarial_chat_key`` but the key | ||
| is not found in ``TargetRegistry``. | ||
| """ | ||
| from pyrit.registry import TargetRegistry |
There was a problem hiding this comment.
nit: why is this import here
| Resolves the default adversarial target, bakes it into the specs that | ||
| require it, then registers the resulting factories. | ||
| """ | ||
| from pyrit.registry.object_registries.attack_technique_registry import AttackTechniqueRegistry |
This PR refactors
ContentHarmsintoRapidResponse(withContentHarmskept as a deprecated alias), and introduces the foundational infrastructure for a central technique registry.Using this pattern, we can register scenario techniques centrally for scenarios to use and share.
TagQuery— a composable, frozen boolean predicate (&, |, ~) for filtering tagged registry objects.AttackTechniqueSpec— a declarative data class describing one registrable technique (class, tags, optional adversarial auto-detection, optional extra_kwargs_builder callback). Allows us to use these for scenario strategiesSCENARIO_TECHNIQUEScatalog — a single Python list of theAttackTechniqueSpecsused byRapidResponse. But it will grow.register_from_specs()— bulk, idempotent registration into the singletonAttackTechniqueRegistry.build_factory_from_spec()— auto-detects adversarial support via inspect.signature so no manual flag is needed per technique.build_strategy_class_from_specs()— dynamically generates aScenarioStrategy enumfrom specs +TagQueryaggregates (rather than hardcoding it).RapidResponse.get_attack_technique_factories()— triggers registration and returns all factories from the registry.