feat(cli): add prune command for orphaned state-file cleanup#555
Merged
Conversation
heyronhay
previously approved these changes
May 11, 2026
riyazsh
previously approved these changes
May 11, 2026
6a46806 to
d60a05f
Compare
The base branch was changed.
103c4e6 to
70ba961
Compare
Adds a new top-level `prune` command that deletes per-resource state
files for source IDs that are no longer present in the source org.
Required because --resource-per-file mode (mandatory for batched
managed-sync workflows) writes per-resource JSON files that are never
removed when the underlying resource is deleted upstream — over time
thousands of orphaned files accumulate.
How it works:
1. Fetches authoritative source IDs via import_resources_without_saving
(no destination calls — purely a read-only ground-truth query).
2. Builds the expected-filename set from the in-memory source IDs
(using sanitization + collision skip from BaseStorage).
3. Lists actual filenames on disk under source/ and destination/.
4. Computes the set difference — files on disk minus expected.
5. Snapshot fence: re-lists once more and intersects, guarding against
a concurrent sync writing a file between the import and the prune.
6. Prompts for confirmation (or skips with --force / dry-runs with
--dry-run / refuses interactive prompts under --json without
--force or --dry-run).
7. Deletes via storage.delete_many; emits per-(origin, type) NDJSON
ResourceOutcome events with status="success" or "partial".
Refusals (UsageError before any API call):
- --resources required to be explicit (defaulting to all is too
dangerous for a destructive command).
- --filters set (would over-prune the filtered-out resources).
- --resource-per-file absent (no per-file stale problem in
monolithic mode).
- --json without --force or --dry-run (interactive prompts
incompatible with JSON mode).
Wiring:
- Command.PRUNE in constants.py.
- prune.py in commands/ with @source_auth_options, @common_options,
@storage_options + --force, --dry-run.
- run_cmd_async dispatch branch.
- Configuration gains resource_per_file, prune_force, prune_dry_run
fields; build_config wires them post-construction. _handle_deprecated
is gated off for PRUNE so legacy file presence doesn't silently
mutate resources_arg. init_async adds PRUNE to source-only
_validate_client and _verify_ddr_status lists; PRUNE is excluded
from destination validation, DDR checks, and start metrics.
13 new unit tests (CLI argparse + handler preconditions + flow);
full suite 472/472 green; ruff/black clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The prune command's first phase calls import_resources_without_saving to fetch authoritative source IDs. import_resources_without_saving shows a "getting resources" progress bar when config.show_progress_bar is true — but for prune that's misleading: the user invoked 'prune', not 'import', and the import is internal plumbing. Save and restore config.show_progress_bar around the internal import call so the user's chosen setting applies elsewhere but the import phase is silent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reverts the previous commit's save/override/restore. The internal import phase can take minutes on large orgs, and a progress bar is genuinely useful feedback for long operations. Pass the flag through to the import call unchanged; users can pass --show-progress-bar=False to silence it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sets DD_SHOW_PROGRESS_BAR=false on the runner CliRunner instance so every CliRunner.invoke from integration and unit tests runs without a progress bar by default. Previously the CLI default of True meant tqdm escape sequences appeared in captured test output, making CI logs noisy and confusing pytest's output handling. Tests that need to verify progress-bar behavior can override via runner.invoke env= argument. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
70ba961 to
caada52
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third and final stacked PR — adds the user-visible
datadog-sync prunecommand that deletes orphaned per-resource state files when source resources are removed upstream.Stacked on: #554 (PR 2 — state helpers) → which is stacked on #553 (PR 1 — storage primitives).
Why
--resource-per-filemode (mandatory for batched managed-sync workflows) writes per-resource JSON files that are never removed when the underlying resource is deleted upstream. Over time thousands of orphaned files accumulate, slowing list operations and polluting future loads.pruneis the explicit cleanup pass.How it works
import_resources_without_saving(read-only against source — no destination calls).source/anddestination/.--force; dry-runs with--dry-run; refuses interactive prompts under--jsonwithout--forceor--dry-run).storage.delete_many; emits per-(origin, type)NDJSONResourceOutcomeevents withstatus="success"or"partial".Refusals (
UsageErrorbefore any API call)--resourcesmust be explicit — defaulting to all types is too dangerous for a destructive command.--filtersrejected — would over-prune filtered-out resources.--resource-per-filerequired — no per-file stale problem in monolithic mode.--jsonrequires--forceor--dry-run— no interactive prompts in JSON mode.Wiring
Command.PRUNEinconstants.py.commands/prune.pywith@source_auth_options,@common_options,@storage_options+--force,--dry-run.run_cmd_async.Configurationgainsresource_per_file,prune_force,prune_dry_runfields;build_configwires them post-construction._handle_deprecatedis gated off forPRUNEso legacy-file presence doesn't silently mutateresources_arg.init_asyncaddsPRUNEto source-only_validate_clientand_verify_ddr_statuslists;PRUNEis excluded from destination validation, destination DDR checks, andsend_metricsstart events.Test plan
Out-of-scope (deferred)
Stacked PR context
🤖 Generated with Claude Code