Skip to content

feat(import): add --minimize-reads for type-scoped state loading#559

Open
riyazsh wants to merge 1 commit into
mainfrom
riyaz/import-minimize-reads
Open

feat(import): add --minimize-reads for type-scoped state loading#559
riyazsh wants to merge 1 commit into
mainfrom
riyaz/import-minimize-reads

Conversation

@riyazsh
Copy link
Copy Markdown
Contributor

@riyazsh riyazsh commented May 12, 2026

What this PR adds

This PR adds --minimize-reads to the datadog-sync import subcommand. The option, its validation, and the underlying loading-strategy code already exist (introduced for sync in #544 / #545) — only the click decorator was missing on import.

When enabled, import scopes the bucket state-load to the requested resource types (type-scoped) or to filter-extracted exact IDs (ID-targeted), instead of scanning every file under both source and destination prefixes.

The problem this solves

State.__init__ calls load_state(Origin.ALL), which reads every file under both prefixes (utils/state.py:77, utils/storage/gcs_bucket.py:56-84). On reasonably-populated buckets the state-load dominates per-run wall-clock regardless of how many records the run actually fetches — the boot-time read of prior runs' files becomes the bottleneck.

Observed shape on a ~22k-file source bucket, importing one resource type at a time:

Resource type Records imported Wall-clock Per-record cost
authn_mappings 3 41 min 825 sec/record (~99.97% state-load overhead)
dashboard_lists 381 42 min 6.6 sec/record
dashboards 14,000 53 min 230 ms/record

Three records, 41 minutes — nearly all of it the unscoped bucket scan. --minimize-reads fixes this for sync already; this PR makes the same fix reachable from import, which is the entry point most automation uses for incremental refreshes.

How to enable

datadog-sync import --minimize-reads --resource-per-file --resources=<types> ...

Constraints enforced in utils/configuration.py:412-417 (unchanged in this PR):

  • requires --resource-per-file
  • requires --resources

--cleanup is not registered on the import command (it lives in _diffs_options, applied only to sync/diffs), so the --minimize-reads + --cleanup rule cannot be violated through the import CLI. Click rejects import --cleanup=... at parse time. Locked in by test_import_does_not_accept_cleanup_option.

What does not change

  • sync command behaviour
  • utils/configuration.py validation logic
  • State constructor or storage layer
  • Default behaviour for import — the flag is opt-in (default=False)

Code summary

  • commands/_import.py: add --minimize-reads @option(...) decorator; import option and CustomOptionClass.
  • commands/sync.py: drop now-stale "Only available on the sync command." line from help text.
  • tests/unit/test_minimize_reads_type_scoped.py: flip the pre-existing negative test that asserted import rejected the flag; add positive --help check, two validation-mirror tests, and a click-level guard for --cleanup on import.

Test plan

  • pytest tests/unit/ — 500 passed
  • --minimize-reads listed in datadog-sync import --help
  • import --minimize-reads without --resource-per-file → UsageError
  • import --minimize-reads without --resources → UsageError
  • import --minimize-reads --cleanup=Force → click "no such option" (parse-time rejection)
  • sync --help still renders correctly after help-text edit
  • CI green across macOS / Linux x86 / Linux arm / Windows

Rollout

Flag is opt-in (default=False). Reverting the commit removes the option from import --help but doesn't break any caller that wasn't using it. No on-disk or wire-format effect.

Out of scope

  • migrate subcommand still lacks --minimize-reads. Adding it requires confirming that the apply phase's existing minimize-reads handling covers the migrate flow. Follow-up.
  • Parallelising _list_and_load (the per-type file walk is still sequential). Separate effort.

@riyazsh riyazsh requested a review from a team as a code owner May 12, 2026 16:15
@riyazsh
Copy link
Copy Markdown
Contributor Author

riyazsh commented May 12, 2026

Self-review notes

Posting these before external review so reviewers can focus on what I missed.

Reachability

Validation at configuration.py:412-419 is command-agnostic. New reachability from import:

  • --minimize-reads requires --resource-per-file → reachable (common_options). Pinned by test_import_minimize_reads_requires_resource_per_file.
  • --minimize-reads requires --resources → reachable (common_options). Pinned by test_import_minimize_reads_requires_resources_flag.
  • --minimize-reads cannot be combined with --cleanupunreachable from import, intentionally. --cleanup lives in _diffs_options, not applied to import. Click rejects import --cleanup=... at parse time. Pinned by test_import_does_not_accept_cleanup_option.

No dead branches added.

Tests

No new fakes/mocks. CliRunner is the same harness existing sync-side tests use. 500 unit tests pass.

Follow-ups (not in this PR)

  • --minimize-reads scopes the load but doesn't parallelise within a type. _list_and_load (gcs_bucket.py:68-84 + S3/Azure) is still serial. Worth a parallelisation pass — likely with the existing --max-concurrent-reads knob.
  • migrate doesn't yet expose --minimize-reads. Adding it needs maintainer alignment on whether migrate's apply phase needs the destination state pre-loaded or whether the lazy-loader at state.py:149 is sufficient.
  • For import specifically, load_state(Origin.ALL) still loads destination state, which import never reads (resources_handler.py:376-435 only touches state.source). With --minimize-reads that's a smaller dead-weight scan than before, but not zero.

Tightened in-PR

  • test_import_does_not_accept_cleanup_option now asserts against the Error: line specifically (commit 0ddc8abf) so a future click message reword doesn't silently mask a regression that accepts --cleanup on import.
  • Dropped the Must not be combined with --cleanup line from import's --minimize-reads help (commit 36b6ec31) — it was technically true but implied import accepts --cleanup, which it doesn't.

Mirrors the --minimize-reads option from sync onto import. The flag,
validation, and loading-strategy code already live in shared code
(utils/configuration.py:412-450, utils/state.py:88-90) — only the
click decorator needed adding to the import subcommand.

When enabled, import scopes the bucket state-load to the requested
resource types (type-scoped) or to filter-extracted exact IDs
(ID-targeted), instead of scanning every file under both source and
destination prefixes. For populated buckets where state-load dominates
per-run wall-clock, this is a significant speedup.

Constraints (unchanged, enforced in configuration.py:412-417):
- requires --resource-per-file
- requires --resources

The "must not be combined with --cleanup" rule on sync's flag is
enforced by sync's validation path because sync accepts --cleanup.
The import command does not register --cleanup at all (it lives in
_diffs_options, applied only to sync/diffs), so click rejects
"import --cleanup=..." at parse time, before configuration validation
runs. A behavioural test pins this invariant down.

Tests:
- flipped pre-existing test_minimize_reads_not_available_on_import_command
  to positive test_minimize_reads_accepted_on_import_command (now in
  --help).
- added test_import_minimize_reads_requires_resource_per_file and
  test_import_minimize_reads_requires_resources_flag, mirroring the
  sync-side validation tests.
- added test_import_does_not_accept_cleanup_option that asserts on
  the Error: line specifically, so a future click message reword
  doesn't silently mask a regression that accepts --cleanup on import.
- removed the now-stale "Only available on the sync command." line
  from sync's --minimize-reads help text.
- import's --minimize-reads help omits the --cleanup caveat (since
  --cleanup isn't an import flag, mentioning it would be misleading).

No behavioural change for the sync command. No new shared options.
@riyazsh riyazsh force-pushed the riyaz/import-minimize-reads branch from 36b6ec3 to 06d0f17 Compare May 12, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant