Skip to content

feat: Expose on_batch_complete via create method#663

Open
mikeknep wants to merge 3 commits into
mainfrom
662-on-batch-complete/mknepper
Open

feat: Expose on_batch_complete via create method#663
mikeknep wants to merge 3 commits into
mainfrom
662-on-batch-complete/mknepper

Conversation

@mikeknep
Copy link
Copy Markdown
Contributor

📋 Summary

Exposes on_batch_complete via the DataDesigner.create method so that users can configure the callback without having to reach into engine internals.

🔗 Related Issue

Closes #662 #662

🔄 Changes

  • Adds an optional argument to DataDesigner.create

🧪 Testing

  • make test passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable) N/A

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable) N/A

@mikeknep mikeknep requested a review from a team as a code owner May 15, 2026 14:11
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 15, 2026

Greptile Summary

This PR exposes the on_batch_complete callback through the DataDesigner.create method so callers no longer need to reach into engine internals to register batch-level hooks.

  • Adds an optional on_batch_complete: Callable[[Path], None] | None = None parameter to DataDesigner.create and forwards it to builder.build as a keyword argument.
  • Adds a focused unit test that mocks the builder and asserts the callback is forwarded verbatim.

Confidence Score: 5/5

Safe to merge — the change is a single optional parameter threaded through an existing call site with no behavioral changes to the default path.

The default value is None and the argument is forwarded as a keyword argument, so all existing call sites are unaffected. The new test directly verifies the forwarding contract. No existing logic is altered.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer/src/data_designer/interface/data_designer.py Adds on_batch_complete parameter to create; correctly typed, defaulted to None, and forwarded via keyword argument to builder.build.
packages/data-designer/tests/interface/test_data_designer.py New test test_create_forwards_on_batch_complete_callback correctly mocks the builder and asserts the callback is passed through unchanged.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant DataDesigner
    participant DatasetBuilder

    Caller->>DataDesigner: create(config_builder, num_records, on_batch_complete)
    DataDesigner->>DataDesigner: _create_resource_provider(...)
    DataDesigner->>DataDesigner: _create_dataset_builder(...)
    DataDesigner->>DatasetBuilder: build(num_records, on_batch_complete, resume)
    loop For each batch
        DatasetBuilder->>DatasetBuilder: write batch to disk
        DatasetBuilder->>Caller: on_batch_complete(batch_path)
    end
    DatasetBuilder-->>DataDesigner: done
    DataDesigner->>DataDesigner: profile dataset
    DataDesigner-->>Caller: DatasetCreationResults
Loading

Reviews (3): Last reviewed commit: "Add note about exceptions to docstring" | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown
Contributor

PR #663 Review — Expose on_batch_complete via DataDesigner.create

Summary

Threads an existing engine-level on_batch_complete: Callable[[Path], None] | None parameter
out through the public DataDesigner.create method on the interface package, so users can
register a per-batch callback without reaching into engine internals. Closes #662.

The change is minimal (47/-1) and surgical:

  • packages/data-designer/src/data_designer/interface/data_designer.py: add the kwarg and
    forward it to builder.build(...).
  • packages/data-designer/tests/interface/test_data_designer.py: a unit test asserting the
    callback is forwarded with identity to the underlying builder.

Findings

Correctness

  • Signature matches the engine: dataset_builder.py:223 declares
    on_batch_complete: Callable[[Path], None] | None = None, identical to the new
    interface kwarg. The forwarding call uses keyword args, so positional drift is
    not a risk.
  • Default None preserves existing behavior — a pure additive change with no
    caller impact.
  • Callable is imported from collections.abc (the project convention) rather
    than typing, consistent with from __future__ import annotations already in
    use in this file.

Project conventions

  • Follows the layering rules in AGENTS.md: interface forwards into engine; no
    reverse imports. ✅
  • Type annotations and modern syntax (Callable[[Path], None] | None) align with
    STYLEGUIDE expectations. ✅
  • Docstring entry added in the Args: section, matching the existing style of
    surrounding parameters.

Test coverage

  • test_create_forwards_on_batch_complete_callback asserts identity (is) on the
    forwarded callback and the num_records kwarg — a tight, focused check. Good.
  • The test only exercises the wiring; behavior of the callback (invocation per
    batch, path correctness) remains covered by engine-level tests. That separation
    of concerns is appropriate for an interface-layer change.
  • Minor nit: the dummy callback uses del path to silence "unused argument"
    lint, which is fine but def on_batch_complete(_path: Path) -> None: pass
    would be more idiomatic. Not blocking.

Risk / blast radius

  • Backward compatible: new kwarg with a default of None.
  • No public-API break; no changes to engine, config, or CLI.
  • No performance implications — the callback is only invoked on batch completion
    in the engine; this PR doesn't add any per-record overhead.

Documentation

  • Public-facing docstring is updated. Consider whether the Fern docs site
    (fern/) has an example or API reference that should mention
    on_batch_complete on create(). If create() is auto-generated from the
    signature, no action needed; otherwise a short note + tiny example would help
    users discover the feature. Not blocking.

Verdict

LGTM. Small, well-scoped, test-covered, backward compatible, and consistent with
the layering invariants in AGENTS.md. The only optional follow-up is verifying
the docs site surfaces the new parameter; everything else is good as-is.

Comment on lines +272 to +273
on_batch_complete: Optional callback called with the completed batch
artifact path after each batch is written.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say a brief example or two here? Without any other context, it's hard to understand why you might want this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I expanded the docstring in b775817


In all resume modes, in-flight partial results from the interrupted run are
discarded before generation continues.
on_batch_complete: Optional callback called with the completed batch artifact path after
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: could add one sentence here that callback exceptions abort the run and get wrapped as DataDesignerGenerationError. Since this is parameter-specific behavior, the docstring feels like the right place for it, and it also shows up anywhere the API docs render docstrings.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good info, added in 3e0ab73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose on_batch_complete through high-level interface

3 participants