feat: Expose on_batch_complete via create method#663
Conversation
Greptile SummaryThis PR exposes the
|
| Filename | Overview |
|---|---|
| packages/data-designer/src/data_designer/interface/data_designer.py | Adds on_batch_complete parameter to create; correctly typed, defaulted to None, and forwarded via keyword argument to builder.build. |
| packages/data-designer/tests/interface/test_data_designer.py | New test test_create_forwards_on_batch_complete_callback correctly mocks the builder and asserts the callback is passed through unchanged. |
Sequence Diagram
sequenceDiagram
participant Caller
participant DataDesigner
participant DatasetBuilder
Caller->>DataDesigner: create(config_builder, num_records, on_batch_complete)
DataDesigner->>DataDesigner: _create_resource_provider(...)
DataDesigner->>DataDesigner: _create_dataset_builder(...)
DataDesigner->>DatasetBuilder: build(num_records, on_batch_complete, resume)
loop For each batch
DatasetBuilder->>DatasetBuilder: write batch to disk
DatasetBuilder->>Caller: on_batch_complete(batch_path)
end
DatasetBuilder-->>DataDesigner: done
DataDesigner->>DataDesigner: profile dataset
DataDesigner-->>Caller: DatasetCreationResults
Reviews (3): Last reviewed commit: "Add note about exceptions to docstring" | Re-trigger Greptile
PR #663 Review — Expose
|
| on_batch_complete: Optional callback called with the completed batch | ||
| artifact path after each batch is written. |
There was a problem hiding this comment.
Should we say a brief example or two here? Without any other context, it's hard to understand why you might want this
There was a problem hiding this comment.
Good idea, I expanded the docstring in b775817
|
|
||
| In all resume modes, in-flight partial results from the interrupted run are | ||
| discarded before generation continues. | ||
| on_batch_complete: Optional callback called with the completed batch artifact path after |
There was a problem hiding this comment.
suggestion: could add one sentence here that callback exceptions abort the run and get wrapped as DataDesignerGenerationError. Since this is parameter-specific behavior, the docstring feels like the right place for it, and it also shows up anywhere the API docs render docstrings.
📋 Summary
Exposes
on_batch_completevia theDataDesigner.createmethod so that users can configure the callback without having to reach intoengineinternals.🔗 Related Issue
Closes #662 #662
🔄 Changes
DataDesigner.create🧪 Testing
make testpasses✅ Checklist