Skip to content

fix: align mdata arginfo with runtime machine sections#1889

Closed
SchrodingersCattt wants to merge 2 commits into
deepmodeling:masterfrom
SchrodingersCattt:fix/mdata-list-arginfo
Closed

fix: align mdata arginfo with runtime machine sections#1889
SchrodingersCattt wants to merge 2 commits into
deepmodeling:masterfrom
SchrodingersCattt:fix/mdata-list-arginfo

Conversation

@SchrodingersCattt

@SchrodingersCattt SchrodingersCattt commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Align general_mdata_arginfo() with convert_mdata() runtime semantics for machine task sections.
  • Allow train, model_devi, and fp to be either a dict or a list of dicts.
  • Add regression tests covering dict-form, list-form, scalar rejection, and list-with-non-dict rejection.

Related

Related to #1888.

Why

dpgen run accepts list-form machine sections through convert_mdata():

elif isinstance(mdata[task_type], (list, tuple)):
    task_data = mdata[task_type][0]

But run_mdata_arginfo() previously exposed these sections as dict-only, causing external validators/tools to falsely reject machine files that runtime accepts.

Validation

  • python -m pytest tests/test_mdata_arginfo.py tests/test_check_examples.py tests/tools/test_convert_mdata.py -q
  • python -m ruff check dpgen/arginfo.py tests/test_mdata_arginfo.py

Summary by CodeRabbit

  • Bug Fixes

    • Improved runtime validation for metadata task configurations, accepting either task sections as dict objects or as non-empty lists of dict objects.
    • Added clearer error messaging when metadata task inputs don’t match the expected runtime shape.
  • Tests

    • Added new unit tests covering acceptance of valid task section formats.
    • Added coverage for rejection cases including lists with non-dict items, empty lists, and scalar inputs.

@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 49.82%. Comparing base (66232e5) to head (a7a519e).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
dpgen/arginfo.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1889      +/-   ##
==========================================
+ Coverage   49.80%   49.82%   +0.02%     
==========================================
  Files          83       83              
  Lines       14986    14992       +6     
==========================================
+ Hits         7464     7470       +6     
  Misses       7522     7522              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dab37a25-46f7-452c-a73c-3201e4935533

📥 Commits

Reviewing files that changed from the base of the PR and between 403266b and a7a519e.

📒 Files selected for processing (2)
  • dpgen/arginfo.py
  • tests/test_mdata_arginfo.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • dpgen/arginfo.py
  • tests/test_mdata_arginfo.py

📝 Walkthrough

Walkthrough

Adds a private helper _is_mdata_task_config that accepts a dict or a list of dicts and rejects all other types. Wires this validator into general_mdata_arginfo() via extra_check and extra_check_errmsg for each task argument. Adds a new test module covering all four input shapes: dict sections, list-wrapped sections, list-with-non-dict, and scalar string.

Changes

mdata task config validation and tests

Layer / File(s) Summary
Task config validator function
dpgen/arginfo.py
Introduces _is_mdata_task_config(value) returning true only for a dict or an all-dict list. Rejects all other types including empty lists and scalar values.
Arginfo task argument validation
dpgen/arginfo.py
Updates each task Argument in general_mdata_arginfo() to accept (dict, list) and enforces _is_mdata_task_config via extra_check with a specific error message describing the expected dict-or-list-of-dicts semantics.
TestRunMdataArginfo test coverage
tests/test_mdata_arginfo.py
Adds _task() and _mdata() helper constructors and five test methods: accepting dict sections, accepting list-wrapped sections and verifying convert_mdata() command extraction, rejecting a list containing a non-dict (expects ArgumentValueError), rejecting an empty list (expects ArgumentValueError), and rejecting a scalar string section (expects ArgumentTypeError).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: align mdata arginfo with runtime machine sections' accurately describes the main change: aligning the mdata arginfo schema with actual runtime behavior for machine task sections. It is specific, concise, and directly reflects the core objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
dpgen/arginfo.py (1)

4-5: ⚡ Quick win

Use a NumPy-style docstring for the new helper.

As per coding guidelines, dpgen/**/*.py: "Use Numpy-style docstrings for functions and classes".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpgen/arginfo.py` around lines 4 - 5, The docstring for the
`_is_mdata_task_config` function uses a simple Google-style format instead of
the NumPy-style format required by the coding guidelines. Convert the docstring
to NumPy-style format by adding properly structured sections including a Summary
section, Parameters section (documenting the `value` parameter and its type),
and Returns section (documenting the return type as bool and what the function
returns).

Source: Coding guidelines

tests/test_mdata_arginfo.py (1)

53-63: ⚡ Quick win

Add a regression test for empty-list task sections.

Given the new validator logic, please add a case like data["train"] = [] and assert rejection. This locks behavior to the runtime contract and prevents silent reintroduction of the empty-list gap.

As per coding guidelines, tests/**/*.py: "Add corresponding unit tests in tests/ directory for new features".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_mdata_arginfo.py` around lines 53 - 63, Add a new test method
following the existing patterns of test_rejects_list_with_non_dict_item and
test_rejects_scalar_section that verifies empty-list task sections are properly
rejected. Create a test that initializes _mdata, sets data["train"] = [] (an
empty list), and asserts that calling normalize(run_mdata_arginfo(), data,
strict_check=False) raises the appropriate error exception. This regression test
ensures the validator prevents empty task sections from being silently accepted.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dpgen/arginfo.py`:
- Around line 8-10: The validation logic for list values currently accepts empty
lists because all(isinstance(item, dict) for item in value) returns True for
empty iterables, but this causes downstream crashes when convert_mdata() tries
to access index 0 on an empty list. Modify the condition in the
isinstance(value, list) branch to additionally check that the list is not empty
before validating that all items are dictionaries, so that empty lists are
rejected as invalid during validation.

---

Nitpick comments:
In `@dpgen/arginfo.py`:
- Around line 4-5: The docstring for the `_is_mdata_task_config` function uses a
simple Google-style format instead of the NumPy-style format required by the
coding guidelines. Convert the docstring to NumPy-style format by adding
properly structured sections including a Summary section, Parameters section
(documenting the `value` parameter and its type), and Returns section
(documenting the return type as bool and what the function returns).

In `@tests/test_mdata_arginfo.py`:
- Around line 53-63: Add a new test method following the existing patterns of
test_rejects_list_with_non_dict_item and test_rejects_scalar_section that
verifies empty-list task sections are properly rejected. Create a test that
initializes _mdata, sets data["train"] = [] (an empty list), and asserts that
calling normalize(run_mdata_arginfo(), data, strict_check=False) raises the
appropriate error exception. This regression test ensures the validator prevents
empty task sections from being silently accepted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9164ecdc-16c8-463b-a2bc-8b7d7d6bfbe3

📥 Commits

Reviewing files that changed from the base of the PR and between 7af5246 and 403266b.

📒 Files selected for processing (2)
  • dpgen/arginfo.py
  • tests/test_mdata_arginfo.py

Comment thread dpgen/arginfo.py

@njzjz njzjz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list form has been deprecated and is just kept for compatibility. I don't think it is necessary to tell external tools that a deprecated usage is supported.

@SchrodingersCattt

Copy link
Copy Markdown
Contributor Author

Thanks for the clarification. It sounds like the list form is intentionally kept only for backward compatibility, while run_mdata_arginfo() represents the canonical machine schema for external tools. We’ll handle list-form compatibility downstream if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants