fix: align mdata arginfo with runtime machine sections by SchrodingersCattt · Pull Request #1889 · deepmodeling/dpgen

SchrodingersCattt · 2026-06-21T08:44:09Z

Summary

Align general_mdata_arginfo() with convert_mdata() runtime semantics for machine task sections.
Allow train, model_devi, and fp to be either a dict or a list of dicts.
Add regression tests covering dict-form, list-form, scalar rejection, and list-with-non-dict rejection.

Why

dpgen run accepts list-form machine sections through convert_mdata():

elif isinstance(mdata[task_type], (list, tuple)):
    task_data = mdata[task_type][0]

But run_mdata_arginfo() previously exposed these sections as dict-only, causing external validators/tools to falsely reject machine files that runtime accepts.

Validation

python -m pytest tests/test_mdata_arginfo.py tests/test_check_examples.py tests/tools/test_convert_mdata.py -q
python -m ruff check dpgen/arginfo.py tests/test_mdata_arginfo.py

Summary by CodeRabbit

Bug Fixes
- Improved runtime validation for metadata task configurations, accepting either task sections as dict objects or as non-empty lists of dict objects.
- Added clearer error messaging when metadata task inputs don’t match the expected runtime shape.
Tests
- Added new unit tests covering acceptance of valid task section formats.
- Added coverage for rejection cases including lists with non-dict items, empty lists, and scalar inputs.

codecov · 2026-06-21T08:46:40Z

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 49.82%. Comparing base (66232e5) to head (a7a519e).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
dpgen/arginfo.py	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1889      +/-   ##
==========================================
+ Coverage   49.80%   49.82%   +0.02%     
==========================================
  Files          83       83              
  Lines       14986    14992       +6     
==========================================
+ Hits         7464     7470       +6     
  Misses       7522     7522

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai · 2026-06-21T08:46:55Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dab37a25-46f7-452c-a73c-3201e4935533

📥 Commits

Reviewing files that changed from the base of the PR and between 403266b and a7a519e.

📒 Files selected for processing (2)

dpgen/arginfo.py
tests/test_mdata_arginfo.py

🚧 Files skipped from review as they are similar to previous changes (2)

dpgen/arginfo.py
tests/test_mdata_arginfo.py

📝 Walkthrough

Walkthrough

Adds a private helper _is_mdata_task_config that accepts a dict or a list of dicts and rejects all other types. Wires this validator into general_mdata_arginfo() via extra_check and extra_check_errmsg for each task argument. Adds a new test module covering all four input shapes: dict sections, list-wrapped sections, list-with-non-dict, and scalar string.

Changes

mdata task config validation and tests

Layer / File(s)	Summary
Task config validator function `dpgen/arginfo.py`	Introduces `_is_mdata_task_config(value)` returning true only for a dict or an all-dict list. Rejects all other types including empty lists and scalar values.
Arginfo task argument validation `dpgen/arginfo.py`	Updates each task `Argument` in `general_mdata_arginfo()` to accept `(dict, list)` and enforces `_is_mdata_task_config` via `extra_check` with a specific error message describing the expected dict-or-list-of-dicts semantics.
TestRunMdataArginfo test coverage `tests/test_mdata_arginfo.py`	Adds `_task()` and `_mdata()` helper constructors and five test methods: accepting dict sections, accepting list-wrapped sections and verifying `convert_mdata()` command extraction, rejecting a list containing a non-dict (expects `ArgumentValueError`), rejecting an empty list (expects `ArgumentValueError`), and rejecting a scalar string section (expects `ArgumentTypeError`).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

run_mdata_arginfo rejects list-form machine sections accepted by runtime convert_mdata #1888: This PR directly implements the fix described in that issue — updating general_mdata_arginfo() to accept both dict and list[dict] forms for train, model_devi, and fp sections, with validation that enforces the list-of-dicts constraint, resolving the schema-runtime mismatch with convert_mdata().

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: align mdata arginfo with runtime machine sections' accurately describes the main change: aligning the mdata arginfo schema with actual runtime behavior for machine task sections. It is specific, concise, and directly reflects the core objective of the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

dpgen/arginfo.py (1)
4-5: ⚡ Quick win

Use a NumPy-style docstring for the new helper.

As per coding guidelines, dpgen/**/*.py: "Use Numpy-style docstrings for functions and classes".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpgen/arginfo.py` around lines 4 - 5, The docstring for the
`_is_mdata_task_config` function uses a simple Google-style format instead of
the NumPy-style format required by the coding guidelines. Convert the docstring
to NumPy-style format by adding properly structured sections including a Summary
section, Parameters section (documenting the `value` parameter and its type),
and Returns section (documenting the return type as bool and what the function
returns).
Source: Coding guidelines
tests/test_mdata_arginfo.py (1)
53-63: ⚡ Quick win

Add a regression test for empty-list task sections.

Given the new validator logic, please add a case like data["train"] = [] and assert rejection. This locks behavior to the runtime contract and prevents silent reintroduction of the empty-list gap.

As per coding guidelines, tests/**/*.py: "Add corresponding unit tests in tests/ directory for new features".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_mdata_arginfo.py` around lines 53 - 63, Add a new test method
following the existing patterns of test_rejects_list_with_non_dict_item and
test_rejects_scalar_section that verifies empty-list task sections are properly
rejected. Create a test that initializes _mdata, sets data["train"] = [] (an
empty list), and asserts that calling normalize(run_mdata_arginfo(), data,
strict_check=False) raises the appropriate error exception. This regression test
ensures the validator prevents empty task sections from being silently accepted.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dpgen/arginfo.py`:
- Around line 8-10: The validation logic for list values currently accepts empty
lists because all(isinstance(item, dict) for item in value) returns True for
empty iterables, but this causes downstream crashes when convert_mdata() tries
to access index 0 on an empty list. Modify the condition in the
isinstance(value, list) branch to additionally check that the list is not empty
before validating that all items are dictionaries, so that empty lists are
rejected as invalid during validation.

---

Nitpick comments:
In `@dpgen/arginfo.py`:
- Around line 4-5: The docstring for the `_is_mdata_task_config` function uses a
simple Google-style format instead of the NumPy-style format required by the
coding guidelines. Convert the docstring to NumPy-style format by adding
properly structured sections including a Summary section, Parameters section
(documenting the `value` parameter and its type), and Returns section
(documenting the return type as bool and what the function returns).

In `@tests/test_mdata_arginfo.py`:
- Around line 53-63: Add a new test method following the existing patterns of
test_rejects_list_with_non_dict_item and test_rejects_scalar_section that
verifies empty-list task sections are properly rejected. Create a test that
initializes _mdata, sets data["train"] = [] (an empty list), and asserts that
calling normalize(run_mdata_arginfo(), data, strict_check=False) raises the
appropriate error exception. This regression test ensures the validator prevents
empty task sections from being silently accepted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9164ecdc-16c8-463b-a2bc-8b7d7d6bfbe3

📥 Commits

Reviewing files that changed from the base of the PR and between 7af5246 and 403266b.

📒 Files selected for processing (2)

dpgen/arginfo.py
tests/test_mdata_arginfo.py

njzjz

The list form has been deprecated and is just kept for compatibility. I don't think it is necessary to tell external tools that a deprecated usage is supported.

SchrodingersCattt · 2026-06-21T15:15:37Z

Thanks for the clarification. It sounds like the list form is intentionally kept only for backward compatibility, while run_mdata_arginfo() represents the canonical machine schema for external tools. We’ll handle list-form compatibility downstream if needed.

fix: align mdata arginfo with runtime machine sections

403266b

coderabbitai Bot reviewed Jun 21, 2026

View reviewed changes

Comment thread dpgen/arginfo.py

test: reject empty mdata task lists

a7a519e

njzjz requested changes Jun 21, 2026

View reviewed changes

SchrodingersCattt closed this Jun 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: align mdata arginfo with runtime machine sections#1889

fix: align mdata arginfo with runtime machine sections#1889
SchrodingersCattt wants to merge 2 commits into
deepmodeling:masterfrom
SchrodingersCattt:fix/mdata-list-arginfo

SchrodingersCattt commented Jun 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

codecov Bot commented Jun 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

njzjz left a comment

Uh oh!

SchrodingersCattt commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SchrodingersCattt commented Jun 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related

Why

Validation

Summary by CodeRabbit

Uh oh!

codecov Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njzjz left a comment

Choose a reason for hiding this comment

Uh oh!

SchrodingersCattt commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SchrodingersCattt commented Jun 21, 2026 •

edited by coderabbitai Bot

Loading

codecov Bot commented Jun 21, 2026 •

edited

Loading

coderabbitai Bot commented Jun 21, 2026 •

edited

Loading