Skip to content

Restart triage from interrupted run#2113

Open
Steboss wants to merge 4 commits into
mainfrom
sbosisio/triage-cache-restart
Open

Restart triage from interrupted run#2113
Steboss wants to merge 4 commits into
mainfrom
sbosisio/triage-cache-restart

Conversation

@Steboss
Copy link
Copy Markdown
Contributor

@Steboss Steboss commented May 18, 2026

No description provided.

@Steboss Steboss requested a review from olupton May 20, 2026 14:12
Comment thread .github/triage/jax_toolbox_triage/args.py Outdated
Comment thread .github/triage/jax_toolbox_triage/utils.py Outdated
Comment thread .github/triage/jax_toolbox_triage/logic.py Outdated
skip_precondition_checks: bool,
check_success_before_failure: bool = True,
confirmation_iterations: int = 1,
result_cache: typing.Optional[typing.Dict[FlatVersionDict, TestResult]] = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result_cache: typing.Optional[typing.Dict[FlatVersionDict, TestResult]] = None,
result_cache: typing.Dict[FlatVersionDict, TestResult] = {},

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as this, we might not even need _version_search anymore? I think it only existed because of result_cache not being in the top-level API.

Comment thread .github/triage/jax_toolbox_triage/summary.py Outdated
Comment thread .github/triage/jax_toolbox_triage/summary.py Outdated
Comment thread .github/triage/jax_toolbox_triage/summary.py Outdated
Comment thread .github/triage/jax_toolbox_triage/triage_tool.py Outdated
Co-authored-by: Olli Lupton <olupton@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Steboss Steboss requested a review from olupton May 28, 2026 13:35
and args.output_prefix.resolve() != args.restart_folder
):
raise Exception(
"--output-prefix must match --restart-folder when restarting"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these semantics, isn't a boolean --restart sufficient?

skip_precondition_checks: bool,
check_success_before_failure: bool = True,
confirmation_iterations: int = 1,
result_cache: typing.Optional[typing.Dict[FlatVersionDict, TestResult]] = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as this, we might not even need _version_search anymore? I think it only existed because of result_cache not being in the top-level API.

result_cache = {}
if self.args.restart_folder is not None:
self.restart_cache.update(
result_cache_from_summary(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to do this? Why isn't it enough to just populate the in-memory cache from JSON at startup and pass that into the two searches?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants