Skip to content

Add skill to debug VAI jobs#600

Draft
kmontemayor2-sc wants to merge 1 commit intomainfrom
kmonte/vai-debug
Draft

Add skill to debug VAI jobs#600
kmontemayor2-sc wants to merge 1 commit intomainfrom
kmonte/vai-debug

Conversation

@kmontemayor2-sc
Copy link
Copy Markdown
Collaborator

Scope of work done

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

is fine, just create the first one. If an existing script already covers what you need, reuse it. Only create a new
script when no existing one fits.

______________________________________________________________________
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of front matter here that will pollute the context.
Can we iterate on this?


**Prefer creating well-named, reusable scripts in `tools/ai/<descriptive-name>.py` over per-job throwaways.** Examples
of good names: `tools/ai/filter_vai_logs_by_replica.py`, `tools/ai/summarize_vai_log_errors.py`,
`tools/ai/extract_first_last_log_per_rank.py`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why dont we just persist these?
Specifically the tools to read/filter logs.

It would save a lot of prompting and cycles.
(I think? we should have doe to fetch logs already)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if we should be including a lot of slop scripts here, they're cheap /easy to generate here (and the robots generate the scripts themselves anyways there's no real prompting on my end.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants