Skip to content

fix: use Studio threshold as Run Eval default#1197

Merged
christso merged 1 commit intomainfrom
fix/1194-studio-default-threshold
Apr 29, 2026
Merged

fix: use Studio threshold as Run Eval default#1197
christso merged 1 commit intomainfrom
fix/1194-studio-default-threshold

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

  • default the Run Eval modal threshold to studio.threshold instead of an empty field with a 0.8 placeholder
  • keep manual edits as per-run overrides while allowing the field to be cleared and retyped
  • use benchmark-scoped Studio config when the modal is opened from a benchmark page
  • add regression tests for the threshold resolution helpers

Testing

  • bun test apps/studio/src/components/run-eval-threshold.test.ts
  • bun --filter @agentv/studio build
  • bunx biome check apps/studio/src/components/RunEvalModal.tsx apps/studio/src/components/run-eval-threshold.ts apps/studio/src/components/run-eval-threshold.test.ts apps/studio/src/lib/api.ts

Notes

  • bun run lint is currently failing on unrelated import ordering in scripts/check-grader-scores.ts on fresh origin/main

Fixes #1194

The Run Eval modal now initializes from studio.threshold, keeps per-run edits as overrides, and uses benchmark-scoped config when available.

Refs #1194

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 945c9cc
Status: ✅  Deploy successful!
Preview URL: https://dc75db32.agentv.pages.dev
Branch Preview URL: https://fix-1194-studio-default-thre.agentv.pages.dev

View logs

@christso christso merged commit 37a84b6 into main Apr 29, 2026
4 checks passed
@christso christso deleted the fix/1194-studio-default-threshold branch April 29, 2026 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: threshold is set to 0.75 in settings but in UI its still 0.8 placeholder

1 participant