Skip to content

[feat][evaluation]trae eval#551

Open
tpfz wants to merge 4 commits into
mainfrom
feat/wzq/trae_eval
Open

[feat][evaluation]trae eval#551
tpfz wants to merge 4 commits into
mainfrom
feat/wzq/trae_eval

Conversation

@tpfz

@tpfz tpfz commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

What type of PR is this?

Check the PR title

  • This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Add documentation if the current PR requires user awareness at the usage level.
  • This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

tpfz and others added 3 commits June 15, 2026 21:08
Add a new EvalTargetType.SandboxAgent (=17) for evaluating agents
launched via CLI inside a sandbox container. Wired through IDL,
generated kitex code, DO entities, DTO/DO converters (internal +
openapi), MySQL convertor, and turn-execution dispatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cord

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0.95238% with 208 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...modules/evaluation/application/eval_openapi_app.go 0.00% 85 Missing ⚠️
...uation/application/convertor/experiment/openapi.go 0.00% 65 Missing ⚠️
...uation/application/convertor/target/eval_target.go 0.00% 51 Missing and 1 partial ⚠️
...api/handler/coze/loop/apis/eval_open_apiservice.go 0.00% 4 Missing ⚠️
backend/modules/evaluation/domain/entity/target.go 33.33% 2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.95%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #551      +/-   ##
==========================================
- Coverage   77.60%   77.43%   -0.18%     
==========================================
  Files         670      670              
  Lines       75995    76259     +264     
==========================================
+ Hits        58979    59052      +73     
- Misses      13565    13749     +184     
- Partials     3451     3458       +7     
Flag Coverage Δ
unittests 77.43% <0.95%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...aluation/domain/service/expt_run_item_turn_impl.go 86.46% <100.00%> (ø)
backend/modules/evaluation/domain/entity/target.go 97.93% <33.33%> (-2.07%) ⬇️
...api/handler/coze/loop/apis/eval_open_apiservice.go 0.00% <0.00%> (ø)
...uation/application/convertor/target/eval_target.go 80.30% <0.00%> (-10.19%) ⬇️
...uation/application/convertor/experiment/openapi.go 81.12% <0.00%> (-2.44%) ⬇️
...modules/evaluation/application/eval_openapi_app.go 87.90% <0.00%> (-4.38%) ⬇️

... and 9 files with indirect coverage changes


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 016641a...f565f08. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Route SandboxAgent eval target through the async execution path so that
external sandbox runs can report results via ReportEvalTargetInvokeResult:
- AsyncCallTarget() returns true for SandboxAgent targets
- Register a SandboxAgent ISourceEvalTargetOperateService that allocates
  an invoke id placeholder in AsyncExecute; actual execution is performed
  outside and reported back through the existing async report endpoint

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant