Skip to content

fix mcp test generation stability and recovery#5

Merged
SummerOneTwo merged 4 commits intomasterfrom
cursor/mcp-stability-hardening
Apr 29, 2026
Merged

fix mcp test generation stability and recovery#5
SummerOneTwo merged 4 commits intomasterfrom
cursor/mcp-stability-hardening

Conversation

@SummerOneTwo
Copy link
Copy Markdown
Owner

Summary

  • add configurable answer extension and propagate it through generation, verification, and polygon packing
  • harden long-running generation with checkpoints, resume support, cancellation handling, and process cleanup tooling
  • add semantic quality gates for generator type=3/type=4 and extend regression tests for recovery and cleanup edge cases

Test plan

  • uv run pytest tests/test_tools/test_problem.py tests/test_tools/test_generator.py tests/test_compiler.py -q
  • uv run ruff check src/autocode_mcp/tools/problem.py src/autocode_mcp/tools/test_verify.py src/autocode_mcp/tools/generator.py src/autocode_mcp/server.py src/autocode_mcp/utils/compiler.py src/autocode_mcp/utils/win_job.py scripts/workflow_guard.py tests/test_tools/test_problem.py tests/test_tools/test_generator.py tests/test_compiler.py

Copilot AI review requested due to automatic review settings April 28, 2026 19:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the stability and recoverability of long-running MCP test generation by making answer file extensions configurable, adding checkpoint/resume + cancellation cleanup paths, and introducing semantic quality gates for generator type=3/type=4.

Changes:

  • Add configurable answer_ext and propagate it through generation, verification, and Polygon packaging.
  • Harden problem_generate_tests with checkpoints, resume support, hard timeouts, and cancellation-safe PID tracking.
  • Add semantic quality gates (type3/type4 overlap checks) and extend tests for recovery/cleanup edge cases.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/autocode_mcp/tools/problem.py Adds answer_ext, checkpoint/resume state handling, PID tracking, and a cleanup tool for generator processes.
src/autocode_mcp/tools/test_verify.py Extends verification to support custom answer extensions and adds limit_semantics checks.
src/autocode_mcp/tools/generator.py Adds a static semantic check for type=3/type=4 divergence (optional strict mode).
src/autocode_mcp/utils/compiler.py Adds process_start_hook and ensures cancellation forces subprocess termination.
src/autocode_mcp/utils/win_job.py Adds close() to release job handles without explicitly terminating first.
src/autocode_mcp/server.py Returns a structured failure response when tool calls are cancelled.
src/autocode_mcp/tools/... tests (tests/test_tools/test_problem.py, tests/test_tools/test_generator.py, tests/test_compiler.py) Adds coverage for custom answer_ext, resume fallback, cancellation cleanup, cleanup tool behavior, and strict semantic checks.
Docs (README.md, skills/autocode-workflow/SKILL.md, agents/autocode-workflow.md, scripts/workflow_guard.py, src/autocode_mcp/prompts/__init__.py, CLAUDE.md) Updates workflow guidance and tool docs for answer_ext, resume/checkpoints, and semantic expectations.
pyproject.toml, uv.lock Adds twine to dev dependency group and updates lockfile accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/autocode_mcp/tools/problem.py
Comment thread src/autocode_mcp/tools/test_verify.py Outdated
Comment thread src/autocode_mcp/tools/problem.py Outdated
Comment thread src/autocode_mcp/tools/problem.py
Comment thread src/autocode_mcp/tools/problem.py Outdated
Comment thread src/autocode_mcp/tools/generator.py Outdated
Tighten answer_ext normalization and retry semantics, clear stale generated outputs before final writes, and make cleanup/semantic checks more resilient across platforms and generator branch styles.

Made-with: Cursor
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/autocode_mcp/tools/problem.py Outdated
Comment on lines +1068 to +1072
remaining_pids = [pid for pid in pids if isinstance(pid, int) and pid not in killed]
self._write_cleanup_state(state_path, remaining_pids)
if not remaining_pids and os.path.exists(state_path):
os.remove(state_path)
removed_files.append(state_path)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProblemCleanupProcessesTool writes the generation state file using _write_cleanup_state() which overwrites the entire JSON with only {active_pids: ...} and may even delete the file when no PIDs remain. This state file is also used by problem_generate_tests for checkpoint/resume (it stores next_seed, candidates, etc.), so overwriting/removing it can break resume=true. Consider loading the existing state JSON, updating only the active_pids field (and keeping other fields intact), and avoid deleting the checkpoint unless the user explicitly requests it.

Copilot uses AI. Check for mistakes.
Comment thread src/autocode_mcp/tools/problem.py Outdated
Comment on lines +1108 to +1109
with open(state_path, "w", encoding="utf-8") as f:
json.dump({"active_pids": remaining_pids}, f, ensure_ascii=False, indent=2)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_write_cleanup_state() currently rewrites the state file to only include active_pids, which discards checkpoint metadata (phase, next_seed, candidates, errors, answer_ext, etc.) saved by problem_generate_tests. Update this helper to merge into the existing JSON state (or use a separate PID-tracking file) so cleanup doesn’t destroy resume checkpoints.

Suggested change
with open(state_path, "w", encoding="utf-8") as f:
json.dump({"active_pids": remaining_pids}, f, ensure_ascii=False, indent=2)
state = self._load_cleanup_state(state_path) or {}
state["active_pids"] = remaining_pids
with open(state_path, "w", encoding="utf-8") as f:
json.dump(state, f, ensure_ascii=False, indent=2)

Copilot uses AI. Check for mistakes.
Comment thread src/autocode_mcp/tools/problem.py Outdated
Comment on lines +1216 to +1246
@@ -873,7 +1243,7 @@ async def execute(
<memory-limit>{memory_limit_bytes}</memory-limit>
<test-count>{actual_test_count}</test-count>
<input-path-pattern>tests/%02d.in</input-path-pattern>
<answer-path-pattern>tests/%02d.ans</answer-path-pattern>
<answer-path-pattern>tests/%02d{answer_ext}</answer-path-pattern>
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answer_ext is taken from the manifest and interpolated directly into problem.xml without normalization/escaping. If the manifest is edited/corrupted (e.g., contains path separators or </&), this can produce invalid XML or an unintended path pattern. Reuse the same answer_ext normalization rules used elsewhere and XML-escape the interpolated values before writing problem.xml.

Copilot uses AI. Check for mistakes.
Comment on lines +143 to +147
def close(self) -> None:
"""关闭 Job Object 句柄(不主动终止进程)。"""
if self.job_handle is not None and self.job_handle != 0:
win32api.CloseHandle(self.job_handle)
self.job_handle = None
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WinJobObject.close() claims it "does not actively terminate processes", but the Job is created with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE, so closing the last handle will terminate all processes in the job anyway. Either (a) drop KILL_ON_JOB_CLOSE when you need a non-terminating close(), or (b) keep the flag but update the docstring/name/comments to reflect that close() can still kill processes via the job limit.

Copilot uses AI. Check for mistakes.
Comment thread src/autocode_mcp/utils/compiler.py Outdated
elapsed_ms = int((time.time() - start_time) * 1000)

# 正常完成后清理 Job Object
# 正常完成后只关闭 Job Handle
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "正常完成后只关闭 Job Handle", but with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE set on the job, job.close() may still terminate any remaining child processes in the job when the handle is closed. If the intent is to not kill processes on success, the job should be created without KILL_ON_JOB_CLOSE (or with a different strategy), otherwise please adjust the comment/behavior to match the actual semantics.

Suggested change
# 正常完成后只关闭 Job Handle
# 正常完成后关闭 Job Handle;若该 Job 配置了 KILL_ON_JOB_CLOSE,
# 关闭时仍可能终止 Job 中尚未退出的子进程。

Copilot uses AI. Check for mistakes.
Preserve checkpoint metadata when cleaning tracked generator PIDs, add POSIX PID termination support, and sanitize/escape manifest-derived answer extensions in Polygon XML output.

Made-with: Cursor
@SummerOneTwo SummerOneTwo merged commit d79858c into master Apr 29, 2026
6 checks passed
@SummerOneTwo SummerOneTwo deleted the cursor/mcp-stability-hardening branch April 29, 2026 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants