fix mcp test generation stability and recovery by SummerOneTwo · Pull Request #5 · SummerOneTwo/AutoCode

SummerOneTwo · 2026-04-28T19:16:56Z

Summary

add configurable answer extension and propagate it through generation, verification, and polygon packing
harden long-running generation with checkpoints, resume support, cancellation handling, and process cleanup tooling
add semantic quality gates for generator type=3/type=4 and extend regression tests for recovery and cleanup edge cases

Test plan

uv run pytest tests/test_tools/test_problem.py tests/test_tools/test_generator.py tests/test_compiler.py -q
uv run ruff check src/autocode_mcp/tools/problem.py src/autocode_mcp/tools/test_verify.py src/autocode_mcp/tools/generator.py src/autocode_mcp/server.py src/autocode_mcp/utils/compiler.py src/autocode_mcp/utils/win_job.py scripts/workflow_guard.py tests/test_tools/test_problem.py tests/test_tools/test_generator.py tests/test_compiler.py

Made-with: Cursor

Copilot

Pull request overview

This PR improves the stability and recoverability of long-running MCP test generation by making answer file extensions configurable, adding checkpoint/resume + cancellation cleanup paths, and introducing semantic quality gates for generator type=3/type=4.

Changes:

Add configurable answer_ext and propagate it through generation, verification, and Polygon packaging.
Harden problem_generate_tests with checkpoints, resume support, hard timeouts, and cancellation-safe PID tracking.
Add semantic quality gates (type3/type4 overlap checks) and extend tests for recovery/cleanup edge cases.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/autocode_mcp/tools/problem.py`	Adds `answer_ext`, checkpoint/resume state handling, PID tracking, and a cleanup tool for generator processes.
`src/autocode_mcp/tools/test_verify.py`	Extends verification to support custom answer extensions and adds `limit_semantics` checks.
`src/autocode_mcp/tools/generator.py`	Adds a static semantic check for type=3/type=4 divergence (optional strict mode).
`src/autocode_mcp/utils/compiler.py`	Adds `process_start_hook` and ensures cancellation forces subprocess termination.
`src/autocode_mcp/utils/win_job.py`	Adds `close()` to release job handles without explicitly terminating first.
`src/autocode_mcp/server.py`	Returns a structured failure response when tool calls are cancelled.
`src/autocode_mcp/tools/...` tests (`tests/test_tools/test_problem.py`, `tests/test_tools/test_generator.py`, `tests/test_compiler.py`)	Adds coverage for custom `answer_ext`, resume fallback, cancellation cleanup, cleanup tool behavior, and strict semantic checks.
Docs (`README.md`, `skills/autocode-workflow/SKILL.md`, `agents/autocode-workflow.md`, `scripts/workflow_guard.py`, `src/autocode_mcp/prompts/__init__.py`, `CLAUDE.md`)	Updates workflow guidance and tool docs for `answer_ext`, resume/checkpoints, and semantic expectations.
`pyproject.toml`, `uv.lock`	Adds `twine` to dev dependency group and updates lockfile accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Made-with: Cursor

Tighten answer_ext normalization and retry semantics, clear stale generated outputs before final writes, and make cleanup/semantic checks more resilient across platforms and generator branch styles. Made-with: Cursor

Copilot

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T19:39:11Z

+                remaining_pids = [pid for pid in pids if isinstance(pid, int) and pid not in killed]
+                self._write_cleanup_state(state_path, remaining_pids)
+                if not remaining_pids and os.path.exists(state_path):
+                    os.remove(state_path)
+                    removed_files.append(state_path)


ProblemCleanupProcessesTool writes the generation state file using _write_cleanup_state() which overwrites the entire JSON with only {active_pids: ...} and may even delete the file when no PIDs remain. This state file is also used by problem_generate_tests for checkpoint/resume (it stores next_seed, candidates, etc.), so overwriting/removing it can break resume=true. Consider loading the existing state JSON, updating only the active_pids field (and keeping other fields intact), and avoid deleting the checkpoint unless the user explicitly requests it.

Copilot · 2026-04-28T19:39:12Z

+        with open(state_path, "w", encoding="utf-8") as f:
+            json.dump({"active_pids": remaining_pids}, f, ensure_ascii=False, indent=2)


_write_cleanup_state() currently rewrites the state file to only include active_pids, which discards checkpoint metadata (phase, next_seed, candidates, errors, answer_ext, etc.) saved by problem_generate_tests. Update this helper to merge into the existing JSON state (or use a separate PID-tracking file) so cleanup doesn’t destroy resume checkpoints.

Suggested change

with open(state_path, "w", encoding="utf-8") as f:

json.dump({"active_pids": remaining_pids}, f, ensure_ascii=False, indent=2)

state = self._load_cleanup_state(state_path) or {}

state["active_pids"] = remaining_pids

with open(state_path, "w", encoding="utf-8") as f:

json.dump(state, f, ensure_ascii=False, indent=2)

Copilot · 2026-04-28T19:39:12Z

@@ -873,7 +1243,7 @@ async def execute(
            <memory-limit>{memory_limit_bytes}</memory-limit>
            <test-count>{actual_test_count}</test-count>
            <input-path-pattern>tests/%02d.in</input-path-pattern>
-            <answer-path-pattern>tests/%02d.ans</answer-path-pattern>
+            <answer-path-pattern>tests/%02d{answer_ext}</answer-path-pattern>


answer_ext is taken from the manifest and interpolated directly into problem.xml without normalization/escaping. If the manifest is edited/corrupted (e.g., contains path separators or </&), this can produce invalid XML or an unintended path pattern. Reuse the same answer_ext normalization rules used elsewhere and XML-escape the interpolated values before writing problem.xml.

Copilot · 2026-04-28T19:39:12Z

+    def close(self) -> None:
+        """关闭 Job Object 句柄（不主动终止进程）。"""
+        if self.job_handle is not None and self.job_handle != 0:
+            win32api.CloseHandle(self.job_handle)
+            self.job_handle = None


WinJobObject.close() claims it "does not actively terminate processes", but the Job is created with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE, so closing the last handle will terminate all processes in the job anyway. Either (a) drop KILL_ON_JOB_CLOSE when you need a non-terminating close(), or (b) keep the flag but update the docstring/name/comments to reflect that close() can still kill processes via the job limit.

Copilot · 2026-04-28T19:39:13Z

        elapsed_ms = int((time.time() - start_time) * 1000)

-        # 正常完成后清理 Job Object
+        # 正常完成后只关闭 Job Handle


The comment says "正常完成后只关闭 Job Handle", but with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE set on the job, job.close() may still terminate any remaining child processes in the job when the handle is closed. If the intent is to not kill processes on success, the job should be created without KILL_ON_JOB_CLOSE (or with a different strategy), otherwise please adjust the comment/behavior to match the actual semantics.

Suggested change

# 正常完成后只关闭 Job Handle

# 正常完成后关闭 Job Handle；若该 Job 配置了 KILL_ON_JOB_CLOSE，

# 关闭时仍可能终止 Job 中尚未退出的子进程。

Preserve checkpoint metadata when cleaning tracked generator PIDs, add POSIX PID termination support, and sanitize/escape manifest-derived answer extensions in Polygon XML output. Made-with: Cursor

fix mcp test generation stability and recovery

7576337

Made-with: Cursor

Copilot AI review requested due to automatic review settings April 28, 2026 19:16

Copilot started reviewing on behalf of SummerOneTwo April 28, 2026 19:17 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

SummerOneTwo added 2 commits April 29, 2026 03:22

fix ci failures for new cleanup tool

3baac63

Made-with: Cursor

fix post-review robustness for test generation flow

4e89ec4

Tighten answer_ext normalization and retry semantics, clear stale generated outputs before final writes, and make cleanup/semantic checks more resilient across platforms and generator branch styles. Made-with: Cursor

SummerOneTwo requested a review from Copilot April 28, 2026 19:35

Copilot started reviewing on behalf of SummerOneTwo April 28, 2026 19:36 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

fix cleanup checkpoint safety and polygon answer-ext sanitization

096a0f7

Preserve checkpoint metadata when cleaning tracked generator PIDs, add POSIX PID termination support, and sanitize/escape manifest-derived answer extensions in Polygon XML output. Made-with: Cursor

SummerOneTwo merged commit d79858c into master Apr 29, 2026
6 checks passed

SummerOneTwo deleted the cursor/mcp-stability-hardening branch April 29, 2026 04:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix mcp test generation stability and recovery#5

fix mcp test generation stability and recovery#5
SummerOneTwo merged 4 commits intomasterfrom
cursor/mcp-stability-hardening

SummerOneTwo commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		with open(state_path, "w", encoding="utf-8") as f:
		json.dump({"active_pids": remaining_pids}, f, ensure_ascii=False, indent=2)

	# 正常完成后只关闭 Job Handle
	# 正常完成后关闭 Job Handle；若该 Job 配置了 KILL_ON_JOB_CLOSE，
	# 关闭时仍可能终止 Job 中尚未退出的子进程。

Conversation

SummerOneTwo commented Apr 28, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants