Skip to content

Cancel monitor and server tasks on backend shutdown#669

Merged
Kovbo merged 1 commit intoOpenPipe:mainfrom
Fiona-Waters:fix/cancel-server-task-on-shutdown
May 5, 2026
Merged

Cancel monitor and server tasks on backend shutdown#669
Kovbo merged 1 commit intoOpenPipe:mainfrom
Fiona-Waters:fix/cancel-server-task-on-shutdown

Conversation

@Fiona-Waters
Copy link
Copy Markdown
Contributor

@Fiona-Waters Fiona-Waters commented Apr 30, 2026

Summary

  • Store _monitor_openai_server task references in LocalBackend and cancel them during close()/_close()
  • Store openai_server_task reference in UnslothService and cancel it during aclose()/close()
  • Iterate over list() copies to avoid RuntimeError from dict mutation when done_callbacks fire during await
  • Add unit test verifying close() cancels monitor tasks

Fixes ##668

Test plan

  • Existing test_local_backend_monitor.py test passes
  • New test_close_cancels_monitor_tasks test passes
  • Run LocalBackend training in shared mode inside a container — pod should exit cleanly after training completes

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Two fire-and-forget asyncio tasks were never cancelled during shutdown:

1. `_monitor_openai_server` in `LocalBackend` — polls vLLM health every
   30s. After 3 consecutive ConnectionRefusedError failures (because vLLM
   is already gone), it re-raises, producing "Task exception was never
   retrieved" and a non-zero process exit.

2. `openai_server_task` in `UnslothService` (shared mode) — the uvicorn
   server task, also orphaned on shutdown.

In containerized environments (Kubeflow, KubeRay), the non-zero exit
causes pod restarts even though training completed successfully.

Fix: store both task references and cancel them during close()/aclose().
Iterate over list() copies to avoid RuntimeError from dict mutation when
done_callbacks fire during await.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Kovbo
Copy link
Copy Markdown
Collaborator

Kovbo commented May 5, 2026

Good catch, thank you @Fiona-Waters!

@Kovbo Kovbo merged commit 099e6b2 into OpenPipe:main May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants