onboard: per-instance asyncio.Lock + typed lifecycle exceptions on ConversationEngine by Antawari · Pull Request #151 · BonfireAI/bonfire

Antawari · 2026-05-27T22:52:20Z

Summary

Two defects rooted at the same class, both load-bearing on the host-only Front Door:

1. Back-to-back `handle_answer` calls corrupt `_turn`

handle_answer awaits emit(...) (which awaits broadcast(...) → asyncio.gather) before incrementing _turn. A second WS message arriving while the first call is suspended inside the first emit reads the same stale _turn — the analyzer fires twice for the same question, _turn double-advances, and the question-emission sequence can skip Q2 or Q3. With three answers required for completion, a single user double-click can push the engine into is_complete=True after Q2; subsequent legitimate answers raise the bare RuntimeError("Conversation is already complete.") and the WS handler silently logs without recovery — the engine is permanently broken for that session.

2. Lifecycle violations raise bare `RuntimeError`

handle_answer before start() and handle_answer after completion both raised RuntimeError("..."). The WS handler's broad except Exception swallowed them without producing a typed error frame the browser could surface to the user.

Changes

src/bonfire/onboard/conversation.py

Added _lock: asyncio.Lock = field(default_factory=asyncio.Lock) to ConversationEngine. Each instance gets its own Lock — default_factory runs per instance, not per class definition, so unrelated conversations don't serialize against each other.
handle_answer now wraps its full body in async with self._lock: — the await points inside the body can't be raced by a sibling call.
Defined two typed exceptions, both subclassing RuntimeError for backward-compat with existing bare-except RuntimeError catchers upstream in the WS handler:
- ConversationNotStarted (handle_answer called before start)
- ConversationAlreadyComplete (handle_answer called after Q3 answered)
Updated __all__ to export the new exception types.

tests/unit/test_onboard_conversation_concurrency.py (new · Knight RED)

Seven tests across three classes:

Class	Test	What it pins
TestLockPresence	`_lock` attribute is `asyncio.Lock`	Contract for the per-instance lock
TestLockPresence	Lock per-instance, not shared	`default_factory` discipline
TestHandleAnswerAcquiresLock	`handle_answer` blocks when lock held	The acquire pattern fires
TestHandleAnswerAcquiresLock	Concurrent calls emit Q1→Q2→Q3 in order	Pre-fix this races and skips questions
TestTypedLifecycleExceptions	Pre-start raises `ConversationNotStarted`	Typed exception path
TestTypedLifecycleExceptions	Post-complete raises `ConversationAlreadyComplete`	Typed exception path
TestTypedLifecycleExceptions	Both subclass `RuntimeError`	Back-compat guard for existing catchers

TDD verification

Knight RED state confirmed against pre-fix code: collection-error RED (the test imports ConversationAlreadyComplete / ConversationNotStarted which don't exist pre-fix — the strongest form of structural RED).
Warrior GREEN state confirmed after fix: 7/7 pass in 1.05s.
Existing regression: tests/unit/test_onboard_server.py — 19/19 still green.
ruff check + ruff format --check on changed files: clean.

Out of scope (filed for follow-up PR)

WS handler integration — catching ConversationNotStarted / ConversationAlreadyComplete specifically in src/bonfire/onboard/server.py and emitting a typed error frame to the browser. Deferred to avoid file overlap with the in-flight front-door hardening PR (which also touches server.py).

Test plan

pytest tests/unit/test_onboard_conversation_concurrency.py — confirm 7/7 green.
pytest tests/unit/test_onboard_server.py — confirm 19/19 still green (no regression).

🤖 Generated with Claude Code

…nversationEngine Two defects rooted at the same class: 1. handle_answer awaits emit() (which awaits broadcast() → asyncio.gather) before incrementing _turn. A second WS message arriving while the first handle_answer is suspended inside the first emit reads the same stale _turn — the analyzer fires twice for the same question, _turn double- advances, and the question-emission sequence can skip Q2 or Q3. With three answers required for completion, a double-click can leave the engine permanently broken for that session. 2. Lifecycle violations (handle_answer before start, handle_answer after completion) raised bare RuntimeError. The WS handler's `except Exception` swallowed them without producing a typed error frame the browser could surface to the user. ## Fix src/bonfire/onboard/conversation.py: - Added `_lock: asyncio.Lock = field(default_factory=asyncio.Lock)` to ConversationEngine. Each instance gets its own Lock — `default_factory` runs per instance, not per class definition, so unrelated conversations don't serialize against each other. - handle_answer now wraps its full body in `async with self._lock:` — the await points inside the body can't be raced by a sibling call. - Defined two typed exceptions, both subclassing RuntimeError for backward-compat with existing bare-`except RuntimeError` catchers upstream in the WS handler: * ConversationNotStarted (handle_answer called before start) * ConversationAlreadyComplete (handle_answer called after Q3 answered) - Updated `__all__` to export the new exception types. ## Tests tests/unit/test_onboard_conversation_concurrency.py (new · Knight RED): Seven tests across three classes: - TestLockPresence (2): asyncio.Lock attribute exists; per-instance not shared. - TestHandleAnswerAcquiresLock (2): * handle_answer blocks when the lock is held externally (asserts via asyncio.wait_for timeout). * two concurrent handle_answer calls emit questions in strict Q1→Q2→Q3 order (pre-fix this races and skips questions). - TestTypedLifecycleExceptions (3): * handle_answer before start raises ConversationNotStarted. * handle_answer after completion raises ConversationAlreadyComplete. * both typed exceptions subclass RuntimeError (back-compat guard). ## Out of scope (filed for follow-up PR) WS handler integration — catching ConversationNotStarted / ConversationAlreadyComplete specifically in src/bonfire/onboard/server.py and emitting a typed error frame to the browser. Deferred to avoid file overlap with the in-flight front-door hardening PR which also touches server.py. ## Verification pytest tests/unit/test_onboard_conversation_concurrency.py 7 passed (Knight RED → GREEN verified) pytest tests/unit/test_onboard_server.py (regression) 19 passed ruff check + format on changed files: clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onboard: per-instance asyncio.Lock + typed lifecycle exceptions on ConversationEngine#151

onboard: per-instance asyncio.Lock + typed lifecycle exceptions on ConversationEngine#151
Antawari wants to merge 1 commit into
mainfrom
ishtar/conversation-engine-concurrency-lock

Antawari commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Antawari commented May 27, 2026

Summary

1. Back-to-back handle_answer calls corrupt _turn

2. Lifecycle violations raise bare RuntimeError

Changes

TDD verification

Out of scope (filed for follow-up PR)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Back-to-back `handle_answer` calls corrupt `_turn`

2. Lifecycle violations raise bare `RuntimeError`