fix(auto-recall): add timeout + fail-open so slow LLMs cannot stall startup or first turn#1673
Open
Sanjays2402 wants to merge 2 commits intoMemTensor:mainfrom
Open
fix(auto-recall): add timeout + fail-open so slow LLMs cannot stall startup or first turn#1673Sanjays2402 wants to merge 2 commits intoMemTensor:mainfrom
Sanjays2402 wants to merge 2 commits intoMemTensor:mainfrom
Conversation
…tartup or first turn When auto-recall was enabled with an existing memory database and a slow LLM on the recall/filter path, the before-prompt-build hook could block the critical path for 30-40 seconds — long enough to trip gateway health checks and contribute to restart loops. Wrap the recall/filter work in a configurable timeout (default 8s, via `recall.autoRecallTimeoutMs`) and a top-level try/catch that fails open to an empty memory set. Auto-recall is best-effort enrichment; it must never delay readiness or destabilize the gateway. Fixes MemTensor#1452
Contributor
There was a problem hiding this comment.
Pull request overview
This PR prevents before_prompt_build auto-recall from stalling the gateway critical path by adding a hard timeout and fail-open behavior around the recall search + LLM filter work (addressing #1452).
Changes:
- Added a
withTimeouthelper that resolves tonullon timeout (fail-open). - Wrapped auto-recall Phase 1 (parallel local + hub search) and Phase 2 (LLM filtering) in the configured timeout, with fail-open behavior.
- Introduced
recall.autoRecallTimeoutMs(default documented as 8000ms) and added unit tests forwithTimeout.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| apps/memos-local-openclaw/tests/with-timeout.test.ts | Adds Vitest coverage for the new withTimeout helper and timeout semantics. |
| apps/memos-local-openclaw/src/types.ts | Documents recall.autoRecallTimeoutMs and adds a default value to DEFAULTS. |
| apps/memos-local-openclaw/src/shared/with-timeout.ts | Implements withTimeout to race promises against a timeout and return null on timeout. |
| apps/memos-local-openclaw/index.ts | Applies withTimeout to auto-recall search and filter steps to prevent long stalls. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1903
to
+1905
| const autoRecallTimeoutMs = | ||
| ctx.config.recall?.autoRecallTimeoutMs ?? DEFAULTS.autoRecallTimeoutMs; | ||
| const phase1 = await withTimeout( |
Comment on lines
+37
to
+48
| it("simulates the auto-recall hang path: a 30s LLM call falls back in 8s", async () => { | ||
| // Mimic a slow recall LLM that would hang the gateway critical path. | ||
| const hangingLLM = new Promise<{ relevant: number[]; sufficient: boolean }>( | ||
| (resolve) => setTimeout(() => resolve({ relevant: [1, 2], sufficient: true }), 30_000), | ||
| ); | ||
| const t0 = Date.now(); | ||
| const result = await withTimeout(hangingLLM, 50, "auto-recall.filter"); | ||
| const elapsed = Date.now() - t0; | ||
| expect(result).toBeNull(); | ||
| // Must give up well under the 30s LLM completion time. | ||
| expect(elapsed).toBeLessThan(500); | ||
| }); |
…lake timer tests Address Copilot review on MemTensor#1673: - index.ts: resolveConfig now passes through cfg.recall.autoRecallTimeoutMs to the resolved recall block. Without this the user-facing config key was effectively dead — the default always won. - with-timeout.test.ts: switch to vi.useFakeTimers() and vi.advanceTimersByTimeAsync so the timeout assertions are deterministic under CI load. The previous wall-clock 'elapsed < 500ms' check was the most likely flake source.
Author
|
Thanks — both addressed in the latest commit:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
With auto-recall enabled and an existing memory database, a slow LLM on the recall/filter path could block the gateway critical path for 30-40 seconds — long enough to trip health checks and cause restart loops (#1452).
This PR wraps the recall/filter LLM work in a configurable timeout and ensures any exception in the auto-recall path fails open rather than propagating to the gateway top level.
Changes
withTimeouthelper that resolves tonullon timeout (clean fail-open semantics).recall.autoRecallTimeoutMs(default 8000 ms).recall.autoRecallTimeoutMs.Behavior
readyis unaffected; auto-recall was never on that path, but the timeout + fail-open guarantees it stays that way.Fixes #1452