fix(auto-recall): add timeout + fail-open so slow LLMs cannot stall startup or first turn by Sanjays2402 · Pull Request #1673 · MemTensor/MemOS

Sanjays2402 · 2026-05-09T19:29:26Z

Summary

With auto-recall enabled and an existing memory database, a slow LLM on the recall/filter path could block the gateway critical path for 30-40 seconds — long enough to trip health checks and cause restart loops (#1452).

This PR wraps the recall/filter LLM work in a configurable timeout and ensures any exception in the auto-recall path fails open rather than propagating to the gateway top level.

Changes

New withTimeout helper that resolves to null on timeout (clean fail-open semantics).
Auto-recall LLM filter now races against recall.autoRecallTimeoutMs (default 8000 ms).
Top-level try/catch around the auto-recall block; on any error or timeout we log a warning and return an empty memory set so the prompt build proceeds normally.
New config key documented: recall.autoRecallTimeoutMs.

Behavior

Healthy LLM: indistinguishable from before — recall + filter happen, memories injected.
Slow LLM (timeout): warning logged, prompt builds with no auto-injected memories, gateway proceeds.
LLM error: same as timeout — warning logged, fail open.
Startup ready is unaffected; auto-recall was never on that path, but the timeout + fail-open guarantees it stays that way.

Fixes #1452

…tartup or first turn When auto-recall was enabled with an existing memory database and a slow LLM on the recall/filter path, the before-prompt-build hook could block the critical path for 30-40 seconds — long enough to trip gateway health checks and contribute to restart loops. Wrap the recall/filter work in a configurable timeout (default 8s, via `recall.autoRecallTimeoutMs`) and a top-level try/catch that fails open to an empty memory set. Auto-recall is best-effort enrichment; it must never delay readiness or destabilize the gateway. Fixes MemTensor#1452

Copilot

Pull request overview

This PR prevents before_prompt_build auto-recall from stalling the gateway critical path by adding a hard timeout and fail-open behavior around the recall search + LLM filter work (addressing #1452).

Changes:

Added a withTimeout helper that resolves to null on timeout (fail-open).
Wrapped auto-recall Phase 1 (parallel local + hub search) and Phase 2 (LLM filtering) in the configured timeout, with fail-open behavior.
Introduced recall.autoRecallTimeoutMs (default documented as 8000ms) and added unit tests for withTimeout.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
apps/memos-local-openclaw/tests/with-timeout.test.ts	Adds Vitest coverage for the new `withTimeout` helper and timeout semantics.
apps/memos-local-openclaw/src/types.ts	Documents `recall.autoRecallTimeoutMs` and adds a default value to `DEFAULTS`.
apps/memos-local-openclaw/src/shared/with-timeout.ts	Implements `withTimeout` to race promises against a timeout and return `null` on timeout.
apps/memos-local-openclaw/index.ts	Applies `withTimeout` to auto-recall search and filter steps to prevent long stalls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        const autoRecallTimeoutMs =
+          ctx.config.recall?.autoRecallTimeoutMs ?? DEFAULTS.autoRecallTimeoutMs;
+        const phase1 = await withTimeout(


+  it("simulates the auto-recall hang path: a 30s LLM call falls back in 8s", async () => {
+    // Mimic a slow recall LLM that would hang the gateway critical path.
+    const hangingLLM = new Promise<{ relevant: number[]; sufficient: boolean }>(
+      (resolve) => setTimeout(() => resolve({ relevant: [1, 2], sufficient: true }), 30_000),
+    );
+    const t0 = Date.now();
+    const result = await withTimeout(hangingLLM, 50, "auto-recall.filter");
+    const elapsed = Date.now() - t0;
+    expect(result).toBeNull();
+    // Must give up well under the 30s LLM completion time.
+    expect(elapsed).toBeLessThan(500);
+  });


…lake timer tests Address Copilot review on MemTensor#1673: - index.ts: resolveConfig now passes through cfg.recall.autoRecallTimeoutMs to the resolved recall block. Without this the user-facing config key was effectively dead — the default always won. - with-timeout.test.ts: switch to vi.useFakeTimers() and vi.advanceTimersByTimeAsync so the timeout assertions are deterministic under CI load. The previous wall-clock 'elapsed < 500ms' check was the most likely flake source.

Sanjays2402 · 2026-05-09T19:54:39Z

Thanks — both addressed in the latest commit:

index.ts:1905 (config not wired): resolveConfig() (in src/config.ts) now passes cfg.recall.autoRecallTimeoutMs through to the resolved recall block. Without this the user-facing config key was effectively dead and the default always won — good catch.
with-timeout.test.ts:48 (flaky timers): Switched to vi.useFakeTimers() + vi.advanceTimersByTimeAsync() so the timeout behavior is asserted deterministically. The previous wall-clock elapsed < 500ms check was the obvious flake risk under CI load. Test suite now finishes in ~5ms and the 30s-hang case is verified by advancing fake time past 8001ms instead of measuring real elapsed time.

Copilot AI review requested due to automatic review settings May 9, 2026 19:29

Copilot started reviewing on behalf of Sanjays2402 May 9, 2026 19:30 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auto-recall): add timeout + fail-open so slow LLMs cannot stall startup or first turn#1673

fix(auto-recall): add timeout + fail-open so slow LLMs cannot stall startup or first turn#1673
Sanjays2402 wants to merge 2 commits intoMemTensor:mainfrom
Sanjays2402:fix/issue-1452

Sanjays2402 commented May 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Sanjays2402 commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sanjays2402 commented May 9, 2026

Summary

Changes

Behavior

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Sanjays2402 commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants