Skip to content

feat: report self-bootstrap status via Health RPC#6

Merged
esifea merged 5 commits into
CryptoLabInc:mainfrom
couragehong:feat/bootstrap-status
May 31, 2026
Merged

feat: report self-bootstrap status via Health RPC#6
esifea merged 5 commits into
CryptoLabInc:mainfrom
couragehong:feat/bootstrap-status

Conversation

@couragehong
Copy link
Copy Markdown
Contributor

  • Previously, the gRPC UDS opened after self-bootstrap completed, so clients dialing during the multi-minute install window only saw dial failures. They now connect immediately and observe STATUS_LOADING with the current Phase (FETCHING_LLAMA_SERVER / FETCHING_MODEL / STARTING_LLAMA_SERVER) and download bytes_done / bytes_total — these proto fields were already defined; this PR is the matching implementation.
  • Embed/EmbedBatch return codes.FailedPrecondition before the backend is wired so retry policies don't burn budget against a non-ready daemon. Once SetBackend wires the backend, Health flips to STATUS_OK and embed requests are accepted.
  • Every termination path (Shutdown RPC, idle timeout, OS signals, serve error, early-fail during bootstrap) flips Health to
    STATUS_SHUTTING_DOWN before the listener closes — clients no longer see an abrupt connection drop with no signal. backend.Start failure also reaps a possibly-orphan child via b.Stop.
  • Client code (rune-mcp) doesn't need to change to remain working — surfacing Phase / bytes_done in a polling UI is a follow-up on the rune-mcp side; the runed surface is now ready for it.

couragehong and others added 3 commits May 28, 2026 17:40
Until self-bootstrap completes, Embed/EmbedBatch return
codes.FailedPrecondition rather than dialing into a nil backend, and
Health reports STATUS_LOADING with the current Phase / bytes_done /
bytes_total / message that bootstrap can feed in.

Mechanics:

- backend reference moves to atomic.Pointer[backend.LlamaBackend];
  modelIdentity becomes atomic.Value; bootstrapStatus is an
  atomic.Pointer[bootstrapState] so {phase, bytes, message} publish
  and observe as one tuple.
- New(version) constructs a Server with nil backend. SetBackend(b,
  modelID) wires it after bootstrap, writing maxTextLength /
  modelIdentity / backend in that order so a reader seeing backend
  necessarily sees the other two.
- SetBootstrapStatus(phase, bytesDone, bytesTotal, message) is the
  loading-state sink consumed by the next Health call while backend
  is still nil.

Health priority is SHUTTING_DOWN (shutdownCh closed) → LOADING
(backend nil) → DEGRADED (IsHealthy false) → OK. SHUTTING_DOWN
outranks LOADING so a drain-in-progress daemon doesn't advertise
itself as "still loading" mid-drain.

Tests cover the LOADING / SetBootstrapStatus reflection /
FAILED_PRECONDITION before-SetBackend / SHUTTING_DOWN priority cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Threads an optional reporter through EnsureAll / EnsureLlamaServer /
EnsureModel so download-byte progress and per-stage transitions reach
callers without coupling the bootstrap package to a specific status
sink (e.g. server.SetBootstrapStatus).

Callback shape:

    type StatusReporter func(stage string, bytesDone, bytesTotal int64)

stage is "llama_server" or "model"; the caller maps that to whichever
domain enum it cares about (cmd/runed routes to HealthResponse_Phase).
Reporter calls run inline on the download goroutine and share the
existing 2-second throttle inside makeProgress, so the status sink
isn't flooded at full chunk cadence.

Stage-transition ticks are emitted by the public entry points *before*
AcquireLock so a trailer waiting on the install lock still surfaces
the correct stage to clients during the lock-wait window. The internal
ensure* helpers no longer emit their own ticks; under the new
arrangement they would have produced duplicate transitions, and the
EnsureAll path explicitly issues the llama-server → model transition
between the two internal calls (still inside the same lock).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The gRPC UDS used to open *after* self-bootstrap completed, so clients
connecting during the multi-minute install window only saw dial
failures. They now connect immediately and observe STATUS_LOADING with
the current Phase + bytes_done / bytes_total / message — exactly what
the proto envisioned when these fields were originally defined.

Flow rewrite:

    paths → daemon-check
         → server.New (backend unset)
         → ipc.Listen → grpc.Serve [bg]
         → SetBootstrapStatus(UNSPECIFIED, "fetching manifest")
         → selfBootstrap with reporter (Phase flips per stage tick)
         → SetBootstrapStatus(STARTING_LLAMA_SERVER)
         → backend.Start
         → srv.SetBackend(b, modelID)            ← Health flips to OK
         → idle ticker / signal wait / drain

reporter is a closure that maps each bootstrap stage to its proto
Phase + message via stagePhase() and forwards to
srv.SetBootstrapStatus. The proto omits PHASE_FETCHING_MANIFEST, so
the manifest-fetch interval reports PHASE_UNSPECIFIED with a "fetching
manifest" message — clients that surface message render correctly
without depending on enum recognition for that brief stage.

A new bailBoot(logger, srv, gs, b) helper centralises early-failure
cleanup: any boot-time error (selfBootstrap, sha256File, backend.Start,
parseIdleTimeout) drives the same TriggerShutdown + GracefulStop +
best-effort b.Stop sequence. Clients see one final STATUS_SHUTTING_DOWN
before the listener closes instead of an abrupt connection drop, which
matches the experience on the normal exit path.

main's exit-select also calls srv.TriggerShutdown() unconditionally so
OS-signal and serve-error exits flip Health to SHUTTING_DOWN too
(sync.Once makes a follow-up Shutdown RPC a no-op).

backend.Start failure additionally calls b.Stop — b.Start may have
spawned a child that failed health-probe, leaving an orphan llama-
server holding ~470MB; b.Stop is idempotent on never-spawned backends.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@couragehong couragehong force-pushed the feat/bootstrap-status branch from a339f8d to f01512a Compare May 28, 2026 08:42
The three preceding commits accumulated a lot of explanatory prose
that restates what well-named identifiers already convey. This pass
prunes WHAT-style narration, "added for X" meta notes, and historical
context, keeping only the lines that capture a non-obvious WHY:
publish order in SetBackend, the FAILED_PRECONDITION-vs-Unavailable
rationale in Embed, the SHUTTING_DOWN-outranks-LOADING priority in
Health, the chars==tokens conservativism in maxTextLength, the
trailer-wait reason for emitting stage ticks before AcquireLock, etc.

No behaviour change; ~145 lines net removed across server / bootstrap
/ runed plus their tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread cmd/runed/main.go
@esifea esifea merged commit d04f8d3 into CryptoLabInc:main May 31, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants