feat: report self-bootstrap status via Health RPC#6
Merged
Conversation
Until self-bootstrap completes, Embed/EmbedBatch return
codes.FailedPrecondition rather than dialing into a nil backend, and
Health reports STATUS_LOADING with the current Phase / bytes_done /
bytes_total / message that bootstrap can feed in.
Mechanics:
- backend reference moves to atomic.Pointer[backend.LlamaBackend];
modelIdentity becomes atomic.Value; bootstrapStatus is an
atomic.Pointer[bootstrapState] so {phase, bytes, message} publish
and observe as one tuple.
- New(version) constructs a Server with nil backend. SetBackend(b,
modelID) wires it after bootstrap, writing maxTextLength /
modelIdentity / backend in that order so a reader seeing backend
necessarily sees the other two.
- SetBootstrapStatus(phase, bytesDone, bytesTotal, message) is the
loading-state sink consumed by the next Health call while backend
is still nil.
Health priority is SHUTTING_DOWN (shutdownCh closed) → LOADING
(backend nil) → DEGRADED (IsHealthy false) → OK. SHUTTING_DOWN
outranks LOADING so a drain-in-progress daemon doesn't advertise
itself as "still loading" mid-drain.
Tests cover the LOADING / SetBootstrapStatus reflection /
FAILED_PRECONDITION before-SetBackend / SHUTTING_DOWN priority cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Threads an optional reporter through EnsureAll / EnsureLlamaServer /
EnsureModel so download-byte progress and per-stage transitions reach
callers without coupling the bootstrap package to a specific status
sink (e.g. server.SetBootstrapStatus).
Callback shape:
type StatusReporter func(stage string, bytesDone, bytesTotal int64)
stage is "llama_server" or "model"; the caller maps that to whichever
domain enum it cares about (cmd/runed routes to HealthResponse_Phase).
Reporter calls run inline on the download goroutine and share the
existing 2-second throttle inside makeProgress, so the status sink
isn't flooded at full chunk cadence.
Stage-transition ticks are emitted by the public entry points *before*
AcquireLock so a trailer waiting on the install lock still surfaces
the correct stage to clients during the lock-wait window. The internal
ensure* helpers no longer emit their own ticks; under the new
arrangement they would have produced duplicate transitions, and the
EnsureAll path explicitly issues the llama-server → model transition
between the two internal calls (still inside the same lock).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The gRPC UDS used to open *after* self-bootstrap completed, so clients
connecting during the multi-minute install window only saw dial
failures. They now connect immediately and observe STATUS_LOADING with
the current Phase + bytes_done / bytes_total / message — exactly what
the proto envisioned when these fields were originally defined.
Flow rewrite:
paths → daemon-check
→ server.New (backend unset)
→ ipc.Listen → grpc.Serve [bg]
→ SetBootstrapStatus(UNSPECIFIED, "fetching manifest")
→ selfBootstrap with reporter (Phase flips per stage tick)
→ SetBootstrapStatus(STARTING_LLAMA_SERVER)
→ backend.Start
→ srv.SetBackend(b, modelID) ← Health flips to OK
→ idle ticker / signal wait / drain
reporter is a closure that maps each bootstrap stage to its proto
Phase + message via stagePhase() and forwards to
srv.SetBootstrapStatus. The proto omits PHASE_FETCHING_MANIFEST, so
the manifest-fetch interval reports PHASE_UNSPECIFIED with a "fetching
manifest" message — clients that surface message render correctly
without depending on enum recognition for that brief stage.
A new bailBoot(logger, srv, gs, b) helper centralises early-failure
cleanup: any boot-time error (selfBootstrap, sha256File, backend.Start,
parseIdleTimeout) drives the same TriggerShutdown + GracefulStop +
best-effort b.Stop sequence. Clients see one final STATUS_SHUTTING_DOWN
before the listener closes instead of an abrupt connection drop, which
matches the experience on the normal exit path.
main's exit-select also calls srv.TriggerShutdown() unconditionally so
OS-signal and serve-error exits flip Health to SHUTTING_DOWN too
(sync.Once makes a follow-up Shutdown RPC a no-op).
backend.Start failure additionally calls b.Stop — b.Start may have
spawned a child that failed health-probe, leaving an orphan llama-
server holding ~470MB; b.Stop is idempotent on never-spawned backends.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a339f8d to
f01512a
Compare
The three preceding commits accumulated a lot of explanatory prose that restates what well-named identifiers already convey. This pass prunes WHAT-style narration, "added for X" meta notes, and historical context, keeping only the lines that capture a non-obvious WHY: publish order in SetBackend, the FAILED_PRECONDITION-vs-Unavailable rationale in Embed, the SHUTTING_DOWN-outranks-LOADING priority in Health, the chars==tokens conservativism in maxTextLength, the trailer-wait reason for emitting stage ticks before AcquireLock, etc. No behaviour change; ~145 lines net removed across server / bootstrap / runed plus their tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jh-lee-cryptolab
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
STATUS_LOADINGwith the currentPhase(FETCHING_LLAMA_SERVER/FETCHING_MODEL/STARTING_LLAMA_SERVER) and downloadbytes_done/bytes_total— these proto fields were already defined; this PR is the matching implementation.Embed/EmbedBatchreturncodes.FailedPreconditionbefore the backend is wired so retry policies don't burn budget against a non-ready daemon. OnceSetBackendwires the backend, Health flips toSTATUS_OKand embed requests are accepted.STATUS_SHUTTING_DOWNbefore the listener closes — clients no longer see an abrupt connection drop with no signal.backend.Startfailure also reaps a possibly-orphan child viab.Stop.rune-mcp) doesn't need to change to remain working — surfacingPhase/bytes_donein a polling UI is a follow-up on the rune-mcp side; the runed surface is now ready for it.