feat(backend): add llama.cpp (llama-server) backend wrapper by bilersan · Pull Request #97 · ActiveMemory/ctx

bilersan · 2026-05-23T14:35:44Z

Contributes to #92

Summary

Adds llamacpp as a named backend alongside vllm, openai, anthropic, ollama, and lmstudio. Follows the established vLLM pattern exactly:

llamacpp struct embedding *openAICompat with cold-start retry on ECONNREFUSED
Ping() delegates to coldStartRetry (reuses vllm_internal.go helper — no duplication)
Default endpoint: http://localhost:8080 (llama-server default)
No API key required by default (llama-server typically runs unauthenticated)

.ctxrc usage (once factory wiring + consumer commands land)

backends:
  - name: local
    type: llamacpp
    endpoint: http://localhost:8080
    timeout: 60s
default_backend: local

What this PR does NOT include

This PR adds the backend type and tests only. It intentionally does not include:

Factory registration wiring — not yet implemented for any backend in this branch
Consumer CLI commands (ctx ai, ctx compact --emit, ctx ingest) — expected in a future phase

When the factory wiring and consumer commands land, llamacpp will be available as a backend type with zero additional work.

Why llama.cpp needs its own wrapper (vs generic openai-compatible)

llama-server behaves like vLLM during model loading: the TCP listener is not yet bound while weights load, so the OS returns ECONNREFUSED at the socket level (not HTTP 503). The coldStartRetry logic from vllm_internal.go handles exactly this case. A generic openai-compatible backend would fail immediately on connection refused instead of retrying.

Validation

Tested against a live llama-server running Qwen3-4B-Q4_K_M:

Test	Result
`Ping` → `GET /v1/models`	HTTP 200, model listed
`Complete` → `POST /v1/chat/completions`	Model response received
Cold-start retry (unit)	Retries ECONNREFUSED, stops on non-dial errors
Non-dial error (unit)	HTTP 500 returns immediately, no retry
All existing backend tests	PASS (no regressions)

Files changed

File	Change
`internal/config/backend/backend.go`	+2 constants (`NameLlamaCpp`, `DefaultEndpointLlamaCpp`)
`internal/backend/types.go`	+1 struct (`llamacpp`)
`internal/backend/llamacpp.go`	Constructor + Ping override (~60 lines)
`internal/backend/llamacpp_test.go`	3 unit tests (mock httptest server)
`internal/backend/llamacpp_e2e_test.go`	2 e2e tests (build tag: `e2e`, requires live server)

Add llamacpp as a named backend alongside vllm, openai, anthropic, ollama, and lmstudio. Follows the established vLLM pattern exactly: - llamacpp struct embedding *openAICompat with cold-start retry - Ping() delegates to coldStartRetry (reuses vllm_internal.go helper) - Default endpoint: http://localhost:8080 (llama-server default) - No API key required by default (llama-server runs unauthenticated) .ctxrc usage (once factory wiring + ctx ai are implemented): backends: - name: local type: llamacpp endpoint: http://localhost:8080 timeout: 60s default_backend: local Note: this commit adds the backend type and tests only. Factory registration wiring and consumer CLI commands (ctx ai, ctx compact --emit, ctx ingest) are not yet implemented in the vllm-integration branch either — they are expected in a future phase once the backend abstraction layer stabilizes. When that wiring lands, llamacpp will be available as a backend type with zero additional work. Validated against a live llama-server (Qwen3-4B-Q4_K_M): - Ping: GET /v1/models returns 200 - Complete: POST /v1/chat/completions returns model response - Cold-start retry: unit-tested via shared coldStartRetry helper Files: internal/config/backend/backend.go +2 constants internal/backend/types.go +1 struct (llamacpp) internal/backend/llamacpp.go constructor + Ping override internal/backend/llamacpp_test.go 3 unit tests (mock server) internal/backend/llamacpp_e2e_test.go 2 e2e tests (build tag: e2e)

bilersan requested a review from josealekhine as a code owner May 23, 2026 14:35

bilersan force-pushed the feat/llamacpp-backend branch from c2a127c to 5210fca Compare May 23, 2026 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backend): add llama.cpp (llama-server) backend wrapper#97

feat(backend): add llama.cpp (llama-server) backend wrapper#97
bilersan wants to merge 1 commit into
ActiveMemory:feat/vllm-integrationfrom
bilersan:feat/llamacpp-backend

bilersan commented May 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bilersan commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

.ctxrc usage (once factory wiring + consumer commands land)

What this PR does NOT include

Why llama.cpp needs its own wrapper (vs generic openai-compatible)

Validation

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bilersan commented May 23, 2026 •

edited

Loading