Summary
/usr/lib/systemd/system/secure-ai-inference.service ships with Type=notify + WatchdogSec=30. The wrapped llama-server does not call sd_notify(WATCHDOG=1) on a heartbeat, and loading a 7B Q4_K_M model with --n-gpu-layers -1 takes ~30-45s on an RTX 3050 Ti (8GB VRAM). The systemd watchdog kills the process during every load, producing an infinite crash-loop (code=dumped status=6/ABRT result='watchdog').
Reproduction
- Pick any model > 4GB via
select-model.sh (e.g. Qwen2.5-Coder-7B-Instruct-Q4_K_M).
- Watch:
sudo journalctl -u secure-ai-inference.service -f
Loop pattern:
Started secure-ai-inference.service
load_tensors: offloaded 29/29 layers to GPU
... (30s elapses) ...
secure-ai-inference.service: Main process exited, code=dumped, status=6/ABRT
secure-ai-inference.service: Failed with result 'watchdog'.
secure-ai-inference.service: Scheduled restart job
Suggested fix
Two reasonable options, both correct:
Option A (simplest) — disable the watchdog
[Service]
WatchdogSec=0
TimeoutStartSec=300
The wrapper script's existing process-exit handling is enough; you don't gain reliability from a watchdog when the wrapped process can't heartbeat.
Option B (correct, more work) — add a heartbeat from a sidecar
Make the wrapper poll http://127.0.0.1:${PORT}/health and call systemd-notify WATCHDOG=1 on success. Requires Type=notify-reload and NotifyAccess=all (or run a tiny notify-helper as the main pid).
I recommend (A) for v0.3.x and (B) for a future hardening pass.
My local workaround
sudo mkdir -p /etc/systemd/system/secure-ai-inference.service.d
sudo tee /etc/systemd/system/secure-ai-inference.service.d/override.conf <<'EOF'
[Service]
WatchdogSec=0
TimeoutStartSec=300
EOF
sudo systemctl daemon-reload
sudo systemctl restart secure-ai-inference.service
After this, model loads in ~25s and /v1/chat/completions returns 200.
🤖 Generated with claude-flow
Summary
/usr/lib/systemd/system/secure-ai-inference.serviceships withType=notify+WatchdogSec=30. The wrappedllama-serverdoes not callsd_notify(WATCHDOG=1)on a heartbeat, and loading a 7B Q4_K_M model with--n-gpu-layers -1takes ~30-45s on an RTX 3050 Ti (8GB VRAM). The systemd watchdog kills the process during every load, producing an infinite crash-loop (code=dumped status=6/ABRT result='watchdog').Reproduction
select-model.sh(e.g. Qwen2.5-Coder-7B-Instruct-Q4_K_M).Suggested fix
Two reasonable options, both correct:
Option A (simplest) — disable the watchdog
The wrapper script's existing process-exit handling is enough; you don't gain reliability from a watchdog when the wrapped process can't heartbeat.
Option B (correct, more work) — add a heartbeat from a sidecar
Make the wrapper poll
http://127.0.0.1:${PORT}/healthand callsystemd-notify WATCHDOG=1on success. RequiresType=notify-reloadandNotifyAccess=all(or run a tiny notify-helper as the main pid).I recommend (A) for v0.3.x and (B) for a future hardening pass.
My local workaround
After this, model loads in ~25s and
/v1/chat/completionsreturns 200.🤖 Generated with claude-flow