Skip to content

v0.3.0: secure-ai-inference.service WatchdogSec=30 kills llama-server during model load; infinite crash-loop on 7B Q4 #35

@Moonwolf711

Description

@Moonwolf711

Summary

/usr/lib/systemd/system/secure-ai-inference.service ships with Type=notify + WatchdogSec=30. The wrapped llama-server does not call sd_notify(WATCHDOG=1) on a heartbeat, and loading a 7B Q4_K_M model with --n-gpu-layers -1 takes ~30-45s on an RTX 3050 Ti (8GB VRAM). The systemd watchdog kills the process during every load, producing an infinite crash-loop (code=dumped status=6/ABRT result='watchdog').

Reproduction

  1. Pick any model > 4GB via select-model.sh (e.g. Qwen2.5-Coder-7B-Instruct-Q4_K_M).
  2. Watch:
    sudo journalctl -u secure-ai-inference.service -f
    
    Loop pattern:
    Started secure-ai-inference.service
    load_tensors: offloaded 29/29 layers to GPU
    ... (30s elapses) ...
    secure-ai-inference.service: Main process exited, code=dumped, status=6/ABRT
    secure-ai-inference.service: Failed with result 'watchdog'.
    secure-ai-inference.service: Scheduled restart job
    

Suggested fix

Two reasonable options, both correct:

Option A (simplest) — disable the watchdog

[Service]
WatchdogSec=0
TimeoutStartSec=300

The wrapper script's existing process-exit handling is enough; you don't gain reliability from a watchdog when the wrapped process can't heartbeat.

Option B (correct, more work) — add a heartbeat from a sidecar

Make the wrapper poll http://127.0.0.1:${PORT}/health and call systemd-notify WATCHDOG=1 on success. Requires Type=notify-reload and NotifyAccess=all (or run a tiny notify-helper as the main pid).

I recommend (A) for v0.3.x and (B) for a future hardening pass.

My local workaround

sudo mkdir -p /etc/systemd/system/secure-ai-inference.service.d
sudo tee /etc/systemd/system/secure-ai-inference.service.d/override.conf <<'EOF'
[Service]
WatchdogSec=0
TimeoutStartSec=300
EOF
sudo systemctl daemon-reload
sudo systemctl restart secure-ai-inference.service

After this, model loads in ~25s and /v1/chat/completions returns 200.

🤖 Generated with claude-flow

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions