v0.3.0: secure-ai-inference.service WatchdogSec=30 kills llama-server during model load; infinite crash-loop on 7B Q4

## Summary

`/usr/lib/systemd/system/secure-ai-inference.service` ships with `Type=notify` + `WatchdogSec=30`. The wrapped `llama-server` does not call `sd_notify(WATCHDOG=1)` on a heartbeat, and loading a 7B Q4_K_M model with `--n-gpu-layers -1` takes ~30-45s on an RTX 3050 Ti (8GB VRAM). The systemd watchdog kills the process during every load, producing an infinite crash-loop (`code=dumped status=6/ABRT result='watchdog'`).

## Reproduction

1. Pick any model > 4GB via `select-model.sh` (e.g. Qwen2.5-Coder-7B-Instruct-Q4_K_M).
2. Watch:
   ```
   sudo journalctl -u secure-ai-inference.service -f
   ```
   Loop pattern:
   ```
   Started secure-ai-inference.service
   load_tensors: offloaded 29/29 layers to GPU
   ... (30s elapses) ...
   secure-ai-inference.service: Main process exited, code=dumped, status=6/ABRT
   secure-ai-inference.service: Failed with result 'watchdog'.
   secure-ai-inference.service: Scheduled restart job
   ```

## Suggested fix

Two reasonable options, both correct:

### Option A (simplest) — disable the watchdog

```
[Service]
WatchdogSec=0
TimeoutStartSec=300
```

The wrapper script's existing process-exit handling is enough; you don't gain reliability from a watchdog when the wrapped process can't heartbeat.

### Option B (correct, more work) — add a heartbeat from a sidecar

Make the wrapper poll `http://127.0.0.1:${PORT}/health` and call `systemd-notify WATCHDOG=1` on success. Requires `Type=notify-reload` and `NotifyAccess=all` (or run a tiny notify-helper as the main pid).

I recommend (A) for v0.3.x and (B) for a future hardening pass.

## My local workaround

```
sudo mkdir -p /etc/systemd/system/secure-ai-inference.service.d
sudo tee /etc/systemd/system/secure-ai-inference.service.d/override.conf <<'EOF'
[Service]
WatchdogSec=0
TimeoutStartSec=300
EOF
sudo systemctl daemon-reload
sudo systemctl restart secure-ai-inference.service
```

After this, model loads in ~25s and `/v1/chat/completions` returns 200.

🤖 Generated with [claude-flow](https://github.com/ruvnet/claude-flow)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0: secure-ai-inference.service WatchdogSec=30 kills llama-server during model load; infinite crash-loop on 7B Q4 #35

Summary

Reproduction

Suggested fix

Option A (simplest) — disable the watchdog

Option B (correct, more work) — add a heartbeat from a sidecar

My local workaround

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

v0.3.0: secure-ai-inference.service WatchdogSec=30 kills llama-server during model load; infinite crash-loop on 7B Q4 #35

Description

Summary

Reproduction

Suggested fix

Option A (simplest) — disable the watchdog

Option B (correct, more work) — add a heartbeat from a sidecar

My local workaround

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions