Skip to content

fix(infra): persist Loki logs and Grafana state#408

Merged
dimakis merged 4 commits into
mainfrom
fix/obs-volumes
Jun 27, 2026
Merged

fix(infra): persist Loki logs and Grafana state#408
dimakis merged 4 commits into
mainfrom
fix/obs-volumes

Conversation

@dimakis

@dimakis dimakis commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Summary

  • Added .loki-data:/loki volume mount so pushed logs survive container restarts
  • Added .grafana-data:/var/lib/grafana volume mount for dashboard/preference persistence
  • Added both data directories to .gitignore

Test plan

  • npm run observability:down && npm run observability:up
  • Verify Loki receives logs: {app="mitzo"} in Grafana
  • Restart containers, confirm logs and traces are still queryable
  • Confirm .loki-data/ and .grafana-data/ directories are created on host

🤖 Generated with Claude Code

dimakis and others added 2 commits June 27, 2026 13:52
Jaeger v2 (latest tag) silently ignores v1 env vars (SPAN_STORAGE_TYPE,
BADGER_*) and falls back to in-memory storage. All traces were lost on
container restart — .jaeger-data/ was empty despite BADGER_EPHEMERAL=false.

Replace env var config with a proper v2 OTel Collector YAML config that
explicitly configures Badger persistent storage. Verified traces survive
container restarts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arts

Loki had no data volume — pushed logs were lost on every container restart.
Grafana similarly lost any manually created dashboards or preferences.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 1 issue(s) (1 warning).

infra/jaeger-v2-config.yaml

Volume mounts and .gitignore additions are correct; the Jaeger v2 migration drops the 14-day trace TTL — add span_store_ttl: 336h to the Badger config to avoid unbounded storage growth.

  • 🟡 regressions (L21): The old v1 config set BADGER_SPAN_STORE_TTL: 336h (14-day retention). The new v2 config omits any TTL setting, so Badger will retain traces indefinitely and the .jaeger-data directory will grow without bound. Add span_store_ttl: 336h under the badger: key to preserve the 14-day retention from v1. [fixable]

directories:
keys: /badger/key
values: /badger/data
ephemeral: false

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 regressions: The old v1 config set BADGER_SPAN_STORE_TTL: 336h (14-day retention). The new v2 config omits any TTL setting, so Badger will retain traces indefinitely and the .jaeger-data directory will grow without bound. Add span_store_ttl: 336h under the badger: key to preserve the 14-day retention from v1. [fixable]

Centaur review caught that the v2 migration dropped the BADGER_SPAN_STORE_TTL
setting, leaving traces to grow without bound.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 2 issue(s) (1 warning).

docker-compose.yml

Clean infrastructure fix — Loki and Grafana volume mounts are correct, Jaeger v2 config properly replaces the silently-ignored v1 env vars, and .gitignore is updated. Only suggestion is to pin the Jaeger image version like the other services.

  • 🟡 regressions (L14): Using jaegertracing/jaeger:latest (line 9, unchanged) means the v2 config format could break if the image regresses or changes schema. Consider pinning to a specific version (e.g., jaegertracing/jaeger:2.x.y) like the other services (Loki 3.4.2, Grafana 12.4.1) to avoid silent breakage. [fixable]

infra/jaeger-v2-config.yaml

Clean infrastructure fix — Loki and Grafana volume mounts are correct, Jaeger v2 config properly replaces the silently-ignored v1 env vars, and .gitignore is updated. Only suggestion is to pin the Jaeger image version like the other services.

  • 🔵 unsafe_assumptions (L30): The OTLP HTTP receiver binds to 0.0.0.0:4318, which is correct for container use but worth noting — it only works because Docker's port mapping provides the access control. No issue here, just confirming the bind address is intentional for containerized deployment.

Comment thread docker-compose.yml
BADGER_DIRECTORY_VALUE: /badger/data
BADGER_DIRECTORY_KEY: /badger/key
BADGER_SPAN_STORE_TTL: 336h # 14 days retention
command: ['--config', '/etc/jaeger/config.yaml']

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 regressions: Using jaegertracing/jaeger:latest (line 9, unchanged) means the v2 config format could break if the image regresses or changes schema. Consider pinning to a specific version (e.g., jaegertracing/jaeger:2.x.y) like the other services (Loki 3.4.2, Grafana 12.4.1) to avoid silent breakage. [fixable]

otlp:
protocols:
http:
endpoint: 0.0.0.0:4318

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 unsafe_assumptions: The OTLP HTTP receiver binds to 0.0.0.0:4318, which is correct for container use but worth noting — it only works because Docker's port mapping provides the access control. No issue here, just confirming the bind address is intentional for containerized deployment.

Centaur review flagged that :latest can silently break the v2 config
format. Pin like Loki (3.4.2) and Grafana (12.4.1).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

LGTM — no issues found.

@dimakis dimakis merged commit 67cf495 into main Jun 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant