feat(site-agent): log trace id on gRPC client responses#3005
Conversation
The gRPC client metrics callback logged only method, code, and duration, so an operator reading e.g. a DeadlineExceeded line could not tell which workflow issued the call. The Flow and Core gRPC clients are already otelgrpc-instrumented, so every RPC carries a span; thread the call context into Metrics.RecordRpcResponse and tag the log with the span's trace id when present, letting operators thread together all logs for a single workflow. - Add ctx to client.Metrics.RecordRpcResponse and pass it from the unary and stream interceptors. - Add client.TraceIDFromContext, a panic-safe helper returning the hex-encoded trace id (or "" when absent), covered by a unit test. - Log trace_id via the existing zerolog style in the coregrpc and flowgrpc implementations; the existing message text is unchanged. Fixes NVIDIA#2650 Signed-off-by: Omar Refai <omar@refai.org>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
Summary by CodeRabbit
WalkthroughThe ChangesTrace ID propagation through gRPC client metrics
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/ok to test fb15808 |
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-06-30 20:04:36 UTC | Commit: fb15808 |
Thread the per-RPC OpenTelemetry trace id into the site-agent gRPC client response logs so an operator can correlate a log line back to the workflow that issued the call.
Today
RecordRpcResponselogs onlymethod,code, andduration:There is no way to tell which workflow triggered the RPC, which makes reasoning about e.g. a
DeadlineExceededimpossible. The Flow and Core gRPC clients are alreadyotelgrpc-instrumented, so every call carries a span; this passes the callcontext.Contextinto theMetrics.RecordRpcResponsecallback and tags the log with the span'strace_idwhen present.Related issues
Fixes #2650
Type of Change
Breaking Changes
Metrics.RecordRpcResponsegains a leadingctx context.Contextparameter. The interface has only two in-tree implementations (coregrpc,flowgrpc) and two call sites (the unary and stream interceptors), all updated here — no other implementers exist in the repo.Testing
Added
TestTraceIDFromContext(table-driven, testify) covering: no span context →"", valid span → hex-encoded trace id, all-zero trace id →"".go build,go vet,gofmt, andgo testare green for the changed packages.Additional Notes
client.TraceIDFromContexthelper is panic-safe: it returns""when the context carries no valid span, so the field is omitted rather than logged empty, and it does not depend on the call originating inside a Temporal activity.trace_idfield is added, matching the existing zerolog.Str(...)style elsewhere in site-agent.RecordRpcResponsepath; the unrelated, currently-unusedRecordLatency/WorkflowMetricscallback is left untouched to keep the change single-purpose.