Skip to content

BE-517: Stop double-stringifying log payloads#8669

Open
TimDiekmann wants to merge 5 commits intomainfrom
t/be-517-stop-double-stringifying-log-payloads
Open

BE-517: Stop double-stringifying log payloads#8669
TimDiekmann wants to merge 5 commits intomainfrom
t/be-517-stop-double-stringifying-log-payloads

Conversation

@TimDiekmann
Copy link
Copy Markdown
Member

@TimDiekmann TimDiekmann commented Apr 28, 2026

🌟 What is the purpose of this PR?

logger.X(JSON.stringify(obj)) round-trips through Winston's JSON formatter. The OTLP log record ends up with the entire payload as an escaped string in body and an empty attributes map — fields cannot be searched in Loki without re-parsing the body. Tim already worked around this on the Grafana ingestion side; this PR removes the workaround at the source.

🔗 Related links

🔍 What does this change?

  • Replace the antipattern at 6 call sites with logger.X("description", payload). Identifier-rich descriptions:
    • ${req.method} ${req.path} for Express requests (e.g. GET /health-check)
    • ${operationType} ${operationName} for GraphQL ops (e.g. query GetMe, mutation CreateEntity failed)
    • llm ${provider} ${taskName} for LLM logs (e.g. llm openai extract-entities)
  • Move error objects, large stringified payloads, and IDs out of the message body and into structured attributes across ~14 other call sites
  • Rename catch (err)catch (error) so { error } shorthand works
  • Skip /graphql in the Express request log — Apollo's plugin logs operation-level detail which is more useful than the generic POST /graphql. requestId is still generated and set on the response header for downstream correlation
  • Apollo plugin now uses ctx.contextValue.logger (the request-scoped child logger that already carries requestId) so each operation log correlates with its request
  • activity-logger.ts: signature extended to accept an optional meta object. Removed the legacy JSON.parse(message) workaround that existed solely to recover from the antipattern this PR removes

Apollo Server's ctx.logger is a 1-arg Logger interface, so ctx.contextValue.logger (typed as our Winston Logger) is used in the plugin instead.

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

  • does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

  • are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

  • do not affect the execution graph

⚠️ Known issues

  • The workers (hash-ai-worker-ts, hash-integration-worker) still have no OTEL setup, so their structured log calls are invisible in Loki until BE-519 lands. This PR upgrades the call-site quality regardless — once OTEL is wired up, the value is unlocked without further code changes.
  • ~20 lower-value call sites still embed identifiers in the message body (e.g. Listening on port ${port}, Iteration ${count} begun). Tracked in BE-518 to keep this PR focused.

🐾 Next steps

  • BE-518: cleanup of remaining lower-value call sites + an eslint rule to prevent logger.X(JSON.stringify(...)) regression and a no-console rule for production code (catches console.info in hash-integration-worker/src/main.ts)
  • BE-519: wire up OTEL for both Temporal workers so log payloads actually reach Loki

🛡 What tests cover this?

No new tests — change is to logging shape only, no behavioural change. Verified locally with a Winston harness that the new logger.X("desc", obj) pattern produces structured OTLP records (body=description, attributes=payload).

❓ How to test this?

  1. Check out the branch
  2. yarn dev:backend and hit a few GraphQL queries / make a recovery / send a test email
  3. With a local OTEL collector pointed at the API, confirm log records have body set to a short description and attributes containing the structured payload (operation, elapsed, etc.)
  4. Without OTEL, just confirm the dev console output is readable (e.g. info: query GetMe {operation:"GetMe",elapsed:"5ms",...})

📹 Demo

image image
info: query queryEntities {"clientInfo":{"clientName":"web-app"},"elapsed":"111.39ms","environment":"development","hostname":"localhost","operation":"queryEntities","requestId":"VsXJXwkzBIQT83","service":"api","timestamp":"2026-04-28T16:25:14.425Z","userAgent":"Mozilla/5.0"}
info: query queryEntities {"clientInfo":{"clientName":"web-app"},"elapsed":"59.06ms","environment":"development","hostname":"localhost","operation":"queryEntities","requestId":"cJx2L6jO1GjYCM","service":"api","timestamp":"2026-04-28T16:25:14.570Z","userAgent":"Mozilla/5.0"}
info: query me {"clientInfo":{"clientName":"web-app"},"elapsed":"36.75ms","environment":"development","hostname":"localhost","operation":"me","requestId":"f58b1f8S3CKXy3","service":"api","timestamp":"2026-04-28T16:25:14.675Z","userAgent":"node"}
info: GET /auth/sessions/whoami {"environment":"development","hostname":"localhost","ip":"127.0.0.1","requestId":"eyDq57D5wUohFe","service":"api","timestamp":"2026-04-28T16:25:14.679Z","userAgent":"axios/1.15.1"}
info: [HPM] GET /auth/sessions/whoami -> http://127.0.0.1:4433/sessions/whoami [200] {"environment":"development","hostname":"localhost","service":"api","timestamp":"2026-04-28T16:25:14.684Z"}
info: GET /auth/sessions/whoami {"environment":"development","hostname":"localhost","ip":"127.0.0.1","origin":"http://localhost:3000","requestId":"775bZgpOLOlxcu","service":"api","timestamp":"2026-04-28T16:25:14.691Z","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.3.1 Safari/605.1.15"}
info: [HPM] GET /auth/sessions/whoami -> http://127.0.0.1:4433/sessions/whoami [200] {"environment":"development","hostname":"localhost","service":"api","timestamp":"2026-04-28T16:25:14.698Z"}
info: query queryEntities {"clientInfo":{"clientName":"web-app"},"elapsed":"32.62ms","environment":"development","hostname":"localhost","operation":"queryEntities","requestId":"o54WGq7IDUrxJx","service":"api","timestamp":"2026-04-28T16:25:14.852Z","userAgent":"Mozilla/5.0"}
(node:82144) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [Socket]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
info: query queryEntities {"clientInfo":{"clientName":"web-app"},"elapsed":"82.29ms","environment":"development","hostname":"localhost","operation":"queryEntities","requestId":"MUiox8YBpvJzhf","service":"api","timestamp":"2026-04-28T16:25:14.899Z","userAgent":"Mozilla/5.0"}
info: query queryEntities {"clientInfo":{"clientName":"web-app"},"elapsed":"91.46ms","environment":"development","hostname":"localhost","operation":"queryEntities","requestId":"JxBVAwNbQEWTn2","service":"api","timestamp":"2026-04-28T16:25:14.915Z","userAgent":"Mozilla/5.0"}
info: query queryEntitySubgraph {"clientInfo":{"clientName":"web-app"},"elapsed":"137.35ms","environment":"development","hostname":"localhost","operation":"queryEntitySubgraph","requestId":"6NaF7V8YJYGJZp","service":"api","timestamp":"2026-04-28T16:25:15.068Z","userAgent":"Mozilla/5.0"}
info: GET /auth/sessions/whoami {"environment":"development","hostname":"localhost","ip":"127.0.0.1","origin":"http://localhost:3000","requestId":"dC7jpb5KXzyXjQ","service":"api","timestamp":"2026-04-28T16:25:15.116Z","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.3.1 Safari/605.1.15"}
info: [HPM] GET /auth/sessions/whoami -> http://127.0.0.1:4433/sessions/whoami [200] {"environment":"development","hostname":"localhost","service":"api","timestamp":"2026-04-28T16:25:15.122Z"}
info: query countEntities {"clientInfo":{"clientName":"web-app"},"elapsed":"21.65ms","environment":"development","hostname":"localhost","operation":"countEntities","requestId":"UrJPY0T3sG8vVV","service":"api","timestamp":"2026-04-28T16:25:21.336Z","userAgent":"Mozilla/5.0"}
info: query countEntities {"clientInfo":{"clientName":"web-app"},"elapsed":"28.31ms","environment":"development","hostname":"localhost","operation":"countEntities","requestId":"Bmnm90N8My5C0A","service":"api","timestamp":"2026-04-28T16:25:21.340Z","userAgent":"Mozilla/5.0"}
info: query getMyPendingInvitations {"clientInfo":{"clientName":"web-app"},"elapsed":"73.88ms","environment":"development","hostname":"localhost","operation":"getMyPendingInvitations","requestId":"9jVHrBNBZRn3J5","service":"api","timestamp":"2026-04-28T16:25:21.391Z","userAgent":"Mozilla/5.0"}

`logger.X(JSON.stringify(obj))` round-trips through Winston's JSON
formatter, leaving an escaped string in the OTLP log body and empty
attributes — fields cannot be searched in Loki without re-parsing the
body string.

Replace the antipattern with `logger.X("description", payload)`:

- body: a short identifier-rich description (e.g. `query GetMe`,
  `mutation CreateEntity failed`, `GET /health-check`,
  `llm openai extract-entities`)
- attributes: the structured payload, directly searchable

Also clean up the same shape elsewhere — error objects, large
stringified payloads, and IDs embedded in template strings — moving
them out of the body and into attributes. Catch parameters renamed
`err` → `error` so the shorthand `{ error }` works.

Apollo Server's `ctx.logger` is a 1-arg Logger interface, so the
graphql plugin now uses our Winston logger directly via closure
instead of going through `ctx.logger`.

The `activity-logger.ts` JSON.parse workaround that existed solely to
recover from the antipattern is no longer needed; signature extended
to accept an optional meta object.
…gh request-scoped logger

The Express request middleware was logging every request including
POST /graphql, which the Apollo plugin already logs at the
operation level (query/mutation name) — duplicate and less informative.

Skip /graphql in the middleware. requestId still gets generated and
set on the response header for downstream correlation.

In the Apollo plugin, switch from the closure-captured logger to
ctx.contextValue.logger, which is the per-request child logger that
already carries requestId in its metadata. Each operation log now
correlates with the request that produced it.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Actions Updated (UTC)
hash Ignored Ignored Preview Apr 28, 2026 6:16pm
hashdotdesign-tokens Ignored Ignored Preview Apr 28, 2026 6:16pm
petrinaut Skipped Skipped Apr 28, 2026 6:16pm

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 28, 2026

PR Summary

Low Risk
Primarily reshapes logging output (message vs structured metadata) across API and workers; functional behavior should be unchanged, but log ingestion/searching and any tooling that relied on previous log text may be affected.

Overview
Stops the logger.X(JSON.stringify(...)) pattern and switches many log sites to structured logging (logger.X("msg", { ...meta })), including LLM request/server-error logs, Temporal worker activity logs, email transporters, and various AI flow activities.

Updates the Temporal activity-logger to accept optional metadata, write filtered flow-level log files using safeStringify, and removes the previous JSON-parse workaround.

Enhances the shared backend logger with safeStringify/expandLogValue (better Error/BigInt/circular handling) and a Winston format to expand nested values and strip detailedFields from console output; API request logging now skips /graphql in favor of operation-level logging in the Apollo plugin using the request-scoped logger for requestId correlation.

Reviewed by Cursor Bugbot for commit 1e50d75. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions github-actions Bot added area/apps > hash* Affects HASH (a `hash-*` app) area/apps > hash-api Affects the HASH API (app) type/eng > backend Owned by the @backend team area/apps labels Apr 28, 2026
@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Apr 28, 2026

🤖 Augment PR Summary

Summary: This PR improves backend and worker logging by removing the anti-pattern of stringifying payload objects before passing them to Winston, so OTLP logs carry structured attributes instead of an escaped JSON blob in the log body.

Changes:

  • Replaced several logger.X(JSON.stringify(...)) call sites with logger.X("desc", { ... }) structured meta objects.
  • Updated GraphQL/Apollo request logging to emit operation-level messages (e.g. query GetMe) using the request-scoped child logger so requestId correlates across logs.
  • Adjusted Express request logging to skip /graphql to avoid duplicating the more-informative operation logs.
  • Extended the Temporal flow activity-logger to accept optional meta objects and removed the legacy JSON parse workaround.
  • Moved error/details (IDs, large payloads, HTML bodies) out of interpolated strings and into structured attributes across API, AI worker, and integration worker code.

Technical Notes: The new pattern keeps log bodies short and human-readable while preserving queryable fields in OTLP/Winston attributes (important for Loki search and correlation).

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread apps/hash-ai-worker-ts/src/activities/shared/activity-logger.ts Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 0% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.49%. Comparing base (3d9d985) to head (1e50d75).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...worker-ts/src/activities/shared/activity-logger.ts 0.00% 12 Missing ⚠️
apps/hash-api/src/graphql/create-apollo-server.ts 0.00% 7 Missing ⚠️
...ivities/shared/get-llm-response/log-llm-request.ts 0.00% 5 Missing ⚠️
...activities/research-entities-action/checkpoints.ts 0.00% 3 Missing ⚠️
...tivities/flow-activities/answer-question-action.ts 0.00% 1 Missing ⚠️
...ies/research-entities-action/coordinating-agent.ts 0.00% 1 Missing ⚠️
...n/coordinating-agent/process-complete-tool-call.ts 0.00% 1 Missing ⚠️
...es/research-entities-action/link-follower-agent.ts 0.00% 1 Missing ⚠️
...ollower-agent/get-link-follower-next-tool-calls.ts 0.00% 1 Missing ⚠️
...rch-entities-action/shared/deduplicate-entities.ts 0.00% 1 Missing ⚠️
... and 10 more
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8669   +/-   ##
=======================================
  Coverage   62.49%   62.49%           
=======================================
  Files        1318     1318           
  Lines      134235   134232    -3     
  Branches     5521     5524    +3     
=======================================
  Hits        83894    83894           
+ Misses      49426    49423    -3     
  Partials      915      915           
Flag Coverage Δ
apps.hash-ai-worker-ts 1.41% <0.00%> (+<0.01%) ⬆️
apps.hash-api 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…refs

Naive JSON.stringify on log payloads loses information (Error own
properties are non-enumerable so an Error serialises to `{}`) or
throws (BigInts, circular references). Both shapes appeared after the
move to `logger.X("desc", meta)`, e.g. `{ error }` from `catch (error)`.

Add `expandLogValue` and `safeStringify` to `hash-backend-utils/logger`:

- `expandLogValue` recursively replaces Errors with `{ name, message,
  stack, cause? }` and BigInts with their string form, then descends
  into arrays and plain objects.
- `safeStringify` runs `JSON.stringify` over the expanded value,
  catching anything that still throws (e.g. circular refs) and
  returning a marker rather than crashing the call site.

Wire the helpers into:

- The Winston logger pipeline via a new `expandLogValuesFormat()` that
  runs before `format.json()` on the logger-level format chain. This
  fixes nested Errors in the OTLP/Datadog/console outputs without
  touching individual call sites.
- `OpenTelemetryLogTransport.toAnyValue` non-primitive path.
- `rewriteForConsole`'s console-prefix path.
- The flow-run log file in `activity-logger.ts`.

Drop newly-unused `stringify` imports in three files; tighten two
`.catch((err) => ...)` parameters to `(error: unknown)` to satisfy
the no-unsafe-assignment rule on the `{ error }` shorthand.
@vercel vercel Bot temporarily deployed to Preview – petrinaut April 28, 2026 16:39 Inactive
@github-actions github-actions Bot added the area/libs Relates to first-party libraries/crates/packages (area) label Apr 28, 2026
Comment thread libs/@local/hash-backend-utils/src/logger.ts Outdated
Comment thread libs/@local/hash-backend-utils/src/logger.ts
Comment thread libs/@local/hash-backend-utils/src/logger.ts Outdated
@graphite-app graphite-app Bot requested a review from a team April 28, 2026 17:06
Three valid findings on the previous commit:

1. `rewriteForConsole` only stripped `detailedFields` from object-shaped
   `info.message`. Activity-logger now passes the message as a string and
   spreads the structured payload onto the info object directly, so the
   detailed payload (request/response/transformedRequest blobs) was
   ending up in the dev console. Filter `detailedFields` from the
   top-level info metadata as well, and treat `detailedFields` itself
   as a control field that shouldn't appear in console output.

2. `expandLogValue` rebuilt every object via `Object.entries`, which
   silently dropped values relying on `toJSON` — most importantly Date,
   but also Buffer and decimal/ID types. Pass through anything with a
   `toJSON` method so JSON.stringify can call it.

3. `expandLogValue` had no cycle guard, so a circular payload would
   cause infinite recursion. `safeStringify` caught that, but
   `expandLogValuesFormat` (used as a Winston format) did not, so a
   cyclic value reaching the logger pipeline would throw out of it.
   Track visited objects in a WeakSet (replace with `[Circular]`) and
   wrap the format invocation in try/catch as a backstop.
@vercel vercel Bot temporarily deployed to Preview – petrinaut April 28, 2026 18:07 Inactive
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7373fdd. Configure here.

Comment thread libs/@local/hash-backend-utils/src/logger.ts
Round of review-agent / bugbot follow-ups on the safe-stringify work:

- expandLogValue: WeakSet was added to but never popped, so any object
  reused across sibling positions (not a true cycle) was being labelled
  `[Circular]` in console / file / OTLP output. Switched to a path-scoped
  Set with try/finally cleanup so genuine self-references are still
  caught while shared references serialise normally.
- expandLogValue: Object.entries only enumerates string keys, so
  rebuilding the info object stripped Winston's symbol-keyed metadata
  (Symbol.for("level"), "message", "splat"). Carry symbol properties
  through unchanged.
- JSDoc tightened: explain why Errors serialise to `{}` accurately
  (no toJSON, non-enumerable own/inherited members), and stop claiming
  unconditional circular-ref safety in safeStringify — only plain-object
  cycles get the `[Circular]` rewrite, anything else falls through to
  the try/catch marker.
- Drop a stale comment in activity-logger.ts (referenced consolePrefix
  in a code path that no longer uses one) and a redundant inline comment
  in expandLogValuesFormat. Trim the `logger.child` implementation
  detail from the Apollo plugin comment.
- Rename `result` → `cleanedInfo` in rewriteForConsole.
@vercel vercel Bot temporarily deployed to Preview – petrinaut April 28, 2026 18:16 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/apps > hash* Affects HASH (a `hash-*` app) area/apps > hash-api Affects the HASH API (app) area/apps area/libs Relates to first-party libraries/crates/packages (area) type/eng > backend Owned by the @backend team

Development

Successfully merging this pull request may close these issues.

1 participant