Skip to content

fix: use block_number/block_offset to uniquely identify log rows#2071

Open
karl-power wants to merge 1 commit intomainfrom
karl/id-field-in-log-filter
Open

fix: use block_number/block_offset to uniquely identify log rows#2071
karl-power wants to merge 1 commit intomainfrom
karl/id-field-in-log-filter

Conversation

@karl-power
Copy link
Copy Markdown
Contributor

@karl-power karl-power commented Apr 8, 2026

Summary

Fixes an issue where clicking a log row to expand it could show the wrong row's details when multiple rows share identical visible column values (Timestamp, ServiceName, Body, etc.).

Root cause: The detail panel query reconstructs a WHERE clause from only the visible columns, without the original search filters. When rows collide on those columns, LIMIT 1 returns whichever row ClickHouse finds first — not necessarily the one clicked.

Fix: Detect and inject ClickHouse's built-in _block_number and _block_offset virtual columns as hidden tiebreakers in the row SELECT. These are available when the table engine was created with enable_block_number_column = 1 and enable_block_offset_column = 1. Since the row WHERE clause builder already iterates all columns in the row data, the block columns are automatically included in detail queries — disambiguating rows without surfacing them in the UI.

A fallback to __hdx_id (a custom materialized column from the prior approach) is retained for any tables that already have that column.

No schema migration required for the primary fix — the approach relies entirely on engine-level settings that already exist on the otel_logs table.

Key changes

  • DBRowTable.tsx: Rename appendSelectWithPrimaryAndPartitionKeyappendSelectWithAdditionalKeys adding an extraKeys: string[] param. Rename useConfigWithPrimaryAndPartitionKeyuseConfigWithAdditionalSelect; when a sourceId is present (row-level query), inspect engine_full metadata for block column flags and inject _block_number/_block_offset as extra keys, falling back to __hdx_id if present.
  • types.ts / source.ts / SourceForm.tsx: Remove uniqueRowIdExpression — the tiebreaker is now auto-detected from the table engine, not manually configured.
  • Schema seed: Remove the __hdx_id materialized column addition from otel_logs; the block column engine flags are sufficient.
  • Tests: New appendSelectWithAdditionalKeys cases covering extraKeys append, deduplication against primary/partition keys, and array-style selects. Scaffold for inferTableSourceConfig tests.

How to test locally

Reproduce the bug on main

  1. Pick any log row from otel_logs. In the ClickHouse UI, run the query that fires when you click a row to open its sidebar.
  2. Insert the same row into otel_logs but change one nested value — e.g. change ResourceAttributes.host.arch from arm64 to arm65.
  3. On the search page, filter for ResourceAttributes.host.arch = arm65. One result appears. Click it to open the sidebar.
  4. The sidebar shows arm64 — it selected the wrong row.

Verify the fix on this branch

  1. Confirm otel_logs was created with enable_block_number_column = 1 and enable_block_offset_column = 1 (check via SHOW CREATE TABLE default.otel_logs).
  2. Repeat steps 1–3 above.
  3. The sidebar now shows arm65 — correct row selected.

Verify fallback path

  1. On a table that has __hdx_id but no block columns, the app still injects __hdx_id as the tiebreaker (same behavior as the previous approach).

References

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 8, 2026

🦋 Changeset detected

Latest commit: 0f75ae4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Patch
@hyperdx/app Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 1, 2026 1:44am

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

E2E Test Results

All tests passed • 159 passed • 3 skipped • 1189s

Status Count
✅ Passed 159
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@karl-power karl-power force-pushed the karl/id-field-in-log-filter branch from a7ee44f to 3151b4d Compare April 9, 2026 09:00
@karl-power karl-power marked this pull request as ready for review April 9, 2026 13:53
@github-actions github-actions Bot added the review/tier-4 Critical — deep review + domain expert sign-off label Apr 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Critical-path files (1):
    • docker/otel-collector/schema/seed/00002_otel_logs.sql
  • Cross-layer change: touches frontend (packages/app) + backend (packages/api) + shared utils (packages/common-utils)

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Production files changed: 6
  • Production lines changed: 81 (+ 73 in test files, excluded from tier calculation)
  • Branch: karl/id-field-in-log-filter
  • Author: karl-power

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@karl-power karl-power requested a review from wrn14897 April 9, 2026 13:54
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

PR Review

PR #2071 — fix: use block_number/block_offset to uniquely identify log rows

The approach is sound: injecting ClickHouse's virtual block columns as hidden tiebreakers cleanly solves the duplicate-row collision bug without any schema migration for new installs.

  • ⚠️ Fragile engine string matchingengineFull.includes('enable_block_number_column = 1') will silently fall back to __hdx_id (or no tiebreaker) if ClickHouse changes whitespace/formatting in engine_full. Consider a case-insensitive regex or trimming spaces around = to be safe.

  • ⚠️ No migration path for existing deployments → Tables created before this change won't have enable_block_number_column/enable_block_offset_column set; the fix is a no-op for them unless they recreate otel_logs. The fallback to __hdx_id helps only if that materialized column exists. This should be called out in release notes / upgrade docs.

  • ⚠️ sourceId used as a boolean flag → The sourceId?: string parameter in useConfigWithAdditionalSelect is only used to enable/disable the useColumns query. Callers like usePatterns.tsx pass no sourceId (correctly), but the parameter name obscures its true purpose ("am I a row-level query?"). A dedicated isRowQuery?: boolean param would be less surprising to future callers.

  • uniqueRowIdExpression is fully removed with no dangling references — clean.

  • ✅ New tests cover extraKeys append, dedup against select, dedup against primary keys, and array-style select.

  • additionalKeysLength count remains correct with extraKeys (hidden keys are properly stripped from the UI via selectColumnMapWithoutAdditionalKeys).

@karl-power
Copy link
Copy Markdown
Contributor Author

@wrn14897 Do you know if the migrations are needed here? I tried to run them locally which invokes migrate, but I don't see that installed anywhere.

Copy link
Copy Markdown
Member

@wrn14897 wrn14897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the UUID type for the id column (https://clickhouse.com/docs/sql-reference/data-types/uuid). Using toUInt16(rand()) only provides 65,536 possible values, which could lead to collisions.

I’d also suggest raising this with the team, since it involves a schema change. It would be good to make sure we all align on the approach for the ID field before moving forward. cc @knudtty

@karl-power
Copy link
Copy Markdown
Contributor Author

karl-power commented Apr 13, 2026

I think we should use the UUID type for the id column

@MikeShi42 do you have thoughts on this as you originally suggested UInt16?

I'm guessing the reasoning was that UInt16 combined with the other fields from the search query would be enough to reduce the chance of a collision, with the benefit of less storage than UUID

@wrn14897
Copy link
Copy Markdown
Member

I think we should use the UUID type for the id column

@MikeShi42 do you have thoughts on this as you originally suggested UInt16?

I'm guessing the reasoning was that UInt16 combined with the other fields from the search query would be enough to reduce the chance of a collision, with the benefit of less storage than UUID

I understand the reason for reducing storage (to achieve a better compression ratio), and that this ID is only used internally. Also, as you mentioned, the chance of collisions would be lower if the search is always combined with other fields. I just want to confirm that this is intentional

@karl-power
Copy link
Copy Markdown
Contributor Author

@wrn14897 I have shared with the team but didn't receive any feedback yet. Are you happy to proceed like this or prefer to use uuid?

@MikeShi42
Copy link
Copy Markdown
Contributor

@karl-power instead of creating a new column, any thoughts on if we should instead just use _block_number and _block_offset and enabling enable_block_number_column and enable_block_offset_column by default in new tables? This will technically allow other features as well within ClickHouse - so more useful than a random id.

We can tell existing users to run an update to add this new random id column (as suggested in this PR) if they haven't set up their table to use block number/offsets.

@karl-power
Copy link
Copy Markdown
Contributor Author

karl-power commented Apr 28, 2026

@karl-power instead of creating a new column, any thoughts on if we should instead just use _block_number and _block_offset and enabling enable_block_number_column and enable_block_offset_column by default in new tables? This will technically allow other features as well within ClickHouse - so more useful than a random id.

We can tell existing users to run an update to add this new random id column (as suggested in this PR) if they haven't set up their table to use block number/offsets.

@MikeShi42 Yeah I think the block_number/offset combo would work. So you're suggesting to remove the UI control for the unique identifier and try to fallback to __hdx_id if the block settings aren't enabled?

@karl-power karl-power force-pushed the karl/id-field-in-log-filter branch from d790f62 to 327e41e Compare May 1, 2026 00:50
@karl-power karl-power changed the title fix: filter logs with ID field fix: use block_number/block_offset to uniquely identify log rows May 1, 2026
@karl-power karl-power force-pushed the karl/id-field-in-log-filter branch from 327e41e to e44d45a Compare May 1, 2026 01:36
@karl-power karl-power force-pushed the karl/id-field-in-log-filter branch from e44d45a to 0f75ae4 Compare May 1, 2026 01:41
@karl-power
Copy link
Copy Markdown
Contributor Author

@MikeShi42 updated to use block number/offset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

‘expand log line’ not use all filter.

3 participants