fix: use block_number/block_offset to uniquely identify log rows#2071
fix: use block_number/block_offset to uniquely identify log rows#2071karl-power wants to merge 1 commit intomainfrom
Conversation
🦋 Changeset detectedLatest commit: 0f75ae4 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
E2E Test Results✅ All tests passed • 159 passed • 3 skipped • 1189s
Tests ran across 4 shards in parallel. |
a7ee44f to
3151b4d
Compare
🔴 Tier 4 — CriticalTouches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD. Why this tier:
Review process: Deep review from a domain expert. Synchronous walkthrough may be required. Stats
|
PR ReviewPR #2071 — fix: use block_number/block_offset to uniquely identify log rows The approach is sound: injecting ClickHouse's virtual block columns as hidden tiebreakers cleanly solves the duplicate-row collision bug without any schema migration for new installs.
|
|
@wrn14897 Do you know if the migrations are needed here? I tried to run them locally which invokes |
3151b4d to
8772bd3
Compare
wrn14897
left a comment
There was a problem hiding this comment.
I think we should use the UUID type for the id column (https://clickhouse.com/docs/sql-reference/data-types/uuid). Using toUInt16(rand()) only provides 65,536 possible values, which could lead to collisions.
I’d also suggest raising this with the team, since it involves a schema change. It would be good to make sure we all align on the approach for the ID field before moving forward. cc @knudtty
@MikeShi42 do you have thoughts on this as you originally suggested UInt16? I'm guessing the reasoning was that UInt16 combined with the other fields from the search query would be enough to reduce the chance of a collision, with the benefit of less storage than UUID |
I understand the reason for reducing storage (to achieve a better compression ratio), and that this ID is only used internally. Also, as you mentioned, the chance of collisions would be lower if the search is always combined with other fields. I just want to confirm that this is intentional |
8772bd3 to
5d75767
Compare
|
@wrn14897 I have shared with the team but didn't receive any feedback yet. Are you happy to proceed like this or prefer to use |
|
@karl-power instead of creating a new column, any thoughts on if we should instead just use We can tell existing users to run an update to add this new random id column (as suggested in this PR) if they haven't set up their table to use block number/offsets. |
@MikeShi42 Yeah I think the |
d790f62 to
327e41e
Compare
327e41e to
e44d45a
Compare
e44d45a to
0f75ae4
Compare
|
@MikeShi42 updated to use block number/offset |
Summary
Fixes an issue where clicking a log row to expand it could show the wrong row's details when multiple rows share identical visible column values (Timestamp, ServiceName, Body, etc.).
Root cause: The detail panel query reconstructs a WHERE clause from only the visible columns, without the original search filters. When rows collide on those columns,
LIMIT 1returns whichever row ClickHouse finds first — not necessarily the one clicked.Fix: Detect and inject ClickHouse's built-in
_block_numberand_block_offsetvirtual columns as hidden tiebreakers in the row SELECT. These are available when the table engine was created withenable_block_number_column = 1andenable_block_offset_column = 1. Since the row WHERE clause builder already iterates all columns in the row data, the block columns are automatically included in detail queries — disambiguating rows without surfacing them in the UI.A fallback to
__hdx_id(a custom materialized column from the prior approach) is retained for any tables that already have that column.No schema migration required for the primary fix — the approach relies entirely on engine-level settings that already exist on the
otel_logstable.Key changes
DBRowTable.tsx: RenameappendSelectWithPrimaryAndPartitionKey→appendSelectWithAdditionalKeysadding anextraKeys: string[]param. RenameuseConfigWithPrimaryAndPartitionKey→useConfigWithAdditionalSelect; when asourceIdis present (row-level query), inspectengine_fullmetadata for block column flags and inject_block_number/_block_offsetas extra keys, falling back to__hdx_idif present.types.ts/source.ts/SourceForm.tsx: RemoveuniqueRowIdExpression— the tiebreaker is now auto-detected from the table engine, not manually configured.__hdx_idmaterialized column addition fromotel_logs; the block column engine flags are sufficient.appendSelectWithAdditionalKeyscases coveringextraKeysappend, deduplication against primary/partition keys, and array-style selects. Scaffold forinferTableSourceConfigtests.How to test locally
Reproduce the bug on
mainotel_logs. In the ClickHouse UI, run the query that fires when you click a row to open its sidebar.otel_logsbut change one nested value — e.g. changeResourceAttributes.host.archfromarm64toarm65.ResourceAttributes.host.arch = arm65. One result appears. Click it to open the sidebar.arm64— it selected the wrong row.Verify the fix on this branch
otel_logswas created withenable_block_number_column = 1andenable_block_offset_column = 1(check viaSHOW CREATE TABLE default.otel_logs).arm65— correct row selected.Verify fallback path
__hdx_idbut no block columns, the app still injects__hdx_idas the tiebreaker (same behavior as the previous approach).References