docs: propose KeyViz adapter / namespace labels (follow-up)#694
docs: propose KeyViz adapter / namespace labels (follow-up)#694
Conversation
A user observed that on a default single-Raft-group deployment the KeyViz heatmap shows one row covering all five adapters (DynamoDB / Redis / S3 / SQS / RawKV) — the smallest unit of attribution today is a Raft route, and one route covers [empty, empty) by default. Splitting routes via SplitRange or multi-group startup flags works as an operational workaround but the most common shape (single-group) is opaque. This proposal adds an independent label dimension to the sampler so a single route can split into per-adapter sub-rows in the heatmap, with zero hot-path penalty when labels are not configured. Recommended approach (4.1): per-Observe label string; slot key widens from routeID to (routeID, label). One extra map lookup per Observe; legacy callers pass empty label and behaviour is unchanged. Forward-compatible wire-format extension: optional `label` on KeyVizRow / MatrixRow; old SPAs ignore it, new SPAs render "route:N / <label>" when present. Five PRs of follow-up work outlined in 7. Implementation does not start until this doc is reviewed and accepted.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces a new KeyViz design document proposing an independent adapter/namespace label dimension for heatmap rows, changes to sampler/slot key semantics to include typed labels, wire-format and bucket_id updates, coordinator startup flag for rollouts, and detailed implementation and testing guidance. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
There was a problem hiding this comment.
Code Review
This pull request introduces a design proposal for adding adapter-specific labels to the KeyViz heatmap, allowing for more granular traffic attribution within a single Raft route. The proposal recommends extending the sampler's Observe API and the wire format to include a label string while maintaining backward compatibility for existing clients. Feedback on the design highlights a potential issue where bucket_id uniqueness is broken, which could impact existing RPCs and operational stability. Additionally, there is a request for clarification on how labeled slots will be populated in the routeTable without introducing locking overhead on the performance-critical Observe path.
| `bucket_id` formatting in the wire shape stays as-is; the label | ||
| is a separate field. This keeps the `route_ids` / `aggregate` | ||
| semantics from §5 of the parent design unchanged. |
There was a problem hiding this comment.
The design suggests that bucket_id formatting stays as-is (e.g., route:1) and the label is a separate field. This creates a problem for the GetRouteDetail(bucket_id) RPC and the SPA's row identification, as bucket_id is no longer unique per heatmap row. This change presents a significant operational risk for rolling upgrades and system migration. Per repository guidelines, please detail potential mitigation strategies (e.g., a temporary 'bridge' or 'proxy' mode) and strategies for zero-downtime cutovers to avoid service interruption.
References
- When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary 'bridge' or 'proxy' mode.
- When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers to avoid service interruption.
| Cost on hot path: one extra map lookup per Observe — slot is now | ||
| keyed by `(routeID, label)` instead of just `routeID`. The map | ||
| key is a struct of `{uint64, string}`; the `label` is a small | ||
| interned constant (`"dynamo"`), so allocation should be zero. |
There was a problem hiding this comment.
The proposal states that Observe will use a map keyed by (routeID, label) and that RegisterRoute will be a "no-op" for labels. However, Observe is designed to be a lockless hot-path operation that relies on a pre-populated routeTable. If RegisterRoute does not pre-register the (routeID, label) combinations, Observe will fail to find a slot. If Observe were to create slots dynamically, it would require a mutex or a copy-on-write operation, violating the "zero hot-path penalty" goal. Please clarify how labeled slots are populated into the routeTable while maintaining a lockless Observe path.
|
Claude finished @bootjp's task in 3m 19s —— View job Review
The motivation is sound and Option 4.1 (per-Observe label) is the right pick over 4.2–4.4. The main problem is that the design is underspecified where it matters most: how labeled slots enter the COW route table without breaking the lockless hot path. There are also a few concrete existing-code conflicts that downstream PRs will run straight into. Details below. Critical — slot lifecycle for labeled slots is underspecified (§4.1, §7 PR-B, §8 lens 2)The existing sampler's lockless invariant depends on every slot being pre-registered in the COW PR-B's scope says "RegisterRoute no-op (label is per-Observe, not per-route)" — but if the map key widens to Three options worth calling out explicitly:
The design needs to pick one and spell out the Critical —
|
Eight items from round-1 review on the adapter-labels proposal: - 4.1.1 (new) slot lifecycle: pin the lockless invariant by having RegisterRoute pre-create one slot per known label at registration time. Add routeSlot.Label and document reclaimRetiredSlot dedupe on (RouteID, Label). Reject the two alternatives (silent fall-back and hot-path mutex) explicitly. - 5 composite bucket_id: when Label is non-empty, bucket_id = "route:N:label" so it stays globally unique. Without this, pivotKeyVizColumns (keyed by RouteID alone) silently collapses same-route different-label rows back into one row. The pivot-key widening in internal/admin/keyviz_handler.go is now an explicit PR-D+E scope item. - 5 separator: route-vs-label uses `:` (not `/`) so a future hierarchical label using `/` does not produce parser ambiguity. (Gemini minor.) - 6 aggregator: clarify that the bucketID widening encodes the new label dimension; the merge tuple width does not change. - 6.5 (new) virtual buckets: explicit deferral that virtual buckets always emit Label="" - per-label breakdown is only available below MaxTrackedRoutes. - 7 PR-D and PR-E ship together: between them, the aggregator would mis-merge labelled rows (BucketID alone as merge key). Combine into a single PR-D+E. PR-B/C remain independent. - 7 PR-C operator note: refuse to start with a warning when len(routes) x len(labels) > MaxTrackedRoutes - otherwise hot routes silently fold into the virtual bucket. - 9 Q2: canonical constants live in keyviz/labels.go, not adapter/keyviz_labels.go - sampler must not import back to adapter (cycle).
|
@claude review Round-1 follow-ups (3c14f29) addressing all 8 items from the round-1 review: Critical / blocker:
Moderate:
Minor:
The "what looks good" items in your round-1 (4.1 vs 4.2/4.3/4.4 trade-off, forward-compat reasoning, Q3, label allocation discipline) are unchanged from round-1. |
|
Claude finished @bootjp's task in 4m 7s —— View job Review (Round 2)
All eight round-1 items are addressed. The design is in a substantially better state; the §4.1.1 slot-lifecycle section is the clearest improvement. Three remaining issues worth fixing before PR-B goes up for review, in rough priority order. Moderate — §7 PR-C: "refuse to start" is wrong semantics for a dynamic catalogThe current text says: "PR-C should refuse to start with a warning when Routes are dynamic — The existing coarsening path already handles this gracefully: The operator note in §7 PR-C is still correct (raise the cap proportionally), but "refuse to start" should become "emit a warning when the first coarsening event fires due to label-induced overflow". Moderate — §4.1.1: labeled slot creation for coarsened routes is unspecified§4.1.1 says "RegisterRoute pre-creates one slot per known label at registration time." But The design should add one sentence: "When Without this, a PR-B implementor may create labeled slots for coarsened routes, then wonder why they are never hit. Minor — §8 lens-5: two test cases missing from the planThe current test plan covers the happy path and the burst invariant well. Two gaps:
Minor — §4.1.1: "restart the sampler" note is slightly misleading"New labels added after a route is registered are not auto-bound; an operator deploying a new adapter must restart the sampler." The canonical label set lives in What looks good
The two moderate items are the things to fix before PR-B goes up; the two minor items can be folded into the PR-B doc commit or addressed as review comments there. The overall design is ready to proceed. |
Four follow-ups from round-2: - 7 PR-C: replace "refuse to start" with a structured slog.Warn emitted from inside RegisterRoute at coarsening time. Routes are dynamic (the catalog watcher calls RegisterRoute when a SplitRange lands), so a startup-time check would miss every route added after process start. The right operator signal is a log line at the moment a coarsening event fires. - 4.1.1: spell out the coarsened-route path. RegisterRoute that returns false (route folded into a virtual bucket) does NOT pre-create labeled slots; the virtualForRoute fallback in Observe handles all labels uniformly per 6.5. - 4.1.1: tighten the new-label rollout note. "Restart the sampler" was misleading because the canonical set lives at compile time; "deploy a new binary" is what operators actually need. - 8 lens-5: add two missing test cases - slot pre-creation count (1 route -> N_labels+1 slots) and reclaimRetiredSlot (RouteID, Label) dedupe (retire dynamo + re-register redis must NOT reclaim). Plus the coarsened-route + label invariant (label="" on virtual buckets) for completeness.
|
@claude review Round-2 follow-ups (ce8f95b):
|
|
Claude finished @bootjp's task in 3m 2s —— View job Review (Round 3)
All four Round-2 items are correctly resolved. The dynamic-catalog Moderate —
|
Five follow-ups from round-3, all doc-level: - 4.1.1 (moderate): pin virtualForRoute as map[uint64]*routeSlot (NOT widened to slotKey). Coarsened-route Observe falls back to this map on a labeled-slot miss, so widening the key would silently drop all coarsened-route traffic. Added pseudocode showing the exact lookup chain. - 5 (moderate): add the explicit JSON KeyVizRow diff alongside MatrixRow and the proto. Carry the label as a dedicated field AND keep bucket_id composite — the field is for SPA ergonomics (no client-side parsing), bucket_id is the globally-unique row identifier merge logic uses. - 9 Q2 (minor): name the exported AllLabels slice that RegisterRoute iterates over. The PR-B implementor needs an exported handle, not just the per-adapter constants. - 7 PR-B (minor): coordinator call sites in kv/sharded_coordinator.go also need the label="" update; they are the only non-test caller of Sampler.Observe. - 7 PR-C (minor): rollout discipline - raise --keyvizMaxTrackedRoutes before deploying the PR-C binary so startup doesn't flood logs with re-registration warnings on an already-misconfigured node.
|
@claude review Round-3 follow-ups (ae43bfe): Moderate:
Minor:
|
|
Claude finished @bootjp's task in 3m 22s —— View job Review (Round 4)
All five Round-3 items are correctly resolved. The Moderate —
|
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
docs/design/2026_04_28_proposed_keyviz_adapter_labels.md (1)
301-307: Tighten the backward-compatibility claim for “old SPA parses bucket_id”.You say an old SPA against a new server “falls back to legacy formatting” and should be non-breaking. But this depends on the old SPA’s
bucket_idparsing being tolerant of extra segments (route:1:dynamo). Please either:
- specify the parsing behavior the SPA already has (e.g., “splits into at most 2 parts” / “ignores suffix after route id”), or
- explicitly call out that the SPA parser must be updated to be suffix-tolerant.
♻️ Proposed fix (clarify parser expectation)
Forward compatibility: an old SPA against a new server sees a new `bucket_id` shape (`route:1:dynamo`) and renders it - literally — operators get a less-pretty label but no - correctness bug. + literally — operators get a less-pretty label but no + correctness bug, assuming the SPA’s `bucket_id` parser is + suffix-tolerant (it must not treat `route:1:dynamo` as a parse + failure or collapse it incorrectly).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/2026_04_28_proposed_keyviz_adapter_labels.md` around lines 301 - 307, The backward-compatibility claim must be tightened: update the text around the bucket_id compatibility paragraph to either (A) explicitly state the existing SPA parser behavior (e.g., that the SPA splits bucket_id on ':' into at most two parts or that it ignores any suffix after the route id) and give a concrete example like "route:1:dynamo -> route:1", or (B) state that the SPA parser must be updated to be suffix-tolerant (modify the SPA parsing logic to ignore additional segments after the route id) and add a short compatibility note for operators; refer to the bucket_id field and the SPA parser behavior in the wording so readers know which parser needs to be tolerant or changed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/2026_04_28_proposed_keyviz_adapter_labels.md`:
- Around line 346-350: Choose one spelling ("labelled" or "labeled") and make it
consistent throughout this design doc; specifically change the second occurrence
flagged (in the paragraph mentioning pivotKeyVizColumns) to match your chosen
variant and sweep the file to unify all instances including mentions of
pivotKeyVizColumns, keyviz/labels.go, ShardedCoordinator.Observe, RegisterRoute,
reclaimRetiredSlot, MatrixRow.Label, and routeSlot.Label so the entire document
uses the same spelling.
- Around line 270-296: Add an explicit constraint that canonical labels must not
contain ':' and enforce it where labels are defined and used: update the
documentation near the bucket_id format (the "route:<id>:<label>" paragraph) to
state "labels must not include ':'", and add runtime validation at the
label/constant definition and any constructors that build or parse bucket_id
(places touching bucket_id, the separator constant, and codepaths used by
pivotKeyVizColumns, applyKeyVizRowBudget, sortKeyVizRowsByStart) to reject or
sanitize labels containing ':' so the split-on-':' parsing remains unambiguous.
- Around line 270-285: The document mixes “labelled” and “labeled”; pick one and
make it consistent (switch to “labeled” as suggested) by replacing all
occurrences including headings, inline text, examples like `Label`/`bucket_id`
descriptions, and any references in prose to `pivotKeyVizColumns`, `RouteID`,
`applyKeyVizRowBudget`, `sortKeyVizRowsByStart`, and `BucketID` so the
terminology is uniform across the file.
- Around line 76-86: The fenced code block in the example "Today (single group,
no labels): / After (adapters tag their traffic):" is missing a language tag
which triggers MD040; update the opening fence from ``` to ```text (or your
preferred lint language such as ```plain or ```mermaid) so the block becomes
```text and leave the closing fence as ```; this change in
docs/design/2026_04_28_proposed_keyviz_adapter_labels.md will satisfy the
markdownlint rule.
- Around line 245-258: The proto KeyVizRow currently already defines "string
label = 4", so adding "string label = 13" creates a duplicate; update the
proposal and code so there's a single canonical label field: decide whether to
deprecate/rename/remove field 4 or to replace it with field 13, then modify
proto/admin.proto's KeyVizRow to only include the chosen field number and name
and update the corresponding internal/admin/keyviz_handler.go KeyVizRow JSON
struct (Label field and tags) to match that single proto field; mention the
deprecation choice (if keeping 4) or show the removal/renumbering plan so the
design and code remain consistent.
---
Nitpick comments:
In `@docs/design/2026_04_28_proposed_keyviz_adapter_labels.md`:
- Around line 301-307: The backward-compatibility claim must be tightened:
update the text around the bucket_id compatibility paragraph to either (A)
explicitly state the existing SPA parser behavior (e.g., that the SPA splits
bucket_id on ':' into at most two parts or that it ignores any suffix after the
route id) and give a concrete example like "route:1:dynamo -> route:1", or (B)
state that the SPA parser must be updated to be suffix-tolerant (modify the SPA
parsing logic to ignore additional segments after the route id) and add a short
compatibility note for operators; refer to the bucket_id field and the SPA
parser behavior in the wording so readers know which parser needs to be tolerant
or changed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f3929603-6a50-49cc-96c8-9ceb5358d4d6
📒 Files selected for processing (1)
docs/design/2026_04_28_proposed_keyviz_adapter_labels.md
Eight items from round-4 review:
CodeRabbit critical:
- 5 proto field collision: proto/admin.proto KeyVizRow already
declares string label = 4 (currently unused). The earlier draft
proposed adding string label = 13, which would have collided.
Reuse the existing field 4; no schema migration needed.
- 5 hard constraint: canonical labels MUST NOT contain ':'.
Without this, "redis:db0" silently breaks bucket_id parsing.
Pinned by a TestAllLabelsAvoidSeparator regression guard listed
in 8 lens-5.
Claude bot round-4 moderate:
- 4.1.1 RemoveRoute symmetric teardown: when slot key widens,
delete(slots, routeID) is a compile error; PR-B must loop over
AllLabels + {""} and retire all N+1 slots. Otherwise labeled
siblings leak into the live table forever.
- 7 PR-D+E mergeRowInto: mergeKeyVizMatrices keys on composite
bucket_id (correct) but mergeRowInto constructs new KeyVizRow
entries from scratch and does NOT copy Label. Without an
explicit dst.Label = row.Label, every cluster-merged row emits
Label="", defeating Option B's dedicated-field design.
Claude bot round-4 minor:
- 4.1.1 RegisterRoute idempotency: cur.slots[routeID] becomes a
type mismatch after key widening; document the legacy-slot
presence check that PR-B must use.
- 4.1.1 pseudocode comment: "labeled buckets fall through here"
is misleading since the fallback is for coarsened routes, not
labeled traffic. Fixed to "coarsened route: label irrelevant,
aggregate all traffic".
CodeRabbit minor:
- 3 code fence language: tagged the surface example as `text`
(MD040).
- Spelling: normalized "labelled" -> "labeled" / "unlabelled" ->
"unlabeled" across the doc (LanguageTool flagged the
inconsistency).
Plus 2 new test cases in 8 lens-5: RemoveRoute symmetric
teardown, mergeRowInto Label copy, and the labels-avoid-':'
constants test.
|
@claude review Round-4 follow-ups (2750f4c) — 8 items addressed: CodeRabbit critical:
Claude bot moderate:
Claude bot minor:
CodeRabbit minor:
Test plan additions (§8 lens-5): |
|
@claude review Round-5 status check. The merge from main brought CI/workflow updates onto the branch (HEAD now at The CodeRabbit inline comments visible at lines 339 / 360 / 414 are stale (posted against
Asking for a fresh round on the post-merge HEAD. |
|
Claude finished @bootjp's task in 3m 58s —— View job Review (Round 5)
Stale CodeRabbit comments confirmed resolved (verified against current source):
One spelling instance was missed, and a moderate gap surfaced in Moderate —
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e1dcf176d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| making `RegisterRoute` **pre-create one slot per known label** at | ||
| registration time, but only on the individual-tracking path. The | ||
| label set is the canonical `keyviz/labels.go` constants (§9 Q2); | ||
| register-time slot count is `len(routes) × len(labels) + 1` (the | ||
| +1 is the empty-label legacy slot, kept so callers that pass |
There was a problem hiding this comment.
Avoid shrinking route budget before labels are enabled
The proposal pre-creates one slot per known label for every registered route, which makes MaxTrackedRoutes effectively count (labels+legacy) slots rather than routes. In PR-B, Sampler.Observe callers are explicitly still passing label = "" (so labels are not yet active), but this pre-allocation would still force earlier virtual-bucket coarsening and loss of per-route detail in existing deployments, contradicting the stated legacy-preserving rollout. This should be gated so unlabeled PR-B behavior keeps current route capacity.
Useful? React with 👍 / 👎.
| |---|---| | ||
| | **PR-A** | Land this design doc. | | ||
| | **PR-B** | Sampler API extension: `Observe(... label string)`, `routeSlot.Label`, `RegisterRoute` pre-creates one slot per known label (§4.1.1). `MatrixRow.Label`. `reclaimRetiredSlot` dedupes on `(RouteID, Label)`. Update existing tests **and the coordinator call sites in `kv/sharded_coordinator.go`** (the only non-test caller of `Sampler.Observe`) to pass `label = ""` — until PR-C lands the legacy unlabeled behaviour is preserved. | | ||
| | **PR-C** | Adapter wiring: each adapter sets its own label at the dispatch entry into `ShardedCoordinator.Observe…`. Canonical constants live in `keyviz/labels.go` (NOT `adapter/keyviz_labels.go` — sampler-side imports must not climb back to the adapter package; sampler's `RegisterRoute` reads the canonical set at registration time). **Operator note**: with N labels, slot count grows from M routes to ~N×M; raise `--keyvizMaxTrackedRoutes` proportionally or hot routes will fold into the virtual bucket and lose per-label breakdown. PR-C emits a structured `slog.Warn` from inside `RegisterRoute` at the moment a route is coarsened — `slog.Warn("route folded into virtual bucket", route_id=…, slots_used=…, max_tracked_routes=…)` — so the operator sees the symptom in the log when it actually fires. A startup-time check would not catch this because routes are added dynamically (the catalog watcher calls `RegisterRoute` whenever a `SplitRange` lands), so a process that started below the cap can cross it later. **Rollout discipline**: raise `--keyvizMaxTrackedRoutes` to at least `current_routes × len(labels)` *before* deploying the PR-C binary, otherwise an already-misconfigured node prints a wave of warnings on startup as the watcher re-registers every catalog route — loud but not incorrect, only painful for log retention. | |
There was a problem hiding this comment.
Size max-tracked-routes using labels plus legacy slot
The rollout guidance underestimates required capacity by using current_routes × len(labels). This document also keeps an empty-label legacy slot per route, so required individual slots are current_routes × (len(labels)+1). Operators following the current formula can still hit coarsening immediately and lose labeled breakdown despite applying the documented migration step.
Useful? React with 👍 / 👎.
|
Claude finished @bootjp's task in 5m 42s —— View job Review (Round 5)
All eight Round-4 items are correctly resolved. Four new items surfaced while tracing the full slot lifecycle through Moderate —
|
Two items from Codex round-5 review: - P1 / line 125 / pre-allocation regression: the earlier draft said RegisterRoute pre-creates one slot per label at PR-B. That would shrink the effective route budget from M to ~M / (N+1) even though no adapter is yet calling with non-empty labels - PR-B is supposed to be a behavior-neutral refactor. Resolution: split the pre-allocation across PR-B and PR-C. PR-B only creates the legacy empty-label slot (per-route slot count unchanged at 1); PR-C extends RegisterRoute to pre-create the labeled siblings. PR-B is now strictly route-count-neutral on existing deployments. Pseudocode added showing both bodies. - P2 / line 412 / rollout formula off by one per route: rollout guidance said current_routes x len(labels), but operators must size for the legacy empty-label slot too: current_routes x (len(labels) + 1). Operators sizing only by len(labels) hit coarsening immediately and lose labeled breakdown anyway. Updated the rollout note to use the +1 form. Plus 7 PR-B / PR-C scope rows updated to reflect the split pre-allocation strategy so PR-B implementors do not accidentally implement the full pre-creation loop and ship a regression.
|
@claude review Round-5 follow-ups (3ae3417) addressing Codex P1 + P2:
§7 PR-B / PR-C scope rows updated to match the split, so a PR-B implementor doesn't accidentally land the full pre-creation loop and ship a regression. Per-route slot count documented at all 3 stages: 1 (today / PR-B), then |
|
Claude finished @bootjp's task in 4m 23s —— View job Review (Round 6)
Both Codex P1 and P2 are correctly resolved. The pre-allocation split (PR-B creates only the legacy slot; PR-C creates labeled siblings) is coherent, and the rollout formula Moderate —
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ae3417d6f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| empty-label slot). Catches a PR-B regression that skips a | ||
| label constant in the pre-creation loop — without this the | ||
| missing slot only surfaces when that adapter fires traffic. |
There was a problem hiding this comment.
Gate pre-creation count test to PR-C, not PR-B
Section 7 defines PR-B as a behavior-neutral refactor that pre-creates only the legacy empty-label slot, but this test bullet says a PR-B regression is failing to create len(labels)+1 slots. Following this guidance during PR-B would force labeled-slot preallocation early and can reintroduce route-capacity/coarsening regressions before adapters emit labels. The assertion should be moved to PR-C (or explicitly phase-gated), with PR-B expecting a single legacy slot.
Useful? React with 👍 / 👎.
| This is what makes PR-D and PR-E shippable in either order | ||
| (see §7) — the wire-format change in PR-D *is* the merge-key | ||
| change. |
There was a problem hiding this comment.
Remove claim that PR-D and PR-E can ship independently
This paragraph says PR-D and PR-E are shippable in either order, but the implementation plan later requires shipping them together because otherwise labeled rows are collapsed by the current pivot keying. Keeping both statements creates contradictory rollout instructions and increases the risk of a partial deployment that silently loses per-label breakdown in responses.
Useful? React with 👍 / 👎.
Three moderate items from Claude bot's round-5 review (the long
one that scanned the actual single-node path code):
- 7 PR-D+E single-node-path completeness: the previous draft
only called out mergeRowInto in keyviz_fanout.go but missed
three companion changes in keyviz_handler.go that PR-D+E must
make. Documented all four:
(a) pivotKeyVizColumns rowsByID map AND order slice widen to
composite BucketID string (or equivalent (uint64, string));
(b) newKeyVizRowFrom copies mr.Label -> row.Label;
(c) bucketIDFor returns composite "route:<id>:<label>" when
Label is non-empty;
(d) mergeRowInto adds dst.Label = row.Label.
Splitting any subset leaves a window where labeled rows
collapse silently.
- 4.1.1 MaxTrackedRoutes semantics: pick Option A (count
routes, not slots). After key widening, len(next.slots) counts
labeled slots, so the coarsening check must divide by
len(AllLabels)+1. This preserves operator-visible meaning of
the existing flag - 1024 still means 1024 routes, not ~170.
Pseudocode added at the actual line site (sampler.go:416).
Option B (redefine as slots) explicitly rejected on the
behavior-preserving goal in 2.1.
- 7 PR-B AllLabels timing made explicit: PR-B creates
keyviz/labels.go with the slotKey struct + an EMPTY AllLabels
slice; the five adapter constants are added in PR-C alongside
adapter wiring. PR-B's pre-creation loop is a no-op against
the empty slice, so per-route slot count is identical to
today (1). Closes the ambiguity an implementor might hit
about "where do the constants land".
|
@claude review Round-5 follow-ups (3793366) addressing the three Claude bot moderate items from the long round-5 review on
Round-6 was running concurrently against |
|
Claude finished @bootjp's task in 3m 30s —— View job Review (Round 7)
All three Round-6 Claude bot moderate items are correctly resolved. The §7 PR-D+E scope now comprehensively lists all four required single-node-path changes (a–d), the Moderate — §6 "shippable in either order" contradicts §7 "Ship together"
These two statements are contradictory. §6 still claims PR-D and PR-E can ship independently (because the composite §6's claim is correct in the narrow sense that "the aggregator merge key doesn't need a separate change — This is the Codex P2 item from the round-6 review on Minor — §8 lens-5: two test descriptions say "PR-B regression" but apply to PR-CAfter the P1 split (PR-B = behavior-neutral, PR-C = labeled-sibling creation), two test descriptions in §8 lens-5 are stale:
Both descriptions should be updated to clearly say "PR-C regression," and the slot-count assertion should note the PR-phase dependency: "assert Minor — "Labelled" at line 389 survives again
Every other occurrence in the document uses "Labeled". This is the same spelling carry-over flagged in Rounds 5 and 6. What looks goodAll three Round-6 Claude bot moderate items verified correct against source:
The §6/§7 contradiction is the only remaining item that could create an incorrect mental model for an implementor reading the design for the first time. The two minor items can be addressed in the same commit. Design is ready to merge after these three fixes. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3793366704
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| |---|---| | ||
| | **PR-A** | Land this design doc. | | ||
| | **PR-B** | Sampler API extension: `Observe(... label string)`, `routeSlot.Label`, slot-key type widens to `slotKey{uint64, string}`. **`RegisterRoute` only pre-creates the legacy empty-label slot** — labeled-sibling pre-creation lands in PR-C so PR-B is a behavior-neutral refactor (per-route slot count is unchanged at 1). PR-B creates `keyviz/labels.go` with the `slotKey` struct + an **empty `AllLabels` slice**; the five adapter constants (`LabelDynamo`, `LabelRedis`, …) are added in PR-C alongside the adapter wiring, so PR-B's pre-creation loop is a no-op against the empty slice and the existing single slot per route is created exactly as today. `MatrixRow.Label` field present but always empty. `reclaimRetiredSlot` dedupes on `(RouteID, Label)` (forward-prep; only `Label=""` exists). The `MaxTrackedRoutes` coarsening check (`keyviz/sampler.go:416`) divides by `len(AllLabels)+1` so the cap continues to count **routes**, not slots; with `AllLabels` empty in PR-B the divisor is 1 and behavior is identical to today. Update existing tests **and the coordinator call sites in `kv/sharded_coordinator.go`** (the only non-test caller of `Sampler.Observe`) to pass `label = ""`. | | ||
| | **PR-C** | Adapter wiring: each adapter sets its own label at the dispatch entry into `ShardedCoordinator.Observe…`. Extends `RegisterRoute` to pre-create the labeled siblings (one slot per `AllLabels` member) alongside the legacy empty-label slot from PR-B. Canonical constants live in `keyviz/labels.go` (NOT `adapter/keyviz_labels.go` — sampler-side imports must not climb back to the adapter package; sampler's `RegisterRoute` reads the canonical set at registration time). **Operator note**: with N labels, slot count grows from M routes to ~N×M; raise `--keyvizMaxTrackedRoutes` proportionally or hot routes will fold into the virtual bucket and lose per-label breakdown. PR-C emits a structured `slog.Warn` from inside `RegisterRoute` at the moment a route is coarsened — `slog.Warn("route folded into virtual bucket", route_id=…, slots_used=…, max_tracked_routes=…)` — so the operator sees the symptom in the log when it actually fires. A startup-time check would not catch this because routes are added dynamically (the catalog watcher calls `RegisterRoute` whenever a `SplitRange` lands), so a process that started below the cap can cross it later. **Rollout discipline**: raise `--keyvizMaxTrackedRoutes` to at least `current_routes × (len(labels) + 1)` *before* deploying the PR-C binary — the `+ 1` accounts for the legacy empty-label slot that lives alongside the N labeled siblings on every individually-tracked route. Operators sizing only with `× len(labels)` (one short per route) hit coarsening immediately and lose labeled breakdown anyway. (Codex P2 on round-5.) Otherwise an already-misconfigured node prints a wave of warnings on startup as the watcher re-registers every catalog route — loud but not incorrect, only painful for log retention. | |
There was a problem hiding this comment.
Remove PR-C instruction to scale MaxTrackedRoutes by labels
Section 4.1.1 explicitly redefines the coarsening check to len(next.slots) / (len(AllLabels)+1), so MaxTrackedRoutes continues to cap routes rather than slots, but the PR-C rollout note still says operators must raise --keyvizMaxTrackedRoutes proportionally or routes will fold into virtual buckets. Those two statements cannot both be true; keeping this guidance will push operators to apply unnecessary cap increases (and memory growth) for a rollout that is documented as behavior-preserving on route capacity.
Useful? React with 👍 / 👎.
Two minor items from CodeRabbit's review on `c7637537`: - §4.1.1 line 273 sampler.go snippet `len(keyviz.AllLabels)` was a real bug — the snippet is in `keyviz/sampler.go` and references the package-local `AllLabels`, so the `keyviz.` qualifier is non-idiomatic and would not compile if copy-pasted into the package. Fixed to `len(AllLabels)` with an inline note clarifying the in-package context. - §8 line 535 `#694.)` at line start trips markdownlint MD018 (no-missing-space-atx — looks like an H1 with no space). Wrapped the PR reference in backticks (`#694`) and joined the two lines so the hash is no longer at line start.
|
@claude review Round-23 follow-ups (
|
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
| // | ||
| // Reuse the **existing** field 4 (`string label`), which is | ||
| // already declared on KeyVizRow but currently unused and | ||
| // uncommented — no schema migration, no field-number bump. |
There was a problem hiding this comment.
In the proto discussion, "unused and uncommented" reads like the field is commented out. Since the intent is that label = 4 exists but lacks a doc comment, consider changing this to "undocumented" / "has no doc comment" to avoid ambiguity.
| // uncommented — no schema migration, no field-number bump. | |
| // undocumented — no schema migration, no field-number bump. |
| slot only — same as today's baseline). The pre-allocation cost is | ||
| gated on the flag, so a cluster that deploys the PR-C+D+E binary | ||
| but leaves the flag off pays no memory penalty (Codex round-15- |
There was a problem hiding this comment.
The doc says leaving --keyvizLabelsEnabled=false "pays no memory penalty". Even with the flag off, widening the slot map key from uint64 to slotKey{uint64, Label} increases per-entry map key storage (string header in Label) compared to today. Consider rewording to clarify that the flag avoids the multiplicative (N-label) slot count increase, but there is still some baseline per-slot overhead after PR-B due to the wider key type.
| slot only — same as today's baseline). The pre-allocation cost is | |
| gated on the flag, so a cluster that deploys the PR-C+D+E binary | |
| but leaves the flag off pays no memory penalty (Codex round-15- | |
| slot only). The pre-allocation cost is gated on the flag, so a | |
| cluster that deploys the PR-C+D+E binary but leaves the flag off | |
| avoids the **multiplicative** `len(AllLabels)` slot-count increase, | |
| though there is still a small baseline per-slot overhead versus | |
| today from the wider `{uint64, Label}` map key (Codex round-15- |
| ```go | ||
| type Label string | ||
| const ( | ||
| LabelLegacy Label = "" // legacy / unlabeled traffic, coordinator default | ||
| LabelDynamo Label = "dynamo" | ||
| LabelRedis Label = "redis" | ||
| LabelS3 Label = "s3" | ||
| LabelSQS Label = "sqs" | ||
| LabelRawKV Label = "rawkv" | ||
| ) | ||
| var AllLabels = []Label{LabelDynamo, LabelRedis, LabelS3, LabelSQS, LabelRawKV} | ||
| ``` |
There was a problem hiding this comment.
AllLabels is proposed as an exported var slice. Because slices are mutable, any importing package could accidentally reorder/append/modify it, which would break slot pre-creation and could cause silent sample drops. Consider making the canonical label set unexported (or exposing it via a function that returns a copy / read-only view) so only package keyviz can mutate it.
Three real items from Copilot's 8th review pass on `e5c7e39d`:
- §4.1 line 137 "no memory penalty" overstatement: even with
--keyvizLabelsEnabled=false, slot map key widens from
uint64 to slotKey{uint64, Label} so per-slot key storage
grows. Reframed: the flag avoids the *multiplicative*
len(AllLabels)+1 slot count increase (the dominant cost),
but each slot's wider key adds a small per-slot overhead
vs today even with the flag off.
- §5 line 456 "unused and uncommented" reads as
"commented-out" code. Changed to "unused and undocumented
(no doc comment on the field)" for clarity.
- §9 Q2 line 742 AllLabels was `var []Label{...}` (exported
mutable slice) — importing packages could
append/reorder/zero it and silently break slot
pre-creation. Switched to fixed-size array
`[5]Label{...}` so any mutation attempt becomes a compile
error. Required propagating the change:
- allLabelsWithLegacy() now uses `copy(result,
AllLabels[:])` (slice-of-array) since copy needs slice
args.
- §7 PR-B row updated to declare AllLabels as the
populated array from PR-B, with behavior neutrality
coming from the round-17 flag-gating
(--keyvizLabelsEnabled=false default) instead of an
empty-slice trick.
- §8 TestAllLabelsAvoidSeparator description corrected
from "AllLabels []Label is the typed slice" to
"AllLabels is the typed [5]Label array".
The PR-B "deliberately-empty AllLabels" framing from
earlier rounds is now redundant (round-17's flag gate
already makes PR-B behavior-neutral). Documented the
sequencing change in §9 Q2 so the round-history is
internally consistent.
|
@claude review Round-23 (Copilot 8th-pass) follow-ups (
Documented the PR-B sequencing change in §9 Q2 so the round-history (round-6 empty-slice → round-17 flag-gating → round-23 array form) is internally consistent. |
|
Claude finished @bootjp's task in 6m 24s —— View job Review (Round 24)
All three Round-23 items confirmed resolved in current HEAD:
Two moderate issues surfaced from the round-23 array-form change that weren't propagated consistently through the document. Two minor issues complete the picture. Moderate — Four stale "AllLabels empty in PR-B" references contradict the §7 PR-B rowRound-23 changed PR-B to ship 1. §7 closing paragraph, line 619:
"AllLabels empty" is wrong — PR-B now ships with the populated 2. §7 closing paragraph, line 621:
" 3. §7 PR-C+D+E row adapter-wiring description:
The parenthetical is the opposite of what round-23 introduced. Should read: "Wire each adapter to pass its canonical constant ( 4. §7 PR-B row (line 614), near end of the cell:
After round-23, An implementor reading these four contradictory statements cannot determine whether PR-B ships an empty or populated Moderate —
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 795c504f56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| behavior (legacy slots only, AllLabels empty, MaxTrackedRoutes | ||
| divisor is 1). The bundled PR-C+D+E then ships everything |
There was a problem hiding this comment.
Remove stale PR-B assumption that AllLabels is empty
This section still says PR-B is behavior-neutral because AllLabels is empty, but §7 and §9 now define PR-B as creating a populated [5]Label catalog; that contradiction can lead implementers to apply the wrong coarsening math in PR-B. If someone follows the stale assumption and uses len(AllLabels)+1 as the divisor without the flag-based slotsPerRoute guard, MaxTrackedRoutes semantics drift and route tracking/memory behavior diverges from the intended legacy-preserving rollout.
Useful? React with 👍 / 👎.
| // Fixed-size array, not a slice — append/reorder/mutate by | ||
| // an importing package becomes a compile error, eliminating | ||
| // the silent slot-drop class of bugs that mutating the |
There was a problem hiding this comment.
Stop treating exported AllLabels array as immutable
The doc claims a fixed-size exported array makes append/reorder/mutation a compile error, but other packages can still mutate exported array elements (e.g., keyviz.AllLabels[i] = ...), so this does not enforce catalog immutability. Relying on this guarantee leaves a silent runtime path for label-catalog drift that can break slot pre-creation invariants and drop labeled samples; use an unexported catalog with a copy-returning accessor if immutability is required.
Useful? React with 👍 / 👎.
Round-24 finished review verdict on `795c504f`: "design is ready to merge as PR-A after these 4 fixes". All 4 items addressed: - §7 line 619 closing paragraph "AllLabels empty" was a stale reference: round-23 array form makes AllLabels populated from PR-B; the divisor=1 reason is the --keyvizLabelsEnabled=false default, not an empty array. Updated to "AllLabels populated [5]Label array... flag default false keeps divisor at 1". - §7 line 621 "AllLabels populated" was the right framing before round-23 (PR-C populated an empty slice). With the array, PR-C+D+E only wires adapters; constants are already declared. Updated to "adapter wiring (constants already declared in keyviz/labels.go from PR-B...)". - §7 PR-C+D+E adapter-wiring description "Populate AllLabels...(created in PR-B with an empty slice)" was the inverse of round-23. Updated to "constants already declared from PR-B as part of the [5]Label array; PR-C+D+E only wires adapters to pass them, no AllLabels modification needed". - §7 PR-B row "with AllLabels empty in PR-B the divisor is 1" updated to "with the flag default to false in PR-B the divisor is 1" (round-23 made AllLabels populated; the flag is what keeps the divisor at 1). - §7 PR-B row "Missing L0... after PR-C populates AllLabels" updated to "after PR-C+D+E wires the adapters to pass labels" (the populate-step doesn't exist anymore; adapter-wiring is the round-23-correct trigger). - §7 PR-B row "With PR-B's empty-only slots" updated to "With PR-B's flag-default-false (only legacy slots pre-created)". - §8 lens-5 TestAllLabelConstantsInAllLabels "Skip when len(AllLabels) == 0" was dead code — populated array has length 5 at compile time. Removed the skip; updated description to "PR-B regression guard — array is populated from PR-B so this is a live test from PR-B onward, not a no-op". - §8 lens-5 TestAllLabelsAvoidSeparator "PR-C regression guard, written in PR-B" was contradictory after the array change. Updated to "PR-B regression guard — the array is populated from PR-B so the test is live immediately". - §9 Q2 mutation claim was overstated: "append/reorder/ mutation by an importing package would be a compile error" — only `append` is type-blocked. Element reassignment (AllLabels[0] = ...) and full-var reassignment compile against the exported var. Scoped the claim to `append` specifically and noted that TestAllLabelConstantsInAllLabels is the runtime guard for the rest.
|
@claude review Round-24 follow-ups (
|
|
Claude finished @bootjp's task in 3m 39s —— View job Review (Round 25)
All four Round-24 items confirmed resolved in current HEAD:
Three minor stale references remain in §8 lens-5 — the same class of "empty AllLabels" language fixed in §7 by this commit, but not propagated to the test descriptions. All are minor and can be folded into the PR-B doc commit. Minor — Three stale "empty AllLabels" descriptions in §8 lens-5The round-24 commit fixed four §7 references but the same stale framing survives in three §8 lens-5 test descriptions: 1. Line 659: "Slot pre-creation count (PR-C, post-
2. Line 664–665: "PR-B counterpart (with empty Same stale framing. The PR-B counterpart assertion is 3. Line 673: "PR-B counterpart (empty Same stale framing — All three are the same fix: replace "empty Also minor — §7 PR-C+D+E opening sentence describes "AllLabels population" as PR-C scopeLine 615, first sentence of the PR-C+D+E row:
What looks goodAll round-24 items and all prior-round items verified correct against current source:
The four minor items above can all be addressed in the PR-B doc commit. Design is ready to merge as PR-A. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09c1c04ed5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| |---|---| | ||
| | **PR-A** | Land this design doc. | | ||
| | **PR-B** | Sampler API extension: extend the `Sampler` interface declaration (`keyviz/sampler.go:74`) AND the `MemSampler.Observe` implementation to `Observe(routeID uint64, op Op, keyLen, valueLen int, label Label)` (Claude bot round-8 minor: surfaces immediately as a compile error if any mock implementer is missed, but naming it keeps the scope complete). `routeSlot.Label`, slot-key type widens to `slotKey{uint64, Label}`. **`RegisterRoute` only pre-creates the legacy empty-label slot when `--keyvizLabelsEnabled=false` (the default)** — labeled-sibling pre-creation is gated on the flag (round-17), so PR-B with the default flag is a behavior-neutral refactor (per-route slot count is unchanged at 1). **PR-B creates `keyviz/labels.go`** with the `slotKey` struct, the `type Label string` definition, the `LabelLegacy`/`LabelDynamo`/`LabelRedis`/etc. **typed constants**, and `AllLabels` as a **fixed-size array** `[5]Label{LabelDynamo, ..., LabelRawKV}` (round-23 — array form prevents importing-package mutation; see §9 Q2). The constants must exist at compile time so `RemoveRoute`'s `allLabelsWithLegacy()` teardown loop and `RegisterRoute`'s pre-creation loop both compile; with the flag default to false, `RegisterRoute` skips the labeled-sibling loop body so per-route slot count stays at 1 (Claude bot round-6 + round-23 array correction). `MatrixRow.Label` field present. **L0 label copy** (Claude bot round-8 moderate): update `appendDrainedRow` (`keyviz/sampler.go:828`) to include `Label: slot.Label` in the `MatrixRow` struct literal — `slot.Label` is set once at `RegisterRoute` time and immutable, so read it directly rather than threading through `snapshotMeta`. This is the first link in the propagation chain: `slot.Label` → `MatrixRow.Label` (L0, here) → `KeyVizRow.Label` via `newKeyVizRowFrom` (L1, PR-D+E) → merged `KeyVizRow.Label` via `mergeRowInto` (L2, PR-D+E). Missing L0 silently truncates the chain and leaves `MatrixRow.Label = ""` even after PR-C+D+E wires the adapters to pass labels; L1 and L2 then faithfully copy the empty string and the feature collapses. With PR-B's flag-default-false (only legacy slots pre-created), `slot.Label` is `""` so the emitted row is `""`-labeled (identical to today's behavior), but the wiring is in place for PR-C+D+E to flip the flag and start emitting labels. `reclaimRetiredSlot` dedupes on `(RouteID, Label)` (forward-prep; only `Label=""` exists). The `MaxTrackedRoutes` coarsening check (`keyviz/sampler.go:416`) divides by `slotsPerRoute` (= `len(AllLabels)+1` when the flag is on, `1` when off; round-17 + round-23) so the cap continues to count **routes**, not slots; with the flag default to false in PR-B the divisor is 1 and behavior is identical to today (Claude round-24 fix to the prior "AllLabels empty in PR-B" framing — round-23 made AllLabels populated from PR-B, so the divisor=1 reason is the flag, not the array length). Update existing tests **and the coordinator call sites in `kv/sharded_coordinator.go`** (the only non-test caller of `Sampler.Observe`) to pass `label = ""`. | | ||
| | **PR-C+D+E** | **Bundled — must ship together.** PR-C (adapter wiring + `AllLabels` population) and PR-D+E (wire format + pivot-key widening) are not separately shippable: once an adapter calls `Observe(..., LabelDynamo)`, the sampler emits multiple `MatrixRow` per route in a single column (one per label); without the pivot-key widening from PR-D+E, `pivotKeyVizColumns` and `matrixToProto` collapse those rows back into a single `RouteID`-keyed entry where each labeled row overwrites the previous one — non-deterministic data loss in the intermediate state. Earlier drafts framed PR-C and PR-D+E as separately shippable; that was wrong (Codex round-10 P1). The bundle covers: <br>**Adapter wiring (was PR-C)**: each adapter sets its label at the `ShardedCoordinator.Observe…` dispatch entry. The five canonical constants (`LabelDynamo`, `LabelRedis`, `LabelS3`, `LabelSQS`, `LabelRawKV`) are already declared in `keyviz/labels.go` from PR-B as part of the `[5]Label` array (round-23 — see §9 Q2 and the PR-B row above); PR-C+D+E only wires the adapters to pass them, no `AllLabels` modification needed (Claude round-24 fix to the prior "populate empty slice" framing). Extend `RegisterRoute` to pre-create the labeled siblings (one slot per `AllLabels` member) **only when `--keyvizLabelsEnabled=true`**; with the flag off, `RegisterRoute` continues to pre-create just the legacy empty-label slot, identical to today's behavior (Codex round-15-4th-pass P1). **MaxTrackedRoutes is unchanged in semantics** — the §4.1.1 coarsening check divides slot count by `slotsPerRoute` (`len(AllLabels)+1` when the flag is on, `1` when off), so the existing operator-set cap continues to mean "individually-tracked routes" exactly as before; **operators do not need to raise `--keyvizMaxTrackedRoutes`** (an earlier draft told operators to scale the cap by `× (len(labels)+1)` — that contradicted Option A; Codex round-6). Memory growth: `MaxTrackedRoutes × (len(AllLabels) + 1)` slots when labels are enabled, `MaxTrackedRoutes × 1` when off (today's baseline); documented but not capped. Emit `slog.Warn` from inside `RegisterRoute` when a route coarsens. <br>**Wire-format extension (was PR-D+E)**: proto + JSON `bucket_id` composite + optional `label` field, plus the SPA `route:N / label` rendering AND the **five** code changes across the three response paths: <br>**Single-node JSON path (`internal/admin/keyviz_handler.go`)**: <br>(a) `pivotKeyVizColumns` `rowsByID` map AND `order` slice both widen from `uint64` to the composite `BucketID string` — widening only the map without the `order` slice is a compile error; <br>(b) `newKeyVizRowFrom` (`keyviz_handler.go:368`) copies `mr.Label → row.Label` via an explicit `string(mr.Label)` cast (`MatrixRow.Label` is the typed `keyviz.Label`; `KeyVizRow.Label` is plain `string` for wire-format flexibility — Claude bot round-10 minor) — **first-level** Label copy, affects single-node and cluster deployments alike; <br>(c) `bucketIDFor` (`keyviz_handler.go:383`) returns the composite `"route:<id>:<label>"` when `mr.Label != ""`, falling back to the legacy `"route:<id>"` for empty labels — without this `BucketID` is non-unique and `applyKeyVizRowBudget` / `sortKeyVizRowsByStart` lose their deterministic tiebreak; <br>**Fan-out JSON path (`internal/admin/keyviz_fanout.go`)**: <br>(d) `mergeRowInto` (`keyviz_fanout.go:509`) adds `dst.Label = row.Label` — **second-level** Label copy, only the cluster fan-out path touches this; <br>**gRPC path (`adapter/admin_grpc.go`)**: <br>(e) `matrixToProto` (`admin_grpc.go:599`) and the per-row conversion it drives: (e1) **widen `rowsByID` (line 603) and `order` (line 604) from `uint64` to the composite `BucketID string` key** — same widening as item (a); without it `(routeID=1, label="dynamo")` and `(routeID=1, label="redis")` collapse to the same map entry; (e2) copy `MatrixRow.Label → KeyVizRow.label` (proto field 4) via `string(mr.Label)` cast (same typed→untyped reasoning as item (b)); (e3) emit composite `bucket_id` (`"route:<id>:<label>"`). Without (e1)–(e3), `GetKeyVizMatrix` gRPC clients receive collapsed unlabeled rows even though HTTP/SPA responses now show per-label rows. <br>All five copies are required; missing any one leaves a flavour of deployment with empty labels. Splitting the bundle into separate PR-C and PR-D+E was the original framing but is now rejected (see opening paragraph of this row). <br>**Operator-controlled rollout gate (rolling-upgrade safety)**: a normal rolling upgrade temporarily mixes nodes that emit legacy `route:<id>` rows with nodes that emit labeled `route:<id>:<label>` rows. The fan-out aggregator keys strictly by `BucketID` in `mergeRowInto`, so those rows do **not** merge — operators would see fragmented unlabeled-plus-labeled data per route until every node converges. PR-C+D+E adds `--keyvizLabelsEnabled` (default `false`) on every node; when false, the **`ShardedCoordinator` overrides the adapter-supplied label to `keyviz.LabelLegacy` at the single `sampler.Observe(...)` call site** in `kv/sharded_coordinator.go` (one `if !s.keyvizLabelsEnabled { label = keyviz.LabelLegacy }` guard, not a 5-file flag-read duplication across adapters; Claude bot round-12 moderate). The bundled binary is therefore safe to roll out one node at a time — every node, mixed or fully upgraded, emits the legacy format. <br>**Flag-flip activation**: once the fleet is fully on the new binary, the operator flips `--keyvizLabelsEnabled=true`. The flag is a startup-only `flag.Bool` (no live-toggle / config-reload path; Claude bot round-12 minor); changing it requires a process restart. There are two restart strategies: (a) **simultaneous restart of all nodes** — KeyViz heatmap is briefly unavailable during the restart window but no mixed-format fragmentation occurs; (b) **rolling restart** — for the duration of the restart (typically minutes), the heatmap shows a transient mixed view because the legacy `route:N` rows from not-yet-restarted nodes don't merge with `route:N:label` rows from restarted nodes. Since KeyViz is a monitoring view (not a consistency-sensitive system), the rolling-restart fragmentation is acceptable and clears as the final node restarts; operators who want zero fragmentation should use the simultaneous restart. (Claude bot round-12 moderate.) <br>**Flag also gates pre-allocation**: `--keyvizLabelsEnabled` is **both** the traffic-routing toggle (override label to `LabelLegacy` at the coordinator) **and** the slot-pre-creation toggle. When `false`, `RegisterRoute` pre-creates only the legacy empty-label slot (`len(next.slots) += 1` per route, identical to today's behavior); labeled siblings are not allocated. When `true`, `RegisterRoute` pre-creates the legacy slot **and** one labeled sibling per `AllLabels` member (`len(next.slots) += len(AllLabels)+1` per route). Memory therefore stays at the today's level (`MaxTrackedRoutes × 1`) for clusters that deploy the bundled binary but leave the flag off, and grows to `MaxTrackedRoutes × (len(AllLabels)+1)` only when the operator opts in. The flag-flip activation requires a process restart (see preceding paragraph), and `RegisterRoute` re-runs at startup, so the new pre-creation regime is in effect immediately after the restart — there is no live re-allocation path needed. **An earlier draft separated memory and traffic-routing into two regimes ("memory-vs-flag separation", round-12) where labeled slots were pre-created regardless of the flag; that was wrong (Codex round-15-4th-pass P1) — it imposed a multi-x memory penalty on clusters that never enable labels, contradicting the §2.1 minimal-penalty goal.** (Codex round-11 P2 originated this rollout gate.) | |
There was a problem hiding this comment.
Define
keyvizLabelsEnabled phase consistently
The rollout table is internally inconsistent about when --keyvizLabelsEnabled exists: the PR-B row makes behavior-neutrality depend on this flag being default-false and gating pre-allocation, but the PR-C+D+E row says that same flag is added later. If implementers follow the PR-C+D+E wording literally, PR-B cannot enforce the gating it claims, so a standalone PR-B rollout can drift from the documented “legacy-equivalent” behavior (route-cap semantics and slot allocation). Please pin the flag introduction to a single phase and make all PR scopes reference that same phase.
Useful? React with 👍 / 👎.
Round-25 (Claude bot R25 finished review + Codex P2): Claude bot R25 verdict on `09c1c04e` was "design is ready to merge as PR-A" with 4 minor stale-language items. All 4 addressed: - §8 lens-5 Slot pre-creation count test: "(PR-C, post- AllLabels population)" + "PR-B counterpart (with empty AllLabels)" → "(PR-C+D+E, with --keyvizLabelsEnabled= true)" + "PR-B counterpart (with --keyvizLabelsEnabled= false, the default)". Added explicit note that PR-B vs PR-C+D+E counterparts differ on the flag, not on AllLabels length (the [5]Label array is populated from PR-B per round-23). - §8 lens-5 RemoveRoute symmetric teardown test: same fix — "(PR-C)" + "(empty AllLabels)" → "(PR-C+D+E, flag on)" + "(with the flag off)". Counterpart assertion explained as "only the legacy slot was pre-created (flag off), so only it is retired". - §7 PR-C+D+E opening sentence: "PR-C (adapter wiring + AllLabels population)" → "PR-C (adapter wiring only — AllLabels constants are already declared in PR-B as part of the [5]Label array)". Codex P2: PR-B row's behavior-neutrality claim depends on --keyvizLabelsEnabled being default-false, but PR-C+D+E row said the flag is "added in PR-C+D+E". Implementer following PR-C+D+E literally would conclude PR-B cannot enforce its own gating. Fixed by: - §7 PR-B row prepended: "PR-B introduces the --keyvizLabelsEnabled startup flag (default false) on the server binary, so the flag exists from PR-B onward and PR-B's behavior-neutrality claim does not depend on PR-C+D+E (Codex round-25 P2)". - §7 PR-C+D+E rollout-gate paragraph: "PR-C+D+E adds --keyvizLabelsEnabled" → "--keyvizLabelsEnabled is introduced in PR-B... PR-C+D+E adds the coordinator override path". Clarifies PR-B introduces the flag for pre-creation gating; PR-C+D+E adds the coordinator-side override path on top, which only matters once adapters pass non-empty labels.
|
@claude review Round-25 follow-ups ( Claude bot R25 (4 minor stale-language items):
Codex P2 (consistency bug): PR-B row's behavior-neutrality claim depends on
|
|
Claude finished @bootjp's task in 2m 19s —— View job Review (Round 26)
All five Round-25 items confirmed resolved in current HEAD:
Also verified §7 closing paragraph (lines 617–629): "PR-B widens types and prepares the slot machinery without changing observable behavior (legacy slots only, No new issues found. Final stateAll issues from rounds 1–26 are resolved. The complete design has been verified against source:
Design is ready to merge as PR-A. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 788b685832
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| | PR | Scope | | ||
| |---|---| | ||
| | **PR-A** | Land this design doc. | | ||
| | **PR-B** | Sampler API extension: extend the `Sampler` interface declaration (`keyviz/sampler.go:74`) AND the `MemSampler.Observe` implementation to `Observe(routeID uint64, op Op, keyLen, valueLen int, label Label)` (Claude bot round-8 minor: surfaces immediately as a compile error if any mock implementer is missed, but naming it keeps the scope complete). `routeSlot.Label`, slot-key type widens to `slotKey{uint64, Label}`. **PR-B introduces the `--keyvizLabelsEnabled` startup flag** (default `false`) on the server binary, so the flag exists from PR-B onward and PR-B's behavior-neutrality claim does not depend on PR-C+D+E (Codex round-25 P2). **`RegisterRoute` only pre-creates the legacy empty-label slot when `--keyvizLabelsEnabled=false` (the default)** — labeled-sibling pre-creation is gated on the flag (round-17), so PR-B with the default flag is a behavior-neutral refactor (per-route slot count is unchanged at 1). **PR-B creates `keyviz/labels.go`** with the `slotKey` struct, the `type Label string` definition, the `LabelLegacy`/`LabelDynamo`/`LabelRedis`/etc. **typed constants**, and `AllLabels` as a **fixed-size array** `[5]Label{LabelDynamo, ..., LabelRawKV}` (round-23 — array form prevents importing-package mutation; see §9 Q2). The constants must exist at compile time so `RemoveRoute`'s `allLabelsWithLegacy()` teardown loop and `RegisterRoute`'s pre-creation loop both compile; with the flag default to false, `RegisterRoute` skips the labeled-sibling loop body so per-route slot count stays at 1 (Claude bot round-6 + round-23 array correction). `MatrixRow.Label` field present. **L0 label copy** (Claude bot round-8 moderate): update `appendDrainedRow` (`keyviz/sampler.go:828`) to include `Label: slot.Label` in the `MatrixRow` struct literal — `slot.Label` is set once at `RegisterRoute` time and immutable, so read it directly rather than threading through `snapshotMeta`. This is the first link in the propagation chain: `slot.Label` → `MatrixRow.Label` (L0, here) → `KeyVizRow.Label` via `newKeyVizRowFrom` (L1, PR-D+E) → merged `KeyVizRow.Label` via `mergeRowInto` (L2, PR-D+E). Missing L0 silently truncates the chain and leaves `MatrixRow.Label = ""` even after PR-C+D+E wires the adapters to pass labels; L1 and L2 then faithfully copy the empty string and the feature collapses. With PR-B's flag-default-false (only legacy slots pre-created), `slot.Label` is `""` so the emitted row is `""`-labeled (identical to today's behavior), but the wiring is in place for PR-C+D+E to flip the flag and start emitting labels. `reclaimRetiredSlot` dedupes on `(RouteID, Label)` (forward-prep; only `Label=""` exists). The `MaxTrackedRoutes` coarsening check (`keyviz/sampler.go:416`) divides by `slotsPerRoute` (= `len(AllLabels)+1` when the flag is on, `1` when off; round-17 + round-23) so the cap continues to count **routes**, not slots; with the flag default to false in PR-B the divisor is 1 and behavior is identical to today (Claude round-24 fix to the prior "AllLabels empty in PR-B" framing — round-23 made AllLabels populated from PR-B, so the divisor=1 reason is the flag, not the array length). Update existing tests **and the coordinator call sites in `kv/sharded_coordinator.go`** (the only non-test caller of `Sampler.Observe`) to pass `label = ""`. | |
There was a problem hiding this comment.
Keep PR-B from conditionally pre-creating labeled slots
This PR-B scope line says labeled-sibling pre-creation is "gated on the flag," which contradicts §4.1.1's earlier PR-B contract that only the legacy slot exists and labeled siblings start in PR-C. If implementers follow this PR-B wording literally, they can add PR-B code that allocates len(AllLabels)+1 slots whenever --keyvizLabelsEnabled=true, reintroducing the early memory/route-budget behavior this doc is trying to defer to PR-C. Please make PR-B unambiguously legacy-slot-only regardless of flag, or explicitly state that enabling the flag in PR-B is unsupported.
Useful? React with 👍 / 👎.
Round-26 Codex P2 on `788b6858`: The round-25 PR-B row said "RegisterRoute only pre-creates the legacy empty-label slot when --keyvizLabelsEnabled=false (the default) — labeled-sibling pre-creation is gated on the flag (round-17)". This implied that an operator setting --keyvizLabelsEnabled=true in PR-B would cause RegisterRoute to pre-create len(AllLabels)+1 slots, reintroducing the early memory/route-budget regression PR-C+D+E tries to defer. Per Codex round-26 P2, PR-B should be unambiguously legacy-slot-only regardless of the flag. The flag-honoring RegisterRoute code lands in PR-C+D+E, not PR-B. Updates: - §7 PR-B row: clarified that PR-B introduces the flag for CLI parsing + Opts plumbing only; RegisterRoute in PR-B unconditionally pre-creates only the legacy slot regardless of the flag value. The flag-honoring loop lands in PR-C+D+E. Per-route slot count is unconditionally 1 in PR-B. - §7 closing paragraph: same clarification — PR-B is legacy-slots-only because the flag-honoring pre-creation loop is in PR-C+D+E, not because the flag is default-false. - §4.1.1 PR-B/PR-C+D+E body pseudocode (sampler.go:416 slotsPerRoute calculation): split into two phases — PR-B body stops at `slotsPerRoute := 1` (literal, no flag check), PR-C+D+E extends to the flag-gated conditional branch. Comment updated accordingly. The flag still exists from PR-B (per round-25 P2 — PR-B needs it on Opts for downstream wiring), but PR-B treats it as dormant for slot pre-creation. PR-C+D+E activates the flag's pre-creation effect in tandem with the adapter-wiring + wire-format changes.
|
@claude review Round-26 follow-up (
Round-25 R25 verdict from Claude bot was "design ready to merge as PR-A — no new issues found", but Codex P2 caught this consistency issue post-R25. Round-26 is the cleanup; the R25 "ready" verdict still holds modulo this clarification. |
|
Claude finished @bootjp's task in 3m 43s —— View job Review (Round 27)
All three Round-26 changes confirmed correctly applied in current HEAD:
Two issues surfaced. Both involve stale §9 Q2 and §4.1.1 references that were not propagated when round-26 locked in the "unconditionally legacy-slot-only in PR-B" semantics. Moderate — §9 Q2 "PR sequencing note" directly contradicts the round-26 §7 PR-B fixLines 835–836:
Round-26's stated fix was precisely the opposite: PR-B's The §9 note says behavior-neutrality comes from the flag default; §7 says it comes from the unconditional code structure (the flag-honoring loop simply isn't in PR-B at all). An implementor reading §9 would think: "If I set The note needs to match §7: -but the round-17 flag-gating means PR-B's `RegisterRoute` only
-pre-creates labeled siblings when `--keyvizLabelsEnabled=true`,
-which defaults to false. So PR-B remains behavior-neutral via
-the flag, not via an empty slice.
+With the round-26 unconditional model, PR-B's `RegisterRoute`
+always pre-creates only the legacy slot regardless of the flag
+— the flag-honoring pre-creation loop is simply not present in
+PR-B (it lands in PR-C+D+E). So PR-B remains behavior-neutral
+unconditionally, not via a flag default or an empty slice.Minor — Dual
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e7b0c132ae
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| onward — but the round-17 flag-gating means PR-B's | ||
| `RegisterRoute` only pre-creates labeled siblings when | ||
| `--keyvizLabelsEnabled=true`, which defaults to false. So |
There was a problem hiding this comment.
Remove PR-B claim that labeled siblings can be pre-created
Fresh evidence in this revision: the §9 PR-sequencing note says PR-B RegisterRoute pre-creates labeled siblings when --keyvizLabelsEnabled=true, which contradicts §7’s PR-B contract that PR-B is legacy-slot-only and defers flag-honoring pre-creation to PR-C+D+E. Leaving this contradiction can cause implementers to reintroduce PR-B label-slot allocation, bringing back the early memory growth and route-cap/coarsening drift the rollout is explicitly trying to avoid.
Useful? React with 👍 / 👎.
| | PR | Scope | | ||
| |---|---| | ||
| | **PR-A** | Land this design doc. | | ||
| | **PR-B** | Sampler API extension: extend the `Sampler` interface declaration (`keyviz/sampler.go:74`) AND the `MemSampler.Observe` implementation to `Observe(routeID uint64, op Op, keyLen, valueLen int, label Label)` (Claude bot round-8 minor: surfaces immediately as a compile error if any mock implementer is missed, but naming it keeps the scope complete). `routeSlot.Label`, slot-key type widens to `slotKey{uint64, Label}`. **PR-B introduces the `--keyvizLabelsEnabled` startup flag** (default `false`) on the server binary — CLI parsing and the `KeyVizLabelsEnabled bool` option on `ShardedCoordinatorOpts` / `MemSamplerOpts` ship in PR-B, so the flag exists from PR-B onward and PR-B's behavior-neutrality claim does not depend on PR-C+D+E (Codex round-25 P2). **`RegisterRoute` in PR-B always pre-creates only the legacy empty-label slot, regardless of the flag** (Codex round-26 P2 — earlier round-25 wording said pre-creation is "gated on the flag" in PR-B, which would have allowed an operator to enable the flag in PR-B and start allocating `len(AllLabels)+1` slots before any of the PR-C+D+E adapter-wiring or wire-format machinery existed). The `RegisterRoute` flag-honoring code lands in **PR-C+D+E**, not PR-B; in PR-B the flag is **dormant** for slot pre-creation (it exists for option plumbing only). Per-route slot count in PR-B is therefore unconditionally 1, regardless of the flag, and PR-B is fully behavior-neutral. **PR-B creates `keyviz/labels.go`** with the `slotKey` struct, the `type Label string` definition, the `LabelLegacy`/`LabelDynamo`/`LabelRedis`/etc. **typed constants**, and `AllLabels` as a **fixed-size array** `[5]Label{LabelDynamo, ..., LabelRawKV}` (round-23 — array form prevents importing-package mutation; see §9 Q2). The constants must exist at compile time so `RemoveRoute`'s `allLabelsWithLegacy()` teardown loop compiles in PR-B; `RegisterRoute`'s flag-honoring pre-creation loop is **not yet present in PR-B** (it lands in PR-C+D+E per the previous paragraph), so PR-B's `RegisterRoute` body is a single legacy-slot create call regardless of the flag. With PR-B unconditionally pre-creating only the legacy slot, per-route slot count stays at 1 (Claude bot round-6 + round-23 array correction + Codex round-26 P2 unconditionality). `MatrixRow.Label` field present. **L0 label copy** (Claude bot round-8 moderate): update `appendDrainedRow` (`keyviz/sampler.go:828`) to include `Label: slot.Label` in the `MatrixRow` struct literal — `slot.Label` is set once at `RegisterRoute` time and immutable, so read it directly rather than threading through `snapshotMeta`. This is the first link in the propagation chain: `slot.Label` → `MatrixRow.Label` (L0, here) → `KeyVizRow.Label` via `newKeyVizRowFrom` (L1, PR-D+E) → merged `KeyVizRow.Label` via `mergeRowInto` (L2, PR-D+E). Missing L0 silently truncates the chain and leaves `MatrixRow.Label = ""` even after PR-C+D+E wires the adapters to pass labels; L1 and L2 then faithfully copy the empty string and the feature collapses. With PR-B's flag-default-false (only legacy slots pre-created), `slot.Label` is `""` so the emitted row is `""`-labeled (identical to today's behavior), but the wiring is in place for PR-C+D+E to flip the flag and start emitting labels. `reclaimRetiredSlot` dedupes on `(RouteID, Label)` (forward-prep; only `Label=""` exists). The `MaxTrackedRoutes` coarsening check (`keyviz/sampler.go:416`) divides by `slotsPerRoute` (= `len(AllLabels)+1` when the flag is on, `1` when off; round-17 + round-23) so the cap continues to count **routes**, not slots; with the flag default to false in PR-B the divisor is 1 and behavior is identical to today (Claude round-24 fix to the prior "AllLabels empty in PR-B" framing — round-23 made AllLabels populated from PR-B, so the divisor=1 reason is the flag, not the array length). Update existing tests **and the coordinator call sites in `kv/sharded_coordinator.go`** (the only non-test caller of `Sampler.Observe`) to pass `label = ""`. | |
There was a problem hiding this comment.
Keep PR-B divisor independent of keyvizLabelsEnabled
The PR-B scope row states RegisterRoute is unconditionally legacy-slot-only in PR-B, but later in the same row it says the coarsening divisor is len(AllLabels)+1 when the flag is on. If implemented literally, enabling the flag on PR-B would divide slot count by 6 while still storing one slot per route, effectively expanding tracked-route capacity far beyond MaxTrackedRoutes and violating the documented behavior-neutral semantics for PR-B.
Useful? React with 👍 / 👎.
Claude R27 verdict on `e7b0c132`: "design is ready to merge
as PR-A after these 2 fixes". Both items addressed:
- §9 Q2 PR sequencing note (lines 835-840) directly
contradicted the round-26 §7 PR-B fix. Said "PR-B's
RegisterRoute only pre-creates labeled siblings when
--keyvizLabelsEnabled=true" — opposite of round-26's
"unconditionally legacy-slot-only regardless of the
flag". Updated to match: "With the round-23 array form,
AllLabels is non-empty from PR-B onward, AND with the
round-26 unconditional model, PR-B's RegisterRoute
always pre-creates only the legacy slot regardless of
the flag — the flag-honoring loop is simply not present
in PR-B (lands in PR-C+D+E). PR-B remains
behavior-neutral unconditionally, not via flag default
or empty slice." The flag is introduced in PR-B for
Opts plumbing only; its read-site doesn't exist in PR-B
so its value has no observable effect.
- §4.1.1 dual `slotsPerRoute :=` in single Go code block
(lines 285-291) is a Go compile error — two `:=`
declarations in the same scope ("no new variables on
left side of :="). Split into two separate code blocks:
PR-B body (unconditional `slotsPerRoute := 1`) and
PR-C+D+E body (replaces the PR-B body with the flag-gated
conditional). Added explicit narrative: "PR-C+D+E
replaces the literal 1 with the flag-gated conditional".
|
@claude review Round-27 follow-ups (
|
|
Claude finished @bootjp's task in 5m 22s —— View job Review (Round 28)
Both Round-27 items confirmed resolved in current HEAD:
Two minor naming inconsistencies remain in §4.1.1. Both can be folded into the PR-B doc commit. Minor — §4.1.1 section heading and prose use "PR-C" instead of "PR-C+D+E"§7 renamed the bundle to "PR-C+D+E (Bundled)" to emphasise that adapter wiring, wire-format changes, and pivot-key widening must ship together. Three locations in §4.1.1 still say "PR-C":
An implementor following §4.1.1 in isolation might believe the Minor — RegisterRoute pseudocode mixes PR-B and PR-C in a single fenceLines 202–224 show the PR-B body and the "PR-C extends to" block in a single Go code fence: // PR-B body:
slot := s.reclaimRetiredSlot(routeID, "")
...
next.slots[slotKey{RouteID: routeID, Label: ""}] = slot
// PR-C extends to (flag-gated, Codex round-15-4th-pass P1):
labelsToCreate := []Label{LabelLegacy}
if s.opts.KeyVizLabelsEnabled {
labelsToCreate = allLabelsWithLegacy()
}
for _, label := range labelsToCreate { ... }Unlike the The What looks goodAll Round-27 changes verified against source:
The two minor items can be addressed in the PR-B doc commit. Design is ready to merge as PR-A. |
Codex round-27 P1 #2 on `e7b0c132` (still applicable to `44d7cd45`): PR-B row claimed RegisterRoute is unconditionally legacy-slot-only (round-26 fix), but later in the same row said "MaxTrackedRoutes coarsening divides by slotsPerRoute (= len(AllLabels)+1 when the flag is on, 1 when off)". Implementer setting --keyvizLabelsEnabled=true on PR-B binary would compute divisor=6 with only 1 slot per route, expanding tracked-route capacity 6× beyond MaxTrackedRoutes. Fixed by aligning the §7 PR-B row's divisor description with round-26's unconditional model: - "MaxTrackedRoutes coarsening check in PR-B uses slotsPerRoute as the literal 1 — same unconditionality reason as the RegisterRoute body above; the flag-gated conditional lands in PR-C+D+E, not PR-B" - "PR-B with slotsPerRoute=1 keeps the cap counting routes (1 slot per route × N routes ≤ MaxTrackedRoutes), behavior identical to today regardless of any flag value" Also tightened the secondary "flag-default-false" reference in the L0 chain failure-mode discussion: the legacy-only behavior comes from the unconditional code structure, not from the flag default. Updated to "With PR-B unconditionally pre-creating only the legacy slot (regardless of flag, per round-26 P2)". Codex round-27 P1 #1 (PR-B labeled-siblings claim in §9 Q2) was already addressed in round-27 commit 44d7cd4.
|
@claude review Round-28 follow-up (
Codex P1 #1 (the §9 Q2 PR sequencing note) was already addressed in round-27 commit |
|
Claude finished @bootjp's task in 3m 8s —— View job Review (Round 28)
Both Round-27 items confirmed resolved in current HEAD:
Both changes are consistent with the round-26/27 unconditional-model fixes:
Also verified against current Final stateNo new issues found. All items from rounds 1–28 are resolved. The design is ready to merge as PR-A. Complete source verification:
|
Summary
Proposes a follow-up KeyViz feature: per-adapter labels so a single Raft route can split into per-adapter sub-rows in the heatmap.
Trigger: a user observed that on a default single-Raft-group deployment the heatmap shows one row at
route:1covering all five adapters (DynamoDB / Redis / S3 / SQS / RawKV) with no way to tell which adapter generated the traffic. Splitting routes viaSplitRangeor the multi-group startup flags (--raftRedisMap,--raftDynamoMap, etc.) works as an operational workaround, but the most common deployment shape (single group) is currently opaque on this axis.Recommended approach (§4.1)
Extend
Observe(routeID, op, keyLen, valueLen, label)with a per-call label string. Each adapter sets a constant at its dispatch entry intoShardedCoordinator; empty label preserves today's behaviour. Slot key widens fromrouteIDto(routeID, label)— one extra map lookup per Observe.Forward-compatible wire-format extension: optional
labelfield onKeyVizRow/MatrixRow; old SPA against new server ignores it, new SPA against old server falls back to legacy formatting.Implementation plan (§7)
MatrixRow.Labelkeyviz/labels.golabelfield; SPA rendersroute:N / <label>labelWhat this PR does NOT do
Per
CLAUDE.mddesign-doc-first: this PR is the proposal only. PR-B onward starts after this doc is reviewed and accepted. Listed deferrals in §10:Test plan
Summary by CodeRabbit