docs: propose KeyViz adapter / namespace labels (follow-up) by bootjp · Pull Request #694 · bootjp/elastickv

bootjp · 2026-04-27T16:18:21Z

Summary

Proposes a follow-up KeyViz feature: per-adapter labels so a single Raft route can split into per-adapter sub-rows in the heatmap.

Trigger: a user observed that on a default single-Raft-group deployment the heatmap shows one row at route:1 covering all five adapters (DynamoDB / Redis / S3 / SQS / RawKV) with no way to tell which adapter generated the traffic. Splitting routes via SplitRange or the multi-group startup flags (--raftRedisMap, --raftDynamoMap, etc.) works as an operational workaround, but the most common deployment shape (single group) is currently opaque on this axis.

Recommended approach (§4.1)

Extend Observe(routeID, op, keyLen, valueLen, label) with a per-call label string. Each adapter sets a constant at its dispatch entry into ShardedCoordinator; empty label preserves today's behaviour. Slot key widens from routeID to (routeID, label) — one extra map lookup per Observe.

Forward-compatible wire-format extension: optional label field on KeyVizRow / MatrixRow; old SPA against new server ignores it, new SPA against old server falls back to legacy formatting.

Implementation plan (§7)

PR	Scope
A	This doc
B	Sampler API extension + `MatrixRow.Label`
C	Adapter wiring + canonical label constants in `keyviz/labels.go`
D	proto + JSON `label` field; SPA renders `route:N / <label>`
E	Aggregator merge key gains `label`

What this PR does NOT do

Per CLAUDE.md design-doc-first: this PR is the proposal only. PR-B onward starts after this doc is reviewed and accepted. Listed deferrals in §10:

Per-table / per-bucket / per-queue / per-Redis-DB sub-labels (only the adapter family is in scope here)
Operator-configurable label taxonomy
Persistence of labelled rows (rides along once Phase 3 persistence ships)
Adapter-aware split triggers

Test plan

Design review: confirm Option 4.1 (per-Observe label) is the right pick vs. 4.2 (per-adapter sampler), 4.3 (operator taxonomy), 4.4 (catalog-level)
Confirm forward-compatibility expectations (old SPA + new server, new SPA + old server) match operator preferences
§9 open questions worth a look — especially Q1 (single segment vs. hierarchical) and Q2 (canonical label allocation)

Summary by CodeRabbit

Documentation
- Added a KeyViz design proposal introducing an adapter-level label axis for splitting heatmap rows within a route. Defines wire-format label handling and composite bucket IDs, merge/aggregation behavior to preserve labels across distributed transforms, rollout-safe toggle, label syntax constraints, canonical label constants, and a targeted test checklist for pre-allocation, reclaim, merge propagation, and virtual-bucket behavior.

A user observed that on a default single-Raft-group deployment the KeyViz heatmap shows one row covering all five adapters (DynamoDB / Redis / S3 / SQS / RawKV) — the smallest unit of attribution today is a Raft route, and one route covers [empty, empty) by default. Splitting routes via SplitRange or multi-group startup flags works as an operational workaround but the most common shape (single-group) is opaque. This proposal adds an independent label dimension to the sampler so a single route can split into per-adapter sub-rows in the heatmap, with zero hot-path penalty when labels are not configured. Recommended approach (4.1): per-Observe label string; slot key widens from routeID to (routeID, label). One extra map lookup per Observe; legacy callers pass empty label and behaviour is unchanged. Forward-compatible wire-format extension: optional `label` on KeyVizRow / MatrixRow; old SPAs ignore it, new SPAs render "route:N / <label>" when present. Five PRs of follow-up work outlined in 7. Implementation does not start until this doc is reviewed and accepted.

coderabbitai · 2026-04-27T16:18:28Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Introduces a new KeyViz design document proposing an independent adapter/namespace label dimension for heatmap rows, changes to sampler/slot key semantics to include typed labels, wire-format and bucket_id updates, coordinator startup flag for rollouts, and detailed implementation and testing guidance.

Changes

Cohort / File(s)	Summary
Design doc `docs/design/2026_04_28_proposed_keyviz_adapter_labels.md`	Adds a new KeyViz design proposal (~785 lines) defining: typed `Label` in sampler interface, slot keys widened to `(routeID, label)`, labeled slot lifecycle and `RemoveRoute` behavior, `--keyvizLabelsEnabled` startup flag, wire-format `label` and composite `bucket_id`, cluster merge/copy rules to preserve labels, canonical label constants and constraints, and a detailed implementation checklist and test targets.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

feat(keyviz): in-memory Sampler + ring buffer + RunFlusher #639 — Implements the in-memory Sampler whose interface/signature the design proposes to extend with a typed Label (requires aligning sampler implementation).

"I hopped through docs with a tiny scroll,
New labels bloom to brighten the roll.
Rows wear names, buckets hum along,
Coordinator sings the rollout song.
— 🐇"

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'docs: propose KeyViz adapter / namespace labels (follow-up)' accurately and specifically describes the main change—a design document proposing new labeling functionality for KeyViz.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/keyviz-adapter-labels

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bootjp · 2026-04-27T16:19:55Z

@claude review

gemini-code-assist

Code Review

This pull request introduces a design proposal for adding adapter-specific labels to the KeyViz heatmap, allowing for more granular traffic attribution within a single Raft route. The proposal recommends extending the sampler's Observe API and the wire format to include a label string while maintaining backward compatibility for existing clients. Feedback on the design highlights a potential issue where bucket_id uniqueness is broken, which could impact existing RPCs and operational stability. Additionally, there is a request for clarification on how labeled slots will be populated in the routeTable without introducing locking overhead on the performance-critical Observe path.

gemini-code-assist · 2026-04-27T16:20:06Z

+`bucket_id` formatting in the wire shape stays as-is; the label
+is a separate field. This keeps the `route_ids` / `aggregate`
+semantics from §5 of the parent design unchanged.


The design suggests that bucket_id formatting stays as-is (e.g., route:1) and the label is a separate field. This creates a problem for the GetRouteDetail(bucket_id) RPC and the SPA's row identification, as bucket_id is no longer unique per heatmap row. This change presents a significant operational risk for rolling upgrades and system migration. Per repository guidelines, please detail potential mitigation strategies (e.g., a temporary 'bridge' or 'proxy' mode) and strategies for zero-downtime cutovers to avoid service interruption.

References

When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary 'bridge' or 'proxy' mode.

When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers to avoid service interruption.

gemini-code-assist · 2026-04-27T16:20:06Z

+Cost on hot path: one extra map lookup per Observe — slot is now
+keyed by `(routeID, label)` instead of just `routeID`. The map
+key is a struct of `{uint64, string}`; the `label` is a small
+interned constant (`"dynamo"`), so allocation should be zero.


The proposal states that Observe will use a map keyed by (routeID, label) and that RegisterRoute will be a "no-op" for labels. However, Observe is designed to be a lockless hot-path operation that relies on a pre-populated routeTable. If RegisterRoute does not pre-register the (routeID, label) combinations, Observe will fail to find a slot. If Observe were to create slots dynamically, it would require a mutex or a copy-on-write operation, violating the "zero hot-path penalty" goal. Please clarify how labeled slots are populated into the routeTable while maintaining a lockless Observe path.

claude · 2026-04-27T16:20:12Z

Claude finished @bootjp's task in 3m 19s —— View job

Review

Gather context
Read design document
Explore existing KeyViz implementation
Post comprehensive review

The motivation is sound and Option 4.1 (per-Observe label) is the right pick over 4.2–4.4. The main problem is that the design is underspecified where it matters most: how labeled slots enter the COW route table without breaking the lockless hot path. There are also a few concrete existing-code conflicts that downstream PRs will run straight into. Details below.

Critical — slot lifecycle for labeled slots is underspecified (§4.1, §7 PR-B, §8 lens 2)

The existing sampler's lockless invariant depends on every slot being pre-registered in the COW routeTable before Observe runs. Observe does Load → map-lookup → atomic.Add with no fallback path: a miss is silently dropped (return). This is intentional and correct for the current design.

PR-B's scope says "RegisterRoute no-op (label is per-Observe, not per-route)" — but if the map key widens to (routeID, label), and RegisterRoute doesn't create (routeID, "dynamo") / (routeID, "redis") / … slots, then the first Observe call from each adapter will miss and be silently dropped until something else creates the slot. §8 lens 2 says "slot-key change is contained in the routesMu COW path; the hot-path Load + map lookup keeps the same shape" — this implicitly assumes slots are pre-created, but the design doesn't say how.

Three options worth calling out explicitly:

Option A (recommended): RegisterRoute pre-creates one slot per known label constant using the adapter/keyviz_labels.go set. This preserves the lockless invariant exactly. Cost: sampler now imports a constant from the adapter package (an import-cycle risk — constants probably live in a shared keyviz package instead). Slot count is exactly N_labels × M_routes, predictable, no dynamic allocation on the hot path.
Option B: On an Observe miss for (routeID, label), fall back to (routeID, ""). This keeps the lockless path and degrades gracefully if an adapter fires before its route is registered, but the "first Observe silently loses its label" behavior is surprising and would make PR-C bugs hard to diagnose.
Option C: First-seen-label creates a slot under a mutex inside Observe. This breaks the zero-mutex hot-path contract and should be a non-option, but it should be explicitly rejected so implementers don't reach for it.

The design needs to pick one and spell out the RegisterRoute behavior accordingly. The §8 lens-2 and PR-B scope descriptions should be updated to match.

Critical — `pivotKeyVizColumns` in `keyviz_handler.go` is keyed by `routeID` alone

internal/admin/keyviz_handler.go:312:

rowsByID := make(map[uint64]*KeyVizRow)

When two MatrixRow values share the same RouteID but carry different labels (e.g., {RouteID: 1, Label: "dynamo"} and {RouteID: 1, Label: "redis"}), the second one overwrites the first in this map. The labeled rows will be silently collapsed back into a single row — exactly the bug this feature is trying to fix.

PR-D's scope says "SPA renders route:N / <label> when present" but doesn't call out that pivotKeyVizColumns also needs to widen its pivot key to (routeID, label). This should be made explicit in the design.

Similarly, bucketIDFor at line 383 produces "route:1" for both rows — so even after the map is fixed, the JSON bucket_id field is still non-unique. The design (§5) says "bucket_id formatting stays as-is; the label is a separate field" but that means bucket_id is no longer a row identifier, which breaks:

applyKeyVizRowBudget tiebreak at line 403: rows[i].BucketID < rows[j].BucketID is not deterministic when two rows share the same BucketID and total.
sortKeyVizRowsByStart at line 412: same tiebreak issue.
Any SPA interaction that uses bucket_id as a row key (the Phase 2-B flyout shows bucket_id as the row identity).

Recommendation: when Label is non-empty, format bucket_id as "route:1/dynamo" (composite). That makes bucket_id unique again, preserves backward compat (empty label → existing format), and sidesteps the RPC and tiebreak issues. Document this explicitly in §5.

Fix this →

Critical — `routeSlot` has no `Label` field; `Flush`/`appendDrainedRow` can't emit `MatrixRow.Label`

appendDrainedRow (sampler.go:828) materialises MatrixRow from a *routeSlot. routeSlot carries RouteID, Start, End, Aggregate, MemberRoutes — but no Label. Without a Label field on the slot, Flush has no way to set MatrixRow.Label. This is a concrete gap that PR-B will need to solve; the design should call it out rather than leaving it implicit.

Moderate — `MaxTrackedRoutes` silent behavior change at PR-C rollout

When PR-C wires adapter labels, the effective number of slots in a default single-group deployment jumps from M routes to ~5M (one per label per route). An operator running MaxTrackedRoutes=1024 with 300 routes will instantly start coarsening all traffic into a virtual bucket (300 routes × 5 labels = 1500 slots > 1024 cap) — with no warning, no config change, and no explanation in the heatmap other than "one aggregate row for everything".

The design acknowledges the worst-case slot count in §4.1 but buries the operator implication. This should be a first-class callout, ideally as a --keyvizMaxTrackedRoutes guidance note in §7 PR-C: "operators should raise the cap before deploying PR-C if they have more than MaxTrackedRoutes / N_labels routes".

Moderate — fan-out merge key gap between PR-D and PR-E

The Phase 2-C fan-out aggregator uses BucketID as the merge key (from keyviz_fanout.go). After PR-D ships the label field on the wire, two nodes will emit rows with the same BucketID but different labels. The aggregator, still using BucketID alone as the merge key, will collapse route:1/dynamo from node A and route:1/redis from node B into a single merged row — the opposite of what this feature wants.

PR-E's scope says "Aggregator merge key gains label" — but between D and E there is a window where the aggregator actively mis-merges labeled rows. Either:

Ship PR-D and PR-E together (simplest), or
Make PR-E a prerequisite of PR-D (not PR-B/C), or
Explicitly document the transient state and why it's acceptable (perhaps the SPA is not yet rendering labels between D and E so operators won't notice the merge error).

Minor — virtual bucket + label interaction is unspecified

If routes overflow MaxTrackedRoutes and fold into a virtual bucket, their traffic is pooled across all labels into one slot. An operator who enabled labels expecting per-adapter breakdown will instead see a single unlabeled virtual row covering all adapters for those routes. The design is silent on this edge case. A one-sentence explicit deferral ("virtual buckets always emit an empty label; per-label breakdown is only available for individually-tracked routes") would close the ambiguity.

Minor — `reclaimRetiredSlot` needs a `(routeID, label)` reclaim key

reclaimRetiredSlot (sampler.go:710) deduplicates re-registered routes by RouteID match. With labels, a route removed as (routeID, "dynamo") and re-registered as (routeID, "dynamo") inside the grace window should be reclaimed; re-registration with a different label should not reclaim the old slot. The current loop checks only r.slot.RouteID == routeID. This is a small implementation detail that PR-B will surface; worth noting in the design.

Minor — Q1 label separator ambiguity

§9 Q1 proposes a /-delimited convention for hierarchical labels (dynamo / users). If composite bucket_id uses / as the separator between route and label (recommended above), a nested label that itself contains / makes parsing bucket_id ambiguous at the consumer. If hierarchical labels ever land, prefer a different separator (:, ., or a structured proto field) rather than /.

What looks good

The Option 4.1 / 4.2 / 4.3 / 4.4 trade-off analysis is thorough and the recommendation is correct.
§8 lens-4 (forward compat) is well-reasoned — old SPA / new server and new SPA / old server are both handled.
Q3 (always carry full breakdown on the wire) is the right call. SPA-level filtering is cheap; re-querying for the collapsed view is not.
The label constants in adapter/keyviz_labels.go as a single audit point is the right design; the review burden approach to catching "DynamoDB" vs "dynamo" variants is practical.
PR sequencing (B then C then D then E) is sensible given the compatibility guarantees, except for the D/E gap noted above.

Addressing the slot-lifecycle gap (how labeled slots enter the COW route table) and the pivotKeyVizColumns pivot-key collision are the blockers before PR-B can be well-specified. The rest are design notes that should be folded into the doc before PR-B lands.

Eight items from round-1 review on the adapter-labels proposal: - 4.1.1 (new) slot lifecycle: pin the lockless invariant by having RegisterRoute pre-create one slot per known label at registration time. Add routeSlot.Label and document reclaimRetiredSlot dedupe on (RouteID, Label). Reject the two alternatives (silent fall-back and hot-path mutex) explicitly. - 5 composite bucket_id: when Label is non-empty, bucket_id = "route:N:label" so it stays globally unique. Without this, pivotKeyVizColumns (keyed by RouteID alone) silently collapses same-route different-label rows back into one row. The pivot-key widening in internal/admin/keyviz_handler.go is now an explicit PR-D+E scope item. - 5 separator: route-vs-label uses `:` (not `/`) so a future hierarchical label using `/` does not produce parser ambiguity. (Gemini minor.) - 6 aggregator: clarify that the bucketID widening encodes the new label dimension; the merge tuple width does not change. - 6.5 (new) virtual buckets: explicit deferral that virtual buckets always emit Label="" - per-label breakdown is only available below MaxTrackedRoutes. - 7 PR-D and PR-E ship together: between them, the aggregator would mis-merge labelled rows (BucketID alone as merge key). Combine into a single PR-D+E. PR-B/C remain independent. - 7 PR-C operator note: refuse to start with a warning when len(routes) x len(labels) > MaxTrackedRoutes - otherwise hot routes silently fold into the virtual bucket. - 9 Q2: canonical constants live in keyviz/labels.go, not adapter/keyviz_labels.go - sampler must not import back to adapter (cycle).

bootjp · 2026-04-27T16:42:33Z

@claude review

Round-1 follow-ups (3c14f29) addressing all 8 items from the round-1 review:

Critical / blocker:

§4.1.1 (new) slot lifecycle — RegisterRoute now pre-creates one slot per known label at registration time, preserving the lockless Observe invariant. Added routeSlot.Label, documented reclaimRetiredSlot dedupe on (RouteID, Label), and explicitly rejected the silent-fallback and hot-path-mutex alternatives.
§5 composite bucket_id — labelled rows now use bucket_id = "route:N:label" so the field stays globally unique. Calls out that pivotKeyVizColumns in internal/admin/keyviz_handler.go must widen its pivot key from uint64 → (uint64, string) to mirror the sampler.
routeSlot.Label field — added to the struct definition in §4.1.1 so PR-B has a concrete contract for appendDrainedRow to read.

Moderate:

§7 PR-D + PR-E ship together — splitting them creates a window where the aggregator (still keyed by BucketID alone) collapses labelled rows. Merged into a single PR-D+E.
§7 PR-C operator note — len(routes) × len(labels) > MaxTrackedRoutes refused at startup with a warning; without this, hot routes silently fold into the virtual bucket and operators lose per-label breakdown.

Minor:

§5 separator — : instead of / so a future hierarchical label (dynamo/users) doesn't conflict with the route-vs-label boundary. Added a §9 Q1 callout.
§6.5 (new) virtual bucket deferral — virtual buckets always emit Label=""; per-label is only for individually-tracked routes.
§9 Q2 location — canonical constants live in keyviz/labels.go, not adapter/keyviz_labels.go. Sampler-side imports must not climb back to adapter (cycle).

The "what looks good" items in your round-1 (4.1 vs 4.2/4.3/4.4 trade-off, forward-compat reasoning, Q3, label allocation discipline) are unchanged from round-1.

claude · 2026-04-27T16:42:50Z

Claude finished @bootjp's task in 4m 7s —— View job

Review (Round 2)

Gather context
Read updated design document
Check KeyViz implementation for remaining gaps
Post comprehensive review

All eight round-1 items are addressed. The design is in a substantially better state; the §4.1.1 slot-lifecycle section is the clearest improvement. Three remaining issues worth fixing before PR-B goes up for review, in rough priority order.

Moderate — §7 PR-C: "refuse to start" is wrong semantics for a dynamic catalog

The current text says: "PR-C should refuse to start with a warning when len(routes) × len(labels) > MaxTrackedRoutes."

Routes are dynamic — RegisterRoute is called by the watcher whenever a new route appears in the catalog (routes can be added after startup via SplitRange and the control-plane RPCs). There is no "startup count of routes" to check. The process would start fine, then later route additions would push the slot count over the cap — and those routes would silently fold into the virtual bucket with no diagnostic from the startup check.

The existing coarsening path already handles this gracefully: RegisterRoute returns false when a route is folded into a virtual bucket. The right operator signal is a log warning inside RegisterRoute at the point the route is coarsened, something like:

slog.Warn("route folded into virtual bucket due to label-induced slot overflow",
    "route_id", routeID,
    "slots_used", len(tbl.slots),
    "max_tracked_routes", s.opts.MaxTrackedRoutes)

The operator note in §7 PR-C is still correct (raise the cap proportionally), but "refuse to start" should become "emit a warning when the first coarsening event fires due to label-induced overflow".

Fix this →

Moderate — §4.1.1: labeled slot creation for coarsened routes is unspecified

§4.1.1 says "RegisterRoute pre-creates one slot per known label at registration time." But RegisterRoute sometimes folds a route into a virtual bucket instead of giving it an individual slot. If that path still pre-creates N_labels labeled slots and then immediately discards them, that is wasted allocation. If it doesn't, then an Observe(routeID, "dynamo") call for a coarsened route misses in tbl.slots and falls through to tbl.virtualForRoute[routeID] — which is correct (§6.5: virtual buckets aggregate across labels) but isn't spelled out.

The design should add one sentence: "When RegisterRoute determines a route will be coarsened into a virtual bucket (i.e., it returns false), no labeled slots are created for that route; the tbl.virtualForRoute[routeID] fallback in Observe handles all labels uniformly, consistent with §6.5."

Without this, a PR-B implementor may create labeled slots for coarsened routes, then wonder why they are never hit.

Fix this →

Minor — §8 lens-5: two test cases missing from the plan

The current test plan covers the happy path and the burst invariant well. Two gaps:

RegisterRoute slot count — a test that registers 1 route and verifies len(tbl.slots) == N_labels + 1 (N labeled + 1 empty-label legacy slot). Without this, PR-B could accidentally skip a label constant in the pre-creation loop and the missing slot would only be caught when that adapter fires traffic.
reclaimRetiredSlot (RouteID, Label) dedupe — a test that retires (routeID, "dynamo"), then re-registers (routeID, "dynamo") and verifies reclaim succeeds; then retires (routeID, "dynamo") again, re-registers as (routeID, "redis"), and verifies reclaim does not fire (the dynamo slot stays in the retired list to be drained by the grace window). The current implementation at keyviz/sampler.go:710 matches only on RouteID — this test would catch the regression if the implementor forgets to also match Label.

Fix this →

Minor — §4.1.1: "restart the sampler" note is slightly misleading

"New labels added after a route is registered are not auto-bound; an operator deploying a new adapter must restart the sampler."

The canonical label set lives in keyviz/labels.go. Adding a new adapter label means adding a constant to that file and shipping a new binary. A "restart" without a new binary changes nothing — the label isn't in the compiled constant set. The sentence should say "must deploy a new binary (which includes restarting)" to avoid operators thinking a process restart alone will pick up a new label.

What looks good

§6 aggregator merge: the insight that encoding the label into bucket_id means the aggregator's map[string]*KeyVizRow already separates labeled rows with no struct changes is elegant and correct. The fan-out mergeKeyVizMatrices at internal/admin/keyviz_fanout.go:471 keys on BucketID; route:1:dynamo and route:1:redis are distinct keys, so the N_labels×M_routes breakdown is naturally preserved across fan-out without touching the merge machinery.
§5 separator choice (: not /): confirmed against the existing bucketIDFor format; "route:1:dynamo/users" would parse correctly as routeID=1, label=dynamo/users by splitting on the second :.
pivotKeyVizColumns collision fix (widening to (uint64, string) pivot key) is now explicitly called out; confirmed against the actual rowsByID := make(map[uint64]*KeyVizRow) at internal/admin/keyviz_handler.go:312.
PR-D+E ship together: confirmed that between PR-C and pre-D+E the handler merges labeled MatrixRows into a single route:N row (because rowsByID[mr.RouteID] reuses the existing entry and accumulates values). Operators see the same aggregated view as before labels — no data loss, just the breakdown is not yet surfaced. The design's "keep showing one row per route until D+E ship" claim is accurate.
§6.5 virtual bucket deferral, §9 Q2 label location, and explicit rejection of the hot-path-mutex alternative are all solid.

The two moderate items are the things to fix before PR-B goes up; the two minor items can be folded into the PR-B doc commit or addressed as review comments there. The overall design is ready to proceed.

Four follow-ups from round-2: - 7 PR-C: replace "refuse to start" with a structured slog.Warn emitted from inside RegisterRoute at coarsening time. Routes are dynamic (the catalog watcher calls RegisterRoute when a SplitRange lands), so a startup-time check would miss every route added after process start. The right operator signal is a log line at the moment a coarsening event fires. - 4.1.1: spell out the coarsened-route path. RegisterRoute that returns false (route folded into a virtual bucket) does NOT pre-create labeled slots; the virtualForRoute fallback in Observe handles all labels uniformly per 6.5. - 4.1.1: tighten the new-label rollout note. "Restart the sampler" was misleading because the canonical set lives at compile time; "deploy a new binary" is what operators actually need. - 8 lens-5: add two missing test cases - slot pre-creation count (1 route -> N_labels+1 slots) and reclaimRetiredSlot (RouteID, Label) dedupe (retire dynamo + re-register redis must NOT reclaim). Plus the coarsened-route + label invariant (label="" on virtual buckets) for completeness.

bootjp · 2026-04-27T16:51:57Z

@claude review

Round-2 follow-ups (ce8f95b):

§7 PR-C / dynamic catalog — replaced refuse to start with a structured slog.Warn emitted from inside RegisterRoute at coarsening time. Right point: routes are added dynamically (catalog watcher calls RegisterRoute whenever a SplitRange lands), so a startup-time check would miss every route added after process start.
§4.1.1 / coarsened routes — added an explicit paragraph: when RegisterRoute returns false (route folded into a virtual bucket), no labeled slots are created; the virtualForRoute fallback in Observe handles all labels uniformly per §6.5. PR-B implementors won't waste cycles allocating slots that are never hit.
§4.1.1 / new-label rollout note — tightened from restart the sampler to deploy a new binary. Process restart alone changes nothing because the canonical set lives at compile time.
§8 lens-5 / test cases — added the two missing tests: slot pre-creation count (1 route → N_labels + 1 slots) and reclaimRetiredSlot (RouteID, Label) dedupe (retire dynamo + re-register redis must not reclaim). Also added the coarsened-route + label invariant (label="" on virtual buckets) for completeness.

claude · 2026-04-27T16:52:10Z

Claude finished @bootjp's task in 3m 2s —— View job

Review (Round 3)

Gather context
Read updated design document (ce8f95b)
Check existing KeyViz implementation for remaining gaps
Post comprehensive review

All four Round-2 items are correctly resolved. The dynamic-catalog slog.Warn, the coarsened-route paragraph, the "deploy a new binary" wording, and the two new §8 lens-5 test cases all read correctly against the actual code. Two moderate issues remain before PR-B can be well-specified; three minor items can be folded into the PR-B doc commit.

Moderate — `virtualForRoute` key type is unspecified after the slot-key widening

§4.1.1 says the routeTable.slots map key widens from uint64 to slotKey{uint64, string}. But routeTable.virtualForRoute (keyviz/sampler.go:218) is a separate map[uint64]*routeSlot — it maps a real RouteID to its coarsening bucket and must stay keyed by uint64 only.

This matters because the new Observe lookup chain must be:

1. tbl.slots[slotKey{routeID, label}]  → hit for individually-tracked routes
2. miss → tbl.virtualForRoute[routeID] → hit for coarsened routes (any label)

Step 2 still uses routeID alone, not slotKey{routeID, label}. If a PR-B implementor accidentally widens virtualForRoute to map[slotKey]*routeSlot as well, coarsened-route Observe calls will miss both lookups and silently drop all traffic for those routes — exactly the kind of silent data loss that is hard to diagnose in production.

The design should add one explicit sentence to §4.1.1: "virtualForRoute stays map[uint64]*routeSlot; the Observe fallback still keys on routeID alone because virtual buckets have no label dimension (§6.5)." The new Observe pseudocode would also make this concrete:

slot, ok := tbl.slots[slotKey{routeID, label}]
if !ok {
    slot, ok = tbl.virtualForRoute[routeID]  // label is irrelevant here
    if !ok {
        return
    }
}

Fix this →

Moderate — `KeyVizRow` wire form after PR-D is unspecified in §5

§5 shows the MatrixRow (Go sampler type) diff gaining Label string. The PR description says "proto + JSON label field" for PR-D. But §5 never shows what the JSON KeyVizRow struct (internal/admin/keyviz_handler.go:81) looks like after PR-D. Two possible shapes:

Option A: KeyVizRow gets no label field; the SPA parses route:1:dynamo from BucketID by splitting on the second :.
Option B: KeyVizRow gets Label string \json:"label,omitempty"`in addition to the compositeBucketID`.

Option B is what the PR description implies and is the better choice (more ergonomic for SPA consumers; avoids string-parsing on the client). But without a clear choice in the design, a PR-D+E implementor and SPA author are working from a different mental model.

Pick one and add a diff block to §5 showing the KeyVizRow change, just as §5 already shows the MatrixRow change. If Option B, note that forward-compat still holds: old SPAs see an unrecognised label field (ignored by JSON) and still use BucketID for row identity.

Fix this →

Minor — `AllLabels` (or equivalent) export is unnamed in the design

§4.1.1 says "RegisterRoute pre-creates one slot per known label at registration time" and "The label set is the canonical keyviz/labels.go constants". But RegisterRoute needs to iterate the complete label set — it needs something like keyviz.AllLabels []string or keyviz.LabelConstants() []string. §9 Q2 names the individual constants (LabelDynamo, LabelRedis, LabelS3, LabelSQS, LabelRawKV) but does not name the exported slice/function that RegisterRoute would range over.

A PR-B implementor who writes RegisterRoute will need to define this. One sentence in §9 Q2 closes the gap: "Alongside the individual constants, keyviz/labels.go exports AllLabels []string — the pre-allocation loop in RegisterRoute ranges over this slice."

Minor — PR-B scope: coordinator call sites omitted

PR-B scope says "Update existing tests to pass empty labels." The coordinator (kv/sharded_coordinator.go) is the primary non-test caller of Sampler.Observe. After the interface widens to Observe(... label string), all coordinator call sites must also pass label="" (or the empty constant from keyviz/labels.go). These are not test files; leaving them out of the scope note could mislead a PR-B implementor into believing only test files need updating.

One line fix: "Update existing tests and coordinator call sites to pass label=""."

Minor — `slog.Warn` at coarsening may flood logs on startup

The PR-C note correctly moves from "refuse to start" to "emit slog.Warn inside RegisterRoute at coarsening time." One edge case: if an operator restarts a node that has 2000 routes already in the catalog and MaxTrackedRoutes is set too low, every re-registered route calls RegisterRoute sequentially via the watcher and emits a warning — potentially 1000+ log lines in the first few seconds. This is loud but not incorrect. Noting in §7 PR-C that operators should raise the cap before a binary rollout (not just after seeing warnings) would help; the current text implies the warning is the operator's first signal, which is correct but results in a log storm on an already-misconfigured node.

What looks good

§6 aggregator merge: confirmed against keyviz_fanout.go:485 — mergeKeyVizMatrices keys on BucketID in the rowsByBucket map. Composite bucket_id (route:1:dynamo) automatically separates same-route different-label rows without any merge-logic changes. The PR-D+E bundling claim holds exactly.
All Round-2 changes verified correct against the source: reclaimRetiredSlot:710 match-on-RouteID noted as a PR-B target; pivotKeyVizColumns:312 map[uint64]*KeyVizRow correctly identified as needing widening; bucketIDFor:383 current format confirmed.
routeSlot.Label as an immutable field (set once at creation, like RouteID) means appendDrainedRow can read slot.Label directly without adding it to snapshotMeta — clean design.
The coarsened-route paragraph in §4.1.1 resolves the "wasted slot allocation" concern cleanly.
§8 lens-5 slot-count and reclaimRetiredSlot dedupe test cases are exactly the right regression guards for PR-B.

The two moderate items (virtual-bucket fallback key and KeyVizRow wire form) are the things to fix before PR-B review; the minor items can be addressed in the PR-B doc commit. Design is otherwise ready to proceed.

Five follow-ups from round-3, all doc-level: - 4.1.1 (moderate): pin virtualForRoute as map[uint64]*routeSlot (NOT widened to slotKey). Coarsened-route Observe falls back to this map on a labeled-slot miss, so widening the key would silently drop all coarsened-route traffic. Added pseudocode showing the exact lookup chain. - 5 (moderate): add the explicit JSON KeyVizRow diff alongside MatrixRow and the proto. Carry the label as a dedicated field AND keep bucket_id composite — the field is for SPA ergonomics (no client-side parsing), bucket_id is the globally-unique row identifier merge logic uses. - 9 Q2 (minor): name the exported AllLabels slice that RegisterRoute iterates over. The PR-B implementor needs an exported handle, not just the per-adapter constants. - 7 PR-B (minor): coordinator call sites in kv/sharded_coordinator.go also need the label="" update; they are the only non-test caller of Sampler.Observe. - 7 PR-C (minor): rollout discipline - raise --keyvizMaxTrackedRoutes before deploying the PR-C binary so startup doesn't flood logs with re-registration warnings on an already-misconfigured node.

bootjp · 2026-04-28T00:27:58Z

@claude review

Round-3 follow-ups (ae43bfe):

Moderate:

§4.1.1 / virtualForRoute key — pinned as map[uint64]*routeSlot (NOT widened). Added pseudocode of the new Observe lookup chain so an implementor can't accidentally widen the fallback map and silently drop all coarsened-route traffic.
§5 / KeyVizRow JSON wire form — added the explicit diff alongside MatrixRow and the proto. Picked Option B (dedicated label field) — bucket_id stays the globally-unique row identifier for the merge path, label serves the SPA's render-and-filter ergonomics.

Minor:

§9 Q2 / AllLabels — named the exported slice that RegisterRoute iterates. Deterministic order, small test surface.
§7 PR-B / coordinator call sites — explicit that kv/sharded_coordinator.go (the only non-test caller of Sampler.Observe) also needs the label="" update in PR-B.
§7 PR-C / rollout discipline — raise --keyvizMaxTrackedRoutes before binary deploy, otherwise startup floods logs with re-registration warnings on a misconfigured node.

claude · 2026-04-28T00:28:12Z

Claude finished @bootjp's task in 3m 22s —— View job

Review (Round 4)

Gather context
Read updated design document (ae43bfe)
Check KeyViz implementation for remaining gaps
Post comprehensive review

All five Round-3 items are correctly resolved. The virtualForRoute key-type pin and the KeyVizRow diff (Option B) are both verified against the actual code at keyviz_handler.go:312 and keyviz_fanout.go:485. Two new moderate items surfaced while tracing the full slot lifecycle through sampler.go and keyviz_fanout.go; two minor items complete the picture.

Moderate — `RemoveRoute` symmetric teardown is unspecified

§4.1.1 specifies that RegisterRoute pre-creates all N+1 labeled slots (N labels + 1 legacy). The symmetric teardown in RemoveRoute is nowhere in the design.

RemoveRoute (keyviz/sampler.go:532) currently does:

delete(next.slots, routeID)  // single uint64 key
s.retiredSlots = append(s.retiredSlots, retiredSlot{slot: individual, retiredAt: retiredAt})

After the key widens to slotKey{uint64, string}:

delete(next.slots, routeID) is a compile error — type mismatch. The code won't build.
Even if the delete is fixed to a loop, RemoveRoute must retire all N+1 labeled slots — not just the legacy one. If it only retires slotKey{routeID, ""}, the (routeID, "dynamo") / (routeID, "redis") / … slots remain in the live routeTable receiving Observe traffic forever, even after the route is logically removed.

Symmetrically, reclaimRetiredSlot must be called once per label inside RegisterRoute's pre-creation loop — the §4.1.1 (RouteID, Label) dedupe rule is only meaningful if RemoveRoute retires per-label and RegisterRoute reclaims per-label. The design implies this but never states it.

The design needs a sentence in §4.1.1 (or a §4.1.2 subsection): "When RemoveRoute(routeID) retires an individual route, it must delete and retire all slotKey{routeID, label} entries — one for each member of AllLabels plus slotKey{routeID, ""} for the legacy slot. RegisterRoute's reclaim loop calls reclaimRetiredSlot(routeID, label) once per slot in the same set." Without this, a PR-B implementor gets a compile error, reaches for the minimum fix (delete one slot), and ships the orphaned-slot bug silently.

Fix this →

Moderate — `mergeRowInto` does not copy `Label`; merged rows lose the field

PR-D+E adds Label string \json:"label,omitempty"`toKeyVizRow. The mergeRowInto function (keyviz_fanout.go:509) constructs new KeyVizRow` entries from scratch:

dst = &KeyVizRow{
    BucketID:  row.BucketID,
    Start:     …,
    End:       …,
    Aggregate: row.Aggregate,
    RouteIDs:  …,
    …
    // Label is absent
}

If Label is not copied here, every merged row emitted by the fan-out aggregator will have Label = "", even though the per-node matrices carry the correct label. The SPA would render route:1 instead of route:1 / dynamo for all cluster-wide responses, silently losing the per-adapter breakdown in the most important multi-node scenario.

The §6 claim that "merge key width does not change" is correct — bucketID is composite, so rowsByBucket already separates rows correctly. But rowsByBucket keying and Label field population are two independent steps; the former is handled automatically, the latter requires an explicit copy. The PR-D+E scope must add: "update mergeRowInto to copy Label from the source row."

The SPA can fall back to parsing bucket_id for route:1:dynamo, but the entire motivation for Option B (dedicated label field) was to avoid client-side parsing. Missing this step in mergeRowInto defeats that design choice specifically for the cluster path.

Fix this →

Minor — `RegisterRoute` idempotency check will not compile after key widening

The idempotency guard at keyviz/sampler.go:407:

if _, ok := cur.slots[routeID]; ok {
    return true
}

is a map[uint64]*routeSlot lookup. After PR-B widens slots to map[slotKey]*routeSlot, this is a type mismatch — the code will not compile with a plain uint64 key. The check needs to become cur.slots[slotKey{routeID, ""}] (presence of the legacy slot implies all labeled slots were already created by the same RegisterRoute call). A one-line note in §4.1.1 closes the ambiguity and prevents a PR-B implementor from picking an incorrect fix under compile pressure.

Minor — Pseudocode comment "labelled buckets fall through here" is misleading

In the §4.1.1 pseudocode:

slot, ok = tbl.virtualForRoute[routeID] // labelled buckets fall through here

"Labelled buckets" suggests the fallback is specifically for labeled traffic, but the fallback is for coarsened routes — routes that exceeded MaxTrackedRoutes and were folded into a virtual bucket. Any label (including empty) for a coarsened route falls through here. A clearer comment: // coarsened route: label irrelevant, aggregate all traffic. Avoids the implication that a virtual bucket is itself a "labelled" entity.

What looks good

All five Round-3 items verified against source. virtualForRoute stays map[uint64]*routeSlot — confirmed. KeyVizRow.Label as a dedicated field (Option B) — correctly designed against the fact that mergeRowInto (keyviz_fanout.go:518) keys on BucketID, so composite bucket_id gives correct separation for free; the only missing step is the Label copy (above).
AllLabels []string with deterministic order in §9 Q2 is the right surface for the pre-creation loop in RegisterRoute and the test surface.
The slog.Warn at coarsening time vs. startup refusal is confirmed correct against RegisterRoute:400 — there's no startup-time route count to check; the dynamic re-registration path is exactly where the warning belongs.
§8 lens-5 test cases are well-targeted: the slot-count assertion and the reclaimRetiredSlot (RouteID, Label) dedupe test are exactly the regression guards PR-B needs.
§6.5 virtual-bucket Label="" deferral is clean and consistent with the virtualForRoute fallback design.

The two moderate items — RemoveRoute symmetric teardown and mergeRowInto Label copy — are the remaining blockers before PR-B and PR-D+E respectively are well-specified. The minor items can be addressed in the PR-B doc commit. Design is close to ready.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

docs/design/2026_04_28_proposed_keyviz_adapter_labels.md (1)
301-307: Tighten the backward-compatibility claim for “old SPA parses bucket_id”.

You say an old SPA against a new server “falls back to legacy formatting” and should be non-breaking. But this depends on the old SPA’s bucket_id parsing being tolerant of extra segments (route:1:dynamo). Please either:

specify the parsing behavior the SPA already has (e.g., “splits into at most 2 parts” / “ignores suffix after route id”), or

explicitly call out that the SPA parser must be updated to be suffix-tolerant.
♻️ Proposed fix (clarify parser expectation)
 Forward compatibility: an old SPA against a new server sees a
 new `bucket_id` shape (`route:1:dynamo`) and renders it
- literally — operators get a less-pretty label but no
- correctness bug.
+ literally — operators get a less-pretty label but no
+ correctness bug, assuming the SPA’s `bucket_id` parser is
+ suffix-tolerant (it must not treat `route:1:dynamo` as a parse
+ failure or collapse it incorrectly).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/2026_04_28_proposed_keyviz_adapter_labels.md` around lines 301 -
307, The backward-compatibility claim must be tightened: update the text around
the bucket_id compatibility paragraph to either (A) explicitly state the
existing SPA parser behavior (e.g., that the SPA splits bucket_id on ':' into at
most two parts or that it ignores any suffix after the route id) and give a
concrete example like "route:1:dynamo -> route:1", or (B) state that the SPA
parser must be updated to be suffix-tolerant (modify the SPA parsing logic to
ignore additional segments after the route id) and add a short compatibility
note for operators; refer to the bucket_id field and the SPA parser behavior in
the wording so readers know which parser needs to be tolerant or changed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/2026_04_28_proposed_keyviz_adapter_labels.md`:
- Around line 346-350: Choose one spelling ("labelled" or "labeled") and make it
consistent throughout this design doc; specifically change the second occurrence
flagged (in the paragraph mentioning pivotKeyVizColumns) to match your chosen
variant and sweep the file to unify all instances including mentions of
pivotKeyVizColumns, keyviz/labels.go, ShardedCoordinator.Observe, RegisterRoute,
reclaimRetiredSlot, MatrixRow.Label, and routeSlot.Label so the entire document
uses the same spelling.
- Around line 270-296: Add an explicit constraint that canonical labels must not
contain ':' and enforce it where labels are defined and used: update the
documentation near the bucket_id format (the "route:<id>:<label>" paragraph) to
state "labels must not include ':'", and add runtime validation at the
label/constant definition and any constructors that build or parse bucket_id
(places touching bucket_id, the separator constant, and codepaths used by
pivotKeyVizColumns, applyKeyVizRowBudget, sortKeyVizRowsByStart) to reject or
sanitize labels containing ':' so the split-on-':' parsing remains unambiguous.
- Around line 270-285: The document mixes “labelled” and “labeled”; pick one and
make it consistent (switch to “labeled” as suggested) by replacing all
occurrences including headings, inline text, examples like `Label`/`bucket_id`
descriptions, and any references in prose to `pivotKeyVizColumns`, `RouteID`,
`applyKeyVizRowBudget`, `sortKeyVizRowsByStart`, and `BucketID` so the
terminology is uniform across the file.
- Around line 76-86: The fenced code block in the example "Today (single group,
no labels): / After (adapters tag their traffic):" is missing a language tag
which triggers MD040; update the opening fence from ``` to ```text (or your
preferred lint language such as ```plain or ```mermaid) so the block becomes
```text and leave the closing fence as ```; this change in
docs/design/2026_04_28_proposed_keyviz_adapter_labels.md will satisfy the
markdownlint rule.
- Around line 245-258: The proto KeyVizRow currently already defines "string
label = 4", so adding "string label = 13" creates a duplicate; update the
proposal and code so there's a single canonical label field: decide whether to
deprecate/rename/remove field 4 or to replace it with field 13, then modify
proto/admin.proto's KeyVizRow to only include the chosen field number and name
and update the corresponding internal/admin/keyviz_handler.go KeyVizRow JSON
struct (Label field and tags) to match that single proto field; mention the
deprecation choice (if keeping 4) or show the removal/renumbering plan so the
design and code remain consistent.

---

Nitpick comments:
In `@docs/design/2026_04_28_proposed_keyviz_adapter_labels.md`:
- Around line 301-307: The backward-compatibility claim must be tightened:
update the text around the bucket_id compatibility paragraph to either (A)
explicitly state the existing SPA parser behavior (e.g., that the SPA splits
bucket_id on ':' into at most two parts or that it ignores any suffix after the
route id) and give a concrete example like "route:1:dynamo -> route:1", or (B)
state that the SPA parser must be updated to be suffix-tolerant (modify the SPA
parsing logic to ignore additional segments after the route id) and add a short
compatibility note for operators; refer to the bucket_id field and the SPA
parser behavior in the wording so readers know which parser needs to be tolerant
or changed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3929603-6a50-49cc-96c8-9ceb5358d4d6

📥 Commits

Reviewing files that changed from the base of the PR and between 08b2fb4 and ae43bfe.

📒 Files selected for processing (1)

docs/design/2026_04_28_proposed_keyviz_adapter_labels.md

Eight items from round-4 review: CodeRabbit critical: - 5 proto field collision: proto/admin.proto KeyVizRow already declares string label = 4 (currently unused). The earlier draft proposed adding string label = 13, which would have collided. Reuse the existing field 4; no schema migration needed. - 5 hard constraint: canonical labels MUST NOT contain ':'. Without this, "redis:db0" silently breaks bucket_id parsing. Pinned by a TestAllLabelsAvoidSeparator regression guard listed in 8 lens-5. Claude bot round-4 moderate: - 4.1.1 RemoveRoute symmetric teardown: when slot key widens, delete(slots, routeID) is a compile error; PR-B must loop over AllLabels + {""} and retire all N+1 slots. Otherwise labeled siblings leak into the live table forever. - 7 PR-D+E mergeRowInto: mergeKeyVizMatrices keys on composite bucket_id (correct) but mergeRowInto constructs new KeyVizRow entries from scratch and does NOT copy Label. Without an explicit dst.Label = row.Label, every cluster-merged row emits Label="", defeating Option B's dedicated-field design. Claude bot round-4 minor: - 4.1.1 RegisterRoute idempotency: cur.slots[routeID] becomes a type mismatch after key widening; document the legacy-slot presence check that PR-B must use. - 4.1.1 pseudocode comment: "labeled buckets fall through here" is misleading since the fallback is for coarsened routes, not labeled traffic. Fixed to "coarsened route: label irrelevant, aggregate all traffic". CodeRabbit minor: - 3 code fence language: tagged the surface example as `text` (MD040). - Spelling: normalized "labelled" -> "labeled" / "unlabelled" -> "unlabeled" across the doc (LanguageTool flagged the inconsistency). Plus 2 new test cases in 8 lens-5: RemoveRoute symmetric teardown, mergeRowInto Label copy, and the labels-avoid-':' constants test.

bootjp · 2026-04-28T14:51:53Z

@claude review

Round-4 follow-ups (2750f4c) — 8 items addressed:

CodeRabbit critical:

§5 proto field collision — proto/admin.proto already declares string label = 4 on KeyVizRow (currently unused). Earlier draft proposing = 13 would have collided. Reuse field 4; no schema migration. Updated the proto diff to make this explicit.
§5 labels MUST NOT contain : — added the hard constraint with a TestAllLabelsAvoidSeparator regression guard listed in §8 lens-5. Without it, "redis:db0" would silently break bucket_id parsing.

Claude bot moderate:

§4.1.1 RemoveRoute symmetric teardown — when slot key widens, delete(slots, routeID) is a compile error. PR-B must loop over AllLabels + {""} and retire all N+1 slots. Otherwise labeled siblings leak into the live table forever. Pseudocode added.
§7 PR-D+E mergeRowInto Label copy — mergeKeyVizMatrices keys on composite bucket_id (correct) but mergeRowInto constructs new KeyVizRow entries from scratch and doesn't copy Label. Without an explicit dst.Label = row.Label, every cluster-merged row emits Label="", defeating Option B. PR-D+E scope updated.

Claude bot minor:

§4.1.1 RegisterRoute idempotency — documented the legacy-slot presence check (cur.slots[slotKey{RouteID: routeID}]) that PR-B must use; the old cur.slots[routeID] becomes a type mismatch after key widening.
§4.1.1 pseudocode comment — "labeled buckets fall through here" → "coarsened route: label irrelevant, aggregate all traffic". Avoids implying that virtual buckets are themselves "labeled".

CodeRabbit minor:

§3 code fence language — tagged the surface example as text (MD040).
Spelling — normalized labelled → labeled / unlabelled → unlabeled consistently throughout the doc.

Test plan additions (§8 lens-5): RemoveRoute symmetric teardown, mergeRowInto Label copy, and the AllLabels avoids-: constant test.

bootjp · 2026-04-28T14:54:21Z

@claude review

Round-5 status check. The merge from main brought CI/workflow updates onto the branch (HEAD now at 9e1dcf17); the design-doc content remains as of round-4 (2750f4c2).

The CodeRabbit inline comments visible at lines 339 / 360 / 414 are stale (posted against 2750f4c2, line numbers shifted after merge):

Line 360 / labels MUST NOT contain : — already in the doc at line 350 ("Hard constraint on canonical labels: a label MUST NOT contain :") with the TestAllLabelsAvoidSeparator regression guard listed in §8 lens-5.
Lines 339 / 414 / spelling — grep -nE 'labelled|unlabelled' returns 0 occurrences in the current file. All normalized to labeled / unlabeled.
Gemini line 108 / lockless invariant — addressed in round-1 by §4.1.1 (slot pre-creation, alternatives explicitly rejected).

Asking for a fresh round on the post-merge HEAD.

claude · 2026-04-28T14:54:54Z

Claude finished @bootjp's task in 3m 58s —— View job

Review (Round 5)

Gather context
Read updated design document (post-merge HEAD 9e1dcf1)
Check KeyViz implementation for remaining gaps
Post comprehensive review

Stale CodeRabbit comments confirmed resolved (verified against current source):

: constraint: present at line 350, with TestAllLabelsAvoidSeparator guard in §8 lens-5 ✓
Lockless invariant: fully addressed in §4.1.1 ✓
Proto field collision: resolved by reusing existing field 4 ✓

One spelling instance was missed, and a moderate gap surfaced in newKeyVizRowFrom. Details below.

Moderate — `newKeyVizRowFrom` also constructs `KeyVizRow` from scratch but is not called out in §7 PR-D+E

§7 PR-D+E correctly identifies that mergeRowInto (keyviz_fanout.go:520) constructs KeyVizRow without copying Label. But the per-node handler has the same problem: newKeyVizRowFrom (keyviz_handler.go:368) also builds the struct from scratch:

row := &KeyVizRow{
    BucketID:  bucketIDFor(mr),
    Start:     ...,
    End:       ...,
    Aggregate: mr.Aggregate,
    RouteCount: total,
    ...
    // Label absent
}

If PR-D+E only adds dst.Label = row.Label in mergeRowInto but forgets to add row.Label = mr.Label in newKeyVizRowFrom, per-node responses — before they ever reach the fan-out — will carry Label = "" for every row. A single-node deployment would silently lose all labels even after PR-D+E ships. The aggregator fix is a second-level concern; this is the first-level one.

The PR-D+E scope bullet should read: "update both newKeyVizRowFrom (keyviz_handler.go:368) and mergeRowInto (keyviz_fanout.go:520) to copy Label — newKeyVizRowFrom populates per-node rows, mergeRowInto populates cluster-merged rows."

Fix this →

Minor — One "Labelled" (double-l) survives at line 328

grep -nE 'labelled|unlabelled' returns 0 hits (the lowercase variants are fully normalized), but grep -nE 'Labelled' finds one survivor:

328: - Labelled: `bucket_id = "route:<id>:<label>"` (e.g.

All other occurrences use Labeled. One-character fix to close the CodeRabbit spelling note completely.

Fix this →

Minor — Unlisted-label drop behavior is undocumented

After key widening, an Observe call with a label not in AllLabels — e.g., a future adapter passing "DynamoDB" instead of "dynamo" — will miss tbl.slots[slotKey{routeID, label}] and then miss tbl.virtualForRoute[routeID] (since the route is individually tracked), and silently drop the call. Today, the only way an Observe can be dropped is if the routeID is unregistered; after PR-B, a known route can also drop traffic if the label isn't in AllLabels.

This is the correct behavior (it mirrors how unknown routes behave and enforces that all labels are declared centrally), but is worth a one-line note in §4.1.1 so it's intentional rather than accidental: "An Observe call carrying a label not in AllLabels misses both lookups and is dropped — the same behavior as an unregistered route. This enforces that all label constants are declared in keyviz/labels.go before use."

Minor — Forward-compat SPA parser tolerance still unspecified

The CodeRabbit nitpick from round-4 was marked ✅ in the commit, but the current text at lines 365-371 still says "renders it literally — operators get a less-pretty label but no correctness bug" without stating whether the current SPA parser handles route:1:dynamo correctly or needs to be updated. One clarifying phrase closes this: either "the SPA renders bucket_id as a raw string without splitting, so route:1:dynamo is shown verbatim" (if that's true of the current SPA), or "the SPA parser must be updated to be suffix-tolerant" (if not).

What looks good

All of Round-4's items are correctly in the document and verified against source:

virtualForRoute stays map[uint64]*routeSlot — correctly pinned with pseudocode showing the two-stage lookup
RemoveRoute symmetric teardown — pseudocode with allLabelsWithLegacy() loop, orphaned-slot consequence spelled out, lens-5 test added
mergeRowInto Label copy — called out in §7 PR-D+E
AllLabels []string named in §9 Q2, coordinator call sites added to PR-B scope
PR-C rollout discipline note (raise cap before deploy) and dynamic-catalog slog.Warn semantics are both correct
registerRoute idempotency check (legacy slot presence) documented with the correct slotKey{RouteID: routeID} form
§8 lens-5 now has five targeted test cases that together pin every tricky PR-B/D+E behavior

The newKeyVizRowFrom gap is the only blocker before PR-B is well-specified. The spelling fix and two minor documentation gaps can be addressed inline on the PR-B doc commit.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e1dcf176d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-28T14:54:58Z

+making `RegisterRoute` **pre-create one slot per known label** at
+registration time, but only on the individual-tracking path. The
+label set is the canonical `keyviz/labels.go` constants (§9 Q2);
+register-time slot count is `len(routes) × len(labels) + 1` (the
+1 is the empty-label legacy slot, kept so callers that pass


Avoid shrinking route budget before labels are enabled

The proposal pre-creates one slot per known label for every registered route, which makes MaxTrackedRoutes effectively count (labels+legacy) slots rather than routes. In PR-B, Sampler.Observe callers are explicitly still passing label = "" (so labels are not yet active), but this pre-allocation would still force earlier virtual-bucket coarsening and loss of per-route detail in existing deployments, contradicting the stated legacy-preserving rollout. This should be gated so unlabeled PR-B behavior keeps current route capacity.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-28T14:54:58Z

+|---|---|
+| **PR-A** | Land this design doc. |
+| **PR-B** | Sampler API extension: `Observe(... label string)`, `routeSlot.Label`, `RegisterRoute` pre-creates one slot per known label (§4.1.1). `MatrixRow.Label`. `reclaimRetiredSlot` dedupes on `(RouteID, Label)`. Update existing tests **and the coordinator call sites in `kv/sharded_coordinator.go`** (the only non-test caller of `Sampler.Observe`) to pass `label = ""` — until PR-C lands the legacy unlabeled behaviour is preserved. |
+| **PR-C** | Adapter wiring: each adapter sets its own label at the dispatch entry into `ShardedCoordinator.Observe…`. Canonical constants live in `keyviz/labels.go` (NOT `adapter/keyviz_labels.go` — sampler-side imports must not climb back to the adapter package; sampler's `RegisterRoute` reads the canonical set at registration time). **Operator note**: with N labels, slot count grows from M routes to ~N×M; raise `--keyvizMaxTrackedRoutes` proportionally or hot routes will fold into the virtual bucket and lose per-label breakdown. PR-C emits a structured `slog.Warn` from inside `RegisterRoute` at the moment a route is coarsened — `slog.Warn("route folded into virtual bucket", route_id=…, slots_used=…, max_tracked_routes=…)` — so the operator sees the symptom in the log when it actually fires. A startup-time check would not catch this because routes are added dynamically (the catalog watcher calls `RegisterRoute` whenever a `SplitRange` lands), so a process that started below the cap can cross it later. **Rollout discipline**: raise `--keyvizMaxTrackedRoutes` to at least `current_routes × len(labels)` *before* deploying the PR-C binary, otherwise an already-misconfigured node prints a wave of warnings on startup as the watcher re-registers every catalog route — loud but not incorrect, only painful for log retention. |


Size max-tracked-routes using labels plus legacy slot

The rollout guidance underestimates required capacity by using current_routes × len(labels). This document also keeps an empty-label legacy slot per route, so required individual slots are current_routes × (len(labels)+1). Operators following the current formula can still hit coarsening immediately and lose labeled breakdown despite applying the documented migration step.

Useful? React with 👍 / 👎.

claude · 2026-04-28T14:55:27Z

Claude finished @bootjp's task in 5m 42s —— View job

Review (Round 5)

Gather context
Read updated design document (2750f4c)
Check KeyViz implementation for remaining gaps
Post comprehensive review

All eight Round-4 items are correctly resolved. Four new items surfaced while tracing the full slot lifecycle through keyviz_handler.go and mapping the AllLabels timing to the PR plan. Three are moderate; one is minor.

Moderate — `newKeyVizRowFrom` and `bucketIDFor` are both missing from the PR-D+E scope

The design calls out that mergeRowInto (keyviz_fanout.go:509) must copy Label explicitly. The same gap exists in the single-node path, and the design never mentions it:

newKeyVizRowFrom (keyviz_handler.go:368) constructs KeyVizRow from MatrixRow. The Label field will exist on MatrixRow after PR-B but newKeyVizRowFrom does not copy it to KeyVizRow.Label. Single-node heatmap responses will emit Label="" for every row, exactly like the mergeRowInto regression for the fan-out path.
bucketIDFor (keyviz_handler.go:383) produces "route:" + routeID. It does not consult mr.Label. After PR-D+E, when MatrixRow.Label = "dynamo", bucketIDFor must return "route:1:dynamo". Without this change BucketID is non-unique across labeled rows — which is the invariant applyKeyVizRowBudget, sortKeyVizRowsByStart, and mergeKeyVizMatrices all depend on.
The pivotKeyVizColumns map + order slice (keyviz_handler.go:312–321) are map[uint64]*KeyVizRow and []uint64, respectively. Widening to (uint64, string) as the design says — or equivalently keying by the composite BucketID string — must cover both the map and the order slice. An implementor who widens only the map and forgets order will get a compile error; worth naming explicitly.

The PR-D+E scope currently reads: "update mergeRowInto (keyviz_fanout.go:509) … dst.Label = row.Label." It should also add: "update newKeyVizRowFrom to copy mr.Label → row.Label, update bucketIDFor to return the composite "route:<id>:<label>" format when Label is non-empty, and widen pivotKeyVizColumns's rowsByID map and order slice from uint64 to (uint64, string) (or equivalently to the composite BucketID string, consistent with rowsByBucket in the fan-out path)."

Fix this →

Moderate — `AllLabels` population timing is unspecified; slot-count jump happens at wrong PR

The design says PR-B "pre-creates one slot per known label" and PR-C "Canonical constants live in keyviz/labels.go." This creates an ambiguity: which PR populates AllLabels with the five adapter constants?

If PR-B populates AllLabels (i.e., adds LabelDynamo, LabelRedis, etc. alongside the mechanism), then the slot count jumps from 1 to 6 per route at PR-B rollout — before any adapter is wired. Operators deploying PR-B would need to raise --keyvizMaxTrackedRoutes immediately, and the slog.Warn coarsening path fires at PR-B, not PR-C. The operator note placed at §7 PR-C would be in the wrong section. (This is the unresolved Codex P1 concern from the round-4 review.)
If PR-B introduces keyviz/labels.go with an empty AllLabels slice (constants added in PR-C), then PR-B has zero capacity impact — the pre-creation loop is a no-op, RegisterRoute creates exactly one slot per route as today, and the operator note at PR-C is correctly placed.

The second interpretation is almost certainly the intent, but the design never states it. An implementor who reads "Canonical constants live in keyviz/labels.go" under PR-C might correctly defer adding the constants; or might add them in PR-B "since that's where the file is created." One sentence in §7 PR-B closes the ambiguity: "PR-B creates keyviz/labels.go with an empty AllLabels slice and the struct definitions — the five adapter constants (LabelDynamo, LabelRedis, …) are added in PR-C alongside the adapter wiring."

Fix this →

Moderate — `MaxTrackedRoutes` coarsening check counts slots after key widening, not routes

RegisterRoute at keyviz/sampler.go:416:

if len(next.slots) < s.opts.MaxTrackedRoutes {

Today next.slots is map[uint64]*routeSlot — one entry per route. After PR-B widens it to map[slotKey]*routeSlot, len(next.slots) counts labeled slots, not routes: 6 entries per route (5 labels + 1 legacy). An operator who set --keyvizMaxTrackedRoutes=1024 expecting 1024 individually-tracked routes will instead see coarsening after 1024/6 ≈ 170 routes. Their existing setting now means roughly 6× fewer routes than before without any config change.

The design needs to pick one of two approaches and state it explicitly:

Option A (recommended for backward compat): The check counts routes, not slots. Change it to len(next.slots) / (len(keyviz.AllLabels)+1) < s.opts.MaxTrackedRoutes (or count distinct RouteIDs). MaxTrackedRoutes retains its "maximum individually-tracked routes" semantics. Total slot memory is MaxTrackedRoutes × (len(AllLabels)+1).
Option B (semantics change): MaxTrackedRoutes is redefined as "maximum labeled slots." Operators must multiply their existing setting by len(AllLabels)+1 on PR-C rollout. The documentation note must state this explicitly.

Without choosing, the PR-C operator guidance is incomplete: operators cannot correctly compute the new cap value without knowing which thing is being counted.

Fix this →

Minor — Codex P2 (unresolved): operator formula underestimates by one

§7 PR-C says: "raise --keyvizMaxTrackedRoutes to at least current_routes × len(labels)."

The design keeps one legacy empty-label slot per route alongside the N labeled slots. The actual slot count per route is len(AllLabels) + 1 (5 labels + 1 legacy = 6). The formula must be current_routes × (len(AllLabels) + 1). At 5 labels this is 20% more than the current formula suggests — enough for an operator following the guide to still hit coarsening immediately on rollout.

(This is the Codex P2 concern from round-4; it was not addressed in 2750f4c.)

What looks good

All round-4 items verified correct against source: proto field 4 reuse (confirmed: string label = 4; at proto/admin.proto:95), TestAllLabelsAvoidSeparator in §8 lens-5, RemoveRoute symmetric teardown pseudocode with allLabelsWithLegacy(), mergeRowInto Label copy callout, RegisterRoute idempotency check (slotKey{RouteID: routeID} zero-value form), pseudocode comment fix ("coarsened route: label irrelevant, aggregate all traffic"), text code fence, and spelling normalized throughout.
The keyviz.AllLabels slice as the single stable iteration surface for RegisterRoute's pre-creation loop is confirmed correct — with an empty AllLabels in PR-B, the loop is a no-op and the existing single slot per route is created as before.
§8 lens-5 test cases are well-targeted; the RemoveRoute N+1 teardown and reclaimRetiredSlot (RouteID, Label) dedupe tests are the right regression guards for PR-B.
The mergeRowInto Label copy is called out for the fan-out path. The three missing single-node-path counterparts (newKeyVizRowFrom, bucketIDFor, pivotKeyVizColumns) are the only remaining blocker for PR-D+E correctness.

The newKeyVizRowFrom / bucketIDFor / MaxTrackedRoutes items need to be addressed before PR-B review (the MaxTrackedRoutes check is PR-B scope); the AllLabels timing and operator formula can also be fixed in the same doc update. Design is close to PR-A merge; these four items are the remaining delta.

Two items from Codex round-5 review: - P1 / line 125 / pre-allocation regression: the earlier draft said RegisterRoute pre-creates one slot per label at PR-B. That would shrink the effective route budget from M to ~M / (N+1) even though no adapter is yet calling with non-empty labels - PR-B is supposed to be a behavior-neutral refactor. Resolution: split the pre-allocation across PR-B and PR-C. PR-B only creates the legacy empty-label slot (per-route slot count unchanged at 1); PR-C extends RegisterRoute to pre-create the labeled siblings. PR-B is now strictly route-count-neutral on existing deployments. Pseudocode added showing both bodies. - P2 / line 412 / rollout formula off by one per route: rollout guidance said current_routes x len(labels), but operators must size for the legacy empty-label slot too: current_routes x (len(labels) + 1). Operators sizing only by len(labels) hit coarsening immediately and lose labeled breakdown anyway. Updated the rollout note to use the +1 form. Plus 7 PR-B / PR-C scope rows updated to reflect the split pre-allocation strategy so PR-B implementors do not accidentally implement the full pre-creation loop and ship a regression.

bootjp · 2026-04-28T14:59:36Z

@claude review

Round-5 follow-ups (3ae3417) addressing Codex P1 + P2:

P1 / pre-allocation regression — the earlier draft would have shrunk the effective route budget from M to ~M/(N+1) at PR-B even though no adapter yet calls with non-empty labels. PR-B is supposed to be a behavior-neutral refactor. Resolution: split the pre-allocation:
- PR-B: only the legacy empty-label slot (per-route slot count unchanged at 1).
- PR-C: extends RegisterRoute to pre-create the labeled siblings.
  Added pseudocode showing both bodies. PR-B is now strictly route-count-neutral on existing deployments.
P2 / rollout formula off by one — rollout guidance was current_routes × len(labels). Should be current_routes × (len(labels) + 1) to account for the legacy empty-label slot. Operators sizing only by × len(labels) hit coarsening immediately and lose labeled breakdown anyway.

§7 PR-B / PR-C scope rows updated to match the split, so a PR-B implementor doesn't accidentally land the full pre-creation loop and ship a regression. Per-route slot count documented at all 3 stages: 1 (today / PR-B), then len(labels) + 1 (PR-C onwards).

claude · 2026-04-28T14:59:56Z

Claude finished @bootjp's task in 4m 23s —— View job

Review (Round 6)

Gather context
Read updated design document (3ae3417)
Check KeyViz implementation for remaining gaps
Post comprehensive review

Both Codex P1 and P2 are correctly resolved. The pre-allocation split (PR-B creates only the legacy slot; PR-C creates labeled siblings) is coherent, and the rollout formula current_routes × (len(labels) + 1) is accurate. Two issues surface from the P1 split itself, plus one carry-over from Round 5 that the 3ae3417 commit did not address.

Moderate — `newKeyVizRowFrom` missing `Label` copy still absent from PR-D+E scope (carry-over from Round 5)

Round 5 found that newKeyVizRowFrom (keyviz_handler.go:368) constructs KeyVizRow from scratch without copying Label — the same problem as mergeRowInto, but affecting single-node responses before the fan-out is ever involved. The current §7 PR-D+E scope only names mergeRowInto:

"PR-D+E must add the explicit dst.Label = row.Label copy…"

newKeyVizRowFrom is the first-level fix (per-node JSON handler); mergeRowInto is the second-level fix (fan-out path). If newKeyVizRowFrom is missed, single-node deployments silently emit Label="" for every row even after PR-D+E ships. The PR-D+E scope bullet should read:

"update both newKeyVizRowFrom (keyviz_handler.go:368) — per-node handler, affects single-node and cluster deployments — and mergeRowInto (keyviz_fanout.go:509) — fan-out aggregator, affects cluster-only deployments — to copy Label from the source row."

Fix this →

Moderate — `RemoveRoute` teardown uses `allLabelsWithLegacy()` in the "PR-B fix" but `AllLabels` doesn't exist until PR-C

§4.1.1 says:

"The PR-B fix loops over AllLabels plus the empty legacy label and retires N+1 slots in one pass"

and shows pseudocode calling allLabelsWithLegacy() (which ranges over AllLabels). But §7 places keyviz/labels.go in PR-C scope: "Canonical constants live in keyviz/labels.go (NOT adapter/keyviz_labels.go)". AllLabels does not exist in PR-B; allLabelsWithLegacy() would fail to compile.

The cleanest resolution is to add keyviz/labels.go creation to PR-B scope (define the constants before any adapter uses them — purely infrastructure, not adapter coupling). The PR-B RemoveRoute loop then compiles correctly, the slot != nil guard makes it safe even though PR-B only creates the legacy slot, and PR-C just wires the adapters to those already-existing constants.

One sentence addition to §7 PR-B: "Create keyviz/labels.go with the full canonical constant set (LabelDynamo, …, AllLabels []string) — adapters don't use them until PR-C, but RemoveRoute's defensive teardown loop references AllLabels at compile time." One sentence update to §7 PR-C: "Adapters wire the constants from the keyviz/labels.go created in PR-B."

Fix this →

Minor — §8 lens-5 test descriptions still say "PR-B regression" after the P1 split

Two test descriptions are now stale:

"Slot pre-creation count": "Catches a PR-B regression that skips a label constant in the pre-creation loop." After P1, PR-B has no labeled pre-creation loop. The assertion len(tbl.slots) == len(labels) + 1 only holds post-PR-C. This should say "PR-C regression".
"RemoveRoute symmetric teardown": "Catches a PR-B regression where only the legacy slot is retired and the labeled siblings leak." After P1, PR-B only ever creates 1 slot (the legacy slot), so the first PR-B test should assert len(retiredSlots) == 1, not len(labels) + 1. The len(labels) + 1 assertion is the post-PR-C regression guard. Both descriptions should say "PR-C regression" and clarify which phase the len(labels) + 1 assertion applies to.

Minor — "Labelled" (double-l) at §5 line 359 still present (carry-over from Round 5)

One survivor from the Round-5 spelling sweep:

- Labelled: `bucket_id = "route:<id>:<label>"` (e.g.

All other instances use "Labeled". One-character fix.

Fix this →

What looks good

The P1 split (PR-B = behavior-neutral refactor, PR-C = labeled-sibling creation) is correctly described in the RegisterRoute pseudocode and in the §7 table. The comment "(Codex P1 on round-5.)" makes the motivation traceable.
The P2 correction × (len(labels) + 1) is now in both the §7 PR-C rollout note and the §4.1.1 slot-count formula, consistently.
The RemoveRoute loop pseudocode itself is correct (the slot != nil guard makes it safe even when labeled siblings don't exist), regardless of the AllLabels sourcing issue.
virtualForRoute stays map[uint64]*routeSlot confirmed correct against sampler.go:219.
The proto field-4 reuse (no duplicate label field) is verified against proto/admin.proto — string label = 4 exists unused; no conflict.
The mergeRowInto keyed-on-BucketID separation is confirmed correct against keyviz_fanout.go:518.

The two moderate items (carry-over newKeyVizRowFrom gap and new keyviz/labels.go scope ambiguity) are the remaining blockers before PR-B is fully well-specified. The minor items can be folded into the PR-B doc commit.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ae3417d6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-28T15:02:29Z

+     empty-label slot). Catches a PR-B regression that skips a
+     label constant in the pre-creation loop — without this the
+     missing slot only surfaces when that adapter fires traffic.


Gate pre-creation count test to PR-C, not PR-B

Section 7 defines PR-B as a behavior-neutral refactor that pre-creates only the legacy empty-label slot, but this test bullet says a PR-B regression is failing to create len(labels)+1 slots. Following this guidance during PR-B would force labeled-slot preallocation early and can reintroduce route-capacity/coarsening regressions before adapters emit labels. The assertion should be moved to PR-C (or explicitly phase-gated), with PR-B expecting a single legacy slot.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-28T15:02:29Z

+This is what makes PR-D and PR-E shippable in either order
+(see §7) — the wire-format change in PR-D *is* the merge-key
+change.


Remove claim that PR-D and PR-E can ship independently

This paragraph says PR-D and PR-E are shippable in either order, but the implementation plan later requires shipping them together because otherwise labeled rows are collapsed by the current pivot keying. Keeping both statements creates contradictory rollout instructions and increases the risk of a partial deployment that silently loses per-label breakdown in responses.

Useful? React with 👍 / 👎.

Three moderate items from Claude bot's round-5 review (the long one that scanned the actual single-node path code): - 7 PR-D+E single-node-path completeness: the previous draft only called out mergeRowInto in keyviz_fanout.go but missed three companion changes in keyviz_handler.go that PR-D+E must make. Documented all four: (a) pivotKeyVizColumns rowsByID map AND order slice widen to composite BucketID string (or equivalent (uint64, string)); (b) newKeyVizRowFrom copies mr.Label -> row.Label; (c) bucketIDFor returns composite "route:<id>:<label>" when Label is non-empty; (d) mergeRowInto adds dst.Label = row.Label. Splitting any subset leaves a window where labeled rows collapse silently. - 4.1.1 MaxTrackedRoutes semantics: pick Option A (count routes, not slots). After key widening, len(next.slots) counts labeled slots, so the coarsening check must divide by len(AllLabels)+1. This preserves operator-visible meaning of the existing flag - 1024 still means 1024 routes, not ~170. Pseudocode added at the actual line site (sampler.go:416). Option B (redefine as slots) explicitly rejected on the behavior-preserving goal in 2.1. - 7 PR-B AllLabels timing made explicit: PR-B creates keyviz/labels.go with the slotKey struct + an EMPTY AllLabels slice; the five adapter constants are added in PR-C alongside adapter wiring. PR-B's pre-creation loop is a no-op against the empty slice, so per-route slot count is identical to today (1). Closes the ambiguity an implementor might hit about "where do the constants land".

bootjp · 2026-04-28T15:04:48Z

@claude review

Round-5 follow-ups (3793366) addressing the three Claude bot moderate items from the long round-5 review on 2750f4c2:

§7 PR-D+E single-node-path completeness — was only calling out mergeRowInto (keyviz_fanout.go:509); now lists all four required changes: (a) pivotKeyVizColumns rowsByID map AND order slice widen to composite BucketID, (b) newKeyVizRowFrom copies mr.Label → row.Label, (c) bucketIDFor returns "route:<id>:<label>" when label non-empty, (d) mergeRowInto adds dst.Label = row.Label. Splitting any subset leaves a window where labeled rows silently collapse.
§4.1.1 MaxTrackedRoutes semantics — picked Option A (count routes, not slots). The coarsening check at sampler.go:416 divides by len(AllLabels)+1 so --keyvizMaxTrackedRoutes=1024 still means 1024 routes after PR-C deploys, not ~170. Pseudocode added at the actual line site. Option B explicitly rejected on the behavior-preserving goal in §2.1.
§7 PR-B AllLabels timing — explicit: PR-B creates keyviz/labels.go with the slotKey struct + an empty AllLabels slice; the five adapter constants are added in PR-C alongside the adapter wiring. PR-B's pre-creation loop is a no-op against the empty slice, so per-route slot count is identical to today (1). Closes the implementor ambiguity.

Round-6 was running concurrently against 3ae3417d — that review may surface additional items on the post-round-5 state; will fold those in next round if any.

claude · 2026-04-28T15:05:14Z

Claude finished @bootjp's task in 3m 30s —— View job

Review (Round 7)

Gather context
Read updated design document (3793366)
Check KeyViz implementation for remaining gaps
Post comprehensive review

All three Round-6 Claude bot moderate items are correctly resolved. The §7 PR-D+E scope now comprehensively lists all four required single-node-path changes (a–d), the MaxTrackedRoutes Option A pseudocode is correct, and the AllLabels timing (empty in PR-B, constants in PR-C) closes the compile-time ambiguity that would have sent implementors in different directions. One moderate contradiction and two minor carry-overs remain.

Moderate — §6 "shippable in either order" contradicts §7 "Ship together"

§6 at line 449–451:

"This is what makes PR-D and PR-E shippable in either order (see §7) — the wire-format change in PR-D is the merge-key change."

§7:

"PR-D+E — Ship together."

These two statements are contradictory. §6 still claims PR-D and PR-E can ship independently (because the composite BucketID encodes the label, so mergeKeyVizMatrices already separates rows correctly). That's true for the fan-out aggregator path but ignores the four single-node-path changes in §7 PR-D+E scope — specifically (a) pivotKeyVizColumns pivot-key widening and (c) bucketIDFor producing the composite ID. Without those two, a single-node deployment still collapses labeled rows at the handler level regardless of what the aggregator does.

§6's claim is correct in the narrow sense that "the aggregator merge key doesn't need a separate change — BucketID is sufficient." The sentence should be scoped to that specific claim: "The merge key width does not change; the new label dimension is encoded into BucketID so mergeKeyVizMatrices already separates same-route different-label rows correctly once bucketIDFor emits composite IDs (PR-D+E)." Drop the "shippable in either order" phrase, which implies PR-D and PR-E are separately deployable — they're not.

This is the Codex P2 item from the round-6 review on 3ae3417d that the trigger commit did not address.

Fix this →

Minor — §8 lens-5: two test descriptions say "PR-B regression" but apply to PR-C

After the P1 split (PR-B = behavior-neutral, PR-C = labeled-sibling creation), two test descriptions in §8 lens-5 are stale:

"Slot pre-creation count" (line 506–508): len(tbl.slots) == len(labels) + 1. This assertion is false in PR-B — there are no labeled siblings yet, so len(tbl.slots) == 1. The comment "Catches a PR-B regression that skips a label constant in the pre-creation loop" should be "Catches a PR-C regression." Verified in the current sampler.go: RegisterRoute creates exactly one slot today; after PR-B it still creates one (the legacy slot); after PR-C it creates len(AllLabels)+1. The == len(labels)+1 assertion only holds post-PR-C.
"RemoveRoute symmetric teardown" (line 511–513): len(retiredSlots) == len(labels) + 1. Same problem — in PR-B there are no labeled siblings to retire, so len(retiredSlots) == 1 is the PR-B assertion; len(labels)+1 is the PR-C assertion. The comment "Catches a PR-B regression where only the legacy slot is retired and the labeled siblings leak" is misleading because labeled siblings don't exist until PR-C. Should say "PR-C regression."

Both descriptions should be updated to clearly say "PR-C regression," and the slot-count assertion should note the PR-phase dependency: "assert len(tbl.slots) == 1 post-PR-B; len(tbl.slots) == len(labels) + 1 post-PR-C." This is the Codex P2 item from the round-6 review on 3ae3417d.

Minor — "Labelled" at line 389 survives again

grep -nE 'Labelled' finds one survivor at line 389:

- Labelled: `bucket_id = "route:<id>:<label>"` (e.g.

Every other occurrence in the document uses "Labeled". This is the same spelling carry-over flagged in Rounds 5 and 6.

Fix this →

What looks good

All three Round-6 Claude bot moderate items verified correct against source:

§7 PR-D+E scope now names all four required changes: pivotKeyVizColumns map+slice widening (a), newKeyVizRowFrom Label copy (b), bucketIDFor composite format (c), mergeRowInto Label copy (d). Verified against keyviz_handler.go:312–388 and keyviz_fanout.go:509–531 — both confirmed as gaps in current code that the design now explicitly calls out.
MaxTrackedRoutes Option A (count routes, not slots): the pseudocode at sampler.go:416 divides len(next.slots) by slotsPerRoute before comparing against MaxTrackedRoutes. Integer arithmetic is correct: len(next.slots)/slotsPerRoute < MaxTrackedRoutes returns false when the slot count reaches capacity, triggering coarsening. With AllLabels empty in PR-B, slotsPerRoute = 1 and behavior is identical to the current check.
AllLabels timing: PR-B creates keyviz/labels.go with empty slice; PR-C adds the five constants. This makes RemoveRoute's allLabelsWithLegacy() call compile in PR-B (iterates over empty AllLabels plus ""), with the slot != nil guard making the loop safe (always true for the legacy slot). The PR-B pre-creation loop is a provable no-op against empty AllLabels.
virtualForRoute stays map[uint64]*routeSlot — confirmed at keyviz/sampler.go:219. The two-stage Observe lookup (labeled slot first, virtual fallback second) is correctly described with pseudocode.
reclaimRetiredSlot at sampler.go:710–724 currently matches on r.slot.RouteID == routeID only — correctly identified as a PR-B target for widening to (RouteID, Label).
Proto field 4 reuse confirmed: proto/admin.proto line with string label = 4; on KeyVizRow is already declared, avoiding the field-13 collision from the earlier draft.
mergeRowInto Label copy gap confirmed in actual code at keyviz_fanout.go:520–531 — dst is constructed with no Label field. Design correctly captures this in PR-D+E scope.

The §6/§7 contradiction is the only remaining item that could create an incorrect mental model for an implementor reading the design for the first time. The two minor items can be addressed in the same commit. Design is ready to merge after these three fixes.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3793366704

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-28T15:06:45Z

+|---|---|
+| **PR-A** | Land this design doc. |
+| **PR-B** | Sampler API extension: `Observe(... label string)`, `routeSlot.Label`, slot-key type widens to `slotKey{uint64, string}`. **`RegisterRoute` only pre-creates the legacy empty-label slot** — labeled-sibling pre-creation lands in PR-C so PR-B is a behavior-neutral refactor (per-route slot count is unchanged at 1). PR-B creates `keyviz/labels.go` with the `slotKey` struct + an **empty `AllLabels` slice**; the five adapter constants (`LabelDynamo`, `LabelRedis`, …) are added in PR-C alongside the adapter wiring, so PR-B's pre-creation loop is a no-op against the empty slice and the existing single slot per route is created exactly as today. `MatrixRow.Label` field present but always empty. `reclaimRetiredSlot` dedupes on `(RouteID, Label)` (forward-prep; only `Label=""` exists). The `MaxTrackedRoutes` coarsening check (`keyviz/sampler.go:416`) divides by `len(AllLabels)+1` so the cap continues to count **routes**, not slots; with `AllLabels` empty in PR-B the divisor is 1 and behavior is identical to today. Update existing tests **and the coordinator call sites in `kv/sharded_coordinator.go`** (the only non-test caller of `Sampler.Observe`) to pass `label = ""`. |
+| **PR-C** | Adapter wiring: each adapter sets its own label at the dispatch entry into `ShardedCoordinator.Observe…`. Extends `RegisterRoute` to pre-create the labeled siblings (one slot per `AllLabels` member) alongside the legacy empty-label slot from PR-B. Canonical constants live in `keyviz/labels.go` (NOT `adapter/keyviz_labels.go` — sampler-side imports must not climb back to the adapter package; sampler's `RegisterRoute` reads the canonical set at registration time). **Operator note**: with N labels, slot count grows from M routes to ~N×M; raise `--keyvizMaxTrackedRoutes` proportionally or hot routes will fold into the virtual bucket and lose per-label breakdown. PR-C emits a structured `slog.Warn` from inside `RegisterRoute` at the moment a route is coarsened — `slog.Warn("route folded into virtual bucket", route_id=…, slots_used=…, max_tracked_routes=…)` — so the operator sees the symptom in the log when it actually fires. A startup-time check would not catch this because routes are added dynamically (the catalog watcher calls `RegisterRoute` whenever a `SplitRange` lands), so a process that started below the cap can cross it later. **Rollout discipline**: raise `--keyvizMaxTrackedRoutes` to at least `current_routes × (len(labels) + 1)` *before* deploying the PR-C binary — the `+ 1` accounts for the legacy empty-label slot that lives alongside the N labeled siblings on every individually-tracked route. Operators sizing only with `× len(labels)` (one short per route) hit coarsening immediately and lose labeled breakdown anyway. (Codex P2 on round-5.) Otherwise an already-misconfigured node prints a wave of warnings on startup as the watcher re-registers every catalog route — loud but not incorrect, only painful for log retention. |


Remove PR-C instruction to scale MaxTrackedRoutes by labels

Section 4.1.1 explicitly redefines the coarsening check to len(next.slots) / (len(AllLabels)+1), so MaxTrackedRoutes continues to cap routes rather than slots, but the PR-C rollout note still says operators must raise --keyvizMaxTrackedRoutes proportionally or routes will fold into virtual buckets. Those two statements cannot both be true; keeping this guidance will push operators to apply unnecessary cap increases (and memory growth) for a rollout that is documented as behavior-preserving on route capacity.

Useful? React with 👍 / 👎.

Two minor items from CodeRabbit's review on `c7637537`: - §4.1.1 line 273 sampler.go snippet `len(keyviz.AllLabels)` was a real bug — the snippet is in `keyviz/sampler.go` and references the package-local `AllLabels`, so the `keyviz.` qualifier is non-idiomatic and would not compile if copy-pasted into the package. Fixed to `len(AllLabels)` with an inline note clarifying the in-package context. - §8 line 535 `#694.)` at line start trips markdownlint MD018 (no-missing-space-atx — looks like an H1 with no space). Wrapped the PR reference in backticks (`#694`) and joined the two lines so the hash is no longer at line start.

bootjp · 2026-04-29T00:37:25Z

@claude review

Round-23 follow-ups (e5c7e39d) addressing CodeRabbit's minor review on c7637537. Both items addressed:

§4.1.1 / line 273 len(keyviz.AllLabels) in sampler.go snippet — real bug: the snippet is in keyviz/sampler.go referencing the package-local AllLabels, so the keyviz. qualifier is non-idiomatic and wouldn't compile if copy-pasted into the package. Fixed to len(AllLabels) with an inline note clarifying the in-package context.
§8 / line 535 #694.) at line start trips markdownlint MD018 (no-missing-space-atx — looks like an H1 with no space). Wrapped the PR reference in backticks (`#694`) and joined the two lines so the hash is no longer at line start.

claude · 2026-04-29T00:37:40Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-29T01:05:58Z

+ //
+ // Reuse the **existing** field 4 (`string label`), which is
+ // already declared on KeyVizRow but currently unused and
+ // uncommented — no schema migration, no field-number bump.


In the proto discussion, "unused and uncommented" reads like the field is commented out. Since the intent is that label = 4 exists but lacks a doc comment, consider changing this to "undocumented" / "has no doc comment" to avoid ambiguity.

Suggested change

// uncommented — no schema migration, no field-number bump.

// undocumented — no schema migration, no field-number bump.

Copilot · 2026-04-29T01:05:59Z

+slot only — same as today's baseline). The pre-allocation cost is
+gated on the flag, so a cluster that deploys the PR-C+D+E binary
+but leaves the flag off pays no memory penalty (Codex round-15-


The doc says leaving --keyvizLabelsEnabled=false "pays no memory penalty". Even with the flag off, widening the slot map key from uint64 to slotKey{uint64, Label} increases per-entry map key storage (string header in Label) compared to today. Consider rewording to clarify that the flag avoids the multiplicative (N-label) slot count increase, but there is still some baseline per-slot overhead after PR-B due to the wider key type.

Suggested change

slot only — same as today's baseline). The pre-allocation cost is

gated on the flag, so a cluster that deploys the PR-C+D+E binary

but leaves the flag off pays no memory penalty (Codex round-15-

slot only). The pre-allocation cost is gated on the flag, so a

cluster that deploys the PR-C+D+E binary but leaves the flag off

avoids the **multiplicative** `len(AllLabels)` slot-count increase,

though there is still a small baseline per-slot overhead versus

today from the wider `{uint64, Label}` map key (Codex round-15-

Copilot · 2026-04-29T01:05:59Z

+   ```go
+   type Label string
+   const (
+       LabelLegacy Label = ""        // legacy / unlabeled traffic, coordinator default
+       LabelDynamo Label = "dynamo"
+       LabelRedis  Label = "redis"
+       LabelS3     Label = "s3"
+       LabelSQS    Label = "sqs"
+       LabelRawKV  Label = "rawkv"
+   )
+   var AllLabels = []Label{LabelDynamo, LabelRedis, LabelS3, LabelSQS, LabelRawKV}
+   ```


AllLabels is proposed as an exported var slice. Because slices are mutable, any importing package could accidentally reorder/append/modify it, which would break slot pre-creation and could cause silent sample drops. Consider making the canonical label set unexported (or exposing it via a function that returns a copy / read-only view) so only package keyviz can mutate it.

Three real items from Copilot's 8th review pass on `e5c7e39d`: - §4.1 line 137 "no memory penalty" overstatement: even with --keyvizLabelsEnabled=false, slot map key widens from uint64 to slotKey{uint64, Label} so per-slot key storage grows. Reframed: the flag avoids the *multiplicative* len(AllLabels)+1 slot count increase (the dominant cost), but each slot's wider key adds a small per-slot overhead vs today even with the flag off. - §5 line 456 "unused and uncommented" reads as "commented-out" code. Changed to "unused and undocumented (no doc comment on the field)" for clarity. - §9 Q2 line 742 AllLabels was `var []Label{...}` (exported mutable slice) — importing packages could append/reorder/zero it and silently break slot pre-creation. Switched to fixed-size array `[5]Label{...}` so any mutation attempt becomes a compile error. Required propagating the change: - allLabelsWithLegacy() now uses `copy(result, AllLabels[:])` (slice-of-array) since copy needs slice args. - §7 PR-B row updated to declare AllLabels as the populated array from PR-B, with behavior neutrality coming from the round-17 flag-gating (--keyvizLabelsEnabled=false default) instead of an empty-slice trick. - §8 TestAllLabelsAvoidSeparator description corrected from "AllLabels []Label is the typed slice" to "AllLabels is the typed [5]Label array". The PR-B "deliberately-empty AllLabels" framing from earlier rounds is now redundant (round-17's flag gate already makes PR-B behavior-neutral). Documented the sequencing change in §9 Q2 so the round-history is internally consistent.

bootjp · 2026-04-29T01:13:08Z

@claude review

Round-23 (Copilot 8th-pass) follow-ups (795c504f) on e5c7e39d. All three items addressed:

§4.1 / line 137 "no memory penalty" overstatement — even with --keyvizLabelsEnabled=false, the slot map key widens from uint64 to slotKey{uint64, Label} (string header), so per-slot key storage grows even with the flag off. Reframed: the flag avoids the multiplicative len(AllLabels)+1 slot count increase (the dominant cost), but each slot has a small per-slot overhead vs today from the wider key.
§5 / line 456 "unused and uncommented" ambiguous — "uncommented" reads like "commented-out code". Changed to "unused and undocumented (no doc comment on the field)".
§9 Q2 / line 742 AllLabels exported mutable slice footgun — importing packages could append/reorder/zero it and silently break slot pre-creation. Switched to fixed-size array [5]Label{...} so any mutation attempt is a compile error. Propagated:
- allLabelsWithLegacy() now uses copy(result, AllLabels[:]) (slice-of-array, since copy needs slice args)
- §7 PR-B row updated: AllLabels is populated from PR-B (no longer "deliberately-empty"), behavior neutrality comes from the round-17 flag-gating (--keyvizLabelsEnabled=false default) instead of an empty-slice trick
- §8 TestAllLabelsAvoidSeparator description corrected from "AllLabels []Label is the typed slice" to "AllLabels is the typed [5]Label array"

Documented the PR-B sequencing change in §9 Q2 so the round-history (round-6 empty-slice → round-17 flag-gating → round-23 array form) is internally consistent.

claude · 2026-04-29T01:13:25Z

Claude finished @bootjp's task in 6m 24s —— View job

Review (Round 24)

Gather context
Read updated design document (795c504)
Check KeyViz implementation for remaining gaps
Post comprehensive review

All three Round-23 items confirmed resolved in current HEAD:

§4.1 "no memory penalty" overstatement (lines 45–52, 136–143): now reads "minimal hot-path penalty…only the slightly wider hash and equality work…" with an explicit note that per-slot key overhead exists regardless of the flag. ✓
§5 "unused and uncommented" (lines 462–465): now "unused and undocumented (no doc comment on the field)". ✓
AllLabels mutable-slice footgun (line 760): switched to var AllLabels = [5]Label{LabelDynamo, LabelRedis, LabelS3, LabelSQS, LabelRawKV}. allLabelsWithLegacy() uses copy(result, AllLabels[:]) because copy needs slice arguments. §9 Q2 PR sequencing note added (lines 783–792). ✓

Two moderate issues surfaced from the round-23 array-form change that weren't propagated consistently through the document. Two minor issues complete the picture.

Moderate — Four stale "AllLabels empty in PR-B" references contradict the §7 PR-B row

Round-23 changed PR-B to ship keyviz/labels.go with a populated [5]Label array. The PR-B table row (line 614) correctly states this. But four other locations in §7 still use the old "empty slice in PR-B" framing:

1. §7 closing paragraph, line 619:

"PR-B widens types and prepares the slot machinery without changing observable behavior (legacy slots only, AllLabels empty, MaxTrackedRoutes divisor is 1)."

"AllLabels empty" is wrong — PR-B now ships with the populated [5]Label array. The reason the divisor is 1 in PR-B is that --keyvizLabelsEnabled defaults to false; len(AllLabels) is 5 from PR-B on. Should read "AllLabels fully populated (constant array from PR-B — flag default false keeps divisor at 1)."

2. §7 closing paragraph, line 621:

"The bundled PR-C+D+E then ships everything operator-visible — adapter wiring, AllLabels populated, all five wire-format copies…"

"AllLabels populated" was the right framing when PR-C filled the empty slice. With the round-23 array, AllLabels is populated from PR-B; PR-C+D+E wires the adapters to the existing constants, not populates the array. Should read "adapter wiring (constants already in AllLabels from PR-B)."

3. §7 PR-C+D+E row adapter-wiring description:

"Populate AllLabels in keyviz/labels.go (created in PR-B with an empty slice) with the five adapter constants (LabelDynamo, LabelRedis, LabelS3, LabelSQS, LabelRawKV)."

The parenthetical is the opposite of what round-23 introduced. Should read: "Wire each adapter to pass its canonical constant (LabelDynamo, etc.) at the ShardedCoordinator.Observe… dispatch entry — the constants are already declared in keyviz/labels.go from PR-B; no AllLabels modification is needed in PR-C+D+E."

4. §7 PR-B row (line 614), near end of the cell:

"with AllLabels empty in PR-B the divisor is 1 and behavior is identical to today."

After round-23, AllLabels has 5 elements from PR-B. The divisor is 1 because --keyvizLabelsEnabled defaults to false, not because AllLabels is empty. Should read "with the flag off in PR-B the divisor is 1 and behavior is identical to today."

An implementor reading these four contradictory statements cannot determine whether PR-B ships an empty or populated AllLabels — a critical implementation question.

Fix this →

Moderate — `TestAllLabelConstantsInAllLabels` skip guard is dead code with array form

§8 lens-5 line 700:

"Skip when len(AllLabels) == 0 (matching the avoid-: test's pre-PR-C no-op pattern)."

With var AllLabels = [5]Label{...} (populated from PR-B), len(AllLabels) is always 5 at compile time — it cannot be 0 at runtime. The t.Skip guard is dead code from PR-B onward; the test is live from the moment PR-B lands.

The TestAllLabelsAvoidSeparator description at line 684–686 correctly acknowledges "The test runs in PR-B against the populated array." The TestAllLabelConstantsInAllLabels description should do the same: remove the "Skip when len(AllLabels) == 0" language and instead say "(PR-B regression guard, written in PR-B — the populated array means this is a live test from PR-B; it guards against adding a constant to keyviz/labels.go but forgetting to update AllLabels)."

The practical effect of the dead skip guard: the test will always run in PR-B, which is correct behavior, but an implementor reading the description will expect it to skip and may be confused when it runs instead.

Fix this →

Minor — Mutation claim overstated: only `append` is prevented, not element-wise modification

§9 Q2 code comment (lines 751–752):

// Fixed-size array, not a slice — append/reorder/mutate by
// an importing package becomes a compile error

And prose at lines 768–769:

"append/reorder/mutation by an importing package would be a compile error"

In Go, var AllLabels = [5]Label{...} is addressable; an importing package can do keyviz.AllLabels[0] = keyviz.Label("hacked") and it compiles and runs. What the array type actually prevents is append(keyviz.AllLabels, extra) — because AllLabels is [5]Label, not []Label, so append is a type error. Element-wise modification and full reassignment (keyviz.AllLabels = [5]Label{...}) still compile.

The claim should be scoped to append: "The fixed-size array type prevents append-based extension — append(AllLabels, extra) is a type error since AllLabels is [5]Label, not []Label. Element reassignment and full-var reassignment are still possible but require deliberate intent and would be caught in code review."

The TestAllLabelConstantsInAllLabels regression guard is the real runtime enforcement: it verifies constants are present in AllLabels after any edit. The type system is a compile-time guard against the append pattern specifically.

Minor — `TestAllLabelsAvoidSeparator` still labeled "PR-C regression guard" after array change

Line 679:

"Canonical labels avoid : (PR-C regression guard, written in PR-B)"

With AllLabels = [5]Label{...} from PR-B, the test iterates 5 elements from PR-B onward — it is a live PR-B test, not a no-op until PR-C. Lines 684–686 correctly acknowledge this: "The test runs in PR-B against the populated array." But the parenthetical label "(PR-C regression guard, written in PR-B)" is now contradictory — calling it a "PR-C guard" implies it's inactive in PR-B.

One line change: "(PR-B regression guard — the array is populated from PR-B so the test is live immediately, preventing any future commit from changing a constant to a :-containing literal)"

What looks good

All Round-23 items verified against source:

allLabelsWithLegacy() using make([]Label, len(AllLabels), len(AllLabels)+1) then copy(result, AllLabels[:]) then append(result, LabelLegacy) is correct Go: length = len(AllLabels), capacity = len(AllLabels)+1, append yields length len(AllLabels)+1 with one allocation. The AllLabels[:] explicit re-slice is necessary because copy needs slice arguments ([5]Label is not a slice). ✓
§4.1 "minimal hot-path penalty" distinction (extra hash/equality work on wider key vs today's bare uint64, flag avoids multiplicative slot count) is accurate. ✓
The §9 Q2 PR-sequencing note (lines 783–792) correctly describes the round-6 empty-slice → round-17 flag-gating → round-23 array transition. ✓
appendDrainedRow L0 Label copy, gRPC items (e1)–(e3), all five PR-D+E items (a)–(e), RemoveRoute symmetric teardown, virtualForRoute stays map[uint64]*routeSlot — all verified against source as correctly described. ✓
pivotKeyVizColumns at keyviz_handler.go:312 confirmed map[uint64]*KeyVizRow (item (a) target). ✓
mergeRowInto at keyviz_fanout.go:520 confirmed constructs KeyVizRow without Label (item (d) target). ✓
matrixToProto at admin_grpc.go:603 confirmed map[uint64]*pb.KeyVizRow (item (e1) target). ✓

The two moderate items (four stale "AllLabels empty" references and the dead-code skip guard) are the final inconsistencies introduced by the round-23 array-form change. Fixing them makes the §7 table internally consistent. The two minor items can be addressed in the same commit. After those four fixes, design is ready to merge as PR-A.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 795c504f56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-29T01:16:17Z

+behavior (legacy slots only, AllLabels empty, MaxTrackedRoutes
+divisor is 1). The bundled PR-C+D+E then ships everything


Remove stale PR-B assumption that AllLabels is empty

This section still says PR-B is behavior-neutral because AllLabels is empty, but §7 and §9 now define PR-B as creating a populated [5]Label catalog; that contradiction can lead implementers to apply the wrong coarsening math in PR-B. If someone follows the stale assumption and uses len(AllLabels)+1 as the divisor without the flag-based slotsPerRoute guard, MaxTrackedRoutes semantics drift and route tracking/memory behavior diverges from the intended legacy-preserving rollout.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-29T01:16:17Z

+   // Fixed-size array, not a slice — append/reorder/mutate by
+   // an importing package becomes a compile error, eliminating
+   // the silent slot-drop class of bugs that mutating the


Stop treating exported AllLabels array as immutable

The doc claims a fixed-size exported array makes append/reorder/mutation a compile error, but other packages can still mutate exported array elements (e.g., keyviz.AllLabels[i] = ...), so this does not enforce catalog immutability. Relying on this guarantee leaves a silent runtime path for label-catalog drift that can break slot pre-creation invariants and drop labeled samples; use an unexported catalog with a copy-returning accessor if immutability is required.

Useful? React with 👍 / 👎.

Round-24 finished review verdict on `795c504f`: "design is ready to merge as PR-A after these 4 fixes". All 4 items addressed: - §7 line 619 closing paragraph "AllLabels empty" was a stale reference: round-23 array form makes AllLabels populated from PR-B; the divisor=1 reason is the --keyvizLabelsEnabled=false default, not an empty array. Updated to "AllLabels populated [5]Label array... flag default false keeps divisor at 1". - §7 line 621 "AllLabels populated" was the right framing before round-23 (PR-C populated an empty slice). With the array, PR-C+D+E only wires adapters; constants are already declared. Updated to "adapter wiring (constants already declared in keyviz/labels.go from PR-B...)". - §7 PR-C+D+E adapter-wiring description "Populate AllLabels...(created in PR-B with an empty slice)" was the inverse of round-23. Updated to "constants already declared from PR-B as part of the [5]Label array; PR-C+D+E only wires adapters to pass them, no AllLabels modification needed". - §7 PR-B row "with AllLabels empty in PR-B the divisor is 1" updated to "with the flag default to false in PR-B the divisor is 1" (round-23 made AllLabels populated; the flag is what keeps the divisor at 1). - §7 PR-B row "Missing L0... after PR-C populates AllLabels" updated to "after PR-C+D+E wires the adapters to pass labels" (the populate-step doesn't exist anymore; adapter-wiring is the round-23-correct trigger). - §7 PR-B row "With PR-B's empty-only slots" updated to "With PR-B's flag-default-false (only legacy slots pre-created)". - §8 lens-5 TestAllLabelConstantsInAllLabels "Skip when len(AllLabels) == 0" was dead code — populated array has length 5 at compile time. Removed the skip; updated description to "PR-B regression guard — array is populated from PR-B so this is a live test from PR-B onward, not a no-op". - §8 lens-5 TestAllLabelsAvoidSeparator "PR-C regression guard, written in PR-B" was contradictory after the array change. Updated to "PR-B regression guard — the array is populated from PR-B so the test is live immediately". - §9 Q2 mutation claim was overstated: "append/reorder/ mutation by an importing package would be a compile error" — only `append` is type-blocked. Element reassignment (AllLabels[0] = ...) and full-var reassignment compile against the exported var. Scoped the claim to `append` specifically and noted that TestAllLabelConstantsInAllLabels is the runtime guard for the rest.

bootjp · 2026-04-29T01:23:41Z

@claude review

Round-24 follow-ups (09c1c04e) addressing the Round 24 finished review on 795c504f ("design is ready to merge as PR-A after these 4 fixes"). All 4 items addressed:

Moderate / 4 stale "AllLabels empty in PR-B" references in §7 — round-23 changed AllLabels to a populated [5]Label array from PR-B, but four locations still used the pre-round-23 "empty slice" framing:
- §7 closing line 619 "AllLabels empty" → "populated [5]Label array; flag default false keeps divisor at 1"
- §7 closing line 621 "AllLabels populated" → "adapter wiring (constants already declared in keyviz/labels.go from PR-B)"
- §7 PR-C+D+E adapter-wiring "Populate AllLabels...(created in PR-B with an empty slice)" → "constants already declared from PR-B as part of the [5]Label array; PR-C+D+E only wires adapters, no AllLabels modification needed"
- §7 PR-B row "with AllLabels empty in PR-B the divisor is 1" → "with the flag default to false in PR-B the divisor is 1"
- Plus 2 secondary fixes in PR-B row: "after PR-C populates AllLabels" → "after PR-C+D+E wires the adapters to pass labels"; "With PR-B's empty-only slots" → "With PR-B's flag-default-false (only legacy slots pre-created)"
Moderate / §8 lens-5 dead Skip when len(AllLabels) == 0 guard — with the populated [5]Label array, length is always 5 at compile time, so the skip is dead code. Removed; updated description to "PR-B regression guard — array is populated from PR-B so this is a live test from PR-B onward".
Minor / §9 Q2 mutation claim overstated — only append(keyviz.AllLabels, extra) is a type error; element reassignment (keyviz.AllLabels[0] = ...) and full-var reassignment compile against the exported var. Scoped the claim to append specifically; noted that TestAllLabelConstantsInAllLabels is the runtime guard for the rest.
Minor / §8 TestAllLabelsAvoidSeparator "PR-C regression guard" label contradictory after array change. Updated to "PR-B regression guard — the array is populated from PR-B so the test is live immediately".

claude · 2026-04-29T01:24:09Z

Claude finished @bootjp's task in 3m 39s —— View job

Review (Round 25)

Gather context
Read updated design document (09c1c04)
Check KeyViz implementation for remaining gaps
Post comprehensive review

All four Round-24 items confirmed resolved in current HEAD:

§7 closing paragraph: "AllLabels empty" → "populated [5]Label array but --keyvizLabelsEnabled defaults to false so RegisterRoute skips the labeled-sibling loop and the MaxTrackedRoutes divisor stays at 1" ✓
§7 closing paragraph: "AllLabels populated" → "constants are already declared in keyviz/labels.go from PR-B; PR-C+D+E only wires adapters to pass them at the dispatch entry, no change to AllLabels itself" ✓
§7 PR-C+D+E adapter-wiring: "Populate AllLabels...(created in PR-B with an empty slice)" → "constants already declared from PR-B as part of the [5]Label array; PR-C+D+E only wires the adapters, no AllLabels modification needed" ✓
§7 PR-B row: "with AllLabels empty in PR-B the divisor is 1" → "with the flag default to false in PR-B the divisor is 1" ✓
§8 lens-5 dead Skip when len(AllLabels) == 0: removed; TestAllLabelConstantsInAllLabels now described as "live test from PR-B onward" ✓
§9 Q2 mutation claim: scoped to append only; element-reassignment and full-var reassignment explicitly called out as still compiling ✓
TestAllLabelsAvoidSeparator label: corrected from "PR-C regression guard, written in PR-B" to "PR-B regression guard — the array is populated from PR-B so the test is live immediately" ✓

Three minor stale references remain in §8 lens-5 — the same class of "empty AllLabels" language fixed in §7 by this commit, but not propagated to the test descriptions. All are minor and can be folded into the PR-B doc commit.

Minor — Three stale "empty AllLabels" descriptions in §8 lens-5

The round-24 commit fixed four §7 references but the same stale framing survives in three §8 lens-5 test descriptions:

1. Line 659: "Slot pre-creation count (PR-C, post-AllLabels population)"

AllLabels is populated from PR-B (round-23 array form). "Post-AllLabels population" implies the PR-C+D+E test waits for AllLabels to be populated — it doesn't; AllLabels has been non-empty since PR-B. The trigger is --keyvizLabelsEnabled=true, not a population event. Should read: "Slot pre-creation count (PR-C+D+E, with --keyvizLabelsEnabled=true)".

2. Line 664–665: "PR-B counterpart (with empty AllLabels): register 1 route, assert len(tbl.slots) == 1"

Same stale framing. The PR-B counterpart assertion is len(tbl.slots) == 1 because --keyvizLabelsEnabled defaults to false, not because AllLabels is empty. Should read: "PR-B counterpart (with the flag off): register 1 route, assert len(tbl.slots) == 1".

3. Line 673: "PR-B counterpart (empty AllLabels): register 1 route, RemoveRoute(routeID), assert len(retiredSlots) == 1"

Same stale framing — retiredSlots == 1 in PR-B because only the legacy slot was pre-created (flag off), not because AllLabels is empty. Should read: "PR-B counterpart (with the flag off): register 1 route, RemoveRoute(routeID), assert len(retiredSlots) == 1".

All three are the same fix: replace "empty AllLabels" / "post-AllLabels population" with "with the flag off" / "with --keyvizLabelsEnabled=true".

Fix this →

Also minor — §7 PR-C+D+E opening sentence describes "AllLabels population" as PR-C scope

Line 615, first sentence of the PR-C+D+E row:

"Bundled — must ship together. PR-C (adapter wiring + AllLabels population) and PR-D+E (wire format + pivot-key widening) are not separately shippable"

AllLabels is populated from PR-B (the [5]Label array). PR-C+D+E does adapter wiring only — no AllLabels population. The parenthetical "(adapter wiring + AllLabels population)" misdescribes PR-C's scope. Should read: "PR-C (adapter wiring only — AllLabels constants are already declared in PR-B)".

Fix this →

What looks good

All round-24 items and all prior-round items verified correct against current source:

§7 closing paragraph now correctly attributes the divisor=1 behavior to the flag default (not array emptiness); "adapter wiring (constants already declared from PR-B)" is accurate ✓
TestAllLabelConstantsInAllLabels in §8 lens-5 now says "PR-B regression guard — the [5]Label array is populated from PR-B (round-23), so this is a live test from PR-B onward" — correctly removes the dead Skip guard ✓
TestAllLabelsAvoidSeparator now says "PR-B regression guard — the array is populated from PR-B so the test is live immediately" ✓
§9 Q2 mutation claim: correctly scoped to append only; element-wise and full-var reassignment explicitly noted as still compiling ✓
appendDrainedRow L0 chain: keyviz/sampler.go:837 gap correctly identified and documented ✓
gRPC (e1)–(e3), items (a)–(d): all confirmed against source ✓
virtualForRoute stays map[uint64]*routeSlot: confirmed at sampler.go:219 ✓
RemoveRoute symmetric teardown with allLabelsWithLegacy() loop and slot != nil guard ✓
MaxTrackedRoutes Option A (divide by slotsPerRoute) ✓
Proto field 4 reuse: confirmed at proto/admin.proto:93 ✓
string(label) and string(mr.Label) casts: both present in §5 (line 541) and §8 lens-5 (lines 688–689) ✓
allLabelsWithLegacy() in keyviz/labels.go: correct make/copy/append idiom with len(AllLabels) length and len(AllLabels)+1 capacity ✓
Phase-gated test descriptions with PR-B (flag-off) and PR-C+D+E (flag-on) counterparts (minus the three stale "empty AllLabels" phrases above) ✓

The four minor items above can all be addressed in the PR-B doc commit. Design is ready to merge as PR-A.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09c1c04ed5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-29T01:26:14Z

+|---|---|
+| **PR-A** | Land this design doc. |
+| **PR-B** | Sampler API extension: extend the `Sampler` interface declaration (`keyviz/sampler.go:74`) AND the `MemSampler.Observe` implementation to `Observe(routeID uint64, op Op, keyLen, valueLen int, label Label)` (Claude bot round-8 minor: surfaces immediately as a compile error if any mock implementer is missed, but naming it keeps the scope complete). `routeSlot.Label`, slot-key type widens to `slotKey{uint64, Label}`. **`RegisterRoute` only pre-creates the legacy empty-label slot when `--keyvizLabelsEnabled=false` (the default)** — labeled-sibling pre-creation is gated on the flag (round-17), so PR-B with the default flag is a behavior-neutral refactor (per-route slot count is unchanged at 1). **PR-B creates `keyviz/labels.go`** with the `slotKey` struct, the `type Label string` definition, the `LabelLegacy`/`LabelDynamo`/`LabelRedis`/etc. **typed constants**, and `AllLabels` as a **fixed-size array** `[5]Label{LabelDynamo, ..., LabelRawKV}` (round-23 — array form prevents importing-package mutation; see §9 Q2). The constants must exist at compile time so `RemoveRoute`'s `allLabelsWithLegacy()` teardown loop and `RegisterRoute`'s pre-creation loop both compile; with the flag default to false, `RegisterRoute` skips the labeled-sibling loop body so per-route slot count stays at 1 (Claude bot round-6 + round-23 array correction). `MatrixRow.Label` field present. **L0 label copy** (Claude bot round-8 moderate): update `appendDrainedRow` (`keyviz/sampler.go:828`) to include `Label: slot.Label` in the `MatrixRow` struct literal — `slot.Label` is set once at `RegisterRoute` time and immutable, so read it directly rather than threading through `snapshotMeta`. This is the first link in the propagation chain: `slot.Label` → `MatrixRow.Label` (L0, here) → `KeyVizRow.Label` via `newKeyVizRowFrom` (L1, PR-D+E) → merged `KeyVizRow.Label` via `mergeRowInto` (L2, PR-D+E). Missing L0 silently truncates the chain and leaves `MatrixRow.Label = ""` even after PR-C+D+E wires the adapters to pass labels; L1 and L2 then faithfully copy the empty string and the feature collapses. With PR-B's flag-default-false (only legacy slots pre-created), `slot.Label` is `""` so the emitted row is `""`-labeled (identical to today's behavior), but the wiring is in place for PR-C+D+E to flip the flag and start emitting labels. `reclaimRetiredSlot` dedupes on `(RouteID, Label)` (forward-prep; only `Label=""` exists). The `MaxTrackedRoutes` coarsening check (`keyviz/sampler.go:416`) divides by `slotsPerRoute` (= `len(AllLabels)+1` when the flag is on, `1` when off; round-17 + round-23) so the cap continues to count **routes**, not slots; with the flag default to false in PR-B the divisor is 1 and behavior is identical to today (Claude round-24 fix to the prior "AllLabels empty in PR-B" framing — round-23 made AllLabels populated from PR-B, so the divisor=1 reason is the flag, not the array length). Update existing tests **and the coordinator call sites in `kv/sharded_coordinator.go`** (the only non-test caller of `Sampler.Observe`) to pass `label = ""`. |
+| **PR-C+D+E** | **Bundled — must ship together.** PR-C (adapter wiring + `AllLabels` population) and PR-D+E (wire format + pivot-key widening) are not separately shippable: once an adapter calls `Observe(..., LabelDynamo)`, the sampler emits multiple `MatrixRow` per route in a single column (one per label); without the pivot-key widening from PR-D+E, `pivotKeyVizColumns` and `matrixToProto` collapse those rows back into a single `RouteID`-keyed entry where each labeled row overwrites the previous one — non-deterministic data loss in the intermediate state. Earlier drafts framed PR-C and PR-D+E as separately shippable; that was wrong (Codex round-10 P1). The bundle covers: <br>**Adapter wiring (was PR-C)**: each adapter sets its label at the `ShardedCoordinator.Observe…` dispatch entry. The five canonical constants (`LabelDynamo`, `LabelRedis`, `LabelS3`, `LabelSQS`, `LabelRawKV`) are already declared in `keyviz/labels.go` from PR-B as part of the `[5]Label` array (round-23 — see §9 Q2 and the PR-B row above); PR-C+D+E only wires the adapters to pass them, no `AllLabels` modification needed (Claude round-24 fix to the prior "populate empty slice" framing). Extend `RegisterRoute` to pre-create the labeled siblings (one slot per `AllLabels` member) **only when `--keyvizLabelsEnabled=true`**; with the flag off, `RegisterRoute` continues to pre-create just the legacy empty-label slot, identical to today's behavior (Codex round-15-4th-pass P1). **MaxTrackedRoutes is unchanged in semantics** — the §4.1.1 coarsening check divides slot count by `slotsPerRoute` (`len(AllLabels)+1` when the flag is on, `1` when off), so the existing operator-set cap continues to mean "individually-tracked routes" exactly as before; **operators do not need to raise `--keyvizMaxTrackedRoutes`** (an earlier draft told operators to scale the cap by `× (len(labels)+1)` — that contradicted Option A; Codex round-6). Memory growth: `MaxTrackedRoutes × (len(AllLabels) + 1)` slots when labels are enabled, `MaxTrackedRoutes × 1` when off (today's baseline); documented but not capped. Emit `slog.Warn` from inside `RegisterRoute` when a route coarsens. <br>**Wire-format extension (was PR-D+E)**: proto + JSON `bucket_id` composite + optional `label` field, plus the SPA `route:N / label` rendering AND the **five** code changes across the three response paths: <br>**Single-node JSON path (`internal/admin/keyviz_handler.go`)**: <br>(a) `pivotKeyVizColumns` `rowsByID` map AND `order` slice both widen from `uint64` to the composite `BucketID string` — widening only the map without the `order` slice is a compile error; <br>(b) `newKeyVizRowFrom` (`keyviz_handler.go:368`) copies `mr.Label → row.Label` via an explicit `string(mr.Label)` cast (`MatrixRow.Label` is the typed `keyviz.Label`; `KeyVizRow.Label` is plain `string` for wire-format flexibility — Claude bot round-10 minor) — **first-level** Label copy, affects single-node and cluster deployments alike; <br>(c) `bucketIDFor` (`keyviz_handler.go:383`) returns the composite `"route:<id>:<label>"` when `mr.Label != ""`, falling back to the legacy `"route:<id>"` for empty labels — without this `BucketID` is non-unique and `applyKeyVizRowBudget` / `sortKeyVizRowsByStart` lose their deterministic tiebreak; <br>**Fan-out JSON path (`internal/admin/keyviz_fanout.go`)**: <br>(d) `mergeRowInto` (`keyviz_fanout.go:509`) adds `dst.Label = row.Label` — **second-level** Label copy, only the cluster fan-out path touches this; <br>**gRPC path (`adapter/admin_grpc.go`)**: <br>(e) `matrixToProto` (`admin_grpc.go:599`) and the per-row conversion it drives: (e1) **widen `rowsByID` (line 603) and `order` (line 604) from `uint64` to the composite `BucketID string` key** — same widening as item (a); without it `(routeID=1, label="dynamo")` and `(routeID=1, label="redis")` collapse to the same map entry; (e2) copy `MatrixRow.Label → KeyVizRow.label` (proto field 4) via `string(mr.Label)` cast (same typed→untyped reasoning as item (b)); (e3) emit composite `bucket_id` (`"route:<id>:<label>"`). Without (e1)–(e3), `GetKeyVizMatrix` gRPC clients receive collapsed unlabeled rows even though HTTP/SPA responses now show per-label rows. <br>All five copies are required; missing any one leaves a flavour of deployment with empty labels. Splitting the bundle into separate PR-C and PR-D+E was the original framing but is now rejected (see opening paragraph of this row). <br>**Operator-controlled rollout gate (rolling-upgrade safety)**: a normal rolling upgrade temporarily mixes nodes that emit legacy `route:<id>` rows with nodes that emit labeled `route:<id>:<label>` rows. The fan-out aggregator keys strictly by `BucketID` in `mergeRowInto`, so those rows do **not** merge — operators would see fragmented unlabeled-plus-labeled data per route until every node converges. PR-C+D+E adds `--keyvizLabelsEnabled` (default `false`) on every node; when false, the **`ShardedCoordinator` overrides the adapter-supplied label to `keyviz.LabelLegacy` at the single `sampler.Observe(...)` call site** in `kv/sharded_coordinator.go` (one `if !s.keyvizLabelsEnabled { label = keyviz.LabelLegacy }` guard, not a 5-file flag-read duplication across adapters; Claude bot round-12 moderate). The bundled binary is therefore safe to roll out one node at a time — every node, mixed or fully upgraded, emits the legacy format. <br>**Flag-flip activation**: once the fleet is fully on the new binary, the operator flips `--keyvizLabelsEnabled=true`. The flag is a startup-only `flag.Bool` (no live-toggle / config-reload path; Claude bot round-12 minor); changing it requires a process restart. There are two restart strategies: (a) **simultaneous restart of all nodes** — KeyViz heatmap is briefly unavailable during the restart window but no mixed-format fragmentation occurs; (b) **rolling restart** — for the duration of the restart (typically minutes), the heatmap shows a transient mixed view because the legacy `route:N` rows from not-yet-restarted nodes don't merge with `route:N:label` rows from restarted nodes. Since KeyViz is a monitoring view (not a consistency-sensitive system), the rolling-restart fragmentation is acceptable and clears as the final node restarts; operators who want zero fragmentation should use the simultaneous restart. (Claude bot round-12 moderate.) <br>**Flag also gates pre-allocation**: `--keyvizLabelsEnabled` is **both** the traffic-routing toggle (override label to `LabelLegacy` at the coordinator) **and** the slot-pre-creation toggle. When `false`, `RegisterRoute` pre-creates only the legacy empty-label slot (`len(next.slots) += 1` per route, identical to today's behavior); labeled siblings are not allocated. When `true`, `RegisterRoute` pre-creates the legacy slot **and** one labeled sibling per `AllLabels` member (`len(next.slots) += len(AllLabels)+1` per route). Memory therefore stays at the today's level (`MaxTrackedRoutes × 1`) for clusters that deploy the bundled binary but leave the flag off, and grows to `MaxTrackedRoutes × (len(AllLabels)+1)` only when the operator opts in. The flag-flip activation requires a process restart (see preceding paragraph), and `RegisterRoute` re-runs at startup, so the new pre-creation regime is in effect immediately after the restart — there is no live re-allocation path needed. **An earlier draft separated memory and traffic-routing into two regimes ("memory-vs-flag separation", round-12) where labeled slots were pre-created regardless of the flag; that was wrong (Codex round-15-4th-pass P1) — it imposed a multi-x memory penalty on clusters that never enable labels, contradicting the §2.1 minimal-penalty goal.** (Codex round-11 P2 originated this rollout gate.) |


Define keyvizLabelsEnabled phase consistently

The rollout table is internally inconsistent about when --keyvizLabelsEnabled exists: the PR-B row makes behavior-neutrality depend on this flag being default-false and gating pre-allocation, but the PR-C+D+E row says that same flag is added later. If implementers follow the PR-C+D+E wording literally, PR-B cannot enforce the gating it claims, so a standalone PR-B rollout can drift from the documented “legacy-equivalent” behavior (route-cap semantics and slot allocation). Please pin the flag introduction to a single phase and make all PR scopes reference that same phase.

Useful? React with 👍 / 👎.

Round-25 (Claude bot R25 finished review + Codex P2): Claude bot R25 verdict on `09c1c04e` was "design is ready to merge as PR-A" with 4 minor stale-language items. All 4 addressed: - §8 lens-5 Slot pre-creation count test: "(PR-C, post- AllLabels population)" + "PR-B counterpart (with empty AllLabels)" → "(PR-C+D+E, with --keyvizLabelsEnabled= true)" + "PR-B counterpart (with --keyvizLabelsEnabled= false, the default)". Added explicit note that PR-B vs PR-C+D+E counterparts differ on the flag, not on AllLabels length (the [5]Label array is populated from PR-B per round-23). - §8 lens-5 RemoveRoute symmetric teardown test: same fix — "(PR-C)" + "(empty AllLabels)" → "(PR-C+D+E, flag on)" + "(with the flag off)". Counterpart assertion explained as "only the legacy slot was pre-created (flag off), so only it is retired". - §7 PR-C+D+E opening sentence: "PR-C (adapter wiring + AllLabels population)" → "PR-C (adapter wiring only — AllLabels constants are already declared in PR-B as part of the [5]Label array)". Codex P2: PR-B row's behavior-neutrality claim depends on --keyvizLabelsEnabled being default-false, but PR-C+D+E row said the flag is "added in PR-C+D+E". Implementer following PR-C+D+E literally would conclude PR-B cannot enforce its own gating. Fixed by: - §7 PR-B row prepended: "PR-B introduces the --keyvizLabelsEnabled startup flag (default false) on the server binary, so the flag exists from PR-B onward and PR-B's behavior-neutrality claim does not depend on PR-C+D+E (Codex round-25 P2)". - §7 PR-C+D+E rollout-gate paragraph: "PR-C+D+E adds --keyvizLabelsEnabled" → "--keyvizLabelsEnabled is introduced in PR-B... PR-C+D+E adds the coordinator override path". Clarifies PR-B introduces the flag for pre-creation gating; PR-C+D+E adds the coordinator-side override path on top, which only matters once adapters pass non-empty labels.

bootjp · 2026-04-29T01:31:32Z

@claude review

Round-25 follow-ups (788b6858) addressing Claude bot R25 finished review verdict ("design is ready to merge as PR-A" + 4 minor items) + Codex P2 on 09c1c04e. All 5 items addressed:

Claude bot R25 (4 minor stale-language items):

§8 lens-5 Slot pre-creation count: "(PR-C, post-AllLabels population)" + "PR-B counterpart (with empty AllLabels)" → "(PR-C+D+E, with --keyvizLabelsEnabled=true)" + "PR-B counterpart (with --keyvizLabelsEnabled=false, the default)". Added explicit note that PR-B vs PR-C+D+E counterparts differ on the flag, not on AllLabels length.
§8 lens-5 RemoveRoute symmetric teardown: same fix, "(PR-C)" + "(empty AllLabels)" → "(PR-C+D+E, flag on)" + "(with the flag off)".
§7 PR-C+D+E opening sentence: "PR-C (adapter wiring + AllLabels population)" → "PR-C (adapter wiring only — AllLabels constants are already declared in PR-B as part of the [5]Label array)".

Codex P2 (consistency bug): PR-B row's behavior-neutrality claim depends on --keyvizLabelsEnabled being default-false, but PR-C+D+E row said the flag is "added in PR-C+D+E". Implementer following PR-C+D+E literally would conclude PR-B cannot enforce its own gating. Fixed by:

§7 PR-B row prepended: "PR-B introduces the --keyvizLabelsEnabled startup flag (default false) on the server binary, so the flag exists from PR-B onward and PR-B's behavior-neutrality claim does not depend on PR-C+D+E"
§7 PR-C+D+E rollout-gate paragraph: "PR-C+D+E adds --keyvizLabelsEnabled" → "--keyvizLabelsEnabled is introduced in PR-B... PR-C+D+E adds the coordinator override path" (the override path only matters once adapters pass non-empty labels, which happens in PR-C+D+E)

claude · 2026-04-29T01:31:55Z

Claude finished @bootjp's task in 2m 19s —— View job

Review (Round 26)

Gather context
Read updated design document (788b685)
Check KeyViz implementation for remaining gaps
Post comprehensive review

All five Round-25 items confirmed resolved in current HEAD:

§8 lens-5 Slot pre-creation count: "(PR-C+D+E, with --keyvizLabelsEnabled=true)" and "PR-B counterpart (with --keyvizLabelsEnabled=false, the default)" at lines 659–670 — correctly attributes the PR-B vs. PR-C+D+E counterpart distinction to the flag state, not to AllLabels length ✓
§8 lens-5 RemoveRoute symmetric teardown: "(PR-C+D+E, flag on)" and "PR-B counterpart (with the flag off)" at lines 671–680 — same fix applied consistently ✓
§7 PR-C+D+E opening sentence: "PR-C (adapter wiring only — AllLabels constants are already declared in PR-B as part of the [5]Label array, see §9 Q2)" — correctly removes the stale "adapter wiring + AllLabels population" claim ✓
Codex P2 / flag-introduction phase: PR-B row opens with "PR-B introduces the --keyvizLabelsEnabled startup flag (default false) on the server binary, so the flag exists from PR-B onward and PR-B's behavior-neutrality claim does not depend on PR-C+D+E" — PR-B can now enforce its own pre-creation gating without waiting for PR-C+D+E ✓
Codex P2 / coordinator override distinction: PR-C+D+E row correctly distinguishes "PR-B introduces the flag for pre-creation gating" from "PR-C+D+E adds the coordinator override path"; the closing note "(Codex round-25 P2 caught the prior wording that said 'PR-C+D+E adds the flag')" documents the fix ✓

Also verified §7 closing paragraph (lines 617–629): "PR-B widens types and prepares the slot machinery without changing observable behavior (legacy slots only, AllLabels is the populated [5]Label array but --keyvizLabelsEnabled defaults to false so RegisterRoute skips the labeled-sibling loop and the MaxTrackedRoutes divisor stays at 1)" ✓ and "The bundled PR-C+D+E then ships everything operator-visible — adapter wiring (constants are already declared in keyviz/labels.go from PR-B; PR-C+D+E only wires adapters to pass them at the dispatch entry, no change to AllLabels itself)" ✓

No new issues found.

Final state

All issues from rounds 1–26 are resolved. The complete design has been verified against source:

virtualForRoute stays map[uint64]*routeSlot at sampler.go:219 ✓
appendDrainedRow L0 Label copy correctly specified ✓
RemoveRoute symmetric teardown with allLabelsWithLegacy() loop ✓
MaxTrackedRoutes Option A (divide by slotsPerRoute) ✓
All five PR-C+D+E wire-format items (a)–(e) plus gRPC sub-steps (e1)–(e3) ✓
keyviz/labels.go not yet in repo — correctly scoped to PR-B ✓
AllLabels [5]Label array, typed Label constants, allLabelsWithLegacy() semantics ✓
Phase-gated §8 lens-5 tests (flag-off and flag-on counterparts) ✓
Proto field 4 reuse confirmed at proto/admin.proto:93 ✓
PR-B introduces the flag; PR-C+D+E adds the coordinator override path — correctly distinguished ✓

Design is ready to merge as PR-A.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 788b685832

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-29T01:33:47Z

+| PR | Scope |
+|---|---|
+| **PR-A** | Land this design doc. |
+| **PR-B** | Sampler API extension: extend the `Sampler` interface declaration (`keyviz/sampler.go:74`) AND the `MemSampler.Observe` implementation to `Observe(routeID uint64, op Op, keyLen, valueLen int, label Label)` (Claude bot round-8 minor: surfaces immediately as a compile error if any mock implementer is missed, but naming it keeps the scope complete). `routeSlot.Label`, slot-key type widens to `slotKey{uint64, Label}`. **PR-B introduces the `--keyvizLabelsEnabled` startup flag** (default `false`) on the server binary, so the flag exists from PR-B onward and PR-B's behavior-neutrality claim does not depend on PR-C+D+E (Codex round-25 P2). **`RegisterRoute` only pre-creates the legacy empty-label slot when `--keyvizLabelsEnabled=false` (the default)** — labeled-sibling pre-creation is gated on the flag (round-17), so PR-B with the default flag is a behavior-neutral refactor (per-route slot count is unchanged at 1). **PR-B creates `keyviz/labels.go`** with the `slotKey` struct, the `type Label string` definition, the `LabelLegacy`/`LabelDynamo`/`LabelRedis`/etc. **typed constants**, and `AllLabels` as a **fixed-size array** `[5]Label{LabelDynamo, ..., LabelRawKV}` (round-23 — array form prevents importing-package mutation; see §9 Q2). The constants must exist at compile time so `RemoveRoute`'s `allLabelsWithLegacy()` teardown loop and `RegisterRoute`'s pre-creation loop both compile; with the flag default to false, `RegisterRoute` skips the labeled-sibling loop body so per-route slot count stays at 1 (Claude bot round-6 + round-23 array correction). `MatrixRow.Label` field present. **L0 label copy** (Claude bot round-8 moderate): update `appendDrainedRow` (`keyviz/sampler.go:828`) to include `Label: slot.Label` in the `MatrixRow` struct literal — `slot.Label` is set once at `RegisterRoute` time and immutable, so read it directly rather than threading through `snapshotMeta`. This is the first link in the propagation chain: `slot.Label` → `MatrixRow.Label` (L0, here) → `KeyVizRow.Label` via `newKeyVizRowFrom` (L1, PR-D+E) → merged `KeyVizRow.Label` via `mergeRowInto` (L2, PR-D+E). Missing L0 silently truncates the chain and leaves `MatrixRow.Label = ""` even after PR-C+D+E wires the adapters to pass labels; L1 and L2 then faithfully copy the empty string and the feature collapses. With PR-B's flag-default-false (only legacy slots pre-created), `slot.Label` is `""` so the emitted row is `""`-labeled (identical to today's behavior), but the wiring is in place for PR-C+D+E to flip the flag and start emitting labels. `reclaimRetiredSlot` dedupes on `(RouteID, Label)` (forward-prep; only `Label=""` exists). The `MaxTrackedRoutes` coarsening check (`keyviz/sampler.go:416`) divides by `slotsPerRoute` (= `len(AllLabels)+1` when the flag is on, `1` when off; round-17 + round-23) so the cap continues to count **routes**, not slots; with the flag default to false in PR-B the divisor is 1 and behavior is identical to today (Claude round-24 fix to the prior "AllLabels empty in PR-B" framing — round-23 made AllLabels populated from PR-B, so the divisor=1 reason is the flag, not the array length). Update existing tests **and the coordinator call sites in `kv/sharded_coordinator.go`** (the only non-test caller of `Sampler.Observe`) to pass `label = ""`. |


Keep PR-B from conditionally pre-creating labeled slots

This PR-B scope line says labeled-sibling pre-creation is "gated on the flag," which contradicts §4.1.1's earlier PR-B contract that only the legacy slot exists and labeled siblings start in PR-C. If implementers follow this PR-B wording literally, they can add PR-B code that allocates len(AllLabels)+1 slots whenever --keyvizLabelsEnabled=true, reintroducing the early memory/route-budget behavior this doc is trying to defer to PR-C. Please make PR-B unambiguously legacy-slot-only regardless of flag, or explicitly state that enabling the flag in PR-B is unsupported.

Useful? React with 👍 / 👎.

Round-26 Codex P2 on `788b6858`: The round-25 PR-B row said "RegisterRoute only pre-creates the legacy empty-label slot when --keyvizLabelsEnabled=false (the default) — labeled-sibling pre-creation is gated on the flag (round-17)". This implied that an operator setting --keyvizLabelsEnabled=true in PR-B would cause RegisterRoute to pre-create len(AllLabels)+1 slots, reintroducing the early memory/route-budget regression PR-C+D+E tries to defer. Per Codex round-26 P2, PR-B should be unambiguously legacy-slot-only regardless of the flag. The flag-honoring RegisterRoute code lands in PR-C+D+E, not PR-B. Updates: - §7 PR-B row: clarified that PR-B introduces the flag for CLI parsing + Opts plumbing only; RegisterRoute in PR-B unconditionally pre-creates only the legacy slot regardless of the flag value. The flag-honoring loop lands in PR-C+D+E. Per-route slot count is unconditionally 1 in PR-B. - §7 closing paragraph: same clarification — PR-B is legacy-slots-only because the flag-honoring pre-creation loop is in PR-C+D+E, not because the flag is default-false. - §4.1.1 PR-B/PR-C+D+E body pseudocode (sampler.go:416 slotsPerRoute calculation): split into two phases — PR-B body stops at `slotsPerRoute := 1` (literal, no flag check), PR-C+D+E extends to the flag-gated conditional branch. Comment updated accordingly. The flag still exists from PR-B (per round-25 P2 — PR-B needs it on Opts for downstream wiring), but PR-B treats it as dormant for slot pre-creation. PR-C+D+E activates the flag's pre-creation effect in tandem with the adapter-wiring + wire-format changes.

bootjp · 2026-04-29T01:37:33Z

@claude review

Round-26 follow-up (e7b0c132) addressing Codex P2 on 788b6858. One real consistency item:

Codex P2 / PR-B flag-gating ambiguity — the round-25 wording said "PR-B's RegisterRoute only pre-creates the legacy slot when --keyvizLabelsEnabled=false", which implied an operator could enable the flag in PR-B and trigger labeled-sibling pre-creation, reintroducing the early memory regression PR-C+D+E tries to defer. Fixed: PR-B's RegisterRoute is unambiguously legacy-slot-only regardless of the flag. The flag's pre-creation effect is dormant in PR-B (the flag exists on Opts for downstream plumbing only); the flag-honoring RegisterRoute loop lands in PR-C+D+E. Clarified in three places:
- §7 PR-B row: "RegisterRoute in PR-B always pre-creates only the legacy empty-label slot, regardless of the flag" + flag is "dormant for slot pre-creation" in PR-B.
- §7 closing paragraph: "PR-B is legacy-slots-only because the flag-honoring pre-creation loop is in PR-C+D+E, not because the flag is default-false".
- §4.1.1 PR-B/PR-C+D+E pseudocode (slotsPerRoute calculation at sampler.go:416): split into two phases — PR-B body stops at slotsPerRoute := 1 (literal, no flag check); PR-C+D+E extends to the flag-gated conditional branch.

Round-25 R25 verdict from Claude bot was "design ready to merge as PR-A — no new issues found", but Codex P2 caught this consistency issue post-R25. Round-26 is the cleanup; the R25 "ready" verdict still holds modulo this clarification.

claude · 2026-04-29T01:37:51Z

Claude finished @bootjp's task in 3m 43s —— View job

Review (Round 27)

Gather context
Read updated design document (e7b0c13)
Check for remaining gaps after Round-26 fixes
Post comprehensive review

All three Round-26 changes confirmed correctly applied in current HEAD:

§7 PR-B row: "RegisterRoute in PR-B always pre-creates only the legacy empty-label slot, regardless of the flag" + "flag is dormant for slot pre-creation (it exists for option plumbing only)" ✓
§7 closing paragraph: "PR-B is legacy-slots-only because the flag-honoring pre-creation loop lands in PR-C+D+E, not PR-B" ✓
§4.1.1 pseudocode (slotsPerRoute at sampler.go:416): split into PR-B body (slotsPerRoute := 1, comment "PR-B body stops here") and PR-C+D+E extension (flag-gated conditional) ✓

Two issues surfaced. Both involve stale §9 Q2 and §4.1.1 references that were not propagated when round-26 locked in the "unconditionally legacy-slot-only in PR-B" semantics.

Moderate — §9 Q2 "PR sequencing note" directly contradicts the round-26 §7 PR-B fix

Lines 835–836:

"but the round-17 flag-gating means PR-B's RegisterRoute only pre-creates labeled siblings when --keyvizLabelsEnabled=true, which defaults to false. So PR-B remains behavior-neutral via the flag, not via an empty slice."

Round-26's stated fix was precisely the opposite: PR-B's RegisterRoute is unconditionally legacy-slot-only regardless of the flag. The §7 PR-B row (correctly updated in round-26) says "RegisterRoute in PR-B always pre-creates only the legacy empty-label slot, regardless of the flag" and "in PR-B the flag is dormant for slot pre-creation."

The §9 note says behavior-neutrality comes from the flag default; §7 says it comes from the unconditional code structure (the flag-honoring loop simply isn't in PR-B at all). An implementor reading §9 would think: "If I set --keyvizLabelsEnabled=true on a PR-B binary, labeled-sibling pre-creation activates" — exactly the risk round-26 was written to prevent.

The note needs to match §7:

-but the round-17 flag-gating means PR-B's `RegisterRoute` only
-pre-creates labeled siblings when `--keyvizLabelsEnabled=true`,
-which defaults to false. So PR-B remains behavior-neutral via
-the flag, not via an empty slice.
+With the round-26 unconditional model, PR-B's `RegisterRoute`
+always pre-creates only the legacy slot regardless of the flag
+— the flag-honoring pre-creation loop is simply not present in
+PR-B (it lands in PR-C+D+E). So PR-B remains behavior-neutral
+unconditionally, not via a flag default or an empty slice.

Fix this →

Minor — Dual `slotsPerRoute :=` in a single code block is invalid Go

Lines 285–291 show both the PR-B and PR-C+D+E versions of the coarsening check in a single fenced Go block:

slotsPerRoute := 1 // PR-B body stops here

// PR-C+D+E extends to (Codex round-15-4th-pass P1):
slotsPerRoute := 1 // legacy slot only (when flag off)
if s.opts.KeyVizLabelsEnabled {
    slotsPerRoute = len(AllLabels) + 1
}

Two := declarations for slotsPerRoute in the same scope is a Go compile error ("no new variables on left side of :="). The intent is that PR-C+D+E replaces the first declaration with the conditional block — but as written, an implementor who copies the PR-C+D+E block while keeping the PR-B line gets an unexplained compile error.

The cleanest fix is two separate code blocks:

```go
// keyviz/sampler.go:416 — PR-B body:
slotsPerRoute := 1 // legacy slot only; unconditional (Codex round-26 P2)
```

```go
// keyviz/sampler.go:416 — PR-C+D+E replaces the above with:
slotsPerRoute := 1 // legacy slot only (when flag off)
if s.opts.KeyVizLabelsEnabled {
    slotsPerRoute = len(AllLabels) + 1 // 6 with flag on
}
```

Fix this →

What looks good

All Round-26 changes verified correct against source:

§7 PR-B row: "RegisterRoute in PR-B always pre-creates only the legacy empty-label slot, regardless of the flag" + "flag is dormant for slot pre-creation (it exists for option plumbing only)" + "the RegisterRoute flag-honoring pre-creation loop is not yet present in PR-B" — all three formulations are consistent and correct ✓
§7 closing paragraph: "RegisterRoute in PR-B unconditionally pre-creates only the legacy slot regardless of --keyvizLabelsEnabled, since the flag-honoring pre-creation loop lands in PR-C+D+E, not PR-B" ✓
slotsPerRoute at §4.1.1: comment "PR-B body stops here" + "PR-C+D+E extends to" — conceptually correct (just needs the two-block split) ✓
virtualForRoute stays map[uint64]*routeSlot: confirmed at sampler.go:219 ✓
appendDrainedRow L0 Label copy: named in §7 PR-B scope ✓
gRPC (e1)–(e3) + items (a)–(d): all five PR-C+D+E wire-format changes ✓
RemoveRoute symmetric teardown: allLabelsWithLegacy() loop with slot != nil guard ✓
MaxTrackedRoutes Option A (divide by slotsPerRoute) ✓
AllLabels [5]Label array + type Label string typed constants ✓
Phase-gated §8 lens-5 tests (flag-off and flag-on counterparts) ✓
Proto field 4 reuse: confirmed at proto/admin.proto:93 ✓
TestAllLabelsAvoidSeparator and TestAllLabelConstantsInAllLabels: both present with correct phase labels ✓

The §9 Q2 note (moderate) needs to be updated before PR-B review; it contradicts the §7 PR-B row that round-26 correctly updated. The pseudocode split (minor) can go in the same commit. After those two fixes, design is ready to merge as PR-A.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7b0c132ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-29T01:41:35Z

+   onward — but the round-17 flag-gating means PR-B's
+   `RegisterRoute` only pre-creates labeled siblings when
+   `--keyvizLabelsEnabled=true`, which defaults to false. So


Remove PR-B claim that labeled siblings can be pre-created

Fresh evidence in this revision: the §9 PR-sequencing note says PR-B RegisterRoute pre-creates labeled siblings when --keyvizLabelsEnabled=true, which contradicts §7’s PR-B contract that PR-B is legacy-slot-only and defers flag-honoring pre-creation to PR-C+D+E. Leaving this contradiction can cause implementers to reintroduce PR-B label-slot allocation, bringing back the early memory growth and route-cap/coarsening drift the rollout is explicitly trying to avoid.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-29T01:41:35Z

+| PR | Scope |
+|---|---|
+| **PR-A** | Land this design doc. |
+| **PR-B** | Sampler API extension: extend the `Sampler` interface declaration (`keyviz/sampler.go:74`) AND the `MemSampler.Observe` implementation to `Observe(routeID uint64, op Op, keyLen, valueLen int, label Label)` (Claude bot round-8 minor: surfaces immediately as a compile error if any mock implementer is missed, but naming it keeps the scope complete). `routeSlot.Label`, slot-key type widens to `slotKey{uint64, Label}`. **PR-B introduces the `--keyvizLabelsEnabled` startup flag** (default `false`) on the server binary — CLI parsing and the `KeyVizLabelsEnabled bool` option on `ShardedCoordinatorOpts` / `MemSamplerOpts` ship in PR-B, so the flag exists from PR-B onward and PR-B's behavior-neutrality claim does not depend on PR-C+D+E (Codex round-25 P2). **`RegisterRoute` in PR-B always pre-creates only the legacy empty-label slot, regardless of the flag** (Codex round-26 P2 — earlier round-25 wording said pre-creation is "gated on the flag" in PR-B, which would have allowed an operator to enable the flag in PR-B and start allocating `len(AllLabels)+1` slots before any of the PR-C+D+E adapter-wiring or wire-format machinery existed). The `RegisterRoute` flag-honoring code lands in **PR-C+D+E**, not PR-B; in PR-B the flag is **dormant** for slot pre-creation (it exists for option plumbing only). Per-route slot count in PR-B is therefore unconditionally 1, regardless of the flag, and PR-B is fully behavior-neutral. **PR-B creates `keyviz/labels.go`** with the `slotKey` struct, the `type Label string` definition, the `LabelLegacy`/`LabelDynamo`/`LabelRedis`/etc. **typed constants**, and `AllLabels` as a **fixed-size array** `[5]Label{LabelDynamo, ..., LabelRawKV}` (round-23 — array form prevents importing-package mutation; see §9 Q2). The constants must exist at compile time so `RemoveRoute`'s `allLabelsWithLegacy()` teardown loop compiles in PR-B; `RegisterRoute`'s flag-honoring pre-creation loop is **not yet present in PR-B** (it lands in PR-C+D+E per the previous paragraph), so PR-B's `RegisterRoute` body is a single legacy-slot create call regardless of the flag. With PR-B unconditionally pre-creating only the legacy slot, per-route slot count stays at 1 (Claude bot round-6 + round-23 array correction + Codex round-26 P2 unconditionality). `MatrixRow.Label` field present. **L0 label copy** (Claude bot round-8 moderate): update `appendDrainedRow` (`keyviz/sampler.go:828`) to include `Label: slot.Label` in the `MatrixRow` struct literal — `slot.Label` is set once at `RegisterRoute` time and immutable, so read it directly rather than threading through `snapshotMeta`. This is the first link in the propagation chain: `slot.Label` → `MatrixRow.Label` (L0, here) → `KeyVizRow.Label` via `newKeyVizRowFrom` (L1, PR-D+E) → merged `KeyVizRow.Label` via `mergeRowInto` (L2, PR-D+E). Missing L0 silently truncates the chain and leaves `MatrixRow.Label = ""` even after PR-C+D+E wires the adapters to pass labels; L1 and L2 then faithfully copy the empty string and the feature collapses. With PR-B's flag-default-false (only legacy slots pre-created), `slot.Label` is `""` so the emitted row is `""`-labeled (identical to today's behavior), but the wiring is in place for PR-C+D+E to flip the flag and start emitting labels. `reclaimRetiredSlot` dedupes on `(RouteID, Label)` (forward-prep; only `Label=""` exists). The `MaxTrackedRoutes` coarsening check (`keyviz/sampler.go:416`) divides by `slotsPerRoute` (= `len(AllLabels)+1` when the flag is on, `1` when off; round-17 + round-23) so the cap continues to count **routes**, not slots; with the flag default to false in PR-B the divisor is 1 and behavior is identical to today (Claude round-24 fix to the prior "AllLabels empty in PR-B" framing — round-23 made AllLabels populated from PR-B, so the divisor=1 reason is the flag, not the array length). Update existing tests **and the coordinator call sites in `kv/sharded_coordinator.go`** (the only non-test caller of `Sampler.Observe`) to pass `label = ""`. |


Keep PR-B divisor independent of keyvizLabelsEnabled

The PR-B scope row states RegisterRoute is unconditionally legacy-slot-only in PR-B, but later in the same row it says the coarsening divisor is len(AllLabels)+1 when the flag is on. If implemented literally, enabling the flag on PR-B would divide slot count by 6 while still storing one slot per route, effectively expanding tracked-route capacity far beyond MaxTrackedRoutes and violating the documented behavior-neutral semantics for PR-B.

Useful? React with 👍 / 👎.

Claude R27 verdict on `e7b0c132`: "design is ready to merge as PR-A after these 2 fixes". Both items addressed: - §9 Q2 PR sequencing note (lines 835-840) directly contradicted the round-26 §7 PR-B fix. Said "PR-B's RegisterRoute only pre-creates labeled siblings when --keyvizLabelsEnabled=true" — opposite of round-26's "unconditionally legacy-slot-only regardless of the flag". Updated to match: "With the round-23 array form, AllLabels is non-empty from PR-B onward, AND with the round-26 unconditional model, PR-B's RegisterRoute always pre-creates only the legacy slot regardless of the flag — the flag-honoring loop is simply not present in PR-B (lands in PR-C+D+E). PR-B remains behavior-neutral unconditionally, not via flag default or empty slice." The flag is introduced in PR-B for Opts plumbing only; its read-site doesn't exist in PR-B so its value has no observable effect. - §4.1.1 dual `slotsPerRoute :=` in single Go code block (lines 285-291) is a Go compile error — two `:=` declarations in the same scope ("no new variables on left side of :="). Split into two separate code blocks: PR-B body (unconditional `slotsPerRoute := 1`) and PR-C+D+E body (replaces the PR-B body with the flag-gated conditional). Added explicit narrative: "PR-C+D+E replaces the literal 1 with the flag-gated conditional".

bootjp · 2026-04-29T01:43:30Z

@claude review

Round-27 follow-ups (44d7cd45) addressing Claude R27 finished review on e7b0c132. Both items addressed:

Moderate / §9 Q2 PR sequencing note contradicted §7 PR-B (round-26) fix — the §9 note still said "PR-B's RegisterRoute only pre-creates labeled siblings when --keyvizLabelsEnabled=true", which is exactly the opposite of round-26's "unconditionally legacy-slot-only regardless of the flag". Reworded the §9 note to match: PR-B is behavior-neutral unconditionally (the flag-honoring loop is simply not present in PR-B; it lands in PR-C+D+E), not via a flag default or an empty slice. Flag exists in PR-B for Opts plumbing only.
Minor / dual slotsPerRoute := in single Go code block — two := declarations for the same variable in one scope is a compile error ("no new variables on left side of :="). Split into two separate code blocks: PR-B body (unconditional slotsPerRoute := 1) and PR-C+D+E body (replaces the PR-B body with the flag-gated conditional). Added narrative "PR-C+D+E replaces the literal 1 with the flag-gated conditional" to make the intent unambiguous.

claude · 2026-04-29T01:43:46Z

Claude finished @bootjp's task in 5m 22s —— View job

Review (Round 28)

Gather context
Read updated design document (44d7cd4)
Check for remaining gaps after Round-27 fixes
Post comprehensive review

Both Round-27 items confirmed resolved in current HEAD:

§9 Q2 PR sequencing note: correctly updated at lines 841–855 — "PR-B remains behavior-neutral unconditionally, not via a flag default or an empty slice. The flag is introduced in PR-B for option plumbing only (KeyVizLabelsEnabled bool on Opts)…; in PR-B the flag's read-site doesn't exist yet, so its value has no observable effect." Fully consistent with §7 PR-B row ("RegisterRoute in PR-B always pre-creates only the legacy empty-label slot, regardless of the flag"). ✓
Dual slotsPerRoute := compile error: correctly split into two separate code fences at lines 278–305. PR-B body fence shows slotsPerRoute := 1 unconditionally; PR-C+D+E fence shows the flag-gated conditional as a replacement. The explanatory note "(Two separate code blocks — the dual slotsPerRoute := form in a single Go scope is a compile error…)" is present. ✓

Two minor naming inconsistencies remain in §4.1.1. Both can be folded into the PR-B doc commit.

Minor — §4.1.1 section heading and prose use "PR-C" instead of "PR-C+D+E"

§7 renamed the bundle to "PR-C+D+E (Bundled)" to emphasise that adapter wiring, wire-format changes, and pivot-key widening must ship together. Three locations in §4.1.1 still say "PR-C":

Line 174, section heading: "#### Pre-allocation is gated on PR-C, not PR-B" → should read "…PR-C+D+E…".
Lines 187, 195: "After PR-C with --keyvizLabelsEnabled=true" / "After PR-C with --keyvizLabelsEnabled=false" → both should say "After PR-C+D+E".
Line 212, code comment: "// PR-C extends to (flag-gated, …):" → should read "// PR-C+D+E extends to …:".

An implementor following §4.1.1 in isolation might believe the RegisterRoute pre-creation extension can ship without the wire-format and pivot-key changes — exactly the partial-deployment scenario §7's bundling rule was written to prevent.

Fix this →

Minor — RegisterRoute pseudocode mixes PR-B and PR-C in a single fence

Lines 202–224 show the PR-B body and the "PR-C extends to" block in a single Go code fence:

// PR-B body:
slot := s.reclaimRetiredSlot(routeID, "")
...
next.slots[slotKey{RouteID: routeID, Label: ""}] = slot

// PR-C extends to (flag-gated, Codex round-15-4th-pass P1):
labelsToCreate := []Label{LabelLegacy}
if s.opts.KeyVizLabelsEnabled {
    labelsToCreate = allLabelsWithLegacy()
}
for _, label := range labelsToCreate { ... }

Unlike the slotsPerRoute case (where two := declarations in the same scope were a compile error), this code doesn't fail to compile if naively copy-pasted: the PR-B body creates slotKey{RouteID, ""} and the "PR-C" loop (with labelsToCreate = []Label{LabelLegacy} when the flag is off) also creates slotKey{RouteID, ""} — resulting in the legacy slot being created twice, with the second pass silently re-inserting the same map key (a no-op, so tests pass). A PR-B implementor who copies both blocks would miss that the intent is "PR-C+D+E replaces this body."

The slotsPerRoute fix used separate fences with an explicit "PR-C+D+E replaces the PR-B body above" bridge — the same treatment here would prevent the double-implementation risk.

Fix this →

What looks good

All Round-27 changes verified against source:

slotsPerRoute code blocks split: PR-B fence (unconditional slotsPerRoute := 1) and PR-C+D+E fence (flag-gated conditional) at lines 278–305 — correctly separated ✓
§9 Q2 PR sequencing note: "unconditionally…not via a flag default or an empty slice" — consistent with §7 PR-B row and §7 closing paragraph ✓
appendDrainedRow L0 Label copy in PR-B scope ✓
gRPC (e1)–(e3) three sub-steps named in PR-C+D+E scope ✓
virtualForRoute stays map[uint64]*routeSlot at sampler.go:219 ✓
RemoveRoute symmetric teardown with allLabelsWithLegacy() loop ✓
MaxTrackedRoutes Option A (divide by slotsPerRoute) ✓
AllLabels [5]Label array + type Label string typed constants ✓
Proto field 4 reuse confirmed at proto/admin.proto:93 ✓
pivotKeyVizColumns at keyviz_handler.go:312 confirmed map[uint64]*KeyVizRow (item (a) target) ✓
mergeRowInto at keyviz_fanout.go:520 confirmed constructs KeyVizRow without Label (item (d) target) ✓
matrixToProto at admin_grpc.go:603 confirmed map[uint64]*pb.KeyVizRow (item (e1) target) ✓

The two minor items can be addressed in the PR-B doc commit. Design is ready to merge as PR-A.

Codex round-27 P1 #2 on `e7b0c132` (still applicable to `44d7cd45`): PR-B row claimed RegisterRoute is unconditionally legacy-slot-only (round-26 fix), but later in the same row said "MaxTrackedRoutes coarsening divides by slotsPerRoute (= len(AllLabels)+1 when the flag is on, 1 when off)". Implementer setting --keyvizLabelsEnabled=true on PR-B binary would compute divisor=6 with only 1 slot per route, expanding tracked-route capacity 6× beyond MaxTrackedRoutes. Fixed by aligning the §7 PR-B row's divisor description with round-26's unconditional model: - "MaxTrackedRoutes coarsening check in PR-B uses slotsPerRoute as the literal 1 — same unconditionality reason as the RegisterRoute body above; the flag-gated conditional lands in PR-C+D+E, not PR-B" - "PR-B with slotsPerRoute=1 keeps the cap counting routes (1 slot per route × N routes ≤ MaxTrackedRoutes), behavior identical to today regardless of any flag value" Also tightened the secondary "flag-default-false" reference in the L0 chain failure-mode discussion: the legacy-only behavior comes from the unconditional code structure, not from the flag default. Updated to "With PR-B unconditionally pre-creating only the legacy slot (regardless of flag, per round-26 P2)". Codex round-27 P1 #1 (PR-B labeled-siblings claim in §9 Q2) was already addressed in round-27 commit 44d7cd4.

bootjp · 2026-04-29T01:45:52Z

@claude review

Round-28 follow-up (c788a96c) addressing Codex P1 #2 (caught on e7b0c132, still applicable to 44d7cd45):

Codex P1 Update dependency go to v1.21.4 #2 / PR-B divisor inconsistency — the §7 PR-B row claimed RegisterRoute is unconditionally legacy-slot-only (round-26 fix) but later said "MaxTrackedRoutes coarsening divides by slotsPerRoute (= len(AllLabels)+1 when the flag is on, 1 when off)". An implementer setting --keyvizLabelsEnabled=true on a PR-B binary would compute divisor=6 with only 1 slot per route, expanding tracked-route capacity 6× beyond MaxTrackedRoutes. Fixed: aligned the §7 PR-B row's divisor description with round-26's unconditional model — "MaxTrackedRoutes coarsening check in PR-B uses slotsPerRoute as the literal 1; the flag-gated conditional lands in PR-C+D+E, not PR-B; PR-B with slotsPerRoute = 1 keeps the cap counting routes regardless of any flag value". Also tightened a related "flag-default-false" phrase in the L0-chain failure mode discussion.

Codex P1 #1 (the §9 Q2 PR sequencing note) was already addressed in round-27 commit 44d7cd45.

claude · 2026-04-29T01:46:04Z

Claude finished @bootjp's task in 3m 8s —— View job

Review (Round 28)

Gather context
Read updated design document (c788a96)
Check for remaining gaps after Round-27 fixes
Post comprehensive review

Both Round-27 items confirmed resolved in current HEAD:

Codex P1 Update dependency go to v1.21.4 #2 / PR-B divisor inconsistency: §7 PR-B row now says "The MaxTrackedRoutes coarsening check (keyviz/sampler.go:416) in PR-B uses slotsPerRoute as the literal 1 — same unconditionality reason as the RegisterRoute body above (Codex round-26 + round-27 P1: the flag-gated slotsPerRoute := 1; if KeyVizLabelsEnabled { slotsPerRoute = len(AllLabels)+1 } lands in PR-C+D+E, not PR-B). PR-B with slotsPerRoute = 1 keeps the cap counting routes ... behavior identical to today regardless of any flag value." ✓
L0 description tightened: "With PR-B unconditionally pre-creating only the legacy slot (regardless of flag, per round-26 P2), slot.Label is "" so the emitted row is ""-labeled (identical to today's behavior), but the type-widening + L0 wiring is in place for PR-C+D+E..." — correctly removes the earlier "flag-default-false" framing in favor of the unconditional model. ✓

Both changes are consistent with the round-26/27 unconditional-model fixes:

§4.1.1 PR-B body code block: slotsPerRoute := 1 // legacy slot only; unconditional in PR-B ✓
§9 Q2 sequencing note (round-27): "PR-B remains behavior-neutral unconditionally, not via a flag default or an empty slice" ✓
§7 closing paragraph: "PR-B ... unconditionally pre-creates only the legacy slot regardless of --keyvizLabelsEnabled" ✓

Also verified against current keyviz/sampler.go source: RegisterRoute at line 416 currently uses len(next.slots) < s.opts.MaxTrackedRoutes with no divisor — the PR-B target is to add slotsPerRoute := 1 (literal) and divide; PR-C+D+E then replaces the literal with the flag-gated conditional. The pseudocode in §4.1.1 matches this exactly. ✓

Final state

No new issues found. All items from rounds 1–28 are resolved. The design is ready to merge as PR-A.

Complete source verification:

virtualForRoute stays map[uint64]*routeSlot at sampler.go:219 ✓
RegisterRoute coarsening check at sampler.go:416 (len(next.slots) < MaxTrackedRoutes) — PR-B adds literal slotsPerRoute := 1; PR-C+D+E replaces with flag-gated conditional ✓
RemoveRoute at sampler.go:532 deletes by uint64 key — compile error after PR-B; design prescribes allLabelsWithLegacy() loop ✓
appendDrainedRow at sampler.go:837 constructs MatrixRow without Label — design prescribes L0 copy in PR-B ✓
pivotKeyVizColumns at keyviz_handler.go:312 — map[uint64]*KeyVizRow; item (a) target ✓
newKeyVizRowFrom at keyviz_handler.go:368 — no Label; item (b) target ✓
bucketIDFor at keyviz_handler.go:383 — no label suffix; item (c) target ✓
mergeRowInto at keyviz_fanout.go:520 — no Label; item (d) target ✓
matrixToProto at admin_grpc.go:603 — map[uint64]*pb.KeyVizRow; items (e1)–(e3) target ✓
proto/admin.proto:93 — string label = 4; already declared, unused; field-4 reuse confirmed ✓
keyviz/labels.go does not yet exist — correctly scoped to PR-B ✓
AllLabels [5]Label with typed Label constants and empty-in-PR-B / populated-from-PR-B transition — well-described in §9 Q2 ✓
Phase-gated §8 lens-5 tests (flag-off and flag-on counterparts) ✓

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Merge branch 'main' into docs/keyviz-adapter-labels

9e1dcf1

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

bootjp requested a review from Copilot April 29, 2026 01:00

Copilot started reviewing on behalf of bootjp April 29, 2026 01:01 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

bootjp merged commit e6004f3 into main Apr 29, 2026
9 checks passed

bootjp deleted the docs/keyviz-adapter-labels branch April 29, 2026 06:59

	// uncommented — no schema migration, no field-number bump.
	// undocumented — no schema migration, no field-number bump.

-slot only — same as today's baseline). The pre-allocation cost is
-gated on the flag, so a cluster that deploys the PR-C+D+E binary
-but leaves the flag off pays no memory penalty (Codex round-15-
+slot only). The pre-allocation cost is gated on the flag, so a
+cluster that deploys the PR-C+D+E binary but leaves the flag off
+avoids the **multiplicative** `len(AllLabels)` slot-count increase,
+though there is still a small baseline per-slot overhead versus
+today from the wider `{uint64, Label}` map key (Codex round-15-

		behavior (legacy slots only, AllLabels empty, MaxTrackedRoutes
		divisor is 1). The bundled PR-C+D+E then ships everything

Conversation

bootjp commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Recommended approach (§4.1)

Implementation plan (§7)

What this PR does NOT do

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

bootjp commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Critical — slot lifecycle for labeled slots is underspecified (§4.1, §7 PR-B, §8 lens 2)

Critical — pivotKeyVizColumns in keyviz_handler.go is keyed by routeID alone

Critical — routeSlot has no Label field; Flush/appendDrainedRow can't emit MatrixRow.Label

Moderate — MaxTrackedRoutes silent behavior change at PR-C rollout

Moderate — fan-out merge key gap between PR-D and PR-E

Minor — virtual bucket + label interaction is unspecified

Minor — reclaimRetiredSlot needs a (routeID, label) reclaim key

Minor — Q1 label separator ambiguity

What looks good

Uh oh!

bootjp commented Apr 27, 2026

Uh oh!

claude Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review (Round 2)

Moderate — §7 PR-C: "refuse to start" is wrong semantics for a dynamic catalog

Moderate — §4.1.1: labeled slot creation for coarsened routes is unspecified

Minor — §8 lens-5: two test cases missing from the plan

Minor — §4.1.1: "restart the sampler" note is slightly misleading

What looks good

Uh oh!

bootjp commented Apr 27, 2026

Uh oh!

claude Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review (Round 3)

Moderate — virtualForRoute key type is unspecified after the slot-key widening

Moderate — KeyVizRow wire form after PR-D is unspecified in §5

Minor — AllLabels (or equivalent) export is unnamed in the design

Minor — PR-B scope: coordinator call sites omitted

Minor — slog.Warn at coarsening may flood logs on startup

What looks good

Uh oh!

bootjp commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review (Round 4)

Moderate — RemoveRoute symmetric teardown is unspecified

Moderate — mergeRowInto does not copy Label; merged rows lose the field

Minor — RegisterRoute idempotency check will not compile after key widening

Minor — Pseudocode comment "labelled buckets fall through here" is misleading

What looks good

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

bootjp commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

claude Bot commented Apr 27, 2026 •

edited

Loading

Critical — `pivotKeyVizColumns` in `keyviz_handler.go` is keyed by `routeID` alone

Critical — `routeSlot` has no `Label` field; `Flush`/`appendDrainedRow` can't emit `MatrixRow.Label`

Moderate — `MaxTrackedRoutes` silent behavior change at PR-C rollout

Minor — `reclaimRetiredSlot` needs a `(routeID, label)` reclaim key

claude Bot commented Apr 27, 2026 •

edited

Loading

claude Bot commented Apr 27, 2026 •

edited

Loading

Moderate — `virtualForRoute` key type is unspecified after the slot-key widening

Moderate — `KeyVizRow` wire form after PR-D is unspecified in §5

Minor — `AllLabels` (or equivalent) export is unnamed in the design

Minor — `slog.Warn` at coarsening may flood logs on startup

claude Bot commented Apr 28, 2026 •

edited

Loading

Moderate — `RemoveRoute` symmetric teardown is unspecified

Moderate — `mergeRowInto` does not copy `Label`; merged rows lose the field

Minor — `RegisterRoute` idempotency check will not compile after key widening

claude Bot commented Apr 28, 2026 •

edited

Loading

Moderate — `newKeyVizRowFrom` also constructs `KeyVizRow` from scratch but is not called out in §7 PR-D+E

claude Bot commented Apr 28, 2026 •

edited

Loading

Moderate — `newKeyVizRowFrom` and `bucketIDFor` are both missing from the PR-D+E scope

Moderate — `AllLabels` population timing is unspecified; slot-count jump happens at wrong PR

Moderate — `MaxTrackedRoutes` coarsening check counts slots after key widening, not routes

claude Bot commented Apr 28, 2026 •

edited

Loading

Moderate — `newKeyVizRowFrom` missing `Label` copy still absent from PR-D+E scope (carry-over from Round 5)

Moderate — `RemoveRoute` teardown uses `allLabelsWithLegacy()` in the "PR-B fix" but `AllLabels` doesn't exist until PR-C

claude Bot commented Apr 28, 2026 •

edited

Loading