Skip to content

feat(sqs): HT-FIFO partitioned-keyspace constructors (Phase 3.D PR 3)#703

Merged
bootjp merged 2 commits intomainfrom
feat/sqs-htfifo-keyspace
Apr 29, 2026
Merged

feat(sqs): HT-FIFO partitioned-keyspace constructors (Phase 3.D PR 3)#703
bootjp merged 2 commits intomainfrom
feat/sqs-htfifo-keyspace

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 28, 2026

Summary

Phase 3.D PR 3 of the split-queue FIFO rollout (per §11 of docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md). Adds the partitioned-keyspace constructors without touching the legacy keyspace or any existing call site. The §11 PR 2 dormancy gate still rejects PartitionCount > 1 at CreateQueue, so these helpers are dead code in production until PR 5 atomically lifts the gate and wires the data-plane fanout.

Keyspace shape

legacy:      !sqs|msg|<family>|<queue>|<gen>|<rest>
partitioned: !sqs|msg|<family>|p|<queue>|<partition>|<gen>|<rest>

The p| discriminator after the family prefix is safe by construction: validateQueueName forbids | in queue names, and the queue segment is base32-raw-URL encoded (cannot start with the literal byte 'p' followed by '|'). The partition is a 4-byte big-endian uint32 so a prefix scan !sqs|msg|<family>|p|<queue>|<partition>| picks exactly one partition's keys.

What's added

  • Constants: sqsPartitionedDiscriminator, SqsPartitionedMsg{Data,Vis,Dedup,Group,ByAge}Prefix.
  • Constructors: sqsPartitionedMsg{Data,Vis,Dedup,Group,ByAge}Key, sqsPartitionedMsgVisPrefixForQueue, sqsPartitionedMsgByAgePrefixForQueueAllPartitions.
  • Reaper helper: sqsMsgByAgePrefixesForQueue returns the {legacy, partitioned} pair so the reaper enumerates both keyspaces during cleanup.
  • Parser: parseSqsPartitionedMsgByAgeKey with the matching record type sqsPartitionedMsgByAgeRecord (legacy parseSqsMsgByAgeKey is unchanged).
  • Helper: appendU32 for the 4-byte partition segment.

What's NOT added

  • No existing call site changes — every current sqsMsgDataKey / sqsMsgVisKey / etc. invocation keeps using the legacy constructor, so existing queues stay byte-identical on disk.
  • No dispatch wrapper. PR 5 will add the if/else dispatch at each call site once partitionFor is wired into the send/receive paths (per the §3.1 design choice of "dispatch at the call site, no variadic, no silent argument loss").

Test plan

adapter/sqs_keys_test.go covers:

  • Byte-distinct legacy vs partitioned across all 5 families (data, vis, dedup, group, byage).
  • Per-partition isolation: a partition-k vis prefix never matches a partition-(k+1) key.
  • Determinism: same inputs yield byte-identical keys across calls.
  • Round-trip parse for partitioned byage keys at corner partition values (0, 7, 31).
  • Mutual rejection: legacy parser rejects partitioned keys and vice versa (the dual-parse contract).
  • Reaper enumeration helper returns both prefixes in {legacy, partitioned} order.
  • Discriminator constants all end with p|.
  • Different queue names produce different keys at the same (partition, gen).
  • go test -race ./adapter/... clean.
  • golangci-lint ./adapter/... clean.

Self-review (per CLAUDE.md "Self-review of code changes")

  1. Data loss — No FSM, Pebble, or retention path is touched. Legacy constructors are unchanged on every byte and call site, so existing queues are byte-identical on disk. The new partitioned helpers are dead code until PR 5 lifts the dormancy gate, so no production data lands under the partitioned prefix in this PR. No issue.

  2. Concurrency / distributed failures — No new goroutine, lock, or Raft path. The constructors are pure byte-builders. The reaper helper returns a fresh slice per call. No issue.

  3. Performance — New constructors are one allocation each, sized like the legacy ones. The reaper enumeration adds at most one extra Range iteration per queue (over the partitioned prefix, which is empty for legacy queues). Hot path (Send/Receive) is untouched. No issue.

  4. Data consistency — The p| discriminator + base32 queue segment combination is verified non-overlapping. Legacy and partitioned keyspaces cannot collide. Mutual-rejection tests assert each parser refuses the other's keyspace. No issue.

  5. Test coverage — 7 new test functions covering byte layout, per-partition isolation, determinism, round-trip parse, mutual rejection, reaper enumeration, and discriminator-constant invariants. PR 7 lands the HT-FIFO Jepsen suite per §11.

Adds the partitioned-keyspace constructors per §3.1 of the split-queue FIFO design without touching the legacy keyspace or any existing call site. The §11 PR 2 dormancy gate still rejects PartitionCount > 1 at CreateQueue, so these helpers are dead code in production until PR 5 atomically lifts the gate and wires the data-plane fanout.

The partitioned keyspace inserts a 'p|' discriminator after the family prefix and a fixed-width big-endian uint32 partition between the queue segment and the generation:

  legacy:      !sqs|msg|<family>|<queue>|<gen>|<rest>
  partitioned: !sqs|msg|<family>|p|<queue>|<partition>|<gen>|<rest>

The discriminator is safe by construction: validateQueueName forbids '|' in queue names, and the queue segment is base32-raw-URL encoded (cannot start with the literal byte 'p' followed by '|').

New constants:
- sqsPartitionedDiscriminator = "p|"
- SqsPartitionedMsg{Data,Vis,Dedup,Group,ByAge}Prefix

New constructors (all unexported, mirroring the legacy family):
- sqsPartitionedMsgDataKey
- sqsPartitionedMsgVisKey
- sqsPartitionedMsgVisPrefixForQueue
- sqsPartitionedMsgDedupKey
- sqsPartitionedMsgGroupKey
- sqsPartitionedMsgByAgeKey
- sqsPartitionedMsgByAgePrefixForQueueAllPartitions
- parseSqsPartitionedMsgByAgeKey
- sqsMsgByAgePrefixesForQueue (returns {legacy, partitioned} pair for the reaper)
- appendU32 (helper for the 4-byte partition segment)

New record type sqsPartitionedMsgByAgeRecord mirrors sqsMsgByAgeRecord with Partition added.

Tests in adapter/sqs_keys_test.go cover:
- Byte-distinct legacy vs partitioned across all 5 families.
- Per-partition isolation: keys for partition k must not match the prefix of partition k+1's scan, otherwise ReceiveMessage fanout would double-count.
- Determinism: same inputs yield byte-identical keys across calls.
- Round-trip parse for partitioned byage keys at corner partition values (0, 31).
- Mutual rejection: parseSqsMsgByAgeKey rejects partitioned keys, parseSqsPartitionedMsgByAgeKey rejects legacy keys (the dual-parse contract).
- Reaper enumeration helper returns both prefixes in {legacy, partitioned} order.
- Discriminator constants all end with 'p|' (asserts none lost the trailing '|').

Design doc §3.1 example code updated to use the actual implementation names (sqsMsgDataKey + sqsPartitionedMsgDataKey) rather than the placeholder legacyMsgDataKey/partitionedMsgDataKey.

go test -race ./adapter/... pass; golangci-lint ./adapter/... clean. No call site changed; existing queues stay byte-identical on disk.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 45 minutes and 55 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b8dbedb5-c701-472d-a2f6-ddf33e6024f0

📥 Commits

Reviewing files that changed from the base of the PR and between 579d193 and be7784a.

📒 Files selected for processing (3)
  • adapter/sqs_keys.go
  • adapter/sqs_keys_test.go
  • docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sqs-htfifo-keyspace

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 45 minutes and 55 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a partitioned keyspace for HT-FIFO SQS queues, adding key constructors, prefix helpers, and unit tests. The review feedback identifies a critical issue where the lack of a delimiter after the variable-length queue name segment could cause prefix collisions between different queues. Additionally, the reviewer pointed out a discrepancy between the code comments and the actual encoding used, and suggested adding a test case to verify isolation between queues with overlapping names.

Comment thread adapter/sqs_keys.go
func sqsPartitionedMsgDataKey(queueName string, partition uint32, gen uint64, messageID string) []byte {
buf := make([]byte, 0, len(SqsPartitionedMsgDataPrefix)+sqsKeyCapLarge)
buf = append(buf, SqsPartitionedMsgDataPrefix...)
buf = append(buf, encodeSQSSegment(queueName)...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The partitioned key layout is missing a delimiter after the variable-length queue name segment. Without a delimiter (like '|'), a prefix scan for a queue name that is a prefix of another queue name (e.g., 'queue' and 'queue1') will incorrectly return records from both queues. This can lead to state inconsistencies. The design document (§3.1) explicitly mentions that segments should be terminated by a '|' to prevent such collisions.

This issue applies to all partitioned key constructors in this file (sqsPartitionedMsgDataKey, sqsPartitionedMsgVisKey, sqsPartitionedMsgDedupKey, sqsPartitionedMsgGroupKey, sqsPartitionedMsgByAgeKey) and their corresponding prefix helpers.

Suggested change
buf = append(buf, encodeSQSSegment(queueName)...)
buf = append(buf, encodeSQSSegment(queueName)...)
buf = append(buf, '|') // Delimiter to prevent prefix collisions between queue names.
References
  1. Avoid silently dropping entries or causing state inconsistencies during serialization or normalization.

Comment thread adapter/sqs_keys.go
func sqsPartitionedMsgVisPrefixForQueue(queueName string, partition uint32, gen uint64) []byte {
buf := make([]byte, 0, len(SqsPartitionedMsgVisPrefix)+sqsKeyCapSmall)
buf = append(buf, SqsPartitionedMsgVisPrefix...)
buf = append(buf, encodeSQSSegment(queueName)...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

As noted in the data key constructor, a delimiter is required after the queue name to ensure prefix scans are isolated to a single queue.

Suggested change
buf = append(buf, encodeSQSSegment(queueName)...)
buf = append(buf, encodeSQSSegment(queueName)...)
buf = append(buf, '|')

Comment thread adapter/sqs_keys.go
func sqsPartitionedMsgByAgePrefixForQueueAllPartitions(queueName string) []byte {
buf := make([]byte, 0, len(SqsPartitionedMsgByAgePrefix)+sqsKeyCapSmall)
buf = append(buf, SqsPartitionedMsgByAgePrefix...)
buf = append(buf, encodeSQSSegment(queueName)...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The prefix helper used by the reaper must also include the delimiter to avoid scanning records from queues with overlapping encoded names.

Suggested change
buf = append(buf, encodeSQSSegment(queueName)...)
buf = append(buf, encodeSQSSegment(queueName)...)
buf = append(buf, '|')

Comment thread adapter/sqs_keys.go Outdated
Comment on lines +63 to +64
// — the queue-name segment is base32-encoded and cannot start with
// the literal ASCII byte 'p' followed by '|'.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment mentions that the queue-name segment is base32-encoded, but the implementation of encodeSQSSegment (line 216) uses base64.RawURLEncoding. This discrepancy also appears in the design document. Please update the comment to reflect the actual encoding used.

Suggested change
// — the queue-name segment is base32-encoded and cannot start with
// the literal ASCII byte 'p' followed by '|'.
// — the queue-name segment is base64-encoded and cannot start with
// the literal ASCII byte 'p' followed by '|'.

Comment thread adapter/sqs_keys_test.go Outdated
Comment on lines +248 to +253
require.False(t, bytes.HasPrefix(legacy, partitionedPrefixOnly),
"a legacy key for any queue name must not start with the partitioned prefix; "+
"the trailing | in the discriminator is what makes this true — "+
"if the legacy key starts with `!sqs|msg|data|p|...`, the queue "+
"name's encoded segment would have to start with `p|` which "+
"base32-raw-URL never produces")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It would be beneficial to add a test case that specifically asserts that queue names with overlapping encoded prefixes do not collide in the partitioned keyspace. This would have caught the missing delimiter issue. Note: the encoding used is base64, not base32.

	require.False(t, bytes.HasPrefix(legacy, partitionedPrefixOnly),
		"a legacy key for any queue name must not start with the partitioned prefix; "+
			"the trailing | in the discriminator is what makes this true — "+
			"if the legacy key starts with '!sqs|msg|data|p|...', the queue "+
			"name's encoded segment would have to start with 'p|' which "+
			"base64-raw-URL never produces")

	// Assert that queues with overlapping encoded names are distinct and prefix-safe.
	// "queue" encodes to "cXVldWU", "queue1" encodes to "cXVldWUx".
	q1 := sqsPartitionedMsgDataKey("queue", 0, 1, "id")
	q2 := sqsPartitionedMsgDataKey("queue1", 0, 1, "id")
	require.NotEqual(t, q1, q2)
	require.False(t, bytes.HasPrefix(q2, q1), "partitioned key for 'queue' must not be a prefix of 'queue1'")

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 28, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude encountered an error —— View job


Code Review: feat(sqs): HT-FIFO partitioned-keyspace constructors (Phase 3.D PR 3)

  • Read changed files
  • Analyze key construction correctness
  • Verify encoding claims
  • Review test coverage
  • Check design doc changes

Summary

The overall structure — discriminator-based keyspace separation, named constructors, appendU32 for the partition segment, and the dual-parser reaper contract — is solid. The discriminator safety argument (p| can't collide with a base64-encoded queue name because | is not in the base64 alphabet) is correct. However, there is one critical correctness bug and several documentation discrepancies that should be resolved before this lands.


Critical: missing | delimiter after queue name segment

Affected: sqsPartitionedMsgDataKey, sqsPartitionedMsgVisKey, sqsPartitionedMsgDedupKey, sqsPartitionedMsgGroupKey, sqsPartitionedMsgByAgeKey, sqsPartitionedMsgVisPrefixForQueue, and sqsPartitionedMsgByAgePrefixForQueueAllPartitions.

The design doc layout says:

partitioned: !sqs|msg|<family>|p|<queue>|<partition>|<gen>|<rest>

but the code produces:

prefix + base64(queueName) + uint32_be(partition) + uint64_be(gen) + rest

with no | after base64(queueName).

The consequence: the all-partitions prefix sqsPartitionedMsgByAgePrefixForQueueAllPartitions(q) returns prefix + base64(q). Since base64 is a variable-length encoding, base64("queue") = "cXVldWU" is a strict byte prefix of base64("queue1") = "cXVldWUx". Therefore:

bytes.HasPrefix(
    sqsPartitionedMsgByAgeKey("queue1", partition, gen, ts, msgID),
    sqsPartitionedMsgByAgePrefixForQueueAllPartitions("queue"),
) == true   // ← wrong: reaper for "queue" scans "queue1"'s entries

This silently causes the reaper to enumerate another queue's orphan records. It also breaks parseSqsPartitionedMsgByAgeKey: if called with queueName="queue", it will successfully pass the bytes.HasPrefix check on a "queue1" key and then misparse the wrong bytes as (partition, gen, ts, msgID).

The sqsPartitionedMsgVisPrefixForQueue function (which includes the partition uint32) has partial protection because for small partition values (< ~754M) the first uint32 byte is 0x00, which is outside the base64 alphabet and disambiguates. But this is coincidental and fragile — for any queue with PartitionCount in the millions or for an adversarial queue name pair, it breaks. Relying on binary bytes to act as implicit terminators is an implicit contract that is not documented, not tested for the general case, and not true for all partition values.

Fix: add '|' after the encoded queue name everywhere, matching the stated layout:

// In every constructor and prefix helper:
buf = append(buf, encodeSQSSegment(queueName)...)
buf = append(buf, '|')   // ← terminate the variable-length segment
buf = appendU32(buf, partition)

Note: the legacy constructors in sqs_messages.go (e.g. sqsMsgByAgePrefixAllGenerations) carry the same latent bug — base64("queue") is also a prefix of base64("queue1") in the legacy byage scan. That is pre-existing and outside this PR's scope, but introducing the same design again into the new keyspace without documenting the known risk is a step backward.

Fix this →


Medium: "base32" vs "base64" in comments and design doc

encodeSQSSegment uses base64.RawURLEncoding (line 216), but three locations claim "base32":

Location Text
sqs_keys.go:63 "the queue-name segment is base32-encoded"
sqs_keys_test.go:18 "a base32-encoded queue segment"
sqs_keys_test.go:243–248 "base32-raw-URL encoding", "base32-raw-URL never produces"

The safety claim ("cannot start with the literal ASCII byte 'p' followed by '|'") is correct for base64, but the reasoning via "base32" is wrong. base32 and base64 have different alphabets — a base32-safe argument would need separate verification. Fix all three to say "base64-raw-URL".


Medium: design doc layout vs implementation

docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md shows the key shape with | separators:

partitioned: !sqs|msg|<family>|p|<queue>|<partition>|<gen>|<rest>

and inline code comments in sqs_keys.go:57–58 repeat the same layout. If the | delimiter fix above is applied, the code will match the doc. If not, the doc should be updated to reflect the actual on-disk format so future implementors don't rely on separators that aren't there.


Low: missing prefix-isolation test between queue names

TestSqsPartitionedMsgKeys_PartitionsAreDistinct verifies that two queue names produce different complete keys (require.NotEqual). But the dangerous failure mode is a prefix scan for queue A returning queue B's entries, not equal keys. The test that would have caught the delimiter bug:

q1prefix := sqsPartitionedMsgByAgePrefixForQueueAllPartitions("queue")
q1key := sqsPartitionedMsgByAgeKey("queue1", 0, 1, 0, "id")
require.False(t, bytes.HasPrefix(q1key, q1prefix),
    "prefix scan for 'queue' must not match 'queue1' keys")

Minor: stringer helper

stringer (test line 267) is a hand-rolled uint32 → string converter. strconv.FormatUint(uint64(v), 10) is one import and handles the full uint32 range without branching. The current implementation is correct for partition values but slightly surprising to a reader who expects strconv.


Positives

  • The p| discriminator safety argument is sound: since | (0x7C) is not in the base64 alphabet, it can never appear in a base64-encoded queue name, so the partitioned prefix is always unambiguous relative to a legacy queue name.
  • appendU32 implementation is correct.
  • Named-constructor dispatch pattern (no variadic) is the right call.
  • parseSqsPartitionedMsgByAgeKey correctly bounds-checks tsRaw the same way the legacy parser does.
  • The dual-parse mutual-rejection tests are well-designed.
  • Reaper enumeration helper returning both prefixes is future-proof.

The partitioned key shape from PR 3 omitted the segment terminator
between the variable-length encoded queue name and the fixed-width
partition uint32. Because base64.RawURLEncoding is variable-length,
base64("queue") = "cXVldWU" is a strict byte prefix of base64("queue1")
= "cXVldWUx", so a reaper-side prefix scan for queue "queue" would
also surface queue "queue1"'s entries (and the parser would misdecode
those bytes as the wrong (partition, gen, ts, msgID)).

Add a '|' byte after the encoded queue segment in every partitioned
constructor and prefix helper. '|' is outside the base64-raw-URL
alphabet (A-Z, a-z, 0-9, -, _) so the terminator can never collide
with the segment it terminates, and validateQueueName already rejects
'|' in raw queue names.

Other review fixes folded in:
- Comment fixes: encoder is base64.RawURLEncoding, not base32 — three
  comment locations corrected (sqs_keys.go header, two test docs).
- Replace hand-rolled stringer with strconv.FormatUint(uint64(v), 10).
- Add TestSqsPartitionedMsgKeys_QueueNamePrefixIsolation that asserts
  the bug above does not regress, using ("queue", "queue1") as the
  motivating example pair.
- Update §3.1 design-doc paragraph to call out the per-segment '|'
  terminator and to fix the same base32 → base64 wording.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 29, 2026

@claude review

Round 2 — addresses the gemini-code-assist review:

  • Critical (delimiter bug): every partitioned constructor and prefix helper now appends a '|' byte after the encoded queue segment, before the fixed-width partition uint32. base64-raw-URL never produces '|', so the terminator cannot collide with the segment. Added TestSqsPartitionedMsgKeys_QueueNamePrefixIsolation using ("queue", "queue1") — the motivating example pair from the review — to lock in the regression.
  • Medium (base32 → base64): three comment locations corrected (sqs_keys.go header, two test docs).
  • Minor (stringer): replaced hand-rolled helper with strconv.FormatUint(uint64(v), 10).
  • Design doc §3.1: updated to spell out the per-segment '|' terminator rule and the same base32 → base64 fix.

Also note: gemini's review flagged the legacy non-partitioned byage prefix (sqsMsgByAgePrefixAllGenerations) as having the same latent bug. That is genuinely a pre-existing issue but out of scope for this PR — I will open a separate follow-up for it after PR 4-A lands.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 29, 2026

Claude finished @bootjp's task in 2m 14s —— View job


Code Review: Round 2 — HT-FIFO Partitioned Keyspace Constructors

  • Read changed files (adapter/sqs_keys.go, adapter/sqs_keys_test.go, design doc)
  • Verify delimiter fix applied correctly in all constructors/prefix helpers
  • Verify base32 → base64 comment corrections
  • Verify stringer → strconv.FormatUint replacement
  • Verify design doc §3.1 update
  • Check for any remaining issues
  • Post review

All four issues from Round 1 have been correctly resolved. This is a clean pass.

Critical: delimiter bug — fixed ✅

Every partitioned constructor and both prefix helpers now append sqsPartitionedQueueTerminator ('|') after the encoded queue segment, before the fixed-width partition uint32:

buf = append(buf, encodeSQSSegment(queueName)...)
buf = append(buf, sqsPartitionedQueueTerminator)   // ← present in all 5 constructors
buf = appendU32(buf, partition)

Verified: sqsPartitionedMsgDataKey, sqsPartitionedMsgVisKey, sqsPartitionedMsgDedupKey, sqsPartitionedMsgGroupKey, sqsPartitionedMsgByAgeKey, sqsPartitionedMsgVisPrefixForQueue, and sqsPartitionedMsgByAgePrefixForQueueAllPartitions — all correct.

The new constant is well-documented:

// sqsPartitionedQueueTerminator is appended after the encoded queue
// name in every partitioned key. … base64.RawURLEncoding never emits '|'
// (it uses A-Z, a-z, 0-9, '-', '_').
const sqsPartitionedQueueTerminator = '|'

Medium: base32 → base64 — fixed ✅

All three locations corrected:

  • sqs_keys.go header comment: now says "base64-raw-URL-encoded" and "base64.RawURLEncoding never emits |"
  • sqsPartitionedQueueTerminator constant comment: also correct
  • Design doc §3.1: "The on-disk queue segment is base64.RawURLEncoding(<queue>) (alphabet A-Za-z0-9-_), and | is outside that alphabet…"

The test file (sqs_keys_test.go) doc-comments also correctly reference base64, not base32.

Minor: stringer → strconv — fixed ✅

// Before: hand-rolled helper
// After:
t.Run(tc.queue+"/p"+strconv.FormatUint(uint64(tc.partition), 10), ...)

Design doc §3.1 — fixed ✅

Now explicitly documents the per-constructor | terminator requirement:

Each partitioned constructor terminates the variable-length encoded queue segment with a | byte before the fixed-width partition uint32 — without that terminator, a prefix scan for queue q would also match queue q1 because base64("q") is a strict byte prefix of base64("q1").

Also updates the pseudocode examples throughout to use the actual function names (sqsPartitionedMsgDataKey etc.) instead of the placeholder partitionedMsgDataKey names from the earlier draft.

New test: TestSqsPartitionedMsgKeys_QueueNamePrefixIsolation

The regression test directly catches the motivating bug case from Round 1:

// Confirms base64("queue") is a strict byte prefix of base64("queue1")
require.True(t, bytes.HasPrefix([]byte(encodeSQSSegment(longQ)), []byte(encodeSQSSegment(shortQ))), ...)

// Then verifies the '|' terminator prevents leakage in all families
require.False(t, bytes.HasPrefix(longVisKey, shortVisPrefix), ...)
require.False(t, bytes.HasPrefix(longByageKey, shortByagePrefix), ...)
require.False(t, bytes.HasPrefix(longDataKey, shortDataKey), ...)
require.False(t, bytes.HasPrefix(longGroupKey, shortGroupKey), ...)

This is exactly the right test — it confirms the precondition (encoded prefix collision exists), then proves the terminator eliminates it.

One observation (informational, not blocking)

The design doc's conceptual notation |<partition>|<gen>|<rest> implies | separates all segments, but the actual binary format emits no | between the raw uint32 partition and uint64 gen fields — those are self-delimiting by their fixed width. This same notational shorthand exists for the legacy keys in the doc. Not a bug, and since the parser correctly uses fixed-width slicing (rest[:4], rest[4:12], rest[12:20]) rather than |-splitting, there's no ambiguity.

Summary

Round 1's critical delimiter bug is fixed, the encoding comments are corrected, the helper is idiomatic, the design doc is updated, and a dedicated regression test locks in the fix. The parseSqsPartitionedMsgByAgeKey parser is correct (bounds-checks, fixed-width slicing, tsRaw > 1<<63-1 guard consistent with the legacy parser). The reaper helper returning both prefixes is future-proof.

LGTM. Ready to merge.

bootjp added a commit that referenced this pull request Apr 29, 2026
… PR 4-A)

Adds the operator-config foundation for HT-FIFO routing per §5 of the split-queue FIFO design. Stacks on top of PR 3 (#703), which lands the partitioned-keyspace constructors.

The grammar matches the design: 'queue.fifo:N=group_0,...,group_{N-1};...' with multiple queue entries separated by ';' (commas are reserved for the per-queue group list). PartitionCount must be a power of two ≤ 32 — the validator rejects anything else at config-parse time so a typo cannot land a half-shaped routing map at runtime.

Two-stage validation:
- parseSQSFifoPartitionMap: grammar + per-entry consistency (PartitionCount agrees with len(Groups), partition count is a power of two within the cap, no duplicate queues).
- validateSQSFifoPartitionMap: runs after --raftGroups is parsed and asserts every group named in the partition map exists in --raftGroups; rejects with a queue/partition-pointer error message so the operator can fix typos without re-counting.

Plumbed through parseRuntimeConfig into runtimeConfig.sqsFifoPartitionMap. Not yet consumed by routing — PR 4-B will wire it through ShardedCoordinator and add the htfifo capability advertisement once the routing layer lands. The §11 PR 2 dormancy gate still rejects PartitionCount > 1 on CreateQueue, so populating this flag has no effect on production traffic until PR 5 lifts the gate atomically with the data-plane fanout.

parseSQSFifoPartitionMapEntry was extracted into parseSQSFifoPartitionCount + parseSQSFifoGroupList helpers to stay under the cyclop=10 ceiling once the four-rejection validation surface landed (zero / over-cap / non-power-of-2 / count-mismatch). parseRuntimeConfig grew over the cyclop ceiling too, so the partition-map block was extracted into buildSQSFifoPartitionMap.

Tests in shard_config_test.go cover:
- Empty input yields empty map.
- Single-queue happy path.
- Multiple queues separated by ';'.
- Whitespace trimming.
- PartitionCount > 0 enforcement.
- PartitionCount must be a power of two.
- PartitionCount within the per-queue cap.
- count and group-list length must agree.
- Malformed entry (missing '=', empty queue, empty group, trailing comma).
- Duplicate queue rejection.
- validateSQSFifoPartitionMap: all-groups-present passes; missing group fails with queue + partition-index pointer in the error message.

go test -race ./... pass; golangci-lint ./... clean. No behaviour change in production paths.
@bootjp bootjp merged commit c153659 into main Apr 29, 2026
9 checks passed
@bootjp bootjp deleted the feat/sqs-htfifo-keyspace branch April 29, 2026 01:04
bootjp added a commit that referenced this pull request Apr 29, 2026
… PR 4-A) (#704)

## Summary

Phase 3.D PR 4-A of the split-queue FIFO rollout. Operator-config
foundation: parses `--sqsFifoPartitionMap`, validates against
`--raftGroups`, plumbs through `parseRuntimeConfig` into
`runtimeConfig.sqsFifoPartitionMap`. Stacks on top of #703 (PR 3,
partitioned-keyspace constructors).

## What's added

- `--sqsFifoPartitionMap` flag with grammar
`queue.fifo:N=group_0,...,group_{N-1};...`
- `parseSQSFifoPartitionMap` + `parseSQSFifoPartitionMapEntry` (grammar
+ per-entry consistency)
- `validateSQSFifoPartitionMap` (cross-check against `--raftGroups`)
- `sqsFifoQueueRouting` struct (PartitionCount + Groups)
- `runtimeConfig.sqsFifoPartitionMap` field

## What's NOT added (deferred to PR 4-B)

- Routing layer wiring through `ShardedCoordinator`
- `htfifo` capability advertisement on `/sqs_health`
- `kv/lease_state.go` leadership-refusal hook for non-htfifo binaries
- Catalog polling for partitioned-queue discovery at startup

These are coupled by the design's "binary advertises htfifo only when
both routing AND leadership-refusal are in place" rule, so they belong
in one PR.

## Validation coverage

- PartitionCount > 0
- PartitionCount ≤ 32 (per-queue cap)
- PartitionCount must be a power of two
- `len(Groups) == PartitionCount`
- No duplicate queues across entries
- Every named group exists in `--raftGroups`
- Empty queue / empty group / trailing-comma / missing-`=` all rejected
with the offending entry quoted in the error

## Test plan

- [x] `TestParseSQSFifoPartitionMap` covers grammar, whitespace,
multiple queues, all rejection paths.
- [x] `TestValidateSQSFifoPartitionMap` covers the missing-group case
with queue + partition-index pointer.
- [x] `parseRuntimeConfig` integration: `main_bootstrap_e2e_test.go`
updated for the new parameter.
- [x] `go test -race ./adapter/... .` pass.
- [x] `golangci-lint ./...` clean.

## Self-review (per CLAUDE.md)

1. **Data loss** — config-only change. No FSM, Pebble, or retention path
is touched. No issue.
2. **Concurrency / distributed failures** — single-shot parse at
startup, no goroutines or shared state. No issue.
3. **Performance** — startup-time validation only. No hot-path effect.
No issue.
4. **Data consistency** — partition count and group list must agree at
parse time, so the runtime cannot see a half-shaped routing map. The
flag has no effect on production traffic until PR 4-B + PR 5 land. No
issue.
5. **Test coverage** — 11 sub-tests in `TestParseSQSFifoPartitionMap`
and 3 in `TestValidateSQSFifoPartitionMap`, covering every documented
rejection branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant