docs(design): propose per-adapter logical backup format#705
docs(design): propose per-adapter logical backup format#705
Conversation
Per-adapter directory hierarchy (dynamodb/<table>/items/<pk>.json,
s3/<bucket>/<original/path>, redis/db_<n>/<type>/<key>.{bin,json,jsonl},
sqs/<queue>/messages.jsonl) so a backup is recoverable without elastickv —
cat / jq / find / aws s3 sync / redis-cli are sufficient. Strips internal
magic prefixes and wide-column key encodings; pins read_ts via the lease-
read path and the active timestamp tracker so the dump is consistent and
the compactor cannot retire its versions mid-dump.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 15 minutes and 24 seconds.Comment |
|
@claude review |
|
Claude finished @bootjp's task in 2m 47s —— View job Code Review:
|
There was a problem hiding this comment.
Code Review
This pull request introduces a design for a logical backup format that decomposes snapshots into adapter-specific, human-readable structures to ensure vendor-independent recovery. The proposal outlines a directory-based layout for DynamoDB, S3, Redis, and SQS, utilizing a cluster-wide read timestamp for point-in-time consistency. Review feedback highlighted several improvement opportunities: mitigating the operational risk of the 30-minute backup TTL with a heartbeat mechanism, preventing S3 metadata filename collisions by using a dedicated directory, and improving scalability and parsing consistency for Redis TTLs and stream metadata by adopting JSONL and sidecar files.
| lease reads already use; no new code path through MVCC is introduced. | ||
| 3. `Admin.EndBackup` releases the tracker registration. A | ||
| tool-side crash leaves the tracker entry behind for at most | ||
| `backup_ts_ttl` (default 30 minutes, configurable via the same admin |
There was a problem hiding this comment.
The 30-minute default TTL for the read_ts registration in the active timestamp tracker represents a significant operational risk for large-scale backups. If the dump takes longer than 30 minutes, the compactor could retire MVCC versions while the backup is still scanning, causing the backup to fail. Per repository guidelines, design documents identifying such risks must detail mitigation strategies, such as implementing a heartbeat mechanism from the producer to the tracker to keep the registration alive.
| `backup_ts_ttl` (default 30 minutes, configurable via the same admin | |
| backup_ts_ttl (default 30 minutes, heartbeated by the producer or configurable via the same admin |
References
- When a design document identifies a significant operational risk, it must also detail potential mitigation strategies.
| │ └── 04/ | ||
| │ └── 29/ | ||
| │ ├── img.jpg | ||
| │ ├── img.jpg.metadata.json |
There was a problem hiding this comment.
Using a fixed .metadata.json suffix for sidecar files can lead to collisions with user-defined S3 object keys that happen to end with the same string. This could result in user data being overwritten or the restore tool misinterpreting metadata as user content. Using a dedicated directory for metadata is a safer approach.
| │ ├── img.jpg.metadata.json | |
| │ ├── _metadata/img.jpg.json |
| ├── strings/ | ||
| │ ├── session%3Aabc123.bin # SET session:abc123 … | ||
| │ └── counter%3Aglobal.bin # SET counter:global "42" | ||
| ├── strings_ttl.json # { "session%3Aabc123": 1735689600000, ... } |
There was a problem hiding this comment.
Storing all string TTLs in a single JSON file per database scales poorly for large datasets. A single JSON object requires loading the entire index into memory to parse or update. Using a .jsonl format would allow for streaming and better scalability, adhering to the decomposed nature of the backup.
| ├── strings_ttl.json # { "session%3Aabc123": 1735689600000, ... } | |
| ├── strings_ttl.jsonl # {"format_version": 1, "key": "session%3Aabc123", "expire_at_ms": 1735689600000} |
| followed (optionally) by a trailing meta line: | ||
| ``` | ||
| {"_meta": true, "length": 2, "last_ms": 1714400000001, "last_seq": 0} | ||
| ``` | ||
| The meta line lets a restore tool seed `XADD` IDs without re-deriving | ||
| them from the entries. |
There was a problem hiding this comment.
Including a metadata line with a different schema at the end of a .jsonl file makes the file heterogeneous and harder to process with standard line-oriented tools like jq. Moving this to a separate _stream.json file would be more consistent with the patterns used for other adapters.
| followed (optionally) by a trailing meta line: | |
| ``` | |
| {"_meta": true, "length": 2, "last_ms": 1714400000001, "last_seq": 0} | |
| ``` | |
| The meta line lets a restore tool seed `XADD` IDs without re-deriving | |
| them from the entries. | |
| followed (optionally) by a sidecar _stream.json containing the stream's | |
| length and last ID. This lets a restore tool seed XADD IDs without | |
| re-deriving them from the entries. |
Critical: - Replace nonexistent kv.ShardedCoordinator.ScanRange with the actual ShardStore.ScanAt (kv/shard_store.go:106) plus a thin BackupScanner iterator wrapper for paged scans. - Specify ActiveTimestampTracker.PinWithDeadline plus a per-second sweeper goroutine. The existing Pin has no TTL, so a crashed producer would block compaction forever without this. Important: - Add hll/ to the Redis layout tree. - Replace .metadata.json sidecar with .elastickv-meta.json reserved suffix; document collision handling and rename-collisions flag. - Spell out cross-shard wait: BeginBackup polls each group applied_index until it covers read_ts (5s default deadline) before any scan starts. The 50ms skew alone was a liveness assertion, not an enforced wait. - Specify CHECKSUMS as sha256sum(1)-compatible (verifiable with sha256sum -c CHECKSUMS from the dump root). - Add ListAdaptersAndScopes RPC alongside BeginBackup/EndBackup with ttl_ms / pin_token. Minor: - Redis TTL: default skip-expired on restore; preserve-ttl opt-in. - aws s3 sync example needs exclude pattern for sidecars. - DynamoDB example clarified: composite-key vs hash-only layouts. - SQS dedup-window reset moved into trade-offs as an explicit callout. - Redis stream merge mode: refuse non-empty target; auto-id strategy falls back to XADD * with an _id_remap.jsonl mapping log. Tests added: TestPinWithDeadlineExpiry, TestBeginBackupWaitsForLaggingShard, TestBackupScannerPaging, TestS3SidecarSuffixCollision, TestRedisTTLExpiredKeySkippedByDefault, TestRedisStreamMergeRejectsNonEmpty.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5719128131
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 5. **Renew on long dumps**: the producer calls `BeginBackup` again with | ||
| the same `read_ts` (carrying the existing token) every `ttl_ms / 3` | ||
| to extend the deadline. A multi-hour dump never relies on a single |
There was a problem hiding this comment.
Add a true pin-renewal RPC contract
Step 5 requires renewing an existing backup pin by calling BeginBackup with the same read_ts and token, but the proposed BeginBackupRequest only contains ttl_ms and cannot carry either value. In a long-running dump, the producer therefore has no API path to extend the original pin and will either create a new read_ts (breaking point-in-time guarantees) or let the original pin expire. The request/response schema needs explicit renewal semantics (e.g., token + read_ts on request, or a dedicated RenewBackup RPC).
Useful? React with 👍 / 👎.
| also losslessly recoverable from the encoded filename alone — `KEYMAP` is a | ||
| convenience, not a correctness dependency. |
There was a problem hiding this comment.
Correct filename reversibility claim for long keys
The document states key translation is recoverable from encoded filenames alone, but the long-segment rule replaces keys over 240 bytes with a hash prefix plus truncated original bytes. That transformation is not bijective from path text alone, so restores or external tools cannot reconstruct verbatim keys without side metadata (_long_keys.jsonl/KEYMAP). This should be specified as a required dependency for long keys, or the encoding scheme should be made fully reversible.
Useful? React with 👍 / 👎.
|
@claude review |
|
Claude finished @bootjp's task in 3m 13s —— View job Code Review:
|
HIGH: - TTL renewal made explicit. New RenewBackup RPC; producer renews every ttl_ms/3 and aborts the dump on renewal failure (continuing past TTL would silently corrupt the artifact). backup_max_ttl_ms caps a single window; multi-hour dumps work via repeated renewals. MEDIUM: - DynamoDB scalability. Default stays per-item (the user's stated requirement: PK名.json で1レコード1ファイル). Added opt-in --dynamodb-bundle-mode jsonl for tables where inode count is the binding constraint; structured warning emitted when an unbundled scope exceeds 1M items. - S3 file-vs-directory collisions. Producer detects keys like "path/to" and "path/to/obj" coexisting and renames the shorter to "path/to.elastickv-leaf-data" with KEYMAP.jsonl + manifest record. - KEYMAP scaled by switching to KEYMAP.jsonl (line-streamable). - Redis strings_ttl.json -> strings_ttl.jsonl with one TTL record per line so TTL count is unbounded. - max_active_backup_pins (default 4) caps concurrent BeginBackup registrations; tracker returns ErrTooManyActiveBackups, RPC surfaces ResourceExhausted. Prevents a misbehaving caller from holding the compactor open across the entire MVCC retention horizon. Manifest fields added: s3_collision_strategy, dynamodb_layout, max_active_backup_pins. CLI flags added: --ttl-ms, --scan-page-size, --dynamodb-bundle-mode/--dynamodb-bundle-size, --rename-collisions. Tests added: TestS3PathFileVsDirectoryCollision, TestBeginBackupTooManyActiveBackups, TestRenewBackupExtendsDeadline.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bbd1bb8b8b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - DynamoDB: `aws dynamodb create-table --cli-input-json _schema.json` and | ||
| then `aws dynamodb put-item --item @items/<pk>/<sk>.json` per file. |
There was a problem hiding this comment.
Use valid CreateTable request JSON for DynamoDB replay
The documented replay step aws dynamodb create-table --cli-input-json _schema.json will not work with the _schema.json shape defined earlier in this doc (format_version, table_name, primary_key, etc.). AWS CLI create-table expects request fields like TableName, KeySchema, and AttributeDefinitions, and --cli-input-json must follow the operation input skeleton format, so users following this runbook will fail before any item restore.
Useful? React with 👍 / 👎.
| every sidecar as if it were a user object. | ||
| - Redis: a 100-line shell script over `find redis/db_0/strings -name '*.bin' -exec redis-cli -x SET …`, | ||
| with similar one-liners per type. | ||
| - SQS: `jq -c . messages.jsonl | xargs -n1 aws sqs send-message --message-body …`. |
There was a problem hiding this comment.
Include FIFO-required fields in SQS replay command
The suggested SQS replay one-liner only passes --message-body, but this design explicitly includes FIFO queues (orders-fifo.fifo) and AWS send-message fails for FIFO queues without MessageGroupId (and may also require MessageDeduplicationId when content-based dedup is off). As written, the advertised external restore path cannot replay FIFO dumps and also drops message attributes/metadata needed for faithful recovery.
Useful? React with 👍 / 👎.
Round-2 carry-overs: - Move NewBackupScanner from *ShardedCoordinator to *ShardStore. ShardedCoordinator (kv/sharded_coordinator.go:124) has no *ShardStore field; ShardStore is the natural home for ScanAt-adjacent code. - Add a Phase-1 note that ScanAt's doc comment must be updated to reflect that BeginBackup's applied_index fence makes cross-shard ScanAt globally consistent. - Add pin_token to ListAdaptersAndScopesRequest so the scope scan is bounded to the pinned read_ts; concurrently-created scopes are not surfaced. - Add hll/ to the Redis db_0 example tree (top-level layout already had it; the per-adapter example lagged). - Add --stream-merge-strategy / --preserve-ttl / --preserve-visibility to the elastickv-restore apply CLI block. New nits from the Gemini changes: - Pick one form for DynamoDB bundle-mode flags. Two-flag form (--dynamodb-bundle-mode jsonl + --dynamodb-bundle-size 64MiB) throughout; drop the inline jsonl:size=64MiB form. - RenewBackupRequest.ttl_ms documents the same 60s-24h range constraint as BeginBackupRequest, bounded above by backup_max_ttl_ms; out-of-range returns InvalidArgument. Tests added: TestRenewBackupTTLRangeValidation, TestListAdaptersAndScopesAtPinTS.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e3b2936057
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 3. **Pin `read_ts`** with `PinWithDeadline(read_ts, now+ttl_ms)`; return | ||
| the resulting `pin_token` to the producer. |
There was a problem hiding this comment.
Pin read_ts before waiting for shard catch-up
This flow pins read_ts only after the shard catch-up wait, which leaves a race where concurrent writes plus compaction can retire versions at read_ts before the scan ever starts. In a busy cluster (or with tight retention), BeginBackup can succeed but the subsequent ScanAt(read_ts) no longer has a guaranteed full snapshot. The pin should be established before the wait (and released on failure) so the fenced timestamp is protected for the entire handshake.
Useful? React with 👍 / 👎.
| - **S3 object keys preserve their `/` separators** as filesystem path | ||
| separators. `s3://photos/2026/04/29/img.jpg` becomes | ||
| `s3/photos/2026/04/29/img.jpg`. This is the whole point — `aws s3 sync` | ||
| on the dump directory works. |
There was a problem hiding this comment.
Encode unsafe S3 path segments before writing to disk
Mapping S3 keys directly to filesystem paths with raw / separators is unsafe for keys containing period-only path segments (for example ./ or ../) and other special segments, because many tools normalize them and can change where data is written/read. That breaks the design goal of deterministic, reversible backups and can cause object collisions or path-escape behavior during dump/restore unless segment normalization is explicitly encoded/escaped.
Useful? React with 👍 / 👎.
| The format is designed to be extractable without elastickv: | ||
|
|
||
| - DynamoDB: `aws dynamodb create-table --cli-input-json _schema.json` and | ||
| then `aws dynamodb put-item --item @items/<pk>/<sk>.json` per file. |
There was a problem hiding this comment.
Use valid aws put-item CLI arguments in replay guidance
The DynamoDB replay command omits required put-item inputs and uses an unsupported file reference form, so operators following this runbook cannot actually restore items. AWS CLI put-item requires at least --table-name and --item, and file input is provided via file://...; without that, the external restore path fails even when item files are correct.
Useful? React with 👍 / 👎.
Important (verified against adapter/redis.go): - Stream TTLs were silently dropped. Streams can carry TTL via EXPIRE/PEXPIRE/PEXPIREAT, kept in !redis|ttl| by buildTTLElems (adapter/redis.go:2471). Added expire_at_ms to the _meta line so restore re-applies the original epoch. - HLL TTLs were silently dropped. HLL is reported as redisTypeString but stores TTL in the legacy scan index (adapter/redis.go:2319-2320 comment confirms it). Added hll_ttl.jsonl sidecar at the db_<n> root, same shape as strings_ttl.jsonl. Sidecar is omitted when no HLL key has a TTL. - Fixed XADD NOMKSTREAM semantic error in the skip-existing description. NOMKSTREAM only prevents stream creation; it does not skip entries by ID. Replaced with the correct mechanism: attempt XADD and treat the "ID equal or smaller" ERR as "already present, skip." Minor: - Added --begin-backup-deadline 5s to the elastickv-backup dump CLI block (the prose mentioned it but the flag list lagged). Tests added: TestRedisStreamTTLRoundTrip, TestRedisHLLTTLRoundTrip, TestRedisStreamSkipExistingHandlesERR.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 797d5711e5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - Bytes in `[A-Za-z0-9._-]` pass through. | ||
| - Every other byte is rendered as `%HH` (uppercase hex), like |
There was a problem hiding this comment.
Encode case to prevent filename collisions
The filename encoding keeps [A-Za-z0-9._-] unchanged, but this does not satisfy the stated requirement to handle case-colliding names on case-insensitive filesystems. Two distinct keys like UserA and usera will map to the same path on default Windows/macOS volumes, causing overwrite or dump failure and making the backup non-recoverable in those environments.
Useful? React with 👍 / 👎.
| → CHECKSUMS → MANIFEST.json | ||
| → EndBackup(pin_token) |
There was a problem hiding this comment.
Include MANIFEST.json in checksum coverage
The flow writes CHECKSUMS before MANIFEST.json, so sha256sum -c CHECKSUMS cannot detect tampering of the manifest even though restore behavior depends on manifest fields (layout, collision strategy, options). This undermines the document’s integrity claim that tampering is caught by checksum verification.
Useful? React with 👍 / 👎.
Important (correctness gap, verified against main.go): - BeginBackup pin was node-local. Each node owns its own ActiveTimestampTracker (main.go:294) and only its compactor consults it (main.go:335). A pin on the receiving node would not protect group leaders on other nodes; their compactors would be free to retire MVCC versions at read_ts mid-dump. Fix: pins propagate through each Raft group's log as BackupPin / BackupExtend / BackupRelease FSM commands. Every replica applies them to its local tracker, so the pin set is replicated and durable across leader changes (a new leader inherits the pin from the same log entry it applied). BeginBackup fans the propose out per group; if any group fails to commit within --begin-backup-deadline, BeginBackup proposes BackupRelease on groups that did commit and returns Unavailable so retry is clean. RenewBackup / EndBackup fan out the same way; releases are idempotent so partial state is tolerated. Minor: - Specified _id_remap.jsonl path: redis/db_<n>/streams/<key>._id_remap.jsonl (next to source) so tools can find it on disk. - TestRedisStreamSkipExistingHandlesERR description: "the dump" -> "the restore" (it's a restore-side test). - Added KEYMAP.jsonl to every per-adapter scope in the layout tree; omitted when empty. Tests added: TestBeginBackupPinFanOutAllNodes, TestBeginBackupPinSurvivesLeaderChange, TestBeginBackupGroupUnreachable.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c00d5604d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - **S3 object keys preserve their `/` separators** as filesystem path | ||
| separators. `s3://photos/2026/04/29/img.jpg` becomes | ||
| `s3/photos/2026/04/29/img.jpg`. This is the whole point — `aws s3 sync` | ||
| on the dump directory works. |
There was a problem hiding this comment.
Handle trailing-slash S3 keys in path mapping
This mapping rule makes the on-disk path non-representable for valid S3 objects whose key ends with / (for example, folder-marker objects): preserving separators and writing at the “natural path” means the target path is a directory, not a file, so the object body/sidecar cannot be emitted without renaming. In datasets that contain these keys, dumps will either fail or silently omit/collide objects, breaking restore completeness; the design needs an explicit escape/rename rule for trailing-slash keys.
Useful? React with 👍 / 👎.
|
|
||
| - DynamoDB: `aws dynamodb create-table --cli-input-json _schema.json` and | ||
| then `aws dynamodb put-item --item @items/<pk>/<sk>.json` per file. | ||
| - S3: `aws s3 sync --exclude '*.elastickv-meta.json' s3/<bucket>/ s3://target-bucket/`. |
There was a problem hiding this comment.
Exclude control files from documented S3 replay sync
The documented replay command excludes only *.elastickv-meta.json, so it will also upload backup control artifacts that live under each bucket scope (for example _bucket.json and KEYMAP.jsonl) as regular S3 objects. That contaminates the restored bucket with non-user objects and makes the “external restore” path non-faithful unless additional excludes (or a filtered object list) are documented.
Useful? React with 👍 / 👎.
Important (verified against kv/fsm.go):
- Phase 1 scope was missing the FSM plumbing for BackupPin /
BackupExtend / BackupRelease entries. Added explicit items:
- new byte tags 0x03/0x04/0x05 in kv/fsm.go alongside existing
raftEncodeHLCLease=0x02 (kv/fsm.go:116);
- new applyBackupPin/applyBackupExtend/applyBackupRelease
handlers, dispatched from kvFSM.Apply (kv/fsm.go:60) in the
same shape as applyHLCLease;
- new *ActiveTimestampTracker field on kvFSM (kv/fsm.go:26),
wired via NewKvFSMWithHLCAndTracker so the FSM and the
coordinator share one tracker (analogous to the *HLC sharing
invariant in CLAUDE.md);
- new kv/backup_codec.go with the 33/25/17-byte fixed binary
layouts for the three entry types.
- kvFSM.Restore (kv/fsm.go:264-287) wipes in-memory pins. A replica
that installs a Raft snapshot during the backup window loses its
BackupPin (snapshot only carries the Pebble store, not the
tracker). After the leader compacts the log past the BackupPin
entry, that entry is gone; a follower catching up via snapshot
never replays it.
Added explicit safe-bound invariant:
backup_duration + worst_case_compaction_lag < mvcc_retention_horizon
BeginBackup enforces a soft form via --snapshot-headroom-entries
(default 10000). Producer's per-scope expected_keys heuristic
catches mid-dump snapshot-induced shortfalls. Heavier mitigations
(snapshotting tracker state, sibling-pin queries) are explicitly
Phase 3, not Phase 1.
Minor:
- --begin-backup-deadline binding constraint clarified: it is the
applied_index poll for lagging shards, not the BackupPin commit
(which is sub-second on a healthy cluster).
- RenewBackup retries per-group BackupExtend up to 3 times with
500ms backoff before aborting; transient leader elections (<1s
under etcd/raft defaults) no longer abort otherwise-healthy dumps.
Tests added: TestBackupPinFSMCodecRoundTrip,
TestRestoreWipesLocalPins, TestBeginBackupRefusesNearSnapshotThreshold,
TestRenewBackupRetriesLeaderElection.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 020e0c4c7f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| every sidecar as if it were a user object. | ||
| - Redis: a 100-line shell script over `find redis/db_0/strings -name '*.bin' -exec redis-cli -x SET …`, | ||
| with similar one-liners per type. | ||
| - SQS: `jq -c . messages.jsonl | xargs -n1 aws sqs send-message --message-body …`. |
There was a problem hiding this comment.
Add required queue URL to SQS replay command
The documented replay one-liner cannot run as written because aws sqs send-message requires --queue-url in addition to --message-body; the AWS CLI synopsis/options mark --queue-url as required. Operators following this command will fail before any message replay begins, so the external-restore path is blocked for all queues.
Useful? React with 👍 / 👎.
| every sidecar as if it were a user object. | ||
| - Redis: a 100-line shell script over `find redis/db_0/strings -name '*.bin' -exec redis-cli -x SET …`, | ||
| with similar one-liners per type. | ||
| - SQS: `jq -c . messages.jsonl | xargs -n1 aws sqs send-message --message-body …`. |
There was a problem hiding this comment.
Avoid xargs parsing for JSONL SQS message replay
Using jq -c . messages.jsonl | xargs -n1 ... is unsafe for common message bodies containing quotes or backslashes (for example JSON payloads), because GNU xargs treats quotes/backslashes as special delimiters by default and can error with unmatched quotes or alter arguments. That means replay can fail or send mutated payloads for valid dumped messages, so this command is not a reliable restore path.
Useful? React with 👍 / 👎.
| sidecar. Operators who hold such keys pass `--rename-collisions` to have | ||
| them emitted as `<obj>.elastickv-meta.json.user-data` (the rename is | ||
| recorded in `KEYMAP`). |
There was a problem hiding this comment.
Make sidecar-collision rename mapping injective
The proposed rename for object keys ending in .elastickv-meta.json (<obj>.elastickv-meta.json.user-data) is not collision-safe: a real user object can already exist at that exact key, so both objects map to the same dump path and one is overwritten/omitted. The collision strategy needs a guaranteed-unique escape rule (or iterative disambiguation) to keep restores lossless for these key pairs.
Useful? React with 👍 / 👎.
Important (verified against internal/raftengine/engine.go and
internal/raftengine/etcd/engine.go):
- --snapshot-headroom-entries check was inverted. Replaced with the
correct formula:
entries_since_last_snapshot = AppliedIndex - LastSnapshotIndex
remaining_headroom = SnapshotEvery - entries_since_last_snapshot
refuse if: remaining_headroom < --snapshot-headroom-entries
(raftengine.Status has LastSnapshotIndex not FirstIndex, and
defaultSnapshotEvery = 10000 is the per-engine snapshot trigger.)
A freshly-snapshotted cluster has the largest remaining headroom
and is now correctly allowed; the check refuses backups that would
trigger the snapshot-installation corner case mid-dump. Default
--snapshot-headroom-entries lowered from 10000 to 1000 (one-tenth
of SnapshotEvery) since the formula now expresses what the prose
intended.
- expected_keys heuristic was undefined. Replaced with a concrete
expected-keys baseline mechanism:
* BeginBackup runs a count-only ScanAt at read_ts before the pin
installs, returns per-scope (key_count, applied_index) in
BeginBackupResponse.
* Producer compares actual count against baseline; fails with
ErrCompactionDuringDump if below 99% +/- sqrt(baseline).
* Adds keys_only bool parameter to MVCCStore.ScanAt /
ShardStore.ScanAt so the baseline pass skips value
materialization (Pebble iterator with payload trimmed at the
block layer).
Minor:
- _long_keys.jsonl renamed to KEYMAP.jsonl on line 192 to match the
rest of the spec.
- BackupPin fan-out is now explicitly concurrent (one goroutine per
group, blocked on all-or-deadline). 100-shard cluster does not pay
100x serial Raft round-trips.
Tests added: TestExpectedKeysBaselineToleratesTTLExpiry,
TestExpectedKeysBaselineCountOnlyScan. Existing tests
TestRestoreWipesLocalPins and TestBeginBackupRefusesNearSnapshotThreshold
updated to match the corrected formulas.
Important:
- BeginBackupResponse proto block was missing definitions. Added
ShardApplied {group_id, applied_index} and ScopeKeyCount
{scope_id, key_count, applied_index_at_count}; added
"repeated ScopeKeyCount expected_keys = 5" to BeginBackupResponse.
- Expected-keys baseline ordering bug. Documented as running "during
step 1" but step 1 is "pick read_ts" and step 2 is shard catch-up.
A lagging shard would return 0 (or partial) for ScanAt at read_ts
before applying through it, producing an undercount and silently
disabling ErrCompactionDuringDump on exactly the shards most at
risk of snapshot-induced pin loss. Moved to "after step 2 (every
shard has applied through read_ts) and before step 3 (pin)".
Minor:
- --snapshot-headroom-entries CLI default 10000 -> 1000 (matches the
one-tenth-of-SnapshotEvery default documented in the prose).
- ScanAt keys_only addition is a source-incompatible Go API change
with ~10 callers in shard_store.go alone. Replaced with a new
ScanKeysAt overload on MVCCStore / ShardStore (returns [][]byte
not []*KVPair); existing ScanAt callers untouched.
Tests adjusted: TestExpectedKeysBaselineCountOnlyScan now verifies
both the new path and the absence of a regression on existing
ScanAt callers; new TestExpectedKeysBaselineRunsAfterShardCatchup
guards against the ordering bug.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7ca1ae5169
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| │ ├── _bucket.json | ||
| │ ├── KEYMAP.jsonl # per-scope (omitted when empty) | ||
| │ ├── <object-key-path> # original object bytes |
There was a problem hiding this comment.
Prevent S3 object keys from colliding with control filenames
This layout puts user object payloads and backup control artifacts in the same bucket directory, so valid keys like _bucket.json or KEYMAP.jsonl map to the exact paths reserved for metadata files and will overwrite or block one another. In any bucket containing those keys, the dump cannot be lossless and restore behavior becomes ambiguous unless these control names are explicitly escaped/reserved for user keys.
Useful? React with 👍 / 👎.
| - **DynamoDB binary partition / sort keys** (the `B` attribute type) are | ||
| rendered as `b64.<base64url>` so a binary key never collides with a | ||
| string key whose hex encoding happens to look like base64. |
There was a problem hiding this comment.
Disambiguate DynamoDB
B and S filename encodings
Encoding binary key attributes as b64.<base64url> is not collision-safe with string keys, because string segments in the same character set pass through unchanged and can legitimately equal b64.<...>. A table containing both S and B keys that serialize to the same segment will collide on disk, causing overwrite/omission unless the encoding carries an unambiguous type discriminator for all key kinds.
Useful? React with 👍 / 👎.
| is renamed to `path/to.elastickv-leaf-data`, with the rename | ||
| recorded in `s3/<bucket>/KEYMAP.jsonl`. The sidecar follows the |
There was a problem hiding this comment.
Make leaf-data collision rename mapping injective
The file-vs-directory fallback renames path/to to path/to.elastickv-leaf-data, but that target can already exist as a real user object key. When a bucket has path/to, path/to/obj, and path/to.elastickv-leaf-data, two distinct objects map to one dump path, so one is lost unless the rename strategy guarantees uniqueness (for example, iterative escaping or hash-based disambiguation).
Useful? React with 👍 / 👎.
Important (verified against internal/raftengine/{etcd/,}engine.go and
kv/fsm.go):
- SnapshotEvery is not on the raftengine.Engine interface today.
Phase 1 adds StatusReader.SnapshotEvery() uint64 implemented on
the etcd backend by returning the snapshotEveryFromEnv() value
cached at Open. Also corrected the env-var name: there is no
--raftSnapshotEvery flag; the override is
ELASTICKV_RAFT_SNAPSHOT_COUNT (internal/raftengine/etcd/engine.go:93).
- Rolling-upgrade safety. An old kvFSM receiving a 0x03-tagged
BackupPin entry falls through decodeRaftRequests ->
decodeLegacyRaftRequest (kv/fsm.go:140-145) -> proto.Unmarshal ->
Apply error -> apply-loop stall. A single-release introduction
would corrupt apply on un-upgraded replicas the moment the first
BeginBackup runs.
Phase 1 ships in two releases:
Release N: kvFSM.Apply treats unknown tags in [0x03, 0x0F] as
no-ops with an unknown_fsm_tag warning (forward-compat handler,
no other backup machinery active).
Release N+1: BackupPin/Extend/Release handlers ship; BeginBackup
gates on every group member reporting version >= N via
Admin.GetRaftGroups, returning FailedPrecondition otherwise.
Operators may treat N and N+1 as the same calendar release if they
enforce a fleet-wide upgrade window first; the gate still
short-circuits.
Minor:
- ScopeKeyCount.scope_id format documented for all four adapters,
including the redis_databases uint32 -> "redis/db_<n>" mapping.
Tests added: TestForwardCompatUnknownFSMTag,
TestBeginBackupGatesOnMinVersion, TestSnapshotEveryReadsFromEngine.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5bda8a8b4b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| activated. `BeginBackup` itself **gates on every group member | ||
| reporting `min_backup_version ≥ N`** — verified through the | ||
| existing `Admin.GetRaftGroups` membership view, which already | ||
| carries per-node version metadata. If any member reports a | ||
| version older than N, `BeginBackup` returns |
There was a problem hiding this comment.
Gate backups on N+1-capable members
The rollout gate currently allows BeginBackup when members are only at version N, but in this design version N explicitly treats backup FSM tags (0x03–0x0F) as no-ops. That means a mixed cluster with N members will accept BackupPin proposals yet silently skip pin application on those replicas, so their compactors can still retire read_ts versions during an active backup. This breaks the safety guarantee for long scans and leader movement; the gate needs to require members that actually implement applyBackupPin/Extend/Release (i.e., N+1 or an explicit capability flag).
Useful? React with 👍 / 👎.
| 2. **Wait for shards to catch up**: every group is required to report | ||
| `applied_index ≥ commit_index_at_pin` for the default group's HLC | ||
| ceiling proposal that produced `read_ts`. `BeginBackup` polls each |
There was a problem hiding this comment.
Wait on per-group fence index, not default-group index
Step 2 requires every shard to satisfy applied_index ≥ commit_index_at_pin from the default group’s HLC-ceiling proposal, but Raft log indices are group-local and not comparable across groups. A low-traffic shard can remain healthy yet never reach another group’s index, causing false FailedPrecondition, while a high-index shard passing this check still does not prove it has crossed the required fence for read_ts. The wait condition must use a fence index derived per group (or another per-group read barrier).
Useful? React with 👍 / 👎.
Important (verified against proto/admin.proto): - The min_backup_version gate claimed Admin.GetRaftGroups already carries per-node version. It does not. RaftGroupState (proto/admin.proto:38-49) only has raft_group_id, leader_node_id, leader_term, commit_index, applied_index, last_contact_unix_ms. Phase 1 therefore adds string node_version = 7 to RaftGroupState, populated by each admin handler from the build-time version constant. Removed the false "already carries" claim. Minor: - MaxEntryBytes is not a constant in the codebase. Replaced with the real limit: MaxSizePerMsg (defaultMaxSizePerMsg = 1<<20, internal/raftengine/etcd/engine.go:55). - Forward-compat unknown-tag handler position specified: in kvFSM.Apply immediately after the raftEncodeHLCLease guard and before decodeRaftRequests, so all leading-byte dispatch stays in one place rather than splitting between Apply and the default case of decodeRaftRequests. Tags >= 0x10 still fall through and error, bounding the forward-compat door.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bf18e1c87
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| {"format_version": 1, "fields": {"name": "alice", "age": "30"}, "expire_at_ms": null} | ||
| ``` |
There was a problem hiding this comment.
Preserve Redis collection elements as binary-safe bytes
The Redis dump format for hashes/lists/sets/streams uses plain JSON strings/objects (for example fields in hashes/<key>.json), which is not a lossless representation for arbitrary Redis payload bytes. Redis collection members/fields/values are binary-safe, but JSON strings require valid Unicode text and JSON objects require textual property names, so keys containing non-UTF8 bytes (or other raw byte sequences) will be mutated or fail serialization during backup/restore. This breaks the document’s cross-version, byte-faithful restore goal for valid Redis datasets unless these elements are encoded with an explicit byte-safe scheme (e.g., base64-wrapped fields/values).
Useful? React with 👍 / 👎.
| {"member": "alice", "score": 100.0}, | ||
| {"member": "bob", "score": 85.5} |
There was a problem hiding this comment.
Represent ZSET scores as strings to cover ±inf values
The proposed zsets/<key>.json stores score as a JSON number, but Redis sorted sets allow +inf and -inf scores (the repo already exercises this in adapter/redis_lua_collection_fastpath_test.go), and those values are not valid JSON numbers. Any dataset containing infinite scores cannot be faithfully dumped in this schema, causing either serialization failure or lossy coercion. The score field should be encoded as a string (Redis wire form) with explicit handling for +inf/-inf.
Useful? React with 👍 / 👎.
Important (verified against adapter/admin_grpc.go): - Rolling-upgrade gate redesigned. GetRaftGroups iterates the local AdminServer's registered groups and reports each row from the local engine's Status() (adapter/admin_grpc.go:382-416), so adding node_version to RaftGroupState would only ever report the local admin server's version. Followers on other nodes would not surface, and the gate would falsely pass on a partially-upgraded cluster. Phase 1 instead adds a dedicated GetNodeVersion RPC fanned out by the BeginBackup handler to every peer in AdminServer.members. Each peer responds with its compiled-in version constant; the handler refuses BeginBackup if any peer is below N or unreachable within --begin-backup-deadline. RaftGroupState gains leader_node_version = 7 for UI visibility only (not used for the gate decision). - AdminServer has no *kv.ShardStore today (confirmed at adapter/admin_grpc.go:62-82). The "existing AdminServer -> ShardStore wiring used by GetRaftGroups" claim was wrong; GetRaftGroups only reaches AdminGroup.Status() and Configuration(). Phase 1 adds the field explicitly, wired in main.go. Minor: - AdminGroup interface (adapter/admin_grpc.go:41-44) does not embed StatusReader, so adding SnapshotEvery() to StatusReader does not reach BeginBackup through s.groups[id]. Phase 1 explicitly extends AdminGroup with SnapshotEvery() uint64; mocks in adapter/admin_grpc_test.go gain one method that returns defaultSnapshotEvery for parity with production. TestBeginBackupGatesOnMinVersion updated to match the new GetNodeVersion fan-out path; canonical 3-node fixture documented.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a0292c955f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| BackupPin : [tag:1][pin_id:16][read_ts:8][deadline_ms:8] = 33 bytes | ||
| BackupExtend : [tag:1][pin_id:16][deadline_ms:8] = 25 bytes |
There was a problem hiding this comment.
Replicate pin TTL as duration, not absolute wall-clock deadline
Encoding deadline_ms into BackupPin/BackupExtend makes expiry depend on each replica’s local wall clock, so even modest clock skew can expire pins early on fast nodes while a backup is still running. In that skew scenario, a replica can drop the pin before the next renewal and compact away read_ts versions, breaking snapshot consistency despite a successful BeginBackup. To keep pin lifetime consistent cluster-wide, propagate a lease duration (or HLC-derived deadline evaluated consistently) instead of a coordinator wall-clock timestamp.
Useful? React with 👍 / 👎.
Important (verified against adapter/admin_grpc.go): - BeginBackup gate fan-out target was AdminServer.members (the static bootstrap seed at line 64,89 — never updated after startup). Nodes added later via AddVoter would not appear, silently missing exactly the nodes most likely to be running an older version during a rolling upgrade. Fix: BeginBackup reuses the existing snapshotMembers helper (adapter/admin_grpc.go:158), the same path GetClusterOverview uses, which unions the seed with members discovered from each group's Configuration(). The fan-out target is the live cluster, not the startup snapshot. Minor: - GetRaftGroups today is a synchronous local-state read with no network calls. Adding leader_node_version populated via GetNodeVersion(leader) would turn it into a fan-out handler with unbounded latency tail. Specified async dial per leader with a 500ms per-leader timeout, "" on failure, and a 10s response cache so rapid polls do not fan out repeatedly. - GetNodeVersion failure mode now distinguishes "old version" from "unreachable" in the FailedPrecondition error string so operators can diagnose without out-of-band investigation. Also noted that --begin-backup-deadline is per-peer (fan-out is concurrent), so the worst-case wait is one deadline period regardless of cluster size. Tests added: TestBeginBackupGatesIncludesDynamicallyAddedNodes, TestGetRaftGroupsLeaderVersionAsync. TestBeginBackupGatesOnMinVersion updated to assert the new error shapes.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3efd68d2c7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - **S3 object keys preserve their `/` separators** as filesystem path | ||
| separators. `s3://photos/2026/04/29/img.jpg` becomes | ||
| `s3/photos/2026/04/29/img.jpg`. This is the whole point — `aws s3 sync` | ||
| on the dump directory works. |
There was a problem hiding this comment.
Handle S3 keys that exceed filesystem segment limits
The layout maps S3 keys directly to filesystem paths by preserving / separators, but it does not define any fallback for oversized S3 path segments. S3 allows keys up to 1024 UTF-8 bytes, so a valid key containing a single long segment (or very deep path) can exceed common filesystem limits (for example NAME_MAX/PATH_MAX) and make dump output non-representable, causing backup failure or silent omission for valid objects.
Useful? React with 👍 / 👎.
|
|
||
| | Risk | Mitigation | | ||
| |------|-----------| | ||
| | Dump duration exceeds MVCC retention horizon | `read_ts` registered with `active_timestamp_tracker`; compactor cannot retire its versions; producer exits with a typed error if registration is rejected | |
There was a problem hiding this comment.
Correct retention-horizon mitigation summary
This mitigation row states that pinning read_ts prevents compaction when a dump exceeds retention horizon, but the design earlier documents that snapshot installation can drop replica pin state (kvFSM.Restore path), which re-allows compaction and is exactly why the safe-bound invariant is required. Leaving this summary as-is is misleading because operators may infer long-running backups are protected when the document itself says they are only safe within bounded duration/headroom conditions.
Useful? React with 👍 / 👎.
Important (verified against adapter/admin_grpc.go and
kv/grpc_conn_cache.go):
- AdminServer has no outbound gRPC connection mechanism today.
GetNodeVersion fan-out (in BeginBackup) and the async
leader_node_version probes (in GetRaftGroups) both need to dial
peer admin servers. Without specifying the dialer, two
implementors would make incompatible choices about credentials
and connection lifecycle.
Phase 1 explicitly adds a *kv.GRPCConnCache field to AdminServer
using internal.GRPCDialOptions(), kept separate from
ShardStore.connCache because admin and data-plane addresses
occupy different address spaces (admin cache is keyed on
NodeIdentity.GRPCAddress; data-plane cache is keyed on Raft-
group-leader addresses, which migrate on leader change).
Minor:
- 10s leader_node_version cache is now an explicit AdminServer
struct field (versionCache map[string]struct{...}) listed in
Phase 1 scope rather than implicit prose.
Tests added: TestAdminGRPCConnCacheReuse (verifies cache reuse,
peer-shutdown eviction, independence from ShardStore.connCache).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6e62c7e740
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - Bytes in `[A-Za-z0-9._-]` pass through. | ||
| - Every other byte is rendered as `%HH` (uppercase hex), like |
There was a problem hiding this comment.
Escape dot-only segments in filename encoding
Allowing . and .. to pass through unchanged breaks path safety for layouts that use key segments as directories (for example DynamoDB items/<pk>/<sk>.json). A valid key segment of .. becomes items/../... and can escape the intended subtree after path normalization, while . collapses away and can alias other paths. This can overwrite sibling artifacts and makes backup/restore non-deterministic unless dot-only segments are explicitly escaped or rejected.
Useful? React with 👍 / 👎.
Important (verified against adapter/admin_grpc.go:455-505): - GetNodeVersion is itself an Admin method, gated by AdminTokenAuth on every auth-enabled cluster. The fan-out from BeginBackup through *kv.GRPCConnCache makes outbound calls with no metadata attached, so every dial would return Unauthenticated and the gate would misreport the failure as "node did not respond" on every cluster running --adminToken. Phase 1 explicitly propagates the inbound bearer token to outbound GetNodeVersion calls via metadata.NewOutgoingContext( context.Background(), inMD). The same pattern applies to the async GetNodeVersion(leader) probes in GetRaftGroups. Forwarding the inbound caller's token (not a server-stored value) preserves the operator's authorization decision and avoids privilege escalation by the admin server. Minor: - versionCache committed to sync.Map (with key/value types spelled out) so the implementor cannot accidentally produce a data race by reusing groupsMu for cache writes. The previous "(or sync.Map equivalent)" left enough room for a plain map + groupsMu approach that would either contend or race. Tests added: TestBeginBackupPropagatesAdminAuthToken, TestVersionCacheRaceUnderLoad (the latter runs under -race).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 486675d414
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ``` | ||
| - `streams/<key>.jsonl`: one entry per line, in stream order: | ||
| ``` | ||
| {"id": "1714400000000-0", "fields": {"event": "login", "user": "alice"}} |
There was a problem hiding this comment.
Preserve Redis stream field order and duplicates
The stream record schema models fields as a JSON object, but Redis stream entries are ordered field-value lists and can legally contain repeated field names. This proposal therefore cannot round-trip valid XADD payloads that rely on ordering or duplicate fields (the repo’s stream model uses an ordered slice for fields), so restore would silently alter message contents. Use an ordered representation (e.g., flat array or array of {field,value} pairs) instead of an object.
Useful? React with 👍 / 👎.
| {"member": "alice", "score": 100.0}, | ||
| {"member": "bob", "score": 85.5} |
There was a problem hiding this comment.
Encode Redis ZSET members with a byte-safe format
The ZSET dump shape stores member as a plain JSON string, which is not a byte-safe encoding for arbitrary Redis member bytes. For valid datasets containing non-UTF8 or control-byte members, serialization/replay will mutate or reject data, breaking the document’s lossless backup goal. Define an explicit byte-safe encoding for ZSET members (similar to other binary-safe paths) to keep restores faithful.
Useful? React with 👍 / 👎.
Per-adapter directory hierarchy (dynamodb/
/items/.json, s3//<original/path>, redis/db_//.{bin,json,jsonl}, sqs//messages.jsonl) so a backup is recoverable without elastickv — cat / jq / find / aws s3 sync / redis-cli are sufficient. Strips internal magic prefixes and wide-column key encodings; pins read_ts via the lease- read path and the active timestamp tracker so the dump is consistent and the compactor cannot retire its versions mid-dump.