fix(security): block S3 bucket takeover via unsafe example defaults#139
Conversation
Design Notes (preserved from PR preparation)Why a blocklist?Removing the examples from the spec fixes future copy-paste. But the bucket name The blocklist is a runtime safety net for existing deployments: even if someone upgrades the proxy binary but doesn't touch their config, the proxy refuses to start instead of silently routing data to an attacker-controlled bucket. The bucket name is permanently compromised — it can never be safe again, regardless of who controls it now. The code must reject it forever. This is the same pattern used by credential scanners that block known-leaked API keys even after rotation. Breaking changeDeployments that relied on an empty Test plan
ContextThis fix addresses a reported S3 bucket takeover vulnerability. A security researcher registered the |
Scope expansion — adding LFS download integrity fixPer review feedback, consolidating follow-up fixes into this PR rather than opening a separate one. Additional changes being added:LFS download integrity verification — closes the tampering vector that complements the exfiltration fix already in this PR. Current behavior (still vulnerable after just the blocklist fix):
Planned fix:
Trust model being enforced:
The envelope lives in Kafka (attacker can't modify it). Downloaded bytes from S3 are verified against the Kafka-held checksum on every read. Breaking change (intentional):Default Full specDetailed API behavior, acceptance criteria, and out-of-scope items are in the forthcoming commit. The broader platform-wide integrity audit (broker segment CRC, RecordBatch CRC32C, S3 checksums) is tracked separately in Pushing commits shortly. |
|
The patch does not guarantee “truncated response on mismatch”, verification has to happen before sending the response, or the protocol needs an explicit post-body failure signal. |
|
Hi Alex, thanks for the comment. I will work on it and update the PR. |
|
@kamir - are you still working on that? |
|
Yes, I am on it. |
A security researcher registered the `kafscale-lfs` S3 bucket on AWS — the same name used as an example throughout the OpenAPI specs and docs. Any deployment using this default would silently route LFS uploads to an attacker-controlled bucket, enabling data exfiltration and content tampering. Three changes: 1. Replace all `kafscale-lfs` example values in both OpenAPI specs (cmd/proxy/ and api/lfs-proxy/) with safe placeholder `my-bucket`. 2. Add runtime blocklist in initLFSModule(): the proxy now refuses to start if the configured S3 bucket matches any known-unsafe example name. This protects existing deployments that upgrade the binary but don't update their config. 3. Require KAFSCALE_LFS_PROXY_S3_BUCKET and KAFSCALE_LFS_PROXY_S3_REGION at startup when LFS is enabled. Previously empty values were silently accepted, causing downstream failures or unsafe fallback behavior. Reported-by: Milan Katwal <milankatwal2056@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3235759 to
e2931e2
Compare
…on mismatch The LFS envelope records a SHA-256 checksum at upload time, but the download handler never verified it. A consumer receiving tampered content from a compromised S3 bucket had no way to detect the modification. This closes the tampering vector that complements the exfiltration fix already in this PR. Trust model enforced: Kafka is the authority. S3 is untrusted storage. The envelope lives in Kafka (attacker cannot modify it). Downloaded bytes from S3 are verified against the Kafka-held checksum on every read. Response framing: chunked transfer encoding + integrity trailer + truncate on mismatch. Detailed below because the first iteration of this commit (reviewed on PR KafScale#139 by @novatechflow) had a critical framing bug — it set Content-Length up-front and verified after io.Copy, so same-size tampering was undetectable from the client's perspective. This commit replaces that attempt with a framing-correct design. Behavior changes: * **Default mode is now `stream`** (was `presign`). Secure-by-default. * **Stream mode hashes bytes on the fly** via io.TeeReader with SHA-256 and compares against the client-supplied envelope checksum. * **Stream-mode responses use chunked transfer encoding** — Content-Length is deliberately NOT set. On verification failure the proxy aborts the connection BEFORE Go's chunkWriter emits the terminating "0\r\n\r\n", producing a truncated chunked stream. Per RFC 7230 §4.1, HTTP/1.1 clients MUST treat this as a transport error. This is the framing-level signal that makes "verification failed" visible to the client's HTTP parser, not just to server logs. * **Trailer "X-Kafscale-LFS-Verify"** is declared up-front and set to "pass" only on a successful match. On mismatch the trailer is never emitted. This is defense in depth alongside the chunked truncation; the missing terminator is the primary signal because trailer support varies among HTTP clients. * **HTTP/2 path** has no Hijacker. panic(http.ErrAbortHandler) sends RST_STREAM, an explicit framing failure in HTTP/2 — sufficient signal on that transport. * **Presign mode is disabled by default**, opt-in via KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. When enabled, proxy logs a warning banner at startup and every 5 minutes. Response body includes an `integrity` block (sha256, checksum_alg, size) so client SDKs can verify downloaded bytes themselves. * **Integrity claim is required** on all download requests. Missing integrity.sha256 → 400 missing_integrity. * **Advisory response headers** on stream mode: X-Kafscale-LFS-Checksum and X-Kafscale-LFS-Content-Length. Set before body flush so advanced clients can hash independently and fail-fast without waiting for the trailer or connection close. * **Ops-tracker event** download_integrity_failed emits to __lfs_ops_state with expected/actual SHA-256, bytes read, and mode — gives operators a Kafka-native alerting signal for tampering attempts. Breaking change (intentional): clients that did not send an integrity block must add one. Clients that relied on the default mode being presign must either request presign explicitly (and have the operator opt in) or switch to stream. OpenAPI specs (cmd/proxy/openapi.yaml and api/lfs-proxy/openapi.yaml) updated with the new schema, IntegrityClaim component, presign_disabled error code, and safe example values (no real bucket names). Tests: * TestHandleHTTPDownload_StreamVerifyMatch — happy path (recorder) * TestHandleHTTPDownload_StreamVerifyMismatch — handler panics * TestHandleHTTPDownload_PresignMode — requires integrity, echoes it back * TestHandleHTTPDownload_PresignDisabledByDefault — 400 when not opted in * TestHandleHTTPDownload_MissingIntegrity — 400 when claim absent * TestHandleHTTPDownload_DefaultModeIsStream — confirms new default * TestStreamVerify_HonestObject_ChunkedAndTrailerPass — on-wire test: asserts Content-Length absent, Transfer-Encoding=chunked, body intact, trailer X-Kafscale-LFS-Verify=pass present. * TestStreamVerify_SameSizeTamper_ClientSeesTransportError — regression test for the framing bug. S3 object replaced with same-length tampered content; asserts io.ReadAll returns transport error AND trailer is not "pass". Observed: readErr=unexpected EOF, trailer="". * TestStreamVerify_NoContentLengthHeader_OnResponse — guard against regression if a future change reintroduces Content-Length. All cmd/proxy and pkg/lfs test suites pass; go vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e2931e2 to
c4b230f
Compare
|
@novatechflow — you were right, the previous commit didn't actually fix the integrity vector. Replaced What was wrong: Fix (
Tests (wire-level, via
All 26 Out of scope (separate PR): per-chunk checksums in the envelope would let us reject before any body bytes are sent — requires envelope schema v2 + producer cooperation. |
|
Self-review of c4b230f found the chunked-truncation approach has architectural holes (bytes > ~4 KB still reach the client before the abort; the headline test was passing because 256-byte payloads never flush past Go's internal chunk buffer; HTTP intermediaries swallow the truncation signal; trailer-based signal doesn't reach Python requests / JS fetch / curl --output). Replacing with buffer-then-verify-then-stream: read S3 fully into a temp file while hashing, only emit 200 + Content-Length after the hash matches, return 502 with structured error on mismatch. Pushing shortly. /cc @novatechflow — please hold review until the next push lands. |
…en-stream
The LFS envelope records a SHA-256 checksum at upload time, but the download
handler never verified it. A consumer receiving tampered content from a
compromised S3 bucket had no way to detect the modification. This closes the
tampering vector that complements the exfiltration fix already in this PR.
Trust model enforced:
Kafka is the authority. S3 is untrusted storage.
The envelope lives in Kafka (attacker cannot modify it). Downloaded bytes
from S3 are verified against the Kafka-held checksum BEFORE any bytes are
sent to the client.
Design: buffer-then-verify-then-stream. Two earlier iterations on this PR
attempted to verify after streaming and signal failure via response framing
(Content-Length truncation, then chunked-encoding truncation + trailer).
Both were unsound for production traffic:
* Setting Content-Length and verifying after io.Copy: the client receives
a complete, syntactically valid 200 OK + N bytes before the abort. Same-
size tampering goes undetected. Flagged by @novatechflow on PR KafScale#139.
* Chunked transfer encoding with trailer-based signal: for any payload
larger than Go's internal chunk-writer buffer (~4 KB), tampered bytes
are already on the wire before the abort; the client's io.ReadAll
returns the bytes plus an unexpected-EOF error. HTTP intermediaries
(nginx-ingress, ALB, CDNs) buffer responses and erase the truncation
signal. Trailer-based signalling is invisible to most HTTP client
libraries (Python requests, JavaScript fetch, curl --output file).
Adversarial review confirmed neither framing trick provides the security
property advertised. The only architecturally honest approach is to verify
before sending — buffer the S3 object to temporary disk while hashing, and
only begin streaming to the client after the hash matches the envelope.
Behavior:
* Default mode is now `stream` (was `presign`). Secure-by-default.
* Stream mode: handler reads S3 into os.CreateTemp(...), computes SHA-256
via io.MultiWriter, compares against the client-supplied envelope
checksum. On match: seeks the temp file to 0, sets Content-Length and
X-Kafscale-LFS-Checksum, writes 200 OK, streams the temp file to the
client. On mismatch: returns 502 with structured JSON error code
"integrity_failure". No tampered bytes ever reach the client.
* S3 read is bounded by io.LimitReader(body, expectedSize+1). A compromised
bucket cannot exhaust proxy disk by returning a payload larger than the
envelope declares. If the read produces > expectedSize bytes, the handler
returns 502 integrity_failure immediately (before hashing).
* Integrity.Size is required on stream-mode requests (in addition to
Integrity.SHA256). Missing size → 400 missing_integrity_size. This is
what makes the LimitReader cap enforceable.
* Presign mode is disabled by default, opt-in via
KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. When enabled, proxy logs a
warning banner at startup and every 5 minutes. Response body includes
an `integrity` block (sha256, checksum_alg, size) so client SDKs can
verify downloaded bytes themselves.
* Integrity.SHA256 is required on all download requests (stream and
presign). Missing claim → 400 missing_integrity.
* Temporary file is created via os.CreateTemp (respects TMPDIR), and
removed in a defer regardless of code path. Production deployments
should size the pod tmpfs / writable layer for the largest expected
LFS object served concurrently.
* Ops-tracker event download_integrity_failed emits to __lfs_ops_state
with expected/actual SHA-256, bytes read, and mode — gives operators a
Kafka-native alerting signal for tampering attempts.
Breaking changes (intentional):
* Stream-mode clients MUST send Integrity.SHA256 AND Integrity.Size.
* Default mode changed from presign to stream.
* Presign mode requires KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true.
OpenAPI specs (cmd/proxy/openapi.yaml and api/lfs-proxy/openapi.yaml)
updated with the IntegrityClaim component, presign_disabled error code,
missing_integrity_size error code, integrity_failure 502 response, and
safe example values (no real bucket names).
Tests (cmd/proxy/lfs_http_test.go):
Recorder-based (handler-level):
* TestHandleHTTPDownload_StreamVerifyMatch — verified response is
200 OK with Content-Length set and correct body
* TestHandleHTTPDownload_StreamVerifyMismatch_ReturnsErrorJSON —
mismatch → 502 integrity_failure, tampered bytes NOT in response
* TestHandleHTTPDownload_StreamRequiresIntegritySize — 400
missing_integrity_size when Size omitted
* TestHandleHTTPDownload_DefaultModeIsStream — confirms default
routing reaches stream-verify path (not presign)
* TestHandleHTTPDownload_PresignMode — presign requires integrity,
echoes it back in response
* TestHandleHTTPDownload_PresignDisabledByDefault — 400 when not
opted in
* TestHandleHTTPDownload_MissingIntegrity — 400 when claim absent
Wire-level (httptest.NewServer + http.Client):
* TestStreamVerify_HonestObject_AtMultipleSizes — verifies the
success path at 256 B, 4 KiB, 64 KiB, 1 MiB. Asserts 200 OK,
Content-Length matches payload length, body delivered intact.
* TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse — the
headline regression test for @novatechflow's report. Tampered S3
object of the same length as the envelope claims, at 256 B / 4 KiB
/ 64 KiB / 1 MiB. Asserts response is 502 integrity_failure and
tampered bytes do NOT appear anywhere in the response. Earlier
iterations of this fix would have leaked the full 1 MiB tampered
payload to the client here.
* TestStreamVerify_OversizedS3Response_RefusesToServe — S3 returns
10x more bytes than envelope declares. Asserts 502 and that
attacker bytes do not appear in response.
* TestStreamVerify_EmptyObjectSubstitution_RefusesToServe — S3
replaced with 0-byte object. Asserts deterministic 502 (earlier
designs were racey on this case).
* TestStreamVerify_ContentLengthSetOnlyAfterVerification — guard
against regression that would set Content-Length to S3 size on a
non-verified path.
All cmd/proxy and pkg/lfs test suites pass; go vet clean.
Operational notes:
* Trade-off: download latency increases by the S3-read time before the
client sees the first byte. Acceptable for LFS-class workloads
(integrity-critical, not latency-critical).
* Temp file disk use is bounded per request by Integrity.Size. Operators
should ensure the pod's writable layer or TMPDIR-backed volume has
capacity for the largest concurrent download workload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c4b230f to
64c8348
Compare
|
Replaced How
|
| Your point | How the fix addresses it | Test |
|---|---|---|
| "verification has to happen before sending the response" | SHA verified against temp file BEFORE w.WriteHeader(200) |
TestStreamVerify_HonestObject_AtMultipleSizes (success path), TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse (mismatch path) |
| "same size tampered → complete 200 OK" | Mismatch returns 502 with JSON {code: "integrity_failure"}; tampered bytes never touch the response body |
TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse at 256 B, 4 KiB, 64 KiB, 1 MiB — asserts bytes.Contains(respBody, tamperedPayload) == false. The 1 MiB case is the critical one: the previous chunked-truncation design would have leaked the entire payload here |
| "closing the connection afterward does not reliably turn that into a failed download" | No connection close trick — explicit 502 status. Works for every HTTP client (Python requests, JS fetch, curl --output, all SDKs) and survives every HTTP intermediary (nginx, ALB, CDN, ingress controllers) |
All wire-level tests above |
Self-review issues from the previous iteration, also fixed
Issue (from my self-review of c4b230f) |
Fix in 64c8348 |
|---|---|
| Tampered bytes >4 KB delivered before abort | No streaming until verified — applies to all sizes |
| Headline test passing because 256 B never flushes | Tests now run at 256 B, 4 KiB, 64 KiB, 1 MiB with byte-content assertions |
TestHandleHTTPDownload_DefaultModeIsStream testing sha256("") == sha256("") |
Test rewritten to use a real envelope and assert it reaches the verify path |
| Empty-object substitution racey | Now deterministic 502 — covered by TestStreamVerify_EmptyObjectSubstitution_RefusesToServe |
| Trailer signal invisible to Python/JS/curl clients | No trailers — plain 502 with JSON body |
| Reverse proxy / ingress swallows truncation signal | No truncation — clean 502 propagates through any HTTP intermediary |
| No oversize-payload defense | io.LimitReader(body, expectedSize+1) + explicit check — TestStreamVerify_OversizedS3Response_RefusesToServe |
| HTTP/2 path uncovered | No transport-specific code paths anymore (no Hijack, no panic) — works identically on HTTP/1.1 and HTTP/2 |
Advisory X-Kafscale-LFS-Checksum echoed input |
Now set to the proxy-computed actualSHA on the success response |
Test summary
All cmd/proxy + pkg/lfs suites pass; go vet clean. Stream-verify tests:
TestStreamVerify_HonestObject_AtMultipleSizes
size=256 PASS
size=4096 PASS
size=65536 PASS
size=1048576 PASS
TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse
size=256 PASS (no tampered bytes in 502 body)
size=4096 PASS
size=65536 PASS
size=1048576 PASS (key regression — previous design would leak 1 MiB here)
TestStreamVerify_OversizedS3Response_RefusesToServe PASS
TestStreamVerify_EmptyObjectSubstitution_RefusesToServe PASS
TestStreamVerify_ContentLengthSetOnlyAfterVerification PASS
TestHandleHTTPDownload_StreamVerifyMatch PASS
TestHandleHTTPDownload_StreamVerifyMismatch_ReturnsErrorJSON PASS
TestHandleHTTPDownload_StreamRequiresIntegritySize PASS
TestHandleHTTPDownload_DefaultModeIsStream PASS
TestHandleHTTPDownload_MissingIntegrity PASS
TestHandleHTTPDownload_PresignDisabledByDefault PASS
TestHandleHTTPDownload_PresignMode PASS
Trade-offs
- Latency: client waits for the full S3 read before receiving the first byte. Acceptable for LFS workloads (integrity-critical, not latency-critical).
- Temp disk:
os.CreateTemphonorsTMPDIR. Operators should size the pod's writable layer / tmpfs for the largest concurrent download workload. Per-request use is bounded byIntegrity.Size.
Breaking changes (documented in commit message)
- Stream-mode clients must send
Integrity.SHA256ANDIntegrity.Size. - Default mode changed
presign→stream. - Presign mode requires
KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true.
Three spec passages still described the discarded "abort mid-stream on mismatch" framing approach. Updated to reflect what the code in 64c8348 actually does: * Stream-mode endpoint description: buffer S3 → verify SHA-256 → only stream on match; mismatch → 502 integrity_failure with no body bytes returned. * application/octet-stream 200 response: clarify Content-Length is the verified size; mismatch yields 502 not a truncated body. * 502 response: enumerate possible error codes (s3_get_failed, s3_presign_failed, integrity_failure). * mode property: note stream requires both integrity.sha256 AND integrity.size; mismatch or oversize → 502 integrity_failure. Applied identically to cmd/proxy/openapi.yaml and api/lfs-proxy/openapi.yaml. No code change; tests and go vet unchanged-green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@kamir - when we have breaking changes we need to update the documentation in the main branch as well as in gh-pages so we don't have drift between code and docs. Can you please add this into the PR too? gh-pages can be pushed and merged by everyone. |
|
Did a review, two new bugs opened:
|
…s DoS
Pre-review of the buffer-then-verify design surfaced a gap: streamDownload-
WithVerify allocates os.CreateTemp per request and reads up to expectedSize+1
bytes from S3 with no upper bound on what the caller may claim.
Without a cap, an authenticated caller (or a compromised producer upstream
of the envelope) can claim Integrity.Size = TB-scale and force the proxy
to attempt that much temp storage per request — a DoS vector in the same
"trust the wrong input" class as the original integrity bug. Also closes
the integer-overflow corner on expectedSize+1 when Size approaches int64
max.
Fix:
* Reject stream-mode requests when Integrity.Size > m.maxBlob (default
KAFSCALE_LFS_PROXY_MAX_BLOB_SIZE = 5 GiB). New error code
payload_too_large, 400 BadRequest. Skipped when maxBlob is 0
(unlimited, operator opt-in).
* Added TestHandleHTTPDownload_StreamRejectsOversizedIntegritySize —
seeds maxBlob = 10 MiB, claims Size = maxBlob+1, asserts 400
payload_too_large.
* OpenAPI specs document payload_too_large under the 400 response.
Drive-by spec fix: openapi.yaml had `description` set on the
application/octet-stream MediaType object — not allowed by OpenAPI 3.0.
Moved into the Schema object where description IS allowed (and where the
caller would naturally look for binary-body docs).
All cmd/proxy + pkg/lfs tests pass; go vet clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@novatechflow — picking up both of your comments: 1. gh-pages drift (your 13:22 comment): done. Opened companion PR #143 against 2. recovery.go + kafscale-cli (your 13:27 comment): thanks for spotting both. Out of scope for this PR (#139 is exfiltration + download integrity); will track them as separate fixes:
Want me to open both as GitHub issues, or you've already opened them somewhere? 3. While preparing for re-review I also pushed PR #139 state: 4 commits, mergeable, all CI green. Ready for your review when you have a window. |
|
Pushed follow-up commit da634bb to harden the PR after review. What changed:
Verification:
|
|
still the same issues.
In So the target topic is still created before the S3 restore runs, with no rollback on failure. That issue is still present too. |
Summary
Two-commit security PR. Closes both halves of the LFS S3 trust problem reported by Milan Katwal (bucket takeover) and confirmed during review by @novatechflow (download integrity).
1267bbckafscale-lfsexample, requireS3_BUCKET/S3_REGION, runtime blocklist for known-compromised names64c8348integrity_failureTrust model enforced end-to-end: Kafka is the authority. S3 is untrusted storage.
Commit 1 —
1267bbcbucket-takeover blocklistA security researcher registered the
kafscale-lfsS3 bucket on AWS — the exact name used as an example in our OpenAPI specs since1b5dbd8. Any deployment using this default silently routed all LFS uploads to an attacker-controlled bucket.Three defense layers:
kafscale-lfs→my-bucket, test fixture →test-bucket)cmd/proxy/lfs.go::initLFSModulethat rejects known-compromised bucket names at startup, even if config still hardcodes themS3_BUCKETandS3_REGIONmust now be explicitly setThe bucket name is permanently compromised — same pattern as credential scanners that block known-leaked API keys forever, regardless of who controls them now.
Commit 2 —
64c8348LFS download integrity (buffer-then-verify)The LFS envelope records a SHA-256 checksum at upload time, but the download handler never verified it on the way out. Tampered S3 bytes reached consumers undetected.
Design: buffer-then-verify-then-stream. The proxy reads the S3 object into
os.CreateTempviaio.MultiWriter(tmpFile, sha256Hasher), then:written > expectedSize→ 502integrity_failure(defends against oversized payloads from a compromised bucket; S3 read is capped byio.LimitReader(body, expectedSize+1))actualSHA != expectedSHA→ 502integrity_failureContent-Length,WriteHeader(200), stream temp file to clientNo tampered bytes ever reach the client. Works identically on HTTP/1.1 and HTTP/2, behind any HTTP intermediary, with any HTTP client library — no framing tricks, no trailers, no
Hijack.Why not stream-and-verify-at-end (the earlier iterations):
Two prior approaches on this PR (Content-Length + post-stream verify, then chunked transfer + truncate-on-mismatch) were shown unsound by review:
Content-Lengthset, client receives complete N bytes before the abort — same-size tampering undetectable (@novatechflow's report)requests, JSfetch, and curl--output fileThe only architecturally honest approach is verification before delivery. See PR comments for the full design discussion.
Trade-off: download latency includes the full S3 read before the client gets the first byte. Acceptable for LFS-class workloads (integrity-critical, not latency-critical). Pod temp storage must be sized for the largest concurrent download workload; per-request usage is bounded by
Integrity.Size.Breaking changes (intentional, both commits)
KAFSCALE_LFS_PROXY_S3_BUCKETfail at startup with a clear error (was: failed silently downstream)KAFSCALE_LFS_PROXY_S3_BUCKET=kafscale-lfsfail at startup with a blocklist errorIntegrity.SHA256ANDIntegrity.Sizepresign→streamKAFSCALE_LFS_PROXY_PRESIGN_ENABLED=trueFiles changed (highlights)
cmd/proxy/lfs.gocmd/proxy/lfs_http.gostreamDownloadWithVerify(buffer-then-verify-then-stream);Integrity.Sizeenforcement on stream modecmd/proxy/lfs_http_test.gohttptest.NewServer) tests at 256 B / 4 KiB / 64 KiB / 1 MiB for both honest and same-size-tampered payloads; oversize, empty-object substitution, and Content-Length regression guardscmd/proxy/lfs_tracker.go,lfs_tracker_types.godownload_integrity_failedevent for operator alerting via__lfs_ops_statecmd/proxy/openapi.yaml,api/lfs-proxy/openapi.yamlIntegrityClaimcomponent,missing_integrity_size/integrity_failure/presign_disablederror codes, updated descriptions reflecting buffer-then-verify behaviour, safe example bucket namespkg/lfs/doc.go,pkg/lfs/envelope_test.goTest plan
CI (currently green):
cmd/proxyandpkg/lfstests passgo vetcleanManual smoke (reviewer):
KAFSCALE_LFS_PROXY_S3_BUCKET=kafscale-lfs→ exits with blocklist errorKAFSCALE_LFS_PROXY_S3_BUCKET→ exits with "required" errorX-Kafscale-LFS-Checksumset to the verified SHAcode: integrity_failure, no body bytes returnedintegrity.size→ 400code: missing_integrity_sizecode: presign_disabledKey wire-level regression tests (also in CI):
TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponseat 256 B / 4 KiB / 64 KiB / 1 MiB → asserts tampered bytes never appear in response body. The 1 MiB case is the test the earlier framing-based designs would have failed.TestStreamVerify_OversizedS3Response_RefusesToServe→ defends against bucket-compromise oversized payloadTestStreamVerify_EmptyObjectSubstitution_RefusesToServe→ deterministic 502 on 0-byte substitutionTestStreamVerify_ContentLengthSetOnlyAfterVerification→ guard against regression that re-leaks bytes via Content-LengthRelated
🤖 Generated with Claude Code