fix(security): block S3 bucket takeover via unsafe example defaults by kamir · Pull Request #139 · KafScale/platform

kamir · 2026-04-17T16:02:12Z

Summary

Two-commit security PR. Closes both halves of the LFS S3 trust problem reported by Milan Katwal (bucket takeover) and confirmed during review by @novatechflow (download integrity).

Commit	What	Vector closed
`1267bbc`	Bucket-takeover blocklist — remove unsafe `kafscale-lfs` example, require `S3_BUCKET`/`S3_REGION`, runtime blocklist for known-compromised names	Exfiltration: deployments using the public example would have routed LFS uploads to an attacker-controlled bucket
`64c8348`	Buffer-then-verify-then-stream download — proxy buffers S3 to temp file while hashing, verifies SHA-256 against Kafka envelope BEFORE any body bytes reach the client; mismatch → 502 `integrity_failure`	Tampering: consumers receiving modified S3 bytes (e.g. from a compromised bucket or insider with S3 write) had no way to detect it

Trust model enforced end-to-end: Kafka is the authority. S3 is untrusted storage.

Commit 1 — `1267bbc` bucket-takeover blocklist

A security researcher registered the kafscale-lfs S3 bucket on AWS — the exact name used as an example in our OpenAPI specs since 1b5dbd8. Any deployment using this default silently routed all LFS uploads to an attacker-controlled bucket.

Three defense layers:

Remove unsafe examples from both OpenAPI specs and Go doc/test files (kafscale-lfs → my-bucket, test fixture → test-bucket)
Runtime blocklist in cmd/proxy/lfs.go::initLFSModule that rejects known-compromised bucket names at startup, even if config still hardcodes them
Required field enforcement — S3_BUCKET and S3_REGION must now be explicitly set

The bucket name is permanently compromised — same pattern as credential scanners that block known-leaked API keys forever, regardless of who controls them now.

Commit 2 — `64c8348` LFS download integrity (buffer-then-verify)

The LFS envelope records a SHA-256 checksum at upload time, but the download handler never verified it on the way out. Tampered S3 bytes reached consumers undetected.

Design: buffer-then-verify-then-stream. The proxy reads the S3 object into os.CreateTemp via io.MultiWriter(tmpFile, sha256Hasher), then:

If written > expectedSize → 502 integrity_failure (defends against oversized payloads from a compromised bucket; S3 read is capped by io.LimitReader(body, expectedSize+1))
If actualSHA != expectedSHA → 502 integrity_failure
Only on match: seek temp to 0, set Content-Length, WriteHeader(200), stream temp file to client

No tampered bytes ever reach the client. Works identically on HTTP/1.1 and HTTP/2, behind any HTTP intermediary, with any HTTP client library — no framing tricks, no trailers, no Hijack.

Why not stream-and-verify-at-end (the earlier iterations):

Two prior approaches on this PR (Content-Length + post-stream verify, then chunked transfer + truncate-on-mismatch) were shown unsound by review:

With Content-Length set, client receives complete N bytes before the abort — same-size tampering undetectable (@novatechflow's report)
With chunked + post-stream truncate, any payload > Go's ~4 KB internal chunk buffer is already on the wire when the abort fires; HTTP intermediaries (nginx-ingress, ALB, CDNs) buffer chunked responses and erase the truncation signal; trailer-based signals are invisible to Python requests, JS fetch, and curl --output file

The only architecturally honest approach is verification before delivery. See PR comments for the full design discussion.

Trade-off: download latency includes the full S3 read before the client gets the first byte. Acceptable for LFS-class workloads (integrity-critical, not latency-critical). Pod temp storage must be sized for the largest concurrent download workload; per-request usage is bounded by Integrity.Size.

Breaking changes (intentional, both commits)

Deployments with empty KAFSCALE_LFS_PROXY_S3_BUCKET fail at startup with a clear error (was: failed silently downstream)
Deployments using KAFSCALE_LFS_PROXY_S3_BUCKET=kafscale-lfs fail at startup with a blocklist error
LFS download stream-mode clients must send both Integrity.SHA256 AND Integrity.Size
Default download mode changed presign → stream
Presign mode requires KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true

Files changed (highlights)

File	Change
`cmd/proxy/lfs.go`	Bucket blocklist, required-field enforcement, presign-enabled flag
`cmd/proxy/lfs_http.go`	New `streamDownloadWithVerify` (buffer-then-verify-then-stream); `Integrity.Size` enforcement on stream mode
`cmd/proxy/lfs_http_test.go`	Recorder-based handler tests + wire-level (`httptest.NewServer`) tests at 256 B / 4 KiB / 64 KiB / 1 MiB for both honest and same-size-tampered payloads; oversize, empty-object substitution, and Content-Length regression guards
`cmd/proxy/lfs_tracker.go`, `lfs_tracker_types.go`	`download_integrity_failed` event for operator alerting via `__lfs_ops_state`
`cmd/proxy/openapi.yaml`, `api/lfs-proxy/openapi.yaml`	`IntegrityClaim` component, `missing_integrity_size` / `integrity_failure` / `presign_disabled` error codes, updated descriptions reflecting buffer-then-verify behaviour, safe example bucket names
`pkg/lfs/doc.go`, `pkg/lfs/envelope_test.go`	Safe example bucket names

Test plan

CI (currently green):

Go Test Suite — all cmd/proxy and pkg/lfs tests pass
go vet clean
CodeQL pass (actions, go, javascript-typescript, python)
Go Coverage Gate, Helm Lint, License Header Check

Manual smoke (reviewer):

Start proxy with KAFSCALE_LFS_PROXY_S3_BUCKET=kafscale-lfs → exits with blocklist error
Start proxy with empty KAFSCALE_LFS_PROXY_S3_BUCKET → exits with "required" error
Start proxy with valid bucket + region → starts normally
Stream-mode download with valid integrity → 200 OK, body matches; X-Kafscale-LFS-Checksum set to the verified SHA
Stream-mode download with wrong SHA → 502 code: integrity_failure, no body bytes returned
Stream-mode download without integrity.size → 400 code: missing_integrity_size
Presign-mode download without opt-in → 400 code: presign_disabled

Key wire-level regression tests (also in CI):

TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse at 256 B / 4 KiB / 64 KiB / 1 MiB → asserts tampered bytes never appear in response body. The 1 MiB case is the test the earlier framing-based designs would have failed.
TestStreamVerify_OversizedS3Response_RefusesToServe → defends against bucket-compromise oversized payload
TestStreamVerify_EmptyObjectSubstitution_RefusesToServe → deterministic 502 on 0-byte substitution
TestStreamVerify_ContentLengthSetOnlyAfterVerification → guard against regression that re-leaks bytes via Content-Length

Design Notes (preserved from PR preparation)

Why a blocklist?

Removing the examples from the spec fixes future copy-paste. But the bucket name kafscale-lfs has been public in this repo since commit 1b5dbd8. Every deployment that was configured by copying from those docs — Helm values, docker-compose files, CI scripts, customer PoCs — may still have kafscale-lfs hardwired in config files.

The blocklist is a runtime safety net for existing deployments: even if someone upgrades the proxy binary but doesn't touch their config, the proxy refuses to start instead of silently routing data to an attacker-controlled bucket.

The bucket name is permanently compromised — it can never be safe again, regardless of who controls it now. The code must reject it forever. This is the same pattern used by credential scanners that block known-leaked API keys even after rotation.

Breaking change

Deployments that relied on an empty KAFSCALE_LFS_PROXY_S3_BUCKET will now fail at startup with a clear error message. This is intentional — an empty bucket was never valid, it just failed silently downstream.

Test plan

go build ./cmd/proxy/ passes
go vet ./pkg/lfs/ passes
go test ./pkg/lfs/... passes
Start proxy with KAFSCALE_LFS_PROXY_S3_BUCKET=kafscale-lfs → exits with blocklist error
Start proxy with empty KAFSCALE_LFS_PROXY_S3_BUCKET → exits with "required" error
Start proxy with valid bucket + region → starts normally
Grep confirms no remaining kafscale-lfs bucket references (only in blocklist)

Context

This fix addresses a reported S3 bucket takeover vulnerability. A security researcher registered the kafscale-lfs bucket on AWS — the exact name used as examples in our OpenAPI specs. The broader config hardening plan (centralized config package, startup validation for all binaries, console auth fix) is tracked in scalytics-all-in-one/PLANS/plan-001.md.

kamir · 2026-04-17T17:22:47Z

Scope expansion — adding LFS download integrity fix

Per review feedback, consolidating follow-up fixes into this PR rather than opening a separate one.

Additional changes being added:

LFS download integrity verification — closes the tampering vector that complements the exfiltration fix already in this PR.

Current behavior (still vulnerable after just the blocklist fix):

Any attacker/insider with S3 write access can tamper with LFS objects post-upload
Neither stream nor presign mode verifies the envelope's sha256 checksum

Planned fix:

Stream mode becomes default. Hashes bytes on the fly (io.TeeReader), compares to envelope checksum, aborts TCP connection on mismatch (no trailers — poor client support). Emits integrity_failure event to __lfs_ops_state tracker.
Presign mode is disabled by default. Opt-in via KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. When enabled, proxy logs a warning banner at startup and every 5 minutes. Response body includes integrity block with sha256/checksum_alg/size so SDK can verify client-side.
Advisory response headers on stream mode: X-Kafscale-LFS-Checksum and X-Kafscale-LFS-Content-Length — let advanced clients hash independently and fail-fast without waiting for connection close.
Updated OpenAPI spec reflects new default, presign_disabled error code, integrity block.

Trust model being enforced:

Kafka is the authority. S3 is untrusted storage.

The envelope lives in Kafka (attacker can't modify it). Downloaded bytes from S3 are verified against the Kafka-held checksum on every read.

Breaking change (intentional):

Default mode changes from presign → stream. Existing clients that hardcoded mode: presign must also have the operator set KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. This is secure-by-default by design.

Full spec

Detailed API behavior, acceptance criteria, and out-of-scope items are in the forthcoming commit. The broader platform-wide integrity audit (broker segment CRC, RecordBatch CRC32C, S3 checksums) is tracked separately in scalytics-all-in-one/PLANS/plan-002.md and will be addressed in follow-up PRs.

Pushing commits shortly.

novatechflow · 2026-04-18T09:00:54Z

streamDownloadWithVerify writes the full body with io.Copy and sets Content-Length before comparing the digest. If the object is tampered but keeps the same size, the client can still receive a complete 200 OK response and treat it as success. Closing the connection afterward does not reliably turn that into a failed download.

The patch does not guarantee “truncated response on mismatch”, verification has to happen before sending the response, or the protocol needs an explicit post-body failure signal.

kamir · 2026-04-21T10:27:15Z

Hi Alex, thanks for the comment. I will work on it and update the PR.

novatechflow · 2026-05-11T08:03:34Z

@kamir - are you still working on that?

kamir · 2026-05-15T18:35:54Z

Yes, I am on it.

A security researcher registered the `kafscale-lfs` S3 bucket on AWS — the same name used as an example throughout the OpenAPI specs and docs. Any deployment using this default would silently route LFS uploads to an attacker-controlled bucket, enabling data exfiltration and content tampering. Three changes: 1. Replace all `kafscale-lfs` example values in both OpenAPI specs (cmd/proxy/ and api/lfs-proxy/) with safe placeholder `my-bucket`. 2. Add runtime blocklist in initLFSModule(): the proxy now refuses to start if the configured S3 bucket matches any known-unsafe example name. This protects existing deployments that upgrade the binary but don't update their config. 3. Require KAFSCALE_LFS_PROXY_S3_BUCKET and KAFSCALE_LFS_PROXY_S3_REGION at startup when LFS is enabled. Previously empty values were silently accepted, causing downstream failures or unsafe fallback behavior. Reported-by: Milan Katwal <milankatwal2056@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@novatechflow

…on mismatch The LFS envelope records a SHA-256 checksum at upload time, but the download handler never verified it. A consumer receiving tampered content from a compromised S3 bucket had no way to detect the modification. This closes the tampering vector that complements the exfiltration fix already in this PR. Trust model enforced: Kafka is the authority. S3 is untrusted storage. The envelope lives in Kafka (attacker cannot modify it). Downloaded bytes from S3 are verified against the Kafka-held checksum on every read. Response framing: chunked transfer encoding + integrity trailer + truncate on mismatch. Detailed below because the first iteration of this commit (reviewed on PR KafScale#139 by @novatechflow) had a critical framing bug — it set Content-Length up-front and verified after io.Copy, so same-size tampering was undetectable from the client's perspective. This commit replaces that attempt with a framing-correct design. Behavior changes: * **Default mode is now `stream`** (was `presign`). Secure-by-default. * **Stream mode hashes bytes on the fly** via io.TeeReader with SHA-256 and compares against the client-supplied envelope checksum. * **Stream-mode responses use chunked transfer encoding** — Content-Length is deliberately NOT set. On verification failure the proxy aborts the connection BEFORE Go's chunkWriter emits the terminating "0\r\n\r\n", producing a truncated chunked stream. Per RFC 7230 §4.1, HTTP/1.1 clients MUST treat this as a transport error. This is the framing-level signal that makes "verification failed" visible to the client's HTTP parser, not just to server logs. * **Trailer "X-Kafscale-LFS-Verify"** is declared up-front and set to "pass" only on a successful match. On mismatch the trailer is never emitted. This is defense in depth alongside the chunked truncation; the missing terminator is the primary signal because trailer support varies among HTTP clients. * **HTTP/2 path** has no Hijacker. panic(http.ErrAbortHandler) sends RST_STREAM, an explicit framing failure in HTTP/2 — sufficient signal on that transport. * **Presign mode is disabled by default**, opt-in via KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. When enabled, proxy logs a warning banner at startup and every 5 minutes. Response body includes an `integrity` block (sha256, checksum_alg, size) so client SDKs can verify downloaded bytes themselves. * **Integrity claim is required** on all download requests. Missing integrity.sha256 → 400 missing_integrity. * **Advisory response headers** on stream mode: X-Kafscale-LFS-Checksum and X-Kafscale-LFS-Content-Length. Set before body flush so advanced clients can hash independently and fail-fast without waiting for the trailer or connection close. * **Ops-tracker event** download_integrity_failed emits to __lfs_ops_state with expected/actual SHA-256, bytes read, and mode — gives operators a Kafka-native alerting signal for tampering attempts. Breaking change (intentional): clients that did not send an integrity block must add one. Clients that relied on the default mode being presign must either request presign explicitly (and have the operator opt in) or switch to stream. OpenAPI specs (cmd/proxy/openapi.yaml and api/lfs-proxy/openapi.yaml) updated with the new schema, IntegrityClaim component, presign_disabled error code, and safe example values (no real bucket names). Tests: * TestHandleHTTPDownload_StreamVerifyMatch — happy path (recorder) * TestHandleHTTPDownload_StreamVerifyMismatch — handler panics * TestHandleHTTPDownload_PresignMode — requires integrity, echoes it back * TestHandleHTTPDownload_PresignDisabledByDefault — 400 when not opted in * TestHandleHTTPDownload_MissingIntegrity — 400 when claim absent * TestHandleHTTPDownload_DefaultModeIsStream — confirms new default * TestStreamVerify_HonestObject_ChunkedAndTrailerPass — on-wire test: asserts Content-Length absent, Transfer-Encoding=chunked, body intact, trailer X-Kafscale-LFS-Verify=pass present. * TestStreamVerify_SameSizeTamper_ClientSeesTransportError — regression test for the framing bug. S3 object replaced with same-length tampered content; asserts io.ReadAll returns transport error AND trailer is not "pass". Observed: readErr=unexpected EOF, trailer="". * TestStreamVerify_NoContentLengthHeader_OnResponse — guard against regression if a future change reintroduces Content-Length. All cmd/proxy and pkg/lfs test suites pass; go vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kamir · 2026-05-16T10:19:03Z

@novatechflow — you were right, the previous commit didn't actually fix the integrity vector. Replaced e2931e2 with c4b230f.

What was wrong: streamDownloadWithVerify set Content-Length: N up-front, hashed during io.Copy, and verified after. By then the client had already received a syntactically complete 200 OK + N bytes. The Hijack+Close / panic(http.ErrAbortHandler) aborted the server handler but were invisible to the client's HTTP parser. Same-size tampering bypassed the check exactly as you described.

Fix (c4b230f):

Drop Content-Length — response uses chunked transfer encoding (Go's net/http switches automatically).
Hijack+Close on mismatch happens before Go writes the terminating 0\r\n\r\n chunk → client sees a truncated chunked stream → RFC 7230 §4.1 mandates this as a transport error.
Trailer: X-Kafscale-LFS-Verify declared up-front, set to pass only on match (defense in depth, not the primary signal — trailer support varies).
HTTP/2 path: panic(http.ErrAbortHandler) → server sends RST_STREAM, an explicit framing failure.

Tests (wire-level, via httptest.NewServer — the previous recorder-based test couldn't catch the bug):

Test	Asserts
`TestStreamVerify_HonestObject_ChunkedAndTrailerPass`	Content-Length absent, `Transfer-Encoding: chunked`, body intact, trailer = `pass`
`TestStreamVerify_SameSizeTamper_ClientSeesTransportError`	regression test for your report — same-length tampered S3 object; client `io.ReadAll` returns transport error AND trailer ≠ `pass`. Observed: `readErr=unexpected EOF`, `trailer=""`
`TestStreamVerify_NoContentLengthHeader_OnResponse`	guard — fails if anyone reintroduces `Content-Length`

All 26 cmd/proxy + pkg/lfs suites pass, go vet clean. CI re-running.

Out of scope (separate PR): per-chunk checksums in the envelope would let us reject before any body bytes are sent — requires envelope schema v2 + producer cooperation.

kamir · 2026-05-16T12:41:42Z

Self-review of c4b230f found the chunked-truncation approach has architectural holes (bytes > ~4 KB still reach the client before the abort; the headline test was passing because 256-byte payloads never flush past Go's internal chunk buffer; HTTP intermediaries swallow the truncation signal; trailer-based signal doesn't reach Python requests / JS fetch / curl --output). Replacing with buffer-then-verify-then-stream: read S3 fully into a temp file while hashing, only emit 200 + Content-Length after the hash matches, return 502 with structured error on mismatch. Pushing shortly. /cc @novatechflow — please hold review until the next push lands.

@novatechflow

…en-stream The LFS envelope records a SHA-256 checksum at upload time, but the download handler never verified it. A consumer receiving tampered content from a compromised S3 bucket had no way to detect the modification. This closes the tampering vector that complements the exfiltration fix already in this PR. Trust model enforced: Kafka is the authority. S3 is untrusted storage. The envelope lives in Kafka (attacker cannot modify it). Downloaded bytes from S3 are verified against the Kafka-held checksum BEFORE any bytes are sent to the client. Design: buffer-then-verify-then-stream. Two earlier iterations on this PR attempted to verify after streaming and signal failure via response framing (Content-Length truncation, then chunked-encoding truncation + trailer). Both were unsound for production traffic: * Setting Content-Length and verifying after io.Copy: the client receives a complete, syntactically valid 200 OK + N bytes before the abort. Same- size tampering goes undetected. Flagged by @novatechflow on PR KafScale#139. * Chunked transfer encoding with trailer-based signal: for any payload larger than Go's internal chunk-writer buffer (~4 KB), tampered bytes are already on the wire before the abort; the client's io.ReadAll returns the bytes plus an unexpected-EOF error. HTTP intermediaries (nginx-ingress, ALB, CDNs) buffer responses and erase the truncation signal. Trailer-based signalling is invisible to most HTTP client libraries (Python requests, JavaScript fetch, curl --output file). Adversarial review confirmed neither framing trick provides the security property advertised. The only architecturally honest approach is to verify before sending — buffer the S3 object to temporary disk while hashing, and only begin streaming to the client after the hash matches the envelope. Behavior: * Default mode is now `stream` (was `presign`). Secure-by-default. * Stream mode: handler reads S3 into os.CreateTemp(...), computes SHA-256 via io.MultiWriter, compares against the client-supplied envelope checksum. On match: seeks the temp file to 0, sets Content-Length and X-Kafscale-LFS-Checksum, writes 200 OK, streams the temp file to the client. On mismatch: returns 502 with structured JSON error code "integrity_failure". No tampered bytes ever reach the client. * S3 read is bounded by io.LimitReader(body, expectedSize+1). A compromised bucket cannot exhaust proxy disk by returning a payload larger than the envelope declares. If the read produces > expectedSize bytes, the handler returns 502 integrity_failure immediately (before hashing). * Integrity.Size is required on stream-mode requests (in addition to Integrity.SHA256). Missing size → 400 missing_integrity_size. This is what makes the LimitReader cap enforceable. * Presign mode is disabled by default, opt-in via KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. When enabled, proxy logs a warning banner at startup and every 5 minutes. Response body includes an `integrity` block (sha256, checksum_alg, size) so client SDKs can verify downloaded bytes themselves. * Integrity.SHA256 is required on all download requests (stream and presign). Missing claim → 400 missing_integrity. * Temporary file is created via os.CreateTemp (respects TMPDIR), and removed in a defer regardless of code path. Production deployments should size the pod tmpfs / writable layer for the largest expected LFS object served concurrently. * Ops-tracker event download_integrity_failed emits to __lfs_ops_state with expected/actual SHA-256, bytes read, and mode — gives operators a Kafka-native alerting signal for tampering attempts. Breaking changes (intentional): * Stream-mode clients MUST send Integrity.SHA256 AND Integrity.Size. * Default mode changed from presign to stream. * Presign mode requires KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. OpenAPI specs (cmd/proxy/openapi.yaml and api/lfs-proxy/openapi.yaml) updated with the IntegrityClaim component, presign_disabled error code, missing_integrity_size error code, integrity_failure 502 response, and safe example values (no real bucket names). Tests (cmd/proxy/lfs_http_test.go): Recorder-based (handler-level): * TestHandleHTTPDownload_StreamVerifyMatch — verified response is 200 OK with Content-Length set and correct body * TestHandleHTTPDownload_StreamVerifyMismatch_ReturnsErrorJSON — mismatch → 502 integrity_failure, tampered bytes NOT in response * TestHandleHTTPDownload_StreamRequiresIntegritySize — 400 missing_integrity_size when Size omitted * TestHandleHTTPDownload_DefaultModeIsStream — confirms default routing reaches stream-verify path (not presign) * TestHandleHTTPDownload_PresignMode — presign requires integrity, echoes it back in response * TestHandleHTTPDownload_PresignDisabledByDefault — 400 when not opted in * TestHandleHTTPDownload_MissingIntegrity — 400 when claim absent Wire-level (httptest.NewServer + http.Client): * TestStreamVerify_HonestObject_AtMultipleSizes — verifies the success path at 256 B, 4 KiB, 64 KiB, 1 MiB. Asserts 200 OK, Content-Length matches payload length, body delivered intact. * TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse — the headline regression test for @novatechflow's report. Tampered S3 object of the same length as the envelope claims, at 256 B / 4 KiB / 64 KiB / 1 MiB. Asserts response is 502 integrity_failure and tampered bytes do NOT appear anywhere in the response. Earlier iterations of this fix would have leaked the full 1 MiB tampered payload to the client here. * TestStreamVerify_OversizedS3Response_RefusesToServe — S3 returns 10x more bytes than envelope declares. Asserts 502 and that attacker bytes do not appear in response. * TestStreamVerify_EmptyObjectSubstitution_RefusesToServe — S3 replaced with 0-byte object. Asserts deterministic 502 (earlier designs were racey on this case). * TestStreamVerify_ContentLengthSetOnlyAfterVerification — guard against regression that would set Content-Length to S3 size on a non-verified path. All cmd/proxy and pkg/lfs test suites pass; go vet clean. Operational notes: * Trade-off: download latency increases by the S3-read time before the client sees the first byte. Acceptable for LFS-class workloads (integrity-critical, not latency-critical). * Temp file disk use is bounded per request by Integrity.Size. Operators should ensure the pod's writable layer or TMPDIR-backed volume has capacity for the largest concurrent download workload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kamir · 2026-05-16T12:48:37Z

Replaced c4b230f with 64c8348 — buffer-then-verify-then-stream. The framing-trick designs are gone; verification now happens before any byte of the body reaches the client.

How `64c8348` works

streamDownloadWithVerify in cmd/proxy/lfs_http.go:

os.CreateTemp → opens a temp file (cleaned up in defer)
io.LimitReader(obj.Body, expectedSize+1) → bounds the S3 read to Integrity.Size+1 bytes; defends against a compromised bucket returning more bytes than the envelope declares
io.Copy(io.MultiWriter(tmpFile, hasher), limitedReader) → simultaneously writes S3 bytes to temp file and feeds SHA-256
If written > expectedSize → 502 integrity_failure, abort
If actualSHA != expectedSHA → 502 integrity_failure, abort
Only on match: seek temp file to 0, set Content-Length: <verified size>, WriteHeader(200), io.Copy temp → client

Integrity.Size is now required on stream-mode requests (in addition to Integrity.SHA256) — that's what makes the LimitReader cap enforceable. Missing → 400 missing_integrity_size.

Mapping your concern → fix → tests

Your point	How the fix addresses it	Test
"verification has to happen before sending the response"	SHA verified against temp file BEFORE `w.WriteHeader(200)`	`TestStreamVerify_HonestObject_AtMultipleSizes` (success path), `TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse` (mismatch path)
"same size tampered → complete 200 OK"	Mismatch returns 502 with JSON `{code: "integrity_failure"}`; tampered bytes never touch the response body	`TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse` at 256 B, 4 KiB, 64 KiB, 1 MiB — asserts `bytes.Contains(respBody, tamperedPayload) == false`. The 1 MiB case is the critical one: the previous chunked-truncation design would have leaked the entire payload here
"closing the connection afterward does not reliably turn that into a failed download"	No connection close trick — explicit 502 status. Works for every HTTP client (Python `requests`, JS `fetch`, curl `--output`, all SDKs) and survives every HTTP intermediary (nginx, ALB, CDN, ingress controllers)	All wire-level tests above

Self-review issues from the previous iteration, also fixed

Issue (from my self-review of `c4b230f`)	Fix in `64c8348`
Tampered bytes >4 KB delivered before abort	No streaming until verified — applies to all sizes
Headline test passing because 256 B never flushes	Tests now run at 256 B, 4 KiB, 64 KiB, 1 MiB with byte-content assertions
`TestHandleHTTPDownload_DefaultModeIsStream` testing `sha256("") == sha256("")`	Test rewritten to use a real envelope and assert it reaches the verify path
Empty-object substitution racey	Now deterministic 502 — covered by `TestStreamVerify_EmptyObjectSubstitution_RefusesToServe`
Trailer signal invisible to Python/JS/curl clients	No trailers — plain 502 with JSON body
Reverse proxy / ingress swallows truncation signal	No truncation — clean 502 propagates through any HTTP intermediary
No oversize-payload defense	`io.LimitReader(body, expectedSize+1)` + explicit check — `TestStreamVerify_OversizedS3Response_RefusesToServe`
HTTP/2 path uncovered	No transport-specific code paths anymore (no Hijack, no panic) — works identically on HTTP/1.1 and HTTP/2
Advisory `X-Kafscale-LFS-Checksum` echoed input	Now set to the proxy-computed `actualSHA` on the success response

Test summary

All cmd/proxy + pkg/lfs suites pass; go vet clean. Stream-verify tests:

TestStreamVerify_HonestObject_AtMultipleSizes
    size=256        PASS
    size=4096       PASS
    size=65536      PASS
    size=1048576    PASS
TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse
    size=256        PASS    (no tampered bytes in 502 body)
    size=4096       PASS
    size=65536      PASS
    size=1048576    PASS    (key regression — previous design would leak 1 MiB here)
TestStreamVerify_OversizedS3Response_RefusesToServe        PASS
TestStreamVerify_EmptyObjectSubstitution_RefusesToServe    PASS
TestStreamVerify_ContentLengthSetOnlyAfterVerification     PASS
TestHandleHTTPDownload_StreamVerifyMatch                   PASS
TestHandleHTTPDownload_StreamVerifyMismatch_ReturnsErrorJSON  PASS
TestHandleHTTPDownload_StreamRequiresIntegritySize         PASS
TestHandleHTTPDownload_DefaultModeIsStream                 PASS
TestHandleHTTPDownload_MissingIntegrity                    PASS
TestHandleHTTPDownload_PresignDisabledByDefault            PASS
TestHandleHTTPDownload_PresignMode                         PASS

Trade-offs

Latency: client waits for the full S3 read before receiving the first byte. Acceptable for LFS workloads (integrity-critical, not latency-critical).
Temp disk: os.CreateTemp honors TMPDIR. Operators should size the pod's writable layer / tmpfs for the largest concurrent download workload. Per-request use is bounded by Integrity.Size.

Breaking changes (documented in commit message)

Stream-mode clients must send Integrity.SHA256 AND Integrity.Size.
Default mode changed presign → stream.
Presign mode requires KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true.

Three spec passages still described the discarded "abort mid-stream on mismatch" framing approach. Updated to reflect what the code in 64c8348 actually does: * Stream-mode endpoint description: buffer S3 → verify SHA-256 → only stream on match; mismatch → 502 integrity_failure with no body bytes returned. * application/octet-stream 200 response: clarify Content-Length is the verified size; mismatch yields 502 not a truncated body. * 502 response: enumerate possible error codes (s3_get_failed, s3_presign_failed, integrity_failure). * mode property: note stream requires both integrity.sha256 AND integrity.size; mismatch or oversize → 502 integrity_failure. Applied identically to cmd/proxy/openapi.yaml and api/lfs-proxy/openapi.yaml. No code change; tests and go vet unchanged-green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

novatechflow · 2026-05-16T13:22:01Z

@kamir - when we have breaking changes we need to update the documentation in the main branch as well as in gh-pages so we don't have drift between code and docs. Can you please add this into the PR too? gh-pages can be pushed and merged by everyone.

novatechflow · 2026-05-16T13:27:32Z

Did a review, two new bugs opened:

pkg/storage/recovery.go uses ListSegments with default/<topic> instead of default/<topic>/. Since the S3 client does prefix matching, restoring orders will also pick up keys for topics like orders-restore or orders-v2. On the source side that can make a valid restore fail when inspectSourceSegment rejects those keys; on the target side it can falsely report that the target already has segments.
cmd/kafscale-cli/main.go creates the target topic in etcd before any S3 copy happens. If RecoverTopicToTimestamp fails afterward, the command exits and leaves the new topic behind even though the restore never completed. That makes retries and cleanup messy and can expose an empty or partial topic to clients.

…s DoS Pre-review of the buffer-then-verify design surfaced a gap: streamDownload- WithVerify allocates os.CreateTemp per request and reads up to expectedSize+1 bytes from S3 with no upper bound on what the caller may claim. Without a cap, an authenticated caller (or a compromised producer upstream of the envelope) can claim Integrity.Size = TB-scale and force the proxy to attempt that much temp storage per request — a DoS vector in the same "trust the wrong input" class as the original integrity bug. Also closes the integer-overflow corner on expectedSize+1 when Size approaches int64 max. Fix: * Reject stream-mode requests when Integrity.Size > m.maxBlob (default KAFSCALE_LFS_PROXY_MAX_BLOB_SIZE = 5 GiB). New error code payload_too_large, 400 BadRequest. Skipped when maxBlob is 0 (unlimited, operator opt-in). * Added TestHandleHTTPDownload_StreamRejectsOversizedIntegritySize — seeds maxBlob = 10 MiB, claims Size = maxBlob+1, asserts 400 payload_too_large. * OpenAPI specs document payload_too_large under the 400 response. Drive-by spec fix: openapi.yaml had `description` set on the application/octet-stream MediaType object — not allowed by OpenAPI 3.0. Moved into the Schema object where description IS allowed (and where the caller would naturally look for binary-body docs). All cmd/proxy + pkg/lfs tests pass; go vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kamir · 2026-05-16T14:02:02Z

@novatechflow — picking up both of your comments:

1. gh-pages drift (your 13:22 comment): done. Opened companion PR #143 against gh-pages. Updates _docs/lfs-proxy.md, _docs/lfs-helm.md, _docs/lfs-sdks.md to fix env-var names (LFS_* → KAFSCALE_LFS_PROXY_*), drop the unsafe kafscale bucket example, replace stale endpoint POST /v1/topics/{topic}/records with the real POST /lfs/produce, and adds a new Trust model and integrity verification section documenting buffer-then-verify, required integrity.sha256 + integrity.size, the KAFSCALE_LFS_PROXY_PRESIGN_ENABLED opt-in, and the new error codes. PR #143 is mergeable independently of this one — happy for you (or anyone) to merge it as soon as the rendered preview looks good.

2. recovery.go + kafscale-cli (your 13:27 comment): thanks for spotting both. Out of scope for this PR (#139 is exfiltration + download integrity); will track them as separate fixes:

pkg/storage/recovery.go ListSegments prefix-match bug (default/<topic> vs default/<topic>/) — separate PR
cmd/kafscale-cli/main.go topic-create-before-restore-fails leaving orphaned topic — separate PR

Want me to open both as GitHub issues, or you've already opened them somewhere?

3. While preparing for re-review I also pushed 5870a0e on this branch: caps download Integrity.Size at KAFSCALE_LFS_PROXY_MAX_BLOB_SIZE (default 5 GiB), returning 400 payload_too_large over the limit. Without this cap, an authenticated caller could claim TB-scale Integrity.Size and force the proxy to allocate that much temp storage per request — same "trust the wrong input" class as the original integrity bug, surfaced by a pre-review pass before pinging you. New regression test TestHandleHTTPDownload_StreamRejectsOversizedIntegritySize. OpenAPI 400 response now enumerates this and the other new codes.

PR #139 state: 4 commits, mergeable, all CI green. Ready for your review when you have a window.

kamir · 2026-05-16T15:30:37Z

Pushed follow-up commit da634bb to harden the PR after review.

What changed:

Validate integrity.sha256 as a 64-character hex SHA-256 digest before any S3 I/O.
Reject negative sizes and the MaxInt64 overflow case even when the download size cap is disabled.
Distinguish temporary verification-storage failures from S3 read failures.
Expose X-Kafscale-LFS-Checksum and X-Kafscale-LFS-Content-Length through CORS.
Make fake S3 return missing-key errors instead of silent empty objects.
Update OpenAPI and docker-compose download examples for stream-mode integrity.

Verification:

GOCACHE=/private/tmp/kafscale-go-cache go test ./cmd/proxy ./pkg/lfs
GOCACHE=/private/tmp/kafscale-go-cache go vet ./cmd/proxy ./pkg/lfs
git diff --check

novatechflow · 2026-05-17T08:12:14Z

still the same issues.

in pkg/storage/recovery.go (line 103), still missing the trailing /, so the prefix-match bug is still present.

targetPrefix := path.Join(cfg.TargetNamespace, cfg.TargetTopic)
existing, err := s3.ListSegments(ctx, targetPrefix)
...
sourcePrefix := path.Join(cfg.SourceNamespace, cfg.SourceTopic)
objects, err := s3.ListSegments(ctx, sourcePrefix)

In cmd/kafscale-cli/main.go (line 194), it still does:

store.CreateTopic(...)
store.UpdateTopicConfig(...)
result, err := storage.RecoverTopicToTimestamp(...)
if err != nil {
    return err
}

So the target topic is still created before the S3 restore runs, with no rollback on failure. That issue is still present too.

kamir force-pushed the fix/s3-bucket-takeover-cve branch from 3235759 to e2931e2 Compare May 16, 2026 09:24

kamir mentioned this pull request May 16, 2026

feat(broker): INIT_PRODUCER_ID + LIST_GROUPS fixes + standalone Helm chart for smoke tests #142

Open

4 tasks

kamir force-pushed the fix/s3-bucket-takeover-cve branch from e2931e2 to c4b230f Compare May 16, 2026 10:18

kamir force-pushed the fix/s3-bucket-takeover-cve branch from c4b230f to 64c8348 Compare May 16, 2026 12:47

kamir mentioned this pull request May 16, 2026

docs(lfs): align gh-pages with PR #139 breaking changes #143

Merged

4 tasks

fix(lfs): harden download integrity edge cases

da634bb

novatechflow pushed a commit that referenced this pull request May 17, 2026

docs(lfs): align gh-pages with PR #139 breaking changes (#143)

b435379

novatechflow added 4 commits May 17, 2026 10:14

fix(restore): tighten topic prefix matching and rollback

4db4864

fix(restore): clean up copied objects on rollback

4f41396

fix(storage): update dual s3 adapter for deletes

bfb779e

test(storage): update broker s3 fake for deletes

f0213b9

novatechflow added 2 commits May 17, 2026 10:23

fix(restore): persist cloned topic config directly

1c1d688

style(lfs): format tracker event helpers

7622032

novatechflow mentioned this pull request May 17, 2026

Pr 139 kamir/kafscale#12

Closed

novatechflow self-requested a review May 17, 2026 08:59

novatechflow approved these changes May 17, 2026

View reviewed changes

novatechflow merged commit 8e11c4d into KafScale:main May 17, 2026
9 checks passed

Conversation

kamir commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commit 1 — 1267bbc bucket-takeover blocklist

Commit 2 — 64c8348 LFS download integrity (buffer-then-verify)

Breaking changes (intentional, both commits)

Files changed (highlights)

Test plan

Related

Uh oh!

kamir commented Apr 17, 2026

Design Notes (preserved from PR preparation)

Why a blocklist?

Breaking change

Test plan

Context

Uh oh!

kamir commented Apr 17, 2026

Scope expansion — adding LFS download integrity fix

Additional changes being added:

Trust model being enforced:

Breaking change (intentional):

Full spec

Uh oh!

novatechflow commented Apr 18, 2026

Uh oh!

kamir commented Apr 21, 2026

Uh oh!

novatechflow commented May 11, 2026

Uh oh!

kamir commented May 15, 2026

Uh oh!

kamir commented May 16, 2026

Uh oh!

kamir commented May 16, 2026

Uh oh!

kamir commented May 16, 2026

How 64c8348 works

Mapping your concern → fix → tests

Self-review issues from the previous iteration, also fixed

Test summary

Trade-offs

Breaking changes (documented in commit message)

Uh oh!

novatechflow commented May 16, 2026

Uh oh!

novatechflow commented May 16, 2026

Uh oh!

kamir commented May 16, 2026

Uh oh!

kamir commented May 16, 2026

Uh oh!

novatechflow commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kamir commented Apr 17, 2026 •

edited

Loading

Commit 1 — `1267bbc` bucket-takeover blocklist

Commit 2 — `64c8348` LFS download integrity (buffer-then-verify)

How `64c8348` works