Skip to content

fix(security): block S3 bucket takeover via unsafe example defaults#139

Merged
novatechflow merged 11 commits into
KafScale:mainfrom
kamir:fix/s3-bucket-takeover-cve
May 17, 2026
Merged

fix(security): block S3 bucket takeover via unsafe example defaults#139
novatechflow merged 11 commits into
KafScale:mainfrom
kamir:fix/s3-bucket-takeover-cve

Conversation

@kamir
Copy link
Copy Markdown
Collaborator

@kamir kamir commented Apr 17, 2026

Summary

Two-commit security PR. Closes both halves of the LFS S3 trust problem reported by Milan Katwal (bucket takeover) and confirmed during review by @novatechflow (download integrity).

Commit What Vector closed
1267bbc Bucket-takeover blocklist — remove unsafe kafscale-lfs example, require S3_BUCKET/S3_REGION, runtime blocklist for known-compromised names Exfiltration: deployments using the public example would have routed LFS uploads to an attacker-controlled bucket
64c8348 Buffer-then-verify-then-stream download — proxy buffers S3 to temp file while hashing, verifies SHA-256 against Kafka envelope BEFORE any body bytes reach the client; mismatch → 502 integrity_failure Tampering: consumers receiving modified S3 bytes (e.g. from a compromised bucket or insider with S3 write) had no way to detect it

Trust model enforced end-to-end: Kafka is the authority. S3 is untrusted storage.

Commit 1 — 1267bbc bucket-takeover blocklist

A security researcher registered the kafscale-lfs S3 bucket on AWS — the exact name used as an example in our OpenAPI specs since 1b5dbd8. Any deployment using this default silently routed all LFS uploads to an attacker-controlled bucket.

Three defense layers:

  • Remove unsafe examples from both OpenAPI specs and Go doc/test files (kafscale-lfsmy-bucket, test fixture → test-bucket)
  • Runtime blocklist in cmd/proxy/lfs.go::initLFSModule that rejects known-compromised bucket names at startup, even if config still hardcodes them
  • Required field enforcementS3_BUCKET and S3_REGION must now be explicitly set

The bucket name is permanently compromised — same pattern as credential scanners that block known-leaked API keys forever, regardless of who controls them now.

Commit 2 — 64c8348 LFS download integrity (buffer-then-verify)

The LFS envelope records a SHA-256 checksum at upload time, but the download handler never verified it on the way out. Tampered S3 bytes reached consumers undetected.

Design: buffer-then-verify-then-stream. The proxy reads the S3 object into os.CreateTemp via io.MultiWriter(tmpFile, sha256Hasher), then:

  • If written > expectedSize → 502 integrity_failure (defends against oversized payloads from a compromised bucket; S3 read is capped by io.LimitReader(body, expectedSize+1))
  • If actualSHA != expectedSHA → 502 integrity_failure
  • Only on match: seek temp to 0, set Content-Length, WriteHeader(200), stream temp file to client

No tampered bytes ever reach the client. Works identically on HTTP/1.1 and HTTP/2, behind any HTTP intermediary, with any HTTP client library — no framing tricks, no trailers, no Hijack.

Why not stream-and-verify-at-end (the earlier iterations):

Two prior approaches on this PR (Content-Length + post-stream verify, then chunked transfer + truncate-on-mismatch) were shown unsound by review:

  1. With Content-Length set, client receives complete N bytes before the abort — same-size tampering undetectable (@novatechflow's report)
  2. With chunked + post-stream truncate, any payload > Go's ~4 KB internal chunk buffer is already on the wire when the abort fires; HTTP intermediaries (nginx-ingress, ALB, CDNs) buffer chunked responses and erase the truncation signal; trailer-based signals are invisible to Python requests, JS fetch, and curl --output file

The only architecturally honest approach is verification before delivery. See PR comments for the full design discussion.

Trade-off: download latency includes the full S3 read before the client gets the first byte. Acceptable for LFS-class workloads (integrity-critical, not latency-critical). Pod temp storage must be sized for the largest concurrent download workload; per-request usage is bounded by Integrity.Size.

Breaking changes (intentional, both commits)

  • Deployments with empty KAFSCALE_LFS_PROXY_S3_BUCKET fail at startup with a clear error (was: failed silently downstream)
  • Deployments using KAFSCALE_LFS_PROXY_S3_BUCKET=kafscale-lfs fail at startup with a blocklist error
  • LFS download stream-mode clients must send both Integrity.SHA256 AND Integrity.Size
  • Default download mode changed presignstream
  • Presign mode requires KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true

Files changed (highlights)

File Change
cmd/proxy/lfs.go Bucket blocklist, required-field enforcement, presign-enabled flag
cmd/proxy/lfs_http.go New streamDownloadWithVerify (buffer-then-verify-then-stream); Integrity.Size enforcement on stream mode
cmd/proxy/lfs_http_test.go Recorder-based handler tests + wire-level (httptest.NewServer) tests at 256 B / 4 KiB / 64 KiB / 1 MiB for both honest and same-size-tampered payloads; oversize, empty-object substitution, and Content-Length regression guards
cmd/proxy/lfs_tracker.go, lfs_tracker_types.go download_integrity_failed event for operator alerting via __lfs_ops_state
cmd/proxy/openapi.yaml, api/lfs-proxy/openapi.yaml IntegrityClaim component, missing_integrity_size / integrity_failure / presign_disabled error codes, updated descriptions reflecting buffer-then-verify behaviour, safe example bucket names
pkg/lfs/doc.go, pkg/lfs/envelope_test.go Safe example bucket names

Test plan

CI (currently green):

  • Go Test Suite — all cmd/proxy and pkg/lfs tests pass
  • go vet clean
  • CodeQL pass (actions, go, javascript-typescript, python)
  • Go Coverage Gate, Helm Lint, License Header Check

Manual smoke (reviewer):

  • Start proxy with KAFSCALE_LFS_PROXY_S3_BUCKET=kafscale-lfs → exits with blocklist error
  • Start proxy with empty KAFSCALE_LFS_PROXY_S3_BUCKET → exits with "required" error
  • Start proxy with valid bucket + region → starts normally
  • Stream-mode download with valid integrity → 200 OK, body matches; X-Kafscale-LFS-Checksum set to the verified SHA
  • Stream-mode download with wrong SHA → 502 code: integrity_failure, no body bytes returned
  • Stream-mode download without integrity.size → 400 code: missing_integrity_size
  • Presign-mode download without opt-in → 400 code: presign_disabled

Key wire-level regression tests (also in CI):

  • TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse at 256 B / 4 KiB / 64 KiB / 1 MiB → asserts tampered bytes never appear in response body. The 1 MiB case is the test the earlier framing-based designs would have failed.
  • TestStreamVerify_OversizedS3Response_RefusesToServe → defends against bucket-compromise oversized payload
  • TestStreamVerify_EmptyObjectSubstitution_RefusesToServe → deterministic 502 on 0-byte substitution
  • TestStreamVerify_ContentLengthSetOnlyAfterVerification → guard against regression that re-leaks bytes via Content-Length

Related

🤖 Generated with Claude Code

@kamir
Copy link
Copy Markdown
Collaborator Author

kamir commented Apr 17, 2026

Design Notes (preserved from PR preparation)

Why a blocklist?

Removing the examples from the spec fixes future copy-paste. But the bucket name kafscale-lfs has been public in this repo since commit 1b5dbd8. Every deployment that was configured by copying from those docs — Helm values, docker-compose files, CI scripts, customer PoCs — may still have kafscale-lfs hardwired in config files.

The blocklist is a runtime safety net for existing deployments: even if someone upgrades the proxy binary but doesn't touch their config, the proxy refuses to start instead of silently routing data to an attacker-controlled bucket.

The bucket name is permanently compromised — it can never be safe again, regardless of who controls it now. The code must reject it forever. This is the same pattern used by credential scanners that block known-leaked API keys even after rotation.

Breaking change

Deployments that relied on an empty KAFSCALE_LFS_PROXY_S3_BUCKET will now fail at startup with a clear error message. This is intentional — an empty bucket was never valid, it just failed silently downstream.

Test plan

  • go build ./cmd/proxy/ passes
  • go vet ./pkg/lfs/ passes
  • go test ./pkg/lfs/... passes
  • Start proxy with KAFSCALE_LFS_PROXY_S3_BUCKET=kafscale-lfs → exits with blocklist error
  • Start proxy with empty KAFSCALE_LFS_PROXY_S3_BUCKET → exits with "required" error
  • Start proxy with valid bucket + region → starts normally
  • Grep confirms no remaining kafscale-lfs bucket references (only in blocklist)

Context

This fix addresses a reported S3 bucket takeover vulnerability. A security researcher registered the kafscale-lfs bucket on AWS — the exact name used as examples in our OpenAPI specs. The broader config hardening plan (centralized config package, startup validation for all binaries, console auth fix) is tracked in scalytics-all-in-one/PLANS/plan-001.md.

@kamir
Copy link
Copy Markdown
Collaborator Author

kamir commented Apr 17, 2026

Scope expansion — adding LFS download integrity fix

Per review feedback, consolidating follow-up fixes into this PR rather than opening a separate one.

Additional changes being added:

LFS download integrity verification — closes the tampering vector that complements the exfiltration fix already in this PR.

Current behavior (still vulnerable after just the blocklist fix):

  • Any attacker/insider with S3 write access can tamper with LFS objects post-upload
  • Neither stream nor presign mode verifies the envelope's sha256 checksum

Planned fix:

  • Stream mode becomes default. Hashes bytes on the fly (io.TeeReader), compares to envelope checksum, aborts TCP connection on mismatch (no trailers — poor client support). Emits integrity_failure event to __lfs_ops_state tracker.
  • Presign mode is disabled by default. Opt-in via KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. When enabled, proxy logs a warning banner at startup and every 5 minutes. Response body includes integrity block with sha256/checksum_alg/size so SDK can verify client-side.
  • Advisory response headers on stream mode: X-Kafscale-LFS-Checksum and X-Kafscale-LFS-Content-Length — let advanced clients hash independently and fail-fast without waiting for connection close.
  • Updated OpenAPI spec reflects new default, presign_disabled error code, integrity block.

Trust model being enforced:

Kafka is the authority. S3 is untrusted storage.

The envelope lives in Kafka (attacker can't modify it). Downloaded bytes from S3 are verified against the Kafka-held checksum on every read.

Breaking change (intentional):

Default mode changes from presignstream. Existing clients that hardcoded mode: presign must also have the operator set KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. This is secure-by-default by design.

Full spec

Detailed API behavior, acceptance criteria, and out-of-scope items are in the forthcoming commit. The broader platform-wide integrity audit (broker segment CRC, RecordBatch CRC32C, S3 checksums) is tracked separately in scalytics-all-in-one/PLANS/plan-002.md and will be addressed in follow-up PRs.

Pushing commits shortly.

@novatechflow
Copy link
Copy Markdown
Collaborator

streamDownloadWithVerify writes the full body with io.Copy and sets Content-Length before comparing the digest. If the object is tampered but keeps the same size, the client can still receive a complete 200 OK response and treat it as success. Closing the connection afterward does not reliably turn that into a failed download.

The patch does not guarantee “truncated response on mismatch”, verification has to happen before sending the response, or the protocol needs an explicit post-body failure signal.

@kamir
Copy link
Copy Markdown
Collaborator Author

kamir commented Apr 21, 2026

Hi Alex, thanks for the comment. I will work on it and update the PR.

@novatechflow
Copy link
Copy Markdown
Collaborator

@kamir - are you still working on that?

@kamir
Copy link
Copy Markdown
Collaborator Author

kamir commented May 15, 2026

Yes, I am on it.

A security researcher registered the `kafscale-lfs` S3 bucket on AWS —
the same name used as an example throughout the OpenAPI specs and docs.
Any deployment using this default would silently route LFS uploads to an
attacker-controlled bucket, enabling data exfiltration and content
tampering.

Three changes:

1. Replace all `kafscale-lfs` example values in both OpenAPI specs
   (cmd/proxy/ and api/lfs-proxy/) with safe placeholder `my-bucket`.

2. Add runtime blocklist in initLFSModule(): the proxy now refuses to
   start if the configured S3 bucket matches any known-unsafe example
   name. This protects existing deployments that upgrade the binary but
   don't update their config.

3. Require KAFSCALE_LFS_PROXY_S3_BUCKET and KAFSCALE_LFS_PROXY_S3_REGION
   at startup when LFS is enabled. Previously empty values were silently
   accepted, causing downstream failures or unsafe fallback behavior.

Reported-by: Milan Katwal <milankatwal2056@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kamir kamir force-pushed the fix/s3-bucket-takeover-cve branch from 3235759 to e2931e2 Compare May 16, 2026 09:24
kamir added a commit to kamir/kafscale that referenced this pull request May 16, 2026
…on mismatch

The LFS envelope records a SHA-256 checksum at upload time, but the download
handler never verified it. A consumer receiving tampered content from a
compromised S3 bucket had no way to detect the modification. This closes the
tampering vector that complements the exfiltration fix already in this PR.

Trust model enforced:
  Kafka is the authority. S3 is untrusted storage.

The envelope lives in Kafka (attacker cannot modify it). Downloaded bytes
from S3 are verified against the Kafka-held checksum on every read.

Response framing: chunked transfer encoding + integrity trailer + truncate
on mismatch. Detailed below because the first iteration of this commit
(reviewed on PR KafScale#139 by @novatechflow) had a critical framing bug — it set
Content-Length up-front and verified after io.Copy, so same-size tampering
was undetectable from the client's perspective. This commit replaces that
attempt with a framing-correct design.

Behavior changes:

* **Default mode is now `stream`** (was `presign`). Secure-by-default.

* **Stream mode hashes bytes on the fly** via io.TeeReader with SHA-256
  and compares against the client-supplied envelope checksum.

* **Stream-mode responses use chunked transfer encoding** — Content-Length
  is deliberately NOT set. On verification failure the proxy aborts the
  connection BEFORE Go's chunkWriter emits the terminating "0\r\n\r\n",
  producing a truncated chunked stream. Per RFC 7230 §4.1, HTTP/1.1
  clients MUST treat this as a transport error. This is the framing-level
  signal that makes "verification failed" visible to the client's HTTP
  parser, not just to server logs.

* **Trailer "X-Kafscale-LFS-Verify"** is declared up-front and set to "pass"
  only on a successful match. On mismatch the trailer is never emitted.
  This is defense in depth alongside the chunked truncation; the missing
  terminator is the primary signal because trailer support varies among
  HTTP clients.

* **HTTP/2 path** has no Hijacker. panic(http.ErrAbortHandler) sends
  RST_STREAM, an explicit framing failure in HTTP/2 — sufficient signal
  on that transport.

* **Presign mode is disabled by default**, opt-in via
  KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. When enabled, proxy logs a
  warning banner at startup and every 5 minutes. Response body includes
  an `integrity` block (sha256, checksum_alg, size) so client SDKs can
  verify downloaded bytes themselves.

* **Integrity claim is required** on all download requests. Missing
  integrity.sha256 → 400 missing_integrity.

* **Advisory response headers** on stream mode: X-Kafscale-LFS-Checksum
  and X-Kafscale-LFS-Content-Length. Set before body flush so advanced
  clients can hash independently and fail-fast without waiting for the
  trailer or connection close.

* **Ops-tracker event** download_integrity_failed emits to
  __lfs_ops_state with expected/actual SHA-256, bytes read, and mode —
  gives operators a Kafka-native alerting signal for tampering attempts.

Breaking change (intentional): clients that did not send an integrity
block must add one. Clients that relied on the default mode being presign
must either request presign explicitly (and have the operator opt in) or
switch to stream.

OpenAPI specs (cmd/proxy/openapi.yaml and api/lfs-proxy/openapi.yaml)
updated with the new schema, IntegrityClaim component, presign_disabled
error code, and safe example values (no real bucket names).

Tests:
  * TestHandleHTTPDownload_StreamVerifyMatch — happy path (recorder)
  * TestHandleHTTPDownload_StreamVerifyMismatch — handler panics
  * TestHandleHTTPDownload_PresignMode — requires integrity, echoes it back
  * TestHandleHTTPDownload_PresignDisabledByDefault — 400 when not opted in
  * TestHandleHTTPDownload_MissingIntegrity — 400 when claim absent
  * TestHandleHTTPDownload_DefaultModeIsStream — confirms new default
  * TestStreamVerify_HonestObject_ChunkedAndTrailerPass — on-wire test:
    asserts Content-Length absent, Transfer-Encoding=chunked, body intact,
    trailer X-Kafscale-LFS-Verify=pass present.
  * TestStreamVerify_SameSizeTamper_ClientSeesTransportError — regression
    test for the framing bug. S3 object replaced with same-length tampered
    content; asserts io.ReadAll returns transport error AND trailer is not
    "pass". Observed: readErr=unexpected EOF, trailer="".
  * TestStreamVerify_NoContentLengthHeader_OnResponse — guard against
    regression if a future change reintroduces Content-Length.

All cmd/proxy and pkg/lfs test suites pass; go vet clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kamir kamir force-pushed the fix/s3-bucket-takeover-cve branch from e2931e2 to c4b230f Compare May 16, 2026 10:18
@kamir
Copy link
Copy Markdown
Collaborator Author

kamir commented May 16, 2026

@novatechflow — you were right, the previous commit didn't actually fix the integrity vector. Replaced e2931e2 with c4b230f.

What was wrong: streamDownloadWithVerify set Content-Length: N up-front, hashed during io.Copy, and verified after. By then the client had already received a syntactically complete 200 OK + N bytes. The Hijack+Close / panic(http.ErrAbortHandler) aborted the server handler but were invisible to the client's HTTP parser. Same-size tampering bypassed the check exactly as you described.

Fix (c4b230f):

  1. Drop Content-Length — response uses chunked transfer encoding (Go's net/http switches automatically).
  2. Hijack+Close on mismatch happens before Go writes the terminating 0\r\n\r\n chunk → client sees a truncated chunked stream → RFC 7230 §4.1 mandates this as a transport error.
  3. Trailer: X-Kafscale-LFS-Verify declared up-front, set to pass only on match (defense in depth, not the primary signal — trailer support varies).
  4. HTTP/2 path: panic(http.ErrAbortHandler) → server sends RST_STREAM, an explicit framing failure.

Tests (wire-level, via httptest.NewServer — the previous recorder-based test couldn't catch the bug):

Test Asserts
TestStreamVerify_HonestObject_ChunkedAndTrailerPass Content-Length absent, Transfer-Encoding: chunked, body intact, trailer = pass
TestStreamVerify_SameSizeTamper_ClientSeesTransportError regression test for your report — same-length tampered S3 object; client io.ReadAll returns transport error AND trailer ≠ pass. Observed: readErr=unexpected EOF, trailer=""
TestStreamVerify_NoContentLengthHeader_OnResponse guard — fails if anyone reintroduces Content-Length

All 26 cmd/proxy + pkg/lfs suites pass, go vet clean. CI re-running.

Out of scope (separate PR): per-chunk checksums in the envelope would let us reject before any body bytes are sent — requires envelope schema v2 + producer cooperation.

@kamir
Copy link
Copy Markdown
Collaborator Author

kamir commented May 16, 2026

Self-review of c4b230f found the chunked-truncation approach has architectural holes (bytes > ~4 KB still reach the client before the abort; the headline test was passing because 256-byte payloads never flush past Go's internal chunk buffer; HTTP intermediaries swallow the truncation signal; trailer-based signal doesn't reach Python requests / JS fetch / curl --output). Replacing with buffer-then-verify-then-stream: read S3 fully into a temp file while hashing, only emit 200 + Content-Length after the hash matches, return 502 with structured error on mismatch. Pushing shortly. /cc @novatechflow — please hold review until the next push lands.

…en-stream

The LFS envelope records a SHA-256 checksum at upload time, but the download
handler never verified it. A consumer receiving tampered content from a
compromised S3 bucket had no way to detect the modification. This closes the
tampering vector that complements the exfiltration fix already in this PR.

Trust model enforced:
  Kafka is the authority. S3 is untrusted storage.

The envelope lives in Kafka (attacker cannot modify it). Downloaded bytes
from S3 are verified against the Kafka-held checksum BEFORE any bytes are
sent to the client.

Design: buffer-then-verify-then-stream. Two earlier iterations on this PR
attempted to verify after streaming and signal failure via response framing
(Content-Length truncation, then chunked-encoding truncation + trailer).
Both were unsound for production traffic:

  * Setting Content-Length and verifying after io.Copy: the client receives
    a complete, syntactically valid 200 OK + N bytes before the abort. Same-
    size tampering goes undetected. Flagged by @novatechflow on PR KafScale#139.

  * Chunked transfer encoding with trailer-based signal: for any payload
    larger than Go's internal chunk-writer buffer (~4 KB), tampered bytes
    are already on the wire before the abort; the client's io.ReadAll
    returns the bytes plus an unexpected-EOF error. HTTP intermediaries
    (nginx-ingress, ALB, CDNs) buffer responses and erase the truncation
    signal. Trailer-based signalling is invisible to most HTTP client
    libraries (Python requests, JavaScript fetch, curl --output file).

Adversarial review confirmed neither framing trick provides the security
property advertised. The only architecturally honest approach is to verify
before sending — buffer the S3 object to temporary disk while hashing, and
only begin streaming to the client after the hash matches the envelope.

Behavior:

  * Default mode is now `stream` (was `presign`). Secure-by-default.

  * Stream mode: handler reads S3 into os.CreateTemp(...), computes SHA-256
    via io.MultiWriter, compares against the client-supplied envelope
    checksum. On match: seeks the temp file to 0, sets Content-Length and
    X-Kafscale-LFS-Checksum, writes 200 OK, streams the temp file to the
    client. On mismatch: returns 502 with structured JSON error code
    "integrity_failure". No tampered bytes ever reach the client.

  * S3 read is bounded by io.LimitReader(body, expectedSize+1). A compromised
    bucket cannot exhaust proxy disk by returning a payload larger than the
    envelope declares. If the read produces > expectedSize bytes, the handler
    returns 502 integrity_failure immediately (before hashing).

  * Integrity.Size is required on stream-mode requests (in addition to
    Integrity.SHA256). Missing size → 400 missing_integrity_size. This is
    what makes the LimitReader cap enforceable.

  * Presign mode is disabled by default, opt-in via
    KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true. When enabled, proxy logs a
    warning banner at startup and every 5 minutes. Response body includes
    an `integrity` block (sha256, checksum_alg, size) so client SDKs can
    verify downloaded bytes themselves.

  * Integrity.SHA256 is required on all download requests (stream and
    presign). Missing claim → 400 missing_integrity.

  * Temporary file is created via os.CreateTemp (respects TMPDIR), and
    removed in a defer regardless of code path. Production deployments
    should size the pod tmpfs / writable layer for the largest expected
    LFS object served concurrently.

  * Ops-tracker event download_integrity_failed emits to __lfs_ops_state
    with expected/actual SHA-256, bytes read, and mode — gives operators a
    Kafka-native alerting signal for tampering attempts.

Breaking changes (intentional):

  * Stream-mode clients MUST send Integrity.SHA256 AND Integrity.Size.
  * Default mode changed from presign to stream.
  * Presign mode requires KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true.

OpenAPI specs (cmd/proxy/openapi.yaml and api/lfs-proxy/openapi.yaml)
updated with the IntegrityClaim component, presign_disabled error code,
missing_integrity_size error code, integrity_failure 502 response, and
safe example values (no real bucket names).

Tests (cmd/proxy/lfs_http_test.go):

  Recorder-based (handler-level):
    * TestHandleHTTPDownload_StreamVerifyMatch — verified response is
      200 OK with Content-Length set and correct body
    * TestHandleHTTPDownload_StreamVerifyMismatch_ReturnsErrorJSON —
      mismatch → 502 integrity_failure, tampered bytes NOT in response
    * TestHandleHTTPDownload_StreamRequiresIntegritySize — 400
      missing_integrity_size when Size omitted
    * TestHandleHTTPDownload_DefaultModeIsStream — confirms default
      routing reaches stream-verify path (not presign)
    * TestHandleHTTPDownload_PresignMode — presign requires integrity,
      echoes it back in response
    * TestHandleHTTPDownload_PresignDisabledByDefault — 400 when not
      opted in
    * TestHandleHTTPDownload_MissingIntegrity — 400 when claim absent

  Wire-level (httptest.NewServer + http.Client):
    * TestStreamVerify_HonestObject_AtMultipleSizes — verifies the
      success path at 256 B, 4 KiB, 64 KiB, 1 MiB. Asserts 200 OK,
      Content-Length matches payload length, body delivered intact.
    * TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse — the
      headline regression test for @novatechflow's report. Tampered S3
      object of the same length as the envelope claims, at 256 B / 4 KiB
      / 64 KiB / 1 MiB. Asserts response is 502 integrity_failure and
      tampered bytes do NOT appear anywhere in the response. Earlier
      iterations of this fix would have leaked the full 1 MiB tampered
      payload to the client here.
    * TestStreamVerify_OversizedS3Response_RefusesToServe — S3 returns
      10x more bytes than envelope declares. Asserts 502 and that
      attacker bytes do not appear in response.
    * TestStreamVerify_EmptyObjectSubstitution_RefusesToServe — S3
      replaced with 0-byte object. Asserts deterministic 502 (earlier
      designs were racey on this case).
    * TestStreamVerify_ContentLengthSetOnlyAfterVerification — guard
      against regression that would set Content-Length to S3 size on a
      non-verified path.

All cmd/proxy and pkg/lfs test suites pass; go vet clean.

Operational notes:

  * Trade-off: download latency increases by the S3-read time before the
    client sees the first byte. Acceptable for LFS-class workloads
    (integrity-critical, not latency-critical).
  * Temp file disk use is bounded per request by Integrity.Size. Operators
    should ensure the pod's writable layer or TMPDIR-backed volume has
    capacity for the largest concurrent download workload.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kamir kamir force-pushed the fix/s3-bucket-takeover-cve branch from c4b230f to 64c8348 Compare May 16, 2026 12:47
@kamir
Copy link
Copy Markdown
Collaborator Author

kamir commented May 16, 2026

Replaced c4b230f with 64c8348buffer-then-verify-then-stream. The framing-trick designs are gone; verification now happens before any byte of the body reaches the client.

How 64c8348 works

streamDownloadWithVerify in cmd/proxy/lfs_http.go:

  1. os.CreateTemp → opens a temp file (cleaned up in defer)
  2. io.LimitReader(obj.Body, expectedSize+1) → bounds the S3 read to Integrity.Size+1 bytes; defends against a compromised bucket returning more bytes than the envelope declares
  3. io.Copy(io.MultiWriter(tmpFile, hasher), limitedReader) → simultaneously writes S3 bytes to temp file and feeds SHA-256
  4. If written > expectedSize502 integrity_failure, abort
  5. If actualSHA != expectedSHA502 integrity_failure, abort
  6. Only on match: seek temp file to 0, set Content-Length: <verified size>, WriteHeader(200), io.Copy temp → client

Integrity.Size is now required on stream-mode requests (in addition to Integrity.SHA256) — that's what makes the LimitReader cap enforceable. Missing → 400 missing_integrity_size.

Mapping your concern → fix → tests

Your point How the fix addresses it Test
"verification has to happen before sending the response" SHA verified against temp file BEFORE w.WriteHeader(200) TestStreamVerify_HonestObject_AtMultipleSizes (success path), TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse (mismatch path)
"same size tampered → complete 200 OK" Mismatch returns 502 with JSON {code: "integrity_failure"}; tampered bytes never touch the response body TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse at 256 B, 4 KiB, 64 KiB, 1 MiB — asserts bytes.Contains(respBody, tamperedPayload) == false. The 1 MiB case is the critical one: the previous chunked-truncation design would have leaked the entire payload here
"closing the connection afterward does not reliably turn that into a failed download" No connection close trick — explicit 502 status. Works for every HTTP client (Python requests, JS fetch, curl --output, all SDKs) and survives every HTTP intermediary (nginx, ALB, CDN, ingress controllers) All wire-level tests above

Self-review issues from the previous iteration, also fixed

Issue (from my self-review of c4b230f) Fix in 64c8348
Tampered bytes >4 KB delivered before abort No streaming until verified — applies to all sizes
Headline test passing because 256 B never flushes Tests now run at 256 B, 4 KiB, 64 KiB, 1 MiB with byte-content assertions
TestHandleHTTPDownload_DefaultModeIsStream testing sha256("") == sha256("") Test rewritten to use a real envelope and assert it reaches the verify path
Empty-object substitution racey Now deterministic 502 — covered by TestStreamVerify_EmptyObjectSubstitution_RefusesToServe
Trailer signal invisible to Python/JS/curl clients No trailers — plain 502 with JSON body
Reverse proxy / ingress swallows truncation signal No truncation — clean 502 propagates through any HTTP intermediary
No oversize-payload defense io.LimitReader(body, expectedSize+1) + explicit check — TestStreamVerify_OversizedS3Response_RefusesToServe
HTTP/2 path uncovered No transport-specific code paths anymore (no Hijack, no panic) — works identically on HTTP/1.1 and HTTP/2
Advisory X-Kafscale-LFS-Checksum echoed input Now set to the proxy-computed actualSHA on the success response

Test summary

All cmd/proxy + pkg/lfs suites pass; go vet clean. Stream-verify tests:

TestStreamVerify_HonestObject_AtMultipleSizes
    size=256        PASS
    size=4096       PASS
    size=65536      PASS
    size=1048576    PASS
TestStreamVerify_SameSizeTamper_NoTamperedBytesInResponse
    size=256        PASS    (no tampered bytes in 502 body)
    size=4096       PASS
    size=65536      PASS
    size=1048576    PASS    (key regression — previous design would leak 1 MiB here)
TestStreamVerify_OversizedS3Response_RefusesToServe        PASS
TestStreamVerify_EmptyObjectSubstitution_RefusesToServe    PASS
TestStreamVerify_ContentLengthSetOnlyAfterVerification     PASS
TestHandleHTTPDownload_StreamVerifyMatch                   PASS
TestHandleHTTPDownload_StreamVerifyMismatch_ReturnsErrorJSON  PASS
TestHandleHTTPDownload_StreamRequiresIntegritySize         PASS
TestHandleHTTPDownload_DefaultModeIsStream                 PASS
TestHandleHTTPDownload_MissingIntegrity                    PASS
TestHandleHTTPDownload_PresignDisabledByDefault            PASS
TestHandleHTTPDownload_PresignMode                         PASS

Trade-offs

  • Latency: client waits for the full S3 read before receiving the first byte. Acceptable for LFS workloads (integrity-critical, not latency-critical).
  • Temp disk: os.CreateTemp honors TMPDIR. Operators should size the pod's writable layer / tmpfs for the largest concurrent download workload. Per-request use is bounded by Integrity.Size.

Breaking changes (documented in commit message)

  • Stream-mode clients must send Integrity.SHA256 AND Integrity.Size.
  • Default mode changed presignstream.
  • Presign mode requires KAFSCALE_LFS_PROXY_PRESIGN_ENABLED=true.

Three spec passages still described the discarded "abort mid-stream on
mismatch" framing approach. Updated to reflect what the code in 64c8348
actually does:

  * Stream-mode endpoint description: buffer S3 → verify SHA-256 →
    only stream on match; mismatch → 502 integrity_failure with no
    body bytes returned.
  * application/octet-stream 200 response: clarify Content-Length is
    the verified size; mismatch yields 502 not a truncated body.
  * 502 response: enumerate possible error codes (s3_get_failed,
    s3_presign_failed, integrity_failure).
  * mode property: note stream requires both integrity.sha256 AND
    integrity.size; mismatch or oversize → 502 integrity_failure.

Applied identically to cmd/proxy/openapi.yaml and api/lfs-proxy/openapi.yaml.
No code change; tests and go vet unchanged-green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@novatechflow
Copy link
Copy Markdown
Collaborator

@kamir - when we have breaking changes we need to update the documentation in the main branch as well as in gh-pages so we don't have drift between code and docs. Can you please add this into the PR too? gh-pages can be pushed and merged by everyone.

@novatechflow
Copy link
Copy Markdown
Collaborator

Did a review, two new bugs opened:

  1. pkg/storage/recovery.go uses ListSegments with default/<topic> instead of default/<topic>/. Since the S3 client does prefix matching, restoring orders will also pick up keys for topics like orders-restore or orders-v2. On the source side that can make a valid restore fail when inspectSourceSegment rejects those keys; on the target side it can falsely report that the target already has segments.

  2. cmd/kafscale-cli/main.go creates the target topic in etcd before any S3 copy happens. If RecoverTopicToTimestamp fails afterward, the command exits and leaves the new topic behind even though the restore never completed. That makes retries and cleanup messy and can expose an empty or partial topic to clients.

…s DoS

Pre-review of the buffer-then-verify design surfaced a gap: streamDownload-
WithVerify allocates os.CreateTemp per request and reads up to expectedSize+1
bytes from S3 with no upper bound on what the caller may claim.

Without a cap, an authenticated caller (or a compromised producer upstream
of the envelope) can claim Integrity.Size = TB-scale and force the proxy
to attempt that much temp storage per request — a DoS vector in the same
"trust the wrong input" class as the original integrity bug. Also closes
the integer-overflow corner on expectedSize+1 when Size approaches int64
max.

Fix:
  * Reject stream-mode requests when Integrity.Size > m.maxBlob (default
    KAFSCALE_LFS_PROXY_MAX_BLOB_SIZE = 5 GiB). New error code
    payload_too_large, 400 BadRequest. Skipped when maxBlob is 0
    (unlimited, operator opt-in).
  * Added TestHandleHTTPDownload_StreamRejectsOversizedIntegritySize —
    seeds maxBlob = 10 MiB, claims Size = maxBlob+1, asserts 400
    payload_too_large.
  * OpenAPI specs document payload_too_large under the 400 response.

Drive-by spec fix: openapi.yaml had `description` set on the
application/octet-stream MediaType object — not allowed by OpenAPI 3.0.
Moved into the Schema object where description IS allowed (and where the
caller would naturally look for binary-body docs).

All cmd/proxy + pkg/lfs tests pass; go vet clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kamir
Copy link
Copy Markdown
Collaborator Author

kamir commented May 16, 2026

@novatechflow — picking up both of your comments:

1. gh-pages drift (your 13:22 comment): done. Opened companion PR #143 against gh-pages. Updates _docs/lfs-proxy.md, _docs/lfs-helm.md, _docs/lfs-sdks.md to fix env-var names (LFS_*KAFSCALE_LFS_PROXY_*), drop the unsafe kafscale bucket example, replace stale endpoint POST /v1/topics/{topic}/records with the real POST /lfs/produce, and adds a new Trust model and integrity verification section documenting buffer-then-verify, required integrity.sha256 + integrity.size, the KAFSCALE_LFS_PROXY_PRESIGN_ENABLED opt-in, and the new error codes. PR #143 is mergeable independently of this one — happy for you (or anyone) to merge it as soon as the rendered preview looks good.

2. recovery.go + kafscale-cli (your 13:27 comment): thanks for spotting both. Out of scope for this PR (#139 is exfiltration + download integrity); will track them as separate fixes:

  • pkg/storage/recovery.go ListSegments prefix-match bug (default/<topic> vs default/<topic>/) — separate PR
  • cmd/kafscale-cli/main.go topic-create-before-restore-fails leaving orphaned topic — separate PR

Want me to open both as GitHub issues, or you've already opened them somewhere?

3. While preparing for re-review I also pushed 5870a0e on this branch: caps download Integrity.Size at KAFSCALE_LFS_PROXY_MAX_BLOB_SIZE (default 5 GiB), returning 400 payload_too_large over the limit. Without this cap, an authenticated caller could claim TB-scale Integrity.Size and force the proxy to allocate that much temp storage per request — same "trust the wrong input" class as the original integrity bug, surfaced by a pre-review pass before pinging you. New regression test TestHandleHTTPDownload_StreamRejectsOversizedIntegritySize. OpenAPI 400 response now enumerates this and the other new codes.

PR #139 state: 4 commits, mergeable, all CI green. Ready for your review when you have a window.

@kamir
Copy link
Copy Markdown
Collaborator Author

kamir commented May 16, 2026

Pushed follow-up commit da634bb to harden the PR after review.

What changed:

  • Validate integrity.sha256 as a 64-character hex SHA-256 digest before any S3 I/O.
  • Reject negative sizes and the MaxInt64 overflow case even when the download size cap is disabled.
  • Distinguish temporary verification-storage failures from S3 read failures.
  • Expose X-Kafscale-LFS-Checksum and X-Kafscale-LFS-Content-Length through CORS.
  • Make fake S3 return missing-key errors instead of silent empty objects.
  • Update OpenAPI and docker-compose download examples for stream-mode integrity.

Verification:

  • GOCACHE=/private/tmp/kafscale-go-cache go test ./cmd/proxy ./pkg/lfs
  • GOCACHE=/private/tmp/kafscale-go-cache go vet ./cmd/proxy ./pkg/lfs
  • git diff --check

@novatechflow
Copy link
Copy Markdown
Collaborator

still the same issues.

  1. in pkg/storage/recovery.go (line 103), still missing the trailing /, so the prefix-match bug is still present.
targetPrefix := path.Join(cfg.TargetNamespace, cfg.TargetTopic)
existing, err := s3.ListSegments(ctx, targetPrefix)
...
sourcePrefix := path.Join(cfg.SourceNamespace, cfg.SourceTopic)
objects, err := s3.ListSegments(ctx, sourcePrefix)

In cmd/kafscale-cli/main.go (line 194), it still does:

store.CreateTopic(...)
store.UpdateTopicConfig(...)
result, err := storage.RecoverTopicToTimestamp(...)
if err != nil {
    return err
}

So the target topic is still created before the S3 restore runs, with no rollback on failure. That issue is still present too.

@novatechflow novatechflow mentioned this pull request May 17, 2026
@novatechflow novatechflow self-requested a review May 17, 2026 08:59
@novatechflow novatechflow merged commit 8e11c4d into KafScale:main May 17, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants