Skip to content

feat(broker): INIT_PRODUCER_ID + LIST_GROUPS fixes + standalone Helm chart for smoke tests#142

Open
kamir wants to merge 3 commits into
KafScale:mainfrom
kamir:feat/broker-smoke-and-protocol-fixes
Open

feat(broker): INIT_PRODUCER_ID + LIST_GROUPS fixes + standalone Helm chart for smoke tests#142
kamir wants to merge 3 commits into
KafScale:mainfrom
kamir:feat/broker-smoke-and-protocol-fixes

Conversation

@kamir
Copy link
Copy Markdown
Collaborator

@kamir kamir commented May 16, 2026

Summary

Three small, independent changes that together unblock smoke-test convergence for the scalytics-all-in-one (SAO) bp-001 Ops Foundation scenarios and resolve OPS-005 items #1 and #2.

Commit What Why
feat(broker): INIT_PRODUCER_ID stub handler (OPS-005 #1) Minimum-viable handler for Kafka API key 22. Allocates monotonic PIDs, epoch=0. Adds dispatch + apiVersions entry. Unblocks Java AdminClient default producers, franz-go idempotent producers, and Schema Registry's producer-init probe. Was: UnsupportedVersionException.
fix(broker): advertise LIST_GROUPS versions 0-5 (was 5-5) One-line widening of the advertised version range. Java admin client / kafka-consumer-groups / Schema Registry negotiate [0,4] — narrow 5-5 window caused UnsupportedVersionException: ... supported range is [5,5]. The underlying handler is version-agnostic.
feat(deploy): add kafscale-broker-standalone Helm chart New minimal chart deploy/helm/kafscale-broker-standalone/ — single broker Pod pointing at external etcd + S3 (MinIO or cloud). Lets smoke tests deploy a broker on KIND without the full operator-based chart at deploy/helm/kafscale/. Not a replacement for the operator chart — explicit smoke-test scope, documented in commit message.

Scope

  • 5 files changed, ~110 LOC
  • No protocol-breaking changes
  • No changes to the operator-based chart at deploy/helm/kafscale/

Verification

Change How verified
INIT_PRODUCER_ID kafka-console-producer --producer-property enable.idempotence=true now succeeds; kaf-mirror (franz-go) replicates primary→standby with offsets matching and lag <1s; SCEN-bp002-06_Replication flipped SKIP → PASS.
LIST_GROUPS kafka-consumer-groups --bootstrap-server kafscale-broker:9092 --list exits 0 cleanly (was UnsupportedVersionException).
Standalone Helm chart Used by SAO bp-001 smoke suite: COMP-kafscale-01 (pod Ready) + COMP-kafscale-02 (Kafka TCP reachable).

Known limitations (for INIT_PRODUCER_ID)

Tracked in OPS-005 — not addressed in this PR:

  • No sequence-number tracking → duplicate-on-retry semantics not enforced
  • No epoch management → fencing of stale producers on rebalance not implemented
  • PID allocator is process-local, not persisted across broker restart

Remaining OPS-005 items (transactional APIs, Schema Registry NPE on verifySchemaTopic) are substantive broker-engineering work and remain open.

Test plan

  • make test passes (Go test suite)
  • helm lint deploy/helm/kafscale-broker-standalone passes (Helm Lint CI)
  • Reviewer can deploy the standalone chart on KIND and reach the broker on :9092
  • Reviewer confirms LIST_GROUPS and InitProducerID v0 negotiation with Java AdminClient

Relationship to PR #139

This branch was originally co-mingled with the security fixes on fix/s3-bucket-takeover-cve. The branches have been separated cleanly:

The two PRs are independent and can be reviewed/merged in any order.

🤖 Generated with Claude Code

kamir and others added 3 commits May 16, 2026 11:25
Minimal chart that deploys a single kafscale-broker Pod pointing at external
etcd + S3 (MinIO or cloud). Intended for quick smoke tests and blueprint
convergence on KIND — NOT a replacement for the full operator-based chart at
deploy/helm/kafscale/.

Used by scalytics-all-in-one bp-001 Ops Foundation smoke suite:
COMP-kafscale-01 (pod Ready) + COMP-kafscale-02 (Kafka TCP reachable).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves OPS-005 item #2. The Java admin client (kafka-consumer-groups,
AdminClient.listConsumerGroups, Schema Registry) negotiates LIST_GROUPS in
range [0,4]. Advertising a narrow 5-5 window caused:

  UnsupportedVersionException: Error listing groups ...
  The broker does not support LIST_GROUPS with version in range [0,4].
  The supported range is [5,5].

The underlying h.coordinator.ListGroups handler is version-agnostic; the
encoder handles v0 just as well as v5. The fix is one line — widen the
advertised range.

Verified: kafka-consumer-groups --bootstrap-server kafscale-broker:9092
--list now exits 0 cleanly (from UnsupportedVersionException pre-fix).

The remaining OPS-005 items (INIT_PRODUCER_ID, transactional APIs, Schema
Registry NPE on verifySchemaTopic) are substantive broker-engineering
work and are not addressed here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a minimum-viable implementation of the Kafka INIT_PRODUCER_ID API
(API key 22). Allocates a monotonically-increasing producer ID with epoch 0;
does not yet track sequence numbers or deduplicate on replay. Sufficient to
unblock Java AdminClient default producers, franz-go idempotent producers,
and Schema Registry's producer-init probe.

Changes:
  - pkg/protocol/api.go: add APIKeyInitProducerID = 22
  - cmd/broker/main.go:
      * handler gains nextProducerID int64 (atomic allocator)
      * dispatch case for *kmsg.InitProducerIDRequest returns pid + epoch=0
      * apiVersions: InitProducerID moves from unsupported to {0, 4}
      * import sync/atomic

Verified:
  - `kafka-console-producer --producer-property enable.idempotence=true`
    now succeeds (was: UnsupportedVersionException).
  - kaf-mirror (franz-go) replicates primary→standby end-to-end:
    PRIMARY offsets = STANDBY offsets, measured lag <1s.
  - SCEN-bp002-06_Replication scenario test flipped SKIP → PASS.

Known limitations (production correctness gap, tracked in OPS-005):
  - no sequence-number tracking: duplicate-on-retry semantics not enforced
  - no epoch management: fencing of stale producers on rebalance not implemented
  - PID allocator is process-local, not persisted across broker restart

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant