Add gRPC message size limit handling to query execution by btipling · Pull Request #116 · ConductorOne/baton-sql

btipling · 2026-04-08T15:17:26Z

Description

New feature

This PR adds gRPC message size limit handling to prevent responses from exceeding the 4MB gRPC message size limit. The changes introduce size-aware query execution that tracks response sizes and terminates queries early when approaching the limit.

Changes

Core Changes:

Added runQueryWithSizeLimit() method that executes queries with size-based early termination
Introduced queryResult struct to track pagination tokens, total size, item count, and whether size limits were hit
Added estimateRowSize() function to estimate serialized size of database rows
Defined constants for gRPC limits (3.5MB conservative threshold, size estimate overhead, etc.)

Size Estimation Functions:

estimateResourceSize() - estimates serialized size of Resource objects
estimateEntitlementSize() - estimates serialized size of Entitlement objects
estimateGrantSize() - estimates serialized size of Grant objects

Updated Query Execution:

Modified List() in resources.go to use size-aware query execution
Modified dynamicEntitlements() in entitlements.go to use size-aware query execution
Modified listGrants() in grants.go to use size-aware query execution
All callbacks now return item size information for accurate tracking

How It Works

The new runQueryWithSizeLimit() method:

Executes queries with standard pagination limits
Estimates the serialized size of each row before processing
Stops early if adding the next item would exceed the 3.5MB threshold
Generates a next page token when either pagination or size limits are hit
Allows callbacks to provide actual item sizes for more accurate tracking

This ensures that responses stay well below the 4MB gRPC limit while maintaining pagination functionality.

Test Plan

Existing tests pass. The size limit handling is transparent to callers - pagination behavior remains unchanged, with the addition of early termination when size limits are approached.

https://claude.ai/code/session_01Gg185kHrjNU2aznuRQ7CsG

Add size-aware query execution that tracks approximate result sizes and automatically stops fetching early when approaching the 4MB gRPC message size limit. This prevents "message too big" errors by: - Adding runQueryWithSizeLimit() that estimates serialized size during query - Stopping early when cumulative size approaches 3MB (conservative threshold) - Returning with a next page token so subsequent calls continue from where we stopped - Adding size estimation functions for resources, entitlements, and grants Fixes CXH-1300 https://claude.ai/code/session_01Gg185kHrjNU2aznuRQ7CsG

github-actions · 2026-04-08T15:18:44Z

Claude finished @btipling's task in 2m 15s —— View job

PR Review: gRPC message size limit handling

Gather context and read changed files
Analyze code changes for bugs, security, and best practices
Post review feedback

Overall: Good idea to add size-based early termination to prevent hitting the 4MB gRPC limit. The approach of tracking cumulative size and breaking early with a next-page token is sound. However, there are two critical pagination bugs that will cause data loss when the size limit is triggered.

Critical Issues

Cursor-based pagination skips rows on size limit (query.go:588-600) — When the size limit is hit, lastRowID has already been updated to the current (unprocessed) row's primary key. The next page query uses WHERE id > lastRowID, permanently skipping that row. Fix: track lastProcessedRowID separately, only updating it after the callback succeeds.
Offset-based pagination skips rows on size limit (query.go:641-650) — nextPageToken computes offset + limit, but when the size limit stops processing early (e.g., 30 of 100 rows), the next page skips ahead by the full limit, losing rows 30-99. Fix: use actual processed item count for the offset calculation.

Minor Issues

Misleading comment (query.go:30-35) — Comment says "3.5MB" but constant is 3MB. PR description also says 3.5MB. Two unused constants (minPageSizeForSizeLimit, pageSizeReductionDivisor) suggest an incomplete page-size reduction strategy.
Inconsistent size estimation (query.go:606-616) — Pre-check uses estimateRowSize(rowMap) (raw row data) but accumulated size uses callback's itemSize (structured object estimate). These can diverge, making the guard unreliable.
Potential nil pointer (grants.go:119-122) — g.Principal.Id could be nil even when g.Principal is not. Same pattern in estimateResourceSize with r.Id.

What looks good

The queryResult struct cleanly encapsulates size tracking state
The result.TotalSize > 0 guard ensures at least one item is always returned (avoids infinite pagination loops)
Preserving the original runQuery avoids breaking provisioning code that doesn't need size limits
Size estimation functions are straightforward and easy to tune

Fix the pagination bugs →

github-actions · 2026-04-08T15:20:30Z

+	// gRPC message size limits - we use a conservative threshold to avoid hitting the 4MB limit
+	// The actual limit is 4MB, but we use 3.5MB to leave room for overhead and metadata
+	maxResponseSizeBytes     = 3 * 1024 * 1024 // 3MB conservative limit
+	sizeEstimateOverhead     = 500             // Overhead per item for protobuf encoding
+	minPageSizeForSizeLimit  = 1               // Minimum page size when reducing due to size limits
+	pageSizeReductionDivisor = 2               // How much to reduce page size when hitting limits


Bug: Misleading comment + unused constants

The comment says "we use 3.5MB" but the constant is 3 * 1024 * 1024 = 3MB, not 3.5MB. The PR description also says 3.5MB. One of them is wrong.

minPageSizeForSizeLimit and pageSizeReductionDivisor are defined but never used anywhere in this PR. They suggest an incomplete "reduce page size" strategy that was never implemented. These should be removed to avoid confusion.

Suggested change

// gRPC message size limits - we use a conservative threshold to avoid hitting the 4MB limit

// The actual limit is 4MB, but we use 3.5MB to leave room for overhead and metadata

maxResponseSizeBytes = 3 * 1024 * 1024 // 3MB conservative limit

sizeEstimateOverhead = 500 // Overhead per item for protobuf encoding

minPageSizeForSizeLimit = 1 // Minimum page size when reducing due to size limits

pageSizeReductionDivisor = 2 // How much to reduce page size when hitting limits

// gRPC message size limits - we use a conservative threshold to avoid hitting the 4MB limit

// The actual limit is 4MB, but we use 3MB to leave room for overhead and metadata

maxResponseSizeBytes = 3 * 1024 * 1024 // 3MB conservative limit

sizeEstimateOverhead = 500 // Overhead per item for protobuf encoding

github-actions · 2026-04-08T15:20:38Z

+			return nil, err
+		}
+
+		foundPaginationKey := false
+		rowMap := make(map[string]interface{})
+		for i, colName := range columns {
+			rowMap[colName] = values[i]
+			if pCtx != nil && pCtx.PrimaryKey == colName {
+				lastRowID = values[i]
+				foundPaginationKey = true
+			}
+		}
+


Bug (Critical): Data loss on size-limit-triggered pagination with cursor strategy

When the size limit is hit (line 611), lastRowID has already been updated to the current (unprocessed) row's primary key here at line 595. The current row is then skipped (we break without calling the callback).

For cursor-based pagination, nextPageToken returns lastRowID, and the next query uses WHERE id > lastRowID. This means the unprocessed row is permanently skipped — data loss.

For offset-based pagination, nextPageToken computes offset + limit, but we processed fewer rows than limit. This also skips rows.

Fix: track lastProcessedRowID separately and only update it after the callback succeeds.

Suggested change

return nil, err

}

foundPaginationKey := false

rowMap := make(map[string]interface{})

for i, colName := range columns {

rowMap[colName] = values[i]

if pCtx != nil && pCtx.PrimaryKey == colName {

lastRowID = values[i]

foundPaginationKey = true

}

}

if err := rows.Scan(scanArgs...); err != nil {

return nil, err

}

foundPaginationKey := false

var currentRowID any

rowMap := make(map[string]interface{})

for i, colName := range columns {

rowMap[colName] = values[i]

if pCtx != nil && pCtx.PrimaryKey == colName {

currentRowID = values[i]

foundPaginationKey = true

}

}

Then after the size check passes and the callback succeeds, add lastRowID = currentRowID. And for offset-based pagination, you'll need to track actual items processed to compute the correct next offset.

github-actions · 2026-04-08T15:20:44Z

+	// Determine if we need a next page token
+	// We need one if: we hit the page limit, OR we hit the size limit
+	if pCtx != nil && (hitPageLimit || result.HitSizeLimit) {
+		result.NextPageToken, err = s.nextPageToken(pCtx, lastRowID)
+		if err != nil {
+			return nil, err
+		}
+	}
+
+	return result, nil


Bug (Critical): Offset-based pagination skips rows when size limit is hit

For offset strategy, nextPageToken computes offset + pageSize (where pageSize = pCtx.Limit). But when the size limit causes early termination, the actual number of processed items is less than pCtx.Limit.

Example: offset=0, limit=100, but size limit stops after 30 items. Next token = 0 + 100 = 100, skipping rows 30-99.

The offset-based next page token must use the actual item count:

// For offset strategy when hitting size limits, adjust the pagination context // so the next page starts at the correct offset if result.HitSizeLimit && pCtx.Strategy == "offset" { pCtx.Limit = int64(result.ItemCount) }

This adjustment should be applied before calling s.nextPageToken().

github-actions · 2026-04-08T15:20:46Z

+	if g.Principal != nil {
+		size += int64(len(g.Principal.Id.Resource))
+		size += int64(len(g.Principal.Id.ResourceType))
+	}


Nit: Potential nil pointer dereference

If g.Principal is non-nil but g.Principal.Id is nil, this will panic. Same pattern exists in estimateResourceSize with r.Id. Consider guarding:

Suggested change

if g.Principal != nil {

size += int64(len(g.Principal.Id.Resource))

size += int64(len(g.Principal.Id.ResourceType))

}

if g.Principal != nil && g.Principal.Id != nil {

size += int64(len(g.Principal.Id.Resource))

size += int64(len(g.Principal.Id.ResourceType))

}

github-actions · 2026-04-08T15:20:51Z

+		rowSize := estimateRowSize(rowMap)
+
+		// Check if adding this item would exceed size limits
+		// We check BEFORE processing to avoid returning partial results
+		if result.TotalSize > 0 && result.TotalSize+rowSize > maxResponseSizeBytes {
+			result.HitSizeLimit = true
+			l.Info("stopping query early due to response size limit",
+				zap.Int64("current_size", result.TotalSize),
+				zap.Int64("row_size", rowSize),
+				zap.Int("items_returned", result.ItemCount))
+			break


Design concern: Pre-check uses different estimate than post-check

The size limit pre-check uses estimateRowSize(rowMap) (raw row data), but the actual accumulated size uses itemSize from the callback (which calls estimateResourceSize/estimateEntitlementSize/estimateGrantSize). These two estimates can diverge significantly — raw row data is just column key/values, while the callbacks account for protobuf structure, nested objects, annotations, etc.

This means the pre-check guard could allow a row through that actually pushes the total well over the limit, or block a row that would have been fine. Consider using a consistent estimation approach, or at minimum adding a post-callback size check that sets HitSizeLimit if the actual accumulated size exceeds the threshold after processing.

- Fix misleading comment (3.5MB -> 3MB) and remove unused constants - Fix critical data loss: track lastRowID separately from currentRowID so unprocessed rows aren't skipped on cursor-based pagination - Fix offset pagination: adjust limit to actual item count when size limit is hit, so next offset is computed correctly - Add post-callback size check for consistent size enforcement - Guard against nil pointer dereference in estimateGrantSize - Fix godot lint issue (comment period) https://claude.ai/code/session_01Gg185kHrjNU2aznuRQ7CsG

github-actions · 2026-04-08T15:34:09Z

Claude finished @btipling's task in 2m 24s —— View job

PR Review: gRPC message size limit handling (post-fix)

Gather context and read changed files
Verify critical pagination bugs were fixed
Analyze code changes for remaining issues
Post review feedback

Overall: The fix commit (fc6c302) correctly addresses all critical and minor issues from the previous review. The code is in good shape.

Critical bugs — verified fixed

1. Cursor-based pagination data loss — ✅ Fixed. currentRowID is now tracked separately (query.go:590), and lastRowID is only updated after the callback succeeds (query.go:630). When the size limit breaks early, lastRowID still points to the last processed row, so the unprocessed row is correctly included in the next page.

2. Offset-based pagination data loss — ✅ Fixed. When HitSizeLimit is true with offset strategy, pCtx.Limit is adjusted to result.ItemCount before computing the next token (query.go:656-658). This ensures the next page offset advances by the actual number of processed items, not the full page size.

Minor issues — verified fixed

3. Misleading comment + unused constants — ✅ Comment now accurately says "3MB" (query.go:31). Unused constants minPageSizeForSizeLimit and pageSizeReductionDivisor removed.

4. Nil pointer guard — ✅ grants.go:119 now checks g.Principal != nil && g.Principal.Id != nil.

5. Inconsistent size estimation — ✅ Addressed by adding a post-callback size check (query.go:634-640). Even if the pre-check (using raw row estimate) allows a row through, the post-callback check catches it if the actual accumulated size exceeds the threshold.

Remaining observations (non-blocking)

Double estimateRowSize call (query.go:607,624) — When itemSize from the callback is 0, estimateRowSize(rowMap) is called twice for the same row (once for the pre-check, once for accumulation). Could cache the result in a local variable. Trivial inefficiency, not a bug.
First item always processed regardless of size — The pre-check guard result.TotalSize > 0 (query.go:607) ensures at least one item is always returned. This is the correct design — it prevents infinite pagination loops on oversized items. A single item exceeding 3MB would still land under the 4MB hard limit given the 1MB headroom.
No tests for the new size-limit path — The PR description says "existing tests pass," but there are no dedicated tests for the early-termination behavior. Consider adding a test that verifies: (a) when a batch approaches 3MB, a next page token is generated; (b) cursor-based pagination resumes at the correct row; (c) offset-based pagination resumes at the correct offset.

Verdict: The pagination fixes are correct. This is ready to merge pending the author's judgment on whether to add targeted tests for the size-limit path.

btipling requested a review from a team April 8, 2026 15:17

github-actions Bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gRPC message size limit handling to query execution#116

Add gRPC message size limit handling to query execution#116
btipling wants to merge 2 commits intomainfrom
claude/slack-session-MvKQ6

btipling commented Apr 8, 2026

Uh oh!

github-actions Bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot Apr 8, 2026

Uh oh!

github-actions Bot Apr 8, 2026

Uh oh!

github-actions Bot Apr 8, 2026

Uh oh!

github-actions Bot Apr 8, 2026

Uh oh!

github-actions Bot Apr 8, 2026

Uh oh!

github-actions Bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

btipling commented Apr 8, 2026

Description

Changes

How It Works

Test Plan

Uh oh!

github-actions Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: gRPC message size limit handling

Critical Issues

Minor Issues

What looks good

Uh oh!

github-actions Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: gRPC message size limit handling (post-fix)

Critical bugs — verified fixed

Minor issues — verified fixed

Remaining observations (non-blocking)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 8, 2026 •

edited

Loading

github-actions Bot commented Apr 8, 2026 •

edited

Loading