Add gRPC message size limit handling to query execution#116
Add gRPC message size limit handling to query execution#116
Conversation
Add size-aware query execution that tracks approximate result sizes and automatically stops fetching early when approaching the 4MB gRPC message size limit. This prevents "message too big" errors by: - Adding runQueryWithSizeLimit() that estimates serialized size during query - Stopping early when cumulative size approaches 3MB (conservative threshold) - Returning with a next page token so subsequent calls continue from where we stopped - Adding size estimation functions for resources, entitlements, and grants Fixes CXH-1300 https://claude.ai/code/session_01Gg185kHrjNU2aznuRQ7CsG
|
Claude finished @btipling's task in 2m 15s —— View job PR Review: gRPC message size limit handling
Overall: Good idea to add size-based early termination to prevent hitting the 4MB gRPC limit. The approach of tracking cumulative size and breaking early with a next-page token is sound. However, there are two critical pagination bugs that will cause data loss when the size limit is triggered. Critical Issues
Minor Issues
What looks good
|
| // gRPC message size limits - we use a conservative threshold to avoid hitting the 4MB limit | ||
| // The actual limit is 4MB, but we use 3.5MB to leave room for overhead and metadata | ||
| maxResponseSizeBytes = 3 * 1024 * 1024 // 3MB conservative limit | ||
| sizeEstimateOverhead = 500 // Overhead per item for protobuf encoding | ||
| minPageSizeForSizeLimit = 1 // Minimum page size when reducing due to size limits | ||
| pageSizeReductionDivisor = 2 // How much to reduce page size when hitting limits |
There was a problem hiding this comment.
Bug: Misleading comment + unused constants
-
The comment says "we use 3.5MB" but the constant is
3 * 1024 * 1024= 3MB, not 3.5MB. The PR description also says 3.5MB. One of them is wrong. -
minPageSizeForSizeLimitandpageSizeReductionDivisorare defined but never used anywhere in this PR. They suggest an incomplete "reduce page size" strategy that was never implemented. These should be removed to avoid confusion.
| // gRPC message size limits - we use a conservative threshold to avoid hitting the 4MB limit | |
| // The actual limit is 4MB, but we use 3.5MB to leave room for overhead and metadata | |
| maxResponseSizeBytes = 3 * 1024 * 1024 // 3MB conservative limit | |
| sizeEstimateOverhead = 500 // Overhead per item for protobuf encoding | |
| minPageSizeForSizeLimit = 1 // Minimum page size when reducing due to size limits | |
| pageSizeReductionDivisor = 2 // How much to reduce page size when hitting limits | |
| // gRPC message size limits - we use a conservative threshold to avoid hitting the 4MB limit | |
| // The actual limit is 4MB, but we use 3MB to leave room for overhead and metadata | |
| maxResponseSizeBytes = 3 * 1024 * 1024 // 3MB conservative limit | |
| sizeEstimateOverhead = 500 // Overhead per item for protobuf encoding |
| return nil, err | ||
| } | ||
|
|
||
| foundPaginationKey := false | ||
| rowMap := make(map[string]interface{}) | ||
| for i, colName := range columns { | ||
| rowMap[colName] = values[i] | ||
| if pCtx != nil && pCtx.PrimaryKey == colName { | ||
| lastRowID = values[i] | ||
| foundPaginationKey = true | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Bug (Critical): Data loss on size-limit-triggered pagination with cursor strategy
When the size limit is hit (line 611), lastRowID has already been updated to the current (unprocessed) row's primary key here at line 595. The current row is then skipped (we break without calling the callback).
For cursor-based pagination, nextPageToken returns lastRowID, and the next query uses WHERE id > lastRowID. This means the unprocessed row is permanently skipped — data loss.
For offset-based pagination, nextPageToken computes offset + limit, but we processed fewer rows than limit. This also skips rows.
Fix: track lastProcessedRowID separately and only update it after the callback succeeds.
| return nil, err | |
| } | |
| foundPaginationKey := false | |
| rowMap := make(map[string]interface{}) | |
| for i, colName := range columns { | |
| rowMap[colName] = values[i] | |
| if pCtx != nil && pCtx.PrimaryKey == colName { | |
| lastRowID = values[i] | |
| foundPaginationKey = true | |
| } | |
| } | |
| if err := rows.Scan(scanArgs...); err != nil { | |
| return nil, err | |
| } | |
| foundPaginationKey := false | |
| var currentRowID any | |
| rowMap := make(map[string]interface{}) | |
| for i, colName := range columns { | |
| rowMap[colName] = values[i] | |
| if pCtx != nil && pCtx.PrimaryKey == colName { | |
| currentRowID = values[i] | |
| foundPaginationKey = true | |
| } | |
| } |
Then after the size check passes and the callback succeeds, add lastRowID = currentRowID. And for offset-based pagination, you'll need to track actual items processed to compute the correct next offset.
| // Determine if we need a next page token | ||
| // We need one if: we hit the page limit, OR we hit the size limit | ||
| if pCtx != nil && (hitPageLimit || result.HitSizeLimit) { | ||
| result.NextPageToken, err = s.nextPageToken(pCtx, lastRowID) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| } | ||
|
|
||
| return result, nil |
There was a problem hiding this comment.
Bug (Critical): Offset-based pagination skips rows when size limit is hit
For offset strategy, nextPageToken computes offset + pageSize (where pageSize = pCtx.Limit). But when the size limit causes early termination, the actual number of processed items is less than pCtx.Limit.
Example: offset=0, limit=100, but size limit stops after 30 items. Next token = 0 + 100 = 100, skipping rows 30-99.
The offset-based next page token must use the actual item count:
// For offset strategy when hitting size limits, adjust the pagination context
// so the next page starts at the correct offset
if result.HitSizeLimit && pCtx.Strategy == "offset" {
pCtx.Limit = int64(result.ItemCount)
}This adjustment should be applied before calling s.nextPageToken().
| if g.Principal != nil { | ||
| size += int64(len(g.Principal.Id.Resource)) | ||
| size += int64(len(g.Principal.Id.ResourceType)) | ||
| } |
There was a problem hiding this comment.
Nit: Potential nil pointer dereference
If g.Principal is non-nil but g.Principal.Id is nil, this will panic. Same pattern exists in estimateResourceSize with r.Id. Consider guarding:
| if g.Principal != nil { | |
| size += int64(len(g.Principal.Id.Resource)) | |
| size += int64(len(g.Principal.Id.ResourceType)) | |
| } | |
| if g.Principal != nil && g.Principal.Id != nil { | |
| size += int64(len(g.Principal.Id.Resource)) | |
| size += int64(len(g.Principal.Id.ResourceType)) | |
| } |
| rowSize := estimateRowSize(rowMap) | ||
|
|
||
| // Check if adding this item would exceed size limits | ||
| // We check BEFORE processing to avoid returning partial results | ||
| if result.TotalSize > 0 && result.TotalSize+rowSize > maxResponseSizeBytes { | ||
| result.HitSizeLimit = true | ||
| l.Info("stopping query early due to response size limit", | ||
| zap.Int64("current_size", result.TotalSize), | ||
| zap.Int64("row_size", rowSize), | ||
| zap.Int("items_returned", result.ItemCount)) | ||
| break |
There was a problem hiding this comment.
Design concern: Pre-check uses different estimate than post-check
The size limit pre-check uses estimateRowSize(rowMap) (raw row data), but the actual accumulated size uses itemSize from the callback (which calls estimateResourceSize/estimateEntitlementSize/estimateGrantSize). These two estimates can diverge significantly — raw row data is just column key/values, while the callbacks account for protobuf structure, nested objects, annotations, etc.
This means the pre-check guard could allow a row through that actually pushes the total well over the limit, or block a row that would have been fine. Consider using a consistent estimation approach, or at minimum adding a post-callback size check that sets HitSizeLimit if the actual accumulated size exceeds the threshold after processing.
- Fix misleading comment (3.5MB -> 3MB) and remove unused constants - Fix critical data loss: track lastRowID separately from currentRowID so unprocessed rows aren't skipped on cursor-based pagination - Fix offset pagination: adjust limit to actual item count when size limit is hit, so next offset is computed correctly - Add post-callback size check for consistent size enforcement - Guard against nil pointer dereference in estimateGrantSize - Fix godot lint issue (comment period) https://claude.ai/code/session_01Gg185kHrjNU2aznuRQ7CsG
|
Claude finished @btipling's task in 2m 24s —— View job PR Review: gRPC message size limit handling (post-fix)
Overall: The fix commit ( Critical bugs — verified fixed1. Cursor-based pagination data loss — ✅ Fixed. 2. Offset-based pagination data loss — ✅ Fixed. When Minor issues — verified fixed3. Misleading comment + unused constants — ✅ Comment now accurately says "3MB" ( 4. Nil pointer guard — ✅ 5. Inconsistent size estimation — ✅ Addressed by adding a post-callback size check ( Remaining observations (non-blocking)
Verdict: The pagination fixes are correct. This is ready to merge pending the author's judgment on whether to add targeted tests for the size-limit path. |
Description
This PR adds gRPC message size limit handling to prevent responses from exceeding the 4MB gRPC message size limit. The changes introduce size-aware query execution that tracks response sizes and terminates queries early when approaching the limit.
Changes
Core Changes:
runQueryWithSizeLimit()method that executes queries with size-based early terminationqueryResultstruct to track pagination tokens, total size, item count, and whether size limits were hitestimateRowSize()function to estimate serialized size of database rowsSize Estimation Functions:
estimateResourceSize()- estimates serialized size of Resource objectsestimateEntitlementSize()- estimates serialized size of Entitlement objectsestimateGrantSize()- estimates serialized size of Grant objectsUpdated Query Execution:
List()in resources.go to use size-aware query executiondynamicEntitlements()in entitlements.go to use size-aware query executionlistGrants()in grants.go to use size-aware query executionHow It Works
The new
runQueryWithSizeLimit()method:This ensures that responses stay well below the 4MB gRPC limit while maintaining pagination functionality.
Test Plan
Existing tests pass. The size limit handling is transparent to callers - pagination behavior remains unchanged, with the addition of early termination when size limits are approached.
https://claude.ai/code/session_01Gg185kHrjNU2aznuRQ7CsG