Skip to content

fix(api): retry on 429 rate-limit with exponential backoff (QUA-645)#106

Merged
Desperado merged 1 commit into
mainfrom
strazhnyk/qua-645-api-client-retry-on-429-rate-limit-with-exponential-backoff
May 24, 2026
Merged

fix(api): retry on 429 rate-limit with exponential backoff (QUA-645)#106
Desperado merged 1 commit into
mainfrom
strazhnyk/qua-645-api-client-retry-on-429-rate-limit-with-exponential-backoff

Conversation

@Desperado
Copy link
Copy Markdown
Contributor

Summary

Free-tier Anthropic keys hit per-minute token limits on the very first call because qmax-code's system prompt + tool catalog is large. The 429 was surfaced raw with no retry — blocking any trial user on a free key.

  • Adds doWithRetry() in internal/agent/retry.go
  • Retries up to 3 times on HTTP 429: 1 s → 2 s → 4 s backoff
  • Parses Retry-After header to use the server-suggested delay when present
  • Respects context cancellation during sleep (Ctrl+C exits immediately, doesn't hang for 4 s)
  • Prints ● rate limited — retrying in 2s (1/3)… via the terminal status line
  • Wired into both callStreamingAPI() and callAPI(); request body is rebuilt from the already-marshaled []byte slice each attempt (no double-read issue)

Closes QUA-645

Test plan

  • With a free-tier key, trigger a large-context call — should see the retry status line and succeed on retry rather than dying immediately
  • Verify Ctrl+C during a retry sleep exits cleanly rather than waiting out the delay
  • Verify a non-429 error (e.g. 401, 500) is still surfaced immediately without retry

🤖 Generated with Claude Code

Free-tier Anthropic keys hit per-minute token limits on the first call
when the system prompt + tool catalog is large. Previously the 429 was
surfaced raw with no retry, blocking any trial user on a free key.

Adds doWithRetry() in internal/agent/retry.go:
- Retries up to 3 times on HTTP 429 (1 s → 2 s → 4 s backoff)
- Parses Retry-After header when present to use the server-suggested delay
- Respects context cancellation during sleep (Ctrl+C exits immediately)
- Prints "rate limited — retrying in Xs (N/3)…" via term or stdout

Wired into callStreamingAPI() (with terminal + context) and callAPI()
(background context, no terminal). Both rebuild the request body from the
already-marshaled []byte slice, so no double-read issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Desperado Desperado merged commit b6c3cfe into main May 24, 2026
6 checks passed
@Desperado Desperado deleted the strazhnyk/qua-645-api-client-retry-on-429-rate-limit-with-exponential-backoff branch May 24, 2026 11:08
@qualitymaxapp
Copy link
Copy Markdown

qualitymaxapp Bot commented May 24, 2026

✅ QualityMax Pipeline

Gate Result
🔍 AI Review ✅ Clean
🧪 Repo Tests ✅ 298/298 passed (go)
🤖 AI Tests ✅ 28/32 passed

Powered by QualityMax — AI-Powered Test Automation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant