feat: auto retry with exponential backoff for exec/read/write#75
Merged
Conversation
- Added max_retries (default: 3) and retry_backoff_seconds (default: 2.0) to NodeServerConfig for configurable retry behavior - Added _request_with_retry helper with exponential backoff (backoff * 2^attempt, capped at 30s) - Retries on connection errors, 5xx, and 'node not connected' responses - Updated _node_exec_impl, _node_read_impl, _node_write_impl to use retry Signed-off-by: Blasius Patrick <blasius.patrick@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem: When a node disconnects briefly (network blip, restart), calls to
node_exec/node_read/node_writefail immediately. The agent has to manually retry.\n\nSolution: Automatic retry with exponential backoff:\n\n- 3 retries by default (configurable viamax_retriesin config)\n- Exponential backoff: 2s, 4s, 8s... (capped at 30s)\n- Retries on: connection refused, 5xx, "node not connected" responses\n- Configurable viaHERMES_NODES_MAX_RETRIESandHERMES_NODES_RETRY_BACKOFF_SECONDSenv vars\n\nFiles changed:\n-config.py— addedmax_retriesandretry_backoff_secondstoNodeServerConfig\n-tools.py— added_request_with_retryhelper with exponential backoff; all three tools now use it