fix: stop legacy RedisBackend corrupting binary payloads#173
Conversation
The shared Redis connection pool was created with decode_responses=True, so
redis-py ran value.decode('utf-8', 'strict') on every GET. Cached payloads are
binary (Rust ByteStorage LZ4, Arrow IPC, AES-256-GCM ciphertext) and are not
valid UTF-8, so every read through the legacy RedisBackend raised
UnicodeDecodeError. The backend's str->bytes "safety" branch was unreachable and
the comment claiming Redis "returns bytes if decode fails" was false.
- client.py: both the sync and async pools now use decode_responses=False.
- backend.py: get() returns the raw bytes (or None) behind a bytes|None
narrowing guard; the misleading str-coercion branch is removed.
- tests: add a real-Redis binary round-trip regression (reproduced the
UnicodeDecodeError before the fix); repurpose test_get_handles_string_response
into test_get_returns_raw_bytes_unchanged to assert the new pass-through
contract (raw bytes, including non-UTF-8, returned unchanged).
Fixes #154
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR fixes a critical bug in the legacy ChangesRedis binary payload support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cachekit/backends/redis/backend.py (1)
110-120:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not collapse unexpected payload types into cache misses.
Noneshould mean “key not found” only. ReturningNonefor non-bytesvalues hides a broken client contract and can silently drop valid data paths.Suggested fix
try: client = self._get_client() value = client.get(key) - # The pool is configured with decode_responses=False (see redis/client.py), - # so Redis returns raw bytes, or None for a missing key. Cached payloads are - # binary (LZ4/Arrow/AES ciphertext) and must never be UTF-8 decoded. The - # isinstance check enforces the bytes|None contract without any str coercion. - return value if isinstance(value, bytes) else None + # The pool is configured with decode_responses=False (see redis/client.py), + # so Redis returns raw bytes, or None for a missing key. + if value is None: + return None + if isinstance(value, bytes): + return value + raise BackendError( + message=f"Redis GET returned unexpected type: {type(value).__name__}", + operation="get", + key=key, + ) + except BackendError: + raise except Exception as e: raise BackendError( message=f"Redis GET failed: {e}", operation="get", key=key,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cachekit/backends/redis/backend.py` around lines 110 - 120, The current get implementation in src/cachekit/backends/redis/backend.py collapses non-bytes Redis responses into None; change the logic so None still means "key not found" (return None), bytes are returned unchanged, but any other unexpected type raises a BackendError (use the same BackendError class, include operation="get" and key in the error and a clear message mentioning the unexpected type/value) instead of silently returning None; update the exception message to surface the type/value to aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/cachekit/backends/redis/backend.py`:
- Around line 110-120: The current get implementation in
src/cachekit/backends/redis/backend.py collapses non-bytes Redis responses into
None; change the logic so None still means "key not found" (return None), bytes
are returned unchanged, but any other unexpected type raises a BackendError (use
the same BackendError class, include operation="get" and key in the error and a
clear message mentioning the unexpected type/value) instead of silently
returning None; update the exception message to surface the type/value to aid
debugging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a3dc2103-27ce-41a8-a7b5-5e8b863d45b7
📒 Files selected for processing (3)
src/cachekit/backends/redis/backend.pysrc/cachekit/backends/redis/client.pytests/integration/test_redis_backend.py
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
CI only runs tests/integration/ on push-to-main, so the integration regression added in the previous commit does not cover the fix on pull requests (codecov patch coverage saw the changed lines as untested). Add the missing tests/unit/backends/test_redis_backend.py (mocked, runs on PRs) covering: - both the sync and async pools are created with decode_responses=False - RedisBackend.get() returns raw non-UTF-8 bytes unchanged, None for a missing key, and None (not a coerced str) for any non-bytes response.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit/backends/test_redis_backend.py`:
- Around line 77-100: The tests patch REDIS_URL but RedisBackend.__init__ uses
RedisBackendConfig.from_env() which resolves via
AliasChoices("CACHEKIT_REDIS_URL", "REDIS_URL"); to be consistent with the rest
of the test file update the decorators in
test_get_returns_non_utf8_bytes_unchanged,
test_get_returns_none_for_missing_key, and
test_get_returns_none_for_non_bytes_response to set "CACHEKIT_REDIS_URL" (or set
both "CACHEKIT_REDIS_URL" and "REDIS_URL") so the config resolution is
deterministic and matches other tests; ensure you keep clear=True if you want
isolated env and leave the DIContainer patching as-is around RedisBackend()
creation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 91e782be-9ff2-4237-ac18-321b0b202fbc
📒 Files selected for processing (1)
tests/unit/backends/test_redis_backend.py
Addresses review feedback on #173: the get() contract tests patched REDIS_URL while RedisBackend resolves via AliasChoices("CACHEKIT_REDIS_URL", "REDIS_URL"). Switch to explicit client_provider injection (the existing test_explicit_client_provider_injection pattern), dropping the DIContainer and env patching entirely so the tests are independent of env-var alias resolution.
Summary
Fixes #154. The legacy
RedisBackend's shared connection pool was created withdecode_responses=True, so redis-py ranvalue.decode('utf-8', 'strict')on every GET. Cached payloads are binary (Rust ByteStorage LZ4, Arrow IPC, AES-256-GCM ciphertext) and are not valid UTF-8 → every read raisedUnicodeDecodeError.The production
RedisBackendProviderpath was unaffected (it already usesdecode_responses=False); this bug bit the explicit/legacyRedisBackendreachable viaset_default_backend()and the Tier-3 auto-create inconfig/decorator.py.Changes
redis/client.py: both the sync and async pools now usedecode_responses=False.redis/backend.py:get()returns the raw bytes (orNone) behind abytes|Nonenarrowing guard. The unreachable str→bytes coercion is removed, along with the false comment claiming Redis "returns bytes if decode fails."UnicodeDecodeErrorbefore the fix);test_get_handles_string_responseis repurposed intotest_get_returns_raw_bytes_unchangedto assert the new pass-through contract (raw bytes, including non-UTF-8, returned unchanged).Verification
UnicodeDecodeErrorbefore the fix and passes after.ruffclean,basedpyright0 errors.Summary by CodeRabbit
Bug Fixes
Tests