Skip to content

fix: stop legacy RedisBackend corrupting binary payloads#173

Merged
27Bslash6 merged 3 commits into
mainfrom
fix/redis-binary-corruption
Jun 7, 2026
Merged

fix: stop legacy RedisBackend corrupting binary payloads#173
27Bslash6 merged 3 commits into
mainfrom
fix/redis-binary-corruption

Conversation

@27Bslash6

@27Bslash6 27Bslash6 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #154. The legacy RedisBackend's shared connection pool was created with decode_responses=True, so redis-py ran value.decode('utf-8', 'strict') on every GET. Cached payloads are binary (Rust ByteStorage LZ4, Arrow IPC, AES-256-GCM ciphertext) and are not valid UTF-8 → every read raised UnicodeDecodeError.

The production RedisBackendProvider path was unaffected (it already uses decode_responses=False); this bug bit the explicit/legacy RedisBackend reachable via set_default_backend() and the Tier-3 auto-create in config/decorator.py.

Changes

  • redis/client.py: both the sync and async pools now use decode_responses=False.
  • redis/backend.py: get() returns the raw bytes (or None) behind a bytes|None narrowing guard. The unreachable str→bytes coercion is removed, along with the false comment claiming Redis "returns bytes if decode fails."
  • Tests: a new real-Redis binary round-trip regression (it reproduced the UnicodeDecodeError before the fix); test_get_handles_string_response is repurposed into test_get_returns_raw_bytes_unchanged to assert the new pass-through contract (raw bytes, including non-UTF-8, returned unchanged).

Verification

  • The new regression test reproduced the UnicodeDecodeError before the fix and passes after.
  • Full Redis suite (backend + invalidation + integration): 53 passed.
  • ruff clean, basedpyright 0 errors.
  • Confirmed the invalidation channel uses its own injected client (not the shared pool), so pub/sub is unaffected by the flag change.

Summary by CodeRabbit

  • Bug Fixes

    • Redis integration now preserves raw binary cache data exactly and returns raw bytes (or None) rather than attempting string conversion, improving reliability for non‑UTF‑8 payloads.
  • Tests

    • Added and updated unit and integration tests to verify binary round‑trips, correct handling of missing keys, and safe handling of unexpected non‑bytes responses.

The shared Redis connection pool was created with decode_responses=True, so
redis-py ran value.decode('utf-8', 'strict') on every GET. Cached payloads are
binary (Rust ByteStorage LZ4, Arrow IPC, AES-256-GCM ciphertext) and are not
valid UTF-8, so every read through the legacy RedisBackend raised
UnicodeDecodeError. The backend's str->bytes "safety" branch was unreachable and
the comment claiming Redis "returns bytes if decode fails" was false.

- client.py: both the sync and async pools now use decode_responses=False.
- backend.py: get() returns the raw bytes (or None) behind a bytes|None
  narrowing guard; the misleading str-coercion branch is removed.
- tests: add a real-Redis binary round-trip regression (reproduced the
  UnicodeDecodeError before the fix); repurpose test_get_handles_string_response
  into test_get_returns_raw_bytes_unchanged to assert the new pass-through
  contract (raw bytes, including non-UTF-8, returned unchanged).

Fixes #154
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 328265e0-504a-4c63-b74c-b4b20ded01c3

📥 Commits

Reviewing files that changed from the base of the PR and between 7f4c194 and 7acd662.

📒 Files selected for processing (1)
  • tests/unit/backends/test_redis_backend.py

Walkthrough

This PR fixes a critical bug in the legacy RedisBackend where decode_responses=True caused all binary payloads (serialised Rust ByteStorage, Arrow IPC, AES-256-GCM ciphertext) to fail with UnicodeDecodeError on every read. The fix disables decoding in both client pools, removes dead str-fallback code from the backend, and adds binary roundtrip tests.

Changes

Redis binary payload support

Layer / File(s) Summary
Redis client pool decode_responses configuration
src/cachekit/backends/redis/client.py
Both get_redis_client() and get_async_redis_client() change decode_responses from True to False, keeping pooled responses as raw bytes instead of UTF-8 decoded strings.
Backend.get() bytes-only contract
src/cachekit/backends/redis/backend.py
RedisBackend.get() removes the unreachable str-to-bytes fallback and enforces bytes-only return; comments updated to reflect the decode_responses=False contract.
Unit tests for pool config and get() contract
tests/unit/backends/test_redis_backend.py
Adds tests that assert both sync and async ConnectionPool.from_url calls set decode_responses=False, and unit tests that validate RedisBackend.get() returns raw bytes, returns None for missing keys, and returns None for non-bytes responses.
Test suite updates and binary roundtrip verification
tests/integration/test_redis_backend.py
Tests import RedisIsolationMixin, update get() unit test expectations to raw non-UTF-8 bytes, and add TestRedisBinaryRoundtrip integration test verifying binary payload survival through the shared cached Redis pool.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description covers the problem, changes made, and verification steps. However, the formal description template sections (Type of Change, Security Checklist, Documentation Validation, Testing, Backward Compatibility) are not addressed. Populate the remaining template sections: explicitly select Type of Change (Bug fix), complete Security and Documentation checklists, and confirm Testing and Backward Compatibility status.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main fix: preventing the legacy RedisBackend from corrupting binary payloads by addressing the decode_responses issue.
Linked Issues check ✅ Passed All code changes directly address the objectives from issue #154: decode_responses=False in both sync/async pools [#154], bytes|None contract in backend.get() [#154], removal of unreachable str→bytes coercion [#154], and binary round-trip regression test [#154].
Out of Scope Changes check ✅ Passed All changes scope to fixing issue #154: pool configuration, backend contract, and targeted test updates. No unrelated modifications detected across the four changed files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/redis-binary-corruption

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cachekit/backends/redis/backend.py (1)

110-120: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not collapse unexpected payload types into cache misses.

None should mean “key not found” only. Returning None for non-bytes values hides a broken client contract and can silently drop valid data paths.

Suggested fix
         try:
             client = self._get_client()
             value = client.get(key)
-            # The pool is configured with decode_responses=False (see redis/client.py),
-            # so Redis returns raw bytes, or None for a missing key. Cached payloads are
-            # binary (LZ4/Arrow/AES ciphertext) and must never be UTF-8 decoded. The
-            # isinstance check enforces the bytes|None contract without any str coercion.
-            return value if isinstance(value, bytes) else None
+            # The pool is configured with decode_responses=False (see redis/client.py),
+            # so Redis returns raw bytes, or None for a missing key.
+            if value is None:
+                return None
+            if isinstance(value, bytes):
+                return value
+            raise BackendError(
+                message=f"Redis GET returned unexpected type: {type(value).__name__}",
+                operation="get",
+                key=key,
+            )
+        except BackendError:
+            raise
         except Exception as e:
             raise BackendError(
                 message=f"Redis GET failed: {e}",
                 operation="get",
                 key=key,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cachekit/backends/redis/backend.py` around lines 110 - 120, The current
get implementation in src/cachekit/backends/redis/backend.py collapses non-bytes
Redis responses into None; change the logic so None still means "key not found"
(return None), bytes are returned unchanged, but any other unexpected type
raises a BackendError (use the same BackendError class, include operation="get"
and key in the error and a clear message mentioning the unexpected type/value)
instead of silently returning None; update the exception message to surface the
type/value to aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/cachekit/backends/redis/backend.py`:
- Around line 110-120: The current get implementation in
src/cachekit/backends/redis/backend.py collapses non-bytes Redis responses into
None; change the logic so None still means "key not found" (return None), bytes
are returned unchanged, but any other unexpected type raises a BackendError (use
the same BackendError class, include operation="get" and key in the error and a
clear message mentioning the unexpected type/value) instead of silently
returning None; update the exception message to surface the type/value to aid
debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a3dc2103-27ce-41a8-a7b5-5e8b863d45b7

📥 Commits

Reviewing files that changed from the base of the PR and between d079f79 and 1d4cfdb.

📒 Files selected for processing (3)
  • src/cachekit/backends/redis/backend.py
  • src/cachekit/backends/redis/client.py
  • tests/integration/test_redis_backend.py

coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 7, 2026
@codecov

codecov Bot commented Jun 7, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

CI only runs tests/integration/ on push-to-main, so the integration regression
added in the previous commit does not cover the fix on pull requests (codecov
patch coverage saw the changed lines as untested). Add the missing
tests/unit/backends/test_redis_backend.py (mocked, runs on PRs) covering:
- both the sync and async pools are created with decode_responses=False
- RedisBackend.get() returns raw non-UTF-8 bytes unchanged, None for a missing
  key, and None (not a coerced str) for any non-bytes response.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/backends/test_redis_backend.py`:
- Around line 77-100: The tests patch REDIS_URL but RedisBackend.__init__ uses
RedisBackendConfig.from_env() which resolves via
AliasChoices("CACHEKIT_REDIS_URL", "REDIS_URL"); to be consistent with the rest
of the test file update the decorators in
test_get_returns_non_utf8_bytes_unchanged,
test_get_returns_none_for_missing_key, and
test_get_returns_none_for_non_bytes_response to set "CACHEKIT_REDIS_URL" (or set
both "CACHEKIT_REDIS_URL" and "REDIS_URL") so the config resolution is
deterministic and matches other tests; ensure you keep clear=True if you want
isolated env and leave the DIContainer patching as-is around RedisBackend()
creation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 91e782be-9ff2-4237-ac18-321b0b202fbc

📥 Commits

Reviewing files that changed from the base of the PR and between 1d4cfdb and 7f4c194.

📒 Files selected for processing (1)
  • tests/unit/backends/test_redis_backend.py

Comment thread tests/unit/backends/test_redis_backend.py Outdated
Addresses review feedback on #173: the get() contract tests patched REDIS_URL
while RedisBackend resolves via AliasChoices("CACHEKIT_REDIS_URL", "REDIS_URL").
Switch to explicit client_provider injection (the existing
test_explicit_client_provider_injection pattern), dropping the DIContainer and
env patching entirely so the tests are independent of env-var alias resolution.
@27Bslash6 27Bslash6 merged commit 82d0417 into main Jun 7, 2026
32 checks passed
@27Bslash6 27Bslash6 deleted the fix/redis-binary-corruption branch June 7, 2026 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Legacy RedisBackend corrupts all binary payloads (decode_responses=True)

1 participant