Skip to content

Implemented Backend Endpoints and Logic for Monthly Dumps#104

Merged
fortune710 merged 41 commits into
mainfrom
staging
Apr 24, 2026
Merged

Implemented Backend Endpoints and Logic for Monthly Dumps#104
fortune710 merged 41 commits into
mainfrom
staging

Conversation

@fortune710

Copy link
Copy Markdown
Owner

No description provided.

fixed unit tests for rate limiting
Added Endpoints for Creating And Getting Monthly Dumps
@fortune710 fortune710 self-assigned this Apr 22, 2026
@bolt-new-by-stackblitz

Copy link
Copy Markdown

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

@coderabbitai

coderabbitai Bot commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@fortune710 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 52 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 8 minutes and 52 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7de71682-376e-4168-af01-c561de2a99f9

📥 Commits

Reviewing files that changed from the base of the PR and between d8b4870 and 0634744.

⛔ Files ignored due to path filters (5)
  • frontend/supabase/.temp/cli-latest is excluded by !**/.temp/**
  • frontend/supabase/.temp/storage-version is excluded by !**/.temp/**
  • supabase/.temp/cli-latest is excluded by !**/.temp/**
  • supabase/.temp/gotrue-version is excluded by !**/.temp/**
  • supabase/.temp/storage-version is excluded by !**/.temp/**
📒 Files selected for processing (40)
  • backend/AGENTS.md
  • backend/Dockerfile
  • backend/config.py
  • backend/controllers/__init__.py
  • backend/controllers/base_controller.py
  • backend/controllers/entry_controller.py
  • backend/controllers/monthly_dump_controller.py
  • backend/database/__init__.py
  • backend/database/tables.py
  • backend/main.py
  • backend/queue_constants.py
  • backend/routers/entries.py
  • backend/routers/monthly_dumps.py
  • backend/routers/user.py
  • backend/schedulers/monthly_dump_enqueue_scheduler.py
  • backend/schedulers/monthly_dump_scheduler.py
  • backend/schedulers/scheduler_manager.py
  • backend/scripts/insert_photos.py
  • backend/scripts/run_insert_photos.sh
  • backend/scripts/run_worker.py
  • backend/scripts/run_worker.sh
  • backend/services/monthly_dump_service.py
  • backend/services/queue_service.py
  • backend/services/queues/entry_ingestion_queue_service.py
  • backend/services/queues/monthly_dump_enqueue_service.py
  • backend/services/queues/monthly_dump_queue_service.py
  • backend/services/storage_service.py
  • backend/tests/test_monthly_dump_service.py
  • backend/tests/test_rate_limit.py
  • backend/tests/test_scheduler_manager.py
  • backend/utils/auth.py
  • backend/utils/dumps/__init__.py
  • backend/utils/dumps/dump_utils.py
  • backend/utils/logging.py
  • backend/utils/rate_limit.py
  • frontend/supabase/migrations/20260407090200_q_monthly_dump_queues.sql
  • frontend/supabase/migrations/20260407090300_monthly_dumps.sql
  • frontend/supabase/migrations/20260420043423_add_monthly_dump_next_run.sql
  • frontend/supabase/migrations/20260420043600_init_monthly_dump_next_run.sql
  • frontend/supabase/migrations/20260420044000_set_default_monthly_dump_next_run.sql
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch staging

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@fortune710

Copy link
Copy Markdown
Owner Author

@coderabbitai

@coderabbitai

coderabbitai Bot commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@fortune710 It looks like your message came through without any specific request. How can I help? I can:

  • Review the PR – trigger a full review of the changes
  • Answer questions – about the code, architecture, or implementation
  • Generate code – unit tests, improvements, or other changes
  • Create issues – to track follow-up work

Let me know what you'd like! If you'd like me to kick off a review, just say "review" and I'll get started.

@fortune710

Copy link
Copy Markdown
Owner Author

@coderabbitai Please review the changes in this PR for any security vulnerabilities

@coderabbitai

coderabbitai Bot commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me examine the code changes in this PR to identify any security vulnerabilities.


🏁 Script executed:

#!/bin/bash
# Fetch the PR diff and file changes
gh pr view 104 --repo fortune710/keepsafe --json files,title,body
echo "---DIFF---"
gh pr diff 104 --repo fortune710/keepsafe

Length of output: 116209


@fortune710 — here's a thorough security review of the PR changes. I found two high-severity SSRF vulnerabilities, plus several medium/low issues.


🔴 High Severity

1. SSRF Allowlist Bypass via Substring Hostname Matching — backend/services/monthly_dump_service.py

is_allowlisted = (
    any(h in hostname for h in MonthlyDumpService.ALLOWED_IMAGE_HOSTS) or 
    ".supabase.co" in hostname
)

h in hostname performs substring matching, not domain matching. An attacker who controls a domain like images.unsplash.com.attacker.com would pass this check because "images.unsplash.com" is a substring of the hostname. The downstream DNS/IP check adds a second layer, but the allowlist itself is bypassable.

Fix — use proper suffix/exact matching:

def _is_allowlisted_host(hostname: str) -> bool:
    for allowed in MonthlyDumpService.ALLOWED_IMAGE_HOSTS:
        if hostname == allowed or hostname.endswith(f".{allowed}"):
            return True
    return ".supabase.co" in hostname  # already suffix-safe pattern
    
is_allowlisted = _is_allowlisted_host(hostname)

2. Redirect Chain Final URL Not Validated — backend/services/monthly_dump_service.py

if response.history:
    for r in response.history:
        if not MonthlyDumpService._is_safe_url(r.url):
            ...
            return None

response.history contains only intermediate redirect responses. The final destination URL (response.url) is never checked. An attacker could craft a multi-hop redirect where early hops look safe but the final one resolves to a private/SSRF-sensitive address.

Fix:

# After the history loop, also validate the final URL
if not MonthlyDumpService._is_safe_url(str(response.url)):
    logger.warning("Rejected disallowed final redirect URL", extra={"url": str(response.url)})
    return None

🟡 Medium Severity

3. Unvalidated Timezone Input — backend/routers/monthly_dumps.py

timezone_name = payload.timezone or "UTC"

timezone_name is passed directly into pytz.timezone(tz_name) with no prior validation. An invalid value raises pytz.exceptions.UnknownTimeZoneError, producing an unhandled 500 instead of a user-friendly 400. This can also be used as a minor DoS vector (forcing pytz to search its database for crafted strings).

Fix:

import pytz

timezone_name = payload.timezone or "UTC"
try:
    pytz.timezone(timezone_name)  # validate early
except pytz.exceptions.UnknownTimeZoneError:
    raise HTTPException(status_code=400, detail="Invalid timezone. Use a valid IANA timezone string (e.g. 'America/New_York').")

4. Raw Exception Detail Leaked to Client — backend/routers/entries.py

except Exception as e:
    raise HTTPException(status_code=400, detail=str(e))

str(e) from an arbitrary exception can expose internal details (module paths, stack traces, etc.) to API consumers.

Fix:

except Exception:
    raise HTTPException(status_code=400, detail="Invalid month format. Use YYYY-MM.")

5. PII Embedded in Log Message Strings — Multiple Files

Several places violate the AGENTS.md rule ("Do not embed dynamic data inside the message string — put it in extra instead"), and also log sensitive identifiers (user IDs, IP addresses) as message interpolations:

backend/utils/rate_limit.py:

logger.warning(f"Rate limit exceeded for IP: {ip}")          # ❌
logger.warning(f"Rate limit exceeded for User: {user_id}")   # ❌

backend/routers/entries.py:

logger.warning(f"Unauthorized access attempt: User {current_user.user.id} tried to access entries for {user_id}")  # ❌
logger.exception(f"Failed to fetch entries: {str(e)}")        # ❌

Fix — use extra={} consistently:

logger.warning("Rate limit exceeded for IP", extra={"ip": ip})
logger.warning("Rate limit exceeded for User", extra={"user_id": user_id})
logger.warning("Unauthorized access attempt", extra={"actor_user_id": current_user.user.id, "target_user_id": user_id})
logger.exception("Failed to fetch entries", extra={"error": str(e)})

🟢 Low Severity

6. storage_service.py Uses Deprecated Logger and Has Double-Attribute Bug

from utils.logging import Logger
logger = Logger("StorageService")   # ❌ AGENTS.md says never use this
...
logger.logger.exception(...)        # ❌ double .logger access — likely a runtime AttributeError

Replace with:

import logging
logger = logging.getLogger(__name__)
logger.exception(...)

7. Fragile HTTPException Re-raising in Rate Limiter — backend/utils/rate_limit.py

except Exception as e:
    if e.__class__.__name__ == "HTTPException":
        raise

Comparing by class name string is fragile (fails with subclasses or across module boundaries). This can cause real HTTPExceptions (429 responses) to be swallowed.

Fix:

except HTTPException:
    raise
except Exception as e:
    logger.error("Error in rate limiter", extra={"error": str(e)}, exc_info=True)
    # fail open

8. Anonymous Rate Limit Bucket Collision — backend/utils/rate_limit.py

ip = request.client.host if request.client else "unknown"
user_key = f"rate_limit:{context}:user:{user_id}:{minute}"  # user_id = "anonymous" for unauthenticated

When request.client is None, all unauthenticated requests share the same unknown IP bucket. Also, all anonymous users share one anonymous user bucket. This creates both false rate-limit triggers and potential bypass opportunities.


Summary Table

Severity Issue File
🔴 High SSRF allowlist bypass via substring hostname matching monthly_dump_service.py
🔴 High Final redirect URL not SSRF-validated monthly_dump_service.py
🟡 Medium Unvalidated timezone allows 500 / minor DoS routers/monthly_dumps.py
🟡 Medium Raw exception detail leaked in HTTP 400 response routers/entries.py
🟡 Medium PII in log message strings (violates AGENTS.md) rate_limit.py, entries.py
🟢 Low Deprecated Logger + logger.logger double-attr bug storage_service.py
🟢 Low Fragile HTTPException re-raise by class name rate_limit.py
🟢 Low Anonymous rate limit bucket collision rate_limit.py

The two SSRF issues in monthly_dump_service.py are the most urgent to address before this goes to production.

@fortune710 fortune710 merged commit a55610b into main Apr 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant