FilmFind — AI-Powered Semantic Movie Discovery

Describe what you want to watch. FilmFind finds it.

FilmFind is a semantic movie and TV show discovery engine that understands natural language. Ask for "dark sci-fi movies like Interstellar with less romance" and get back ranked, explained recommendations — not just keyword matches.

How It Works

A search request flows through a multi-stage AI pipeline:

Query → LLM Parse → Embed (768-dim) → FAISS HNSW → Filter → Score → LLM Re-rank → Results

QueryParser — Gemini extracts themes, genres, year/language constraints, reference titles
EmbeddingService — sentence-transformers (all-mpnet-base-v2) encodes the query to a 768-dim vector
SemanticRetrievalEngine — FAISS HNSW finds top-k candidates by cosine similarity
FilterEngine — applies hard filters (genre, year, rating, language, streaming provider)
MultiSignalScoringEngine — re-ranks by semantic 50% + genre 20% + popularity/rating/recency 10% each
LLMReRanker — Gemini writes a natural-language explanation for each result

60-Second Mode: user picks mood/context/craving → SQL scoring across all enriched films → weighted random pick → LLM generates why-reasons.

Tech Stack

Layer	Technology
Backend	FastAPI, Python 3.11, SQLAlchemy, Alembic
Database	PostgreSQL via Supabase (pgvector HNSW index)
ML	sentence-transformers/all-mpnet-base-v2 (768-dim), FAISS
LLM	Gemini 2.0 Flash (primary) → Groq Llama 3.3 70B (fallback) → rule-based
Cache	Redis (local dev) / Upstash (production)
Frontend	Next.js 16, React 19, TypeScript, TailwindCSS 4, Framer Motion
Images	Supabase Storage (poster/backdrop CDN)

Getting Started

All services run via Docker Compose — no local Python or Node setup required.

1. Clone and configure

git clone https://github.com/dheerajram13/FilmFind.git
cd FilmFind
cp .env.example .env
# Fill in your API keys (see .env.example for all required vars)

2. Required API keys

Key	Where to get
`DATABASE_URL`	Supabase project → Settings → Database
`SUPABASE_URL` / `SUPABASE_ANON_KEY` / `SUPABASE_SERVICE_ROLE_KEY`	Supabase project → Settings → API
`GEMINI_API_KEY`	Google AI Studio (free)
`GROQ_API_KEY`	Groq Console (free)
`TMDB_API_KEY`	TMDB Settings (free)

3. Start

docker compose up --build

Frontend: http://localhost:3000
Backend API: http://localhost:8000
API docs: http://localhost:8000/api/docs (development only)

4. Run the data pipeline (first time)

# Ingest media from TMDB
docker compose exec backend python scripts/ingest/ingest_media.py

# Generate embeddings
docker compose exec backend python scripts/ml/generate_embeddings.py

# Build FAISS index
docker compose exec backend python scripts/ml/build_index.py

# LLM enrichment for 60-second mode (uses Gemini free tier)
docker compose exec backend python scripts/enrich/enrich_films.py
docker compose exec backend python scripts/ml/score_films.py

Project Structure

FilmFind/
├── backend/
│   ├── app/
│   │   ├── api/
│   │   │   ├── routes/           # search, sixty, admin, health
│   │   │   ├── dependencies.py   # DB session, auth, rate limiting, injection guard
│   │   │   └── exceptions.py     # Custom exception hierarchy
│   │   ├── core/
│   │   │   ├── config.py         # Pydantic settings (all env vars)
│   │   │   ├── database.py       # SQLAlchemy engine + session
│   │   │   ├── cache_manager.py  # Redis wrapper
│   │   │   ├── middleware.py     # Security headers, error handling, logging
│   │   │   ├── scheduler.py      # APScheduler background jobs
│   │   │   └── scoring.py        # 60-sec mode valid enums + profiles
│   │   ├── models/
│   │   │   ├── media.py          # Media base + Movie/TVShow subclasses (STI)
│   │   │   └── session.py        # SearchSession, SixtySession analytics
│   │   ├── schemas/
│   │   │   ├── movie.py          # MovieResponse, MovieSearchResult (Pydantic v2)
│   │   │   ├── query.py          # QueryIntent, QueryConstraints, ParsedQuery
│   │   │   └── search.py         # SearchRequest, SearchResponse
│   │   ├── services/
│   │   │   ├── query_parser.py   # LLM query → structured intent
│   │   │   ├── embedding_service.py  # sentence-transformers wrapper
│   │   │   ├── retrieval_engine.py   # FAISS HNSW search
│   │   │   ├── filter_engine.py      # Hard filter application
│   │   │   ├── scoring_engine.py     # Multi-signal scoring weights
│   │   │   ├── reranker.py           # LLM re-ranking + explanations
│   │   │   ├── sixty_scorer.py       # SQL scoring for 60-sec mode
│   │   │   ├── film_admin_service.py # Enrich / embed / cache refresh
│   │   │   └── llm_client.py         # Gemini → Groq → rule-based fallback
│   │   ├── repositories/
│   │   │   ├── movie_repository.py   # Full DB query repository
│   │   │   └── query_utils.py        # Lightweight query helpers
│   │   ├── db/
│   │   │   └── sessions.py       # Fire-and-forget analytics logging
│   │   └── main.py               # FastAPI app + middleware stack
│   ├── scripts/
│   │   ├── ingest/               # TMDB ingestion + DB seeding
│   │   ├── enrich/               # LLM narrative enrichment + streaming backfill
│   │   ├── ml/                   # Embeddings, FAISS index, sixty-mode scoring
│   │   ├── migrate/              # One-time migrations (Supabase image upload)
│   │   └── utils/                # DB health check, query parser tester
│   ├── tests/                    # pytest suite (500+ tests)
│   ├── alembic/                  # DB migrations
│   ├── Dockerfile.prod           # Production image (port 7860 for HF Spaces)
│   └── requirements.txt
│
├── frontend/
│   ├── app/
│   │   ├── page.tsx              # Entry point — renders FilmfindHome
│   │   ├── layout.tsx            # Root layout + metadata
│   │   └── globals.css           # Global styles
│   ├── components/home/
│   │   ├── FilmfindHome.tsx      # Top-level state orchestrator
│   │   ├── HomeScreen.tsx        # Landing / search input
│   │   ├── ResultsScreen.tsx     # Search results list
│   │   ├── DetailScreen.tsx      # Movie detail view
│   │   ├── ResultCard.tsx        # Individual result card
│   │   ├── FiltersSidebar.tsx    # Genre/year/streaming filters
│   │   └── SixtySecondMode.tsx   # 60-second pick mode
│   ├── hooks/
│   │   ├── useSearch.ts          # Search state + AbortController
│   │   └── useFilters.ts         # Client-side filter state
│   ├── lib/
│   │   ├── api-client.ts         # Typed fetch wrapper (AbortSignal support)
│   │   ├── movie-formatters.ts   # Pure formatting helpers
│   │   ├── streaming-providers.ts # Provider name normalisation
│   │   └── image-utils.ts        # TMDB/Supabase image URL helpers
│   └── types/
│       └── api.ts                # TypeScript interfaces matching backend schemas
│
├── .env.example                  # All required env vars with descriptions
├── docker-compose.yml            # 3-service dev stack (backend, frontend, redis)
├── plan.md                       # Deployment plan (HF Spaces + Vercel + Upstash)
└── deploy.md                     # Step-by-step deployment guide

API Reference

Search

POST /api/search
Content-Type: application/json

{
  "query": "dark sci-fi movies like Interstellar with less romance",
  "limit": 10,
  "filters": {
    "year_min": 2010,
    "language": "en"
  }
}

60-Second Mode

POST /api/sixty/pick
Content-Type: application/json

{
  "mood": "chill",
  "context": "solo-night",
  "craving": "mind-blown"
}

Valid values:

mood: happy sad charged chill adventurous romantic
context: family date-night solo-night friends movie-night background
craving: laugh cry mind-blown thrilled inspired scared comforted wowed

Health

GET /health
GET /health/detailed

Security

Rate limiting: 20 req/min (search), 10 req/min (sixty/pick) per IP — sliding window via Redis
Prompt injection guard on all queries
Admin endpoints require Authorization: Bearer <ADMIN_SECRET>
Security headers: HSTS, CSP, X-Frame-Options, Permissions-Policy
CORS restricted to configured origins only

Development Commands

# Run tests
docker compose exec backend python -m pytest

# Linting
docker compose exec backend ruff check app/
docker compose exec frontend npm run lint

# Type checking
docker compose exec backend mypy app/
docker compose exec frontend npm run type-check

# DB migrations
docker compose exec backend alembic upgrade head

# Restart a single service
docker compose restart backend

Deployment

See deploy.md for the full guide. Target stack (100% free tier):

Service	Platform
Backend	Hugging Face Spaces (CPU Basic, 16GB RAM)
Frontend	Vercel
Database	Supabase (already live)
Cache	Upstash Redis

Author

Dheeraj Srirama

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
backend		backend
data		data
frontend		frontend
images		images
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DOCKER_GUIDE.md		DOCKER_GUIDE.md
LICENSE		LICENSE
README.md		README.md
TESTING_GUIDE.md		TESTING_GUIDE.md
deploy.md		deploy.md
docker-compose.yml		docker-compose.yml
init-data.sh		init-data.sh
push_to_hf.sh		push_to_hf.sh
quick-start.sh		quick-start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FilmFind — AI-Powered Semantic Movie Discovery

How It Works

Tech Stack

Getting Started

1. Clone and configure

2. Required API keys

3. Start

4. Run the data pipeline (first time)

Project Structure

API Reference

Search

60-Second Mode

Health

Security

Development Commands

Deployment

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FilmFind — AI-Powered Semantic Movie Discovery

How It Works

Tech Stack

Getting Started

1. Clone and configure

2. Required API keys

3. Start

4. Run the data pipeline (first time)

Project Structure

API Reference

Search

60-Second Mode

Health

Security

Development Commands

Deployment

Author

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages