AI prompt engineering for image and video production. By Joost Helfers.
PromptEnhancer generates model-optimized prompts for commercial AI image and video production. It runs a three-step pipeline via OpenRouter:
- Vision Analysis — An AI vision model reads your reference images and extracts color palette, lighting, texture, and emotional tone.
- Creative Brief — A planning model develops a production brief: creative vision, visual metaphor, shot diversity, and color anchors.
- Prompt Derivation — Prompts are derived from the brief, formatted for the target model's specific strengths and syntax.
The UI is organized around two choices: what you want to do and what you're making (Image / Image edit / Video).
- Create prompts — The full pipeline. Describe a concept and/or upload reference images, and generate a diverse set of model-specific prompts. Choose Image (text-to-image), Image edit (image-to-image, needs a reference image), or Video.
- Enhance a prompt — Paste an existing prompt and optimize it for the selected model. The enhancer restructures, expands, and adapts it — it doesn't generate from scratch.
- Develop a brief — Shape a creative brief and shot list first (creative vision, visual metaphor, shot diversity, color anchors), then generate image or video prompts from it.
| Model | Type | Description |
|---|---|---|
| Z-Image | Image | Default. Alibaba's 6B photorealism + text-rendering model. Natural-language, positive-only prompts (Turbo runs at CFG 0). |
| Flux 2 Klein 9B | Image | Best for cinematic stills. Keep prompts concise (50-100 words). |
| NanoBanana 2 | Image | Fast and flexible. Up to 14 reference images with character consistency. |
| Gemini Omni Flash | Video | Google's multimodal video model. Conversational prompts, iterative editing, physics-aware. |
| Veo 3.1 | Video | Google video. Structured scenes with camera, dialogue, and audio. |
| Kling v3 | Video | Multi-shot video with character labels and temporal markers. |
| Kling o3 | Video | Enhanced Kling with deeper scene understanding for complex sequences. |
| LTX-Video 2.3 | Video | High-resolution video (up to 4K). Flowing present-tense with audio. |
- Node.js (v18 or later)
- An OpenRouter API key
git clone https://github.com/joosthel/PromptEnhancer.git
cd PromptEnhancer
npm installCreate a .env.local file in the project root:
echo 'OPENROUTER_API_KEY=sk-or-v1-...' > .env.localReplace sk-or-v1-... with your actual key from openrouter.ai/keys.
macOS — double-click start.command, or run from terminal:
./start.commandWindows — double-click start.bat, or from a terminal:
start.batBoth scripts install dependencies (if needed), check for your API key, start the dev server, and open your browser at http://localhost:3000.
You can also start manually:
npm run devTests run on Vitest with React Testing Library (jsdom).
npm test # run once
npm run test:watch # watch mode- Pure-logic tests live next to the code, e.g.
src/lib/__tests__/. - Component tests render real components in jsdom, e.g.
src/app/__tests__/start-new.test.tsx. - Shared setup (DOM matchers, in-memory
localStorage) is invitest.setup.ts.
- Choose a mode — Select Create prompts, Enhance a prompt, or Develop a brief from the left panel (and, for the first two, what you're making: Image / Image edit / Video).
- Select a target model — Pick the AI model you're generating prompts for.
- Add reference images (optional) — Drag and drop, click to browse, paste from clipboard, or enter a URL. Click thumbnails to label images (style reference, subject, face, background).
- Describe your concept — Fill in the text area. All modes accept freeform descriptions.
- Set prompt count (Generation/Art Direction) — Choose 1-6 prompts for diversity.
- Generate — The pipeline runs server-side. Reference images are cached, so re-generating with the same images skips the vision step.
- Refine — Use Fix buttons on individual prompt cards to iterate without re-running the full pipeline (Hands, Lighting, Too AI, Mood, or custom notes), or "Polish all" to run an art-direction pass over the whole set. If a run ever fails, hit Retry — it reuses the brief already produced and finishes fast.
| Layer | Tech |
|---|---|
| Framework | Next.js 16 (App Router) |
| Language | TypeScript 5 |
| Styling | Tailwind CSS 4 |
| AI Gateway | OpenRouter (public) · Langdock (agency tenant) |
| Vision model | google/gemini-3.5-flash (fallback google/gemini-2.5-flash) — always via OpenRouter, even on the agency tenant |
| Text model (public) | openai/gpt-4o-mini (fallback openai/gpt-4.1-nano) — fast, reliable structured JSON, non-reasoning |
| Text model (agency) | gemini-2.5-flash via Langdock (env-configurable) |
No database. No auth. Runtime deps are just the Next.js scaffold + zod (schema validation) + the MCP SDK; vitest is a dev-only test dependency.
src/
app/
page.tsx # Main page — state, mode logic, generation handlers
layout.tsx # Root layout, metadata, fonts
globals.css # Global styles, accessibility rules
api/
generate-stream/route.ts # Primary pipeline (vision → brief → prompts), streamed via SSE with a heartbeat
generate/route.ts # Non-streaming pipeline (shared with the MCP server)
enhance/route.ts # Single-step prompt enhancement
revise/route.ts # Single-card fix/revision
reformat/route.ts # Cross-model prompt reformatting
refine/route.ts # Opt-in art-direction "Polish" pass
[transport]/route.ts # MCP server (/api/mcp, /api/sse)
components/
ModeSelector.tsx # Three app modes + sub-mode chips
ModelSelector.tsx # Target model cards with descriptions
ImageUploader.tsx # Drag-drop, paste, URL input, image labeling
InputForm.tsx # Description textarea + prompt count
PromptList.tsx # Results: brief, visual analysis, prompt cards
PromptCard.tsx # Individual prompt with copy, fix, reformat
FixToolbar.tsx # Fix category chips + custom input
FixHistory.tsx # Prompt revision history
ModelChips.tsx # Cross-model reformat chips per card
BatchActions.tsx # Select all, batch fix operations
LoadingAnimation.tsx # Dot-ring loading with phase labels
CreditPopup.tsx # API credit acknowledgment popup
HelpModal.tsx # How-it-works documentation modal
lib/
services.ts # Shared orchestration (enhance/generate/revise/reformat) — used by REST routes + MCP
openrouter.ts # Typed fetch wrapper for OpenRouter API
system-prompt.ts # Vision prompt, types, shared constants
prompt-engine.ts # System prompt + user message builders
model-profiles.ts # Model definitions, modes, fix categories
image-utils.ts # Canvas resize, clipboard, URL validation, fingerprinting
use-focus-trap.ts # Shared focus trap hook for modals
The same deployment also exposes an MCP server, so the prompt engine can be called from MCP clients (Langdock, Claude, Cursor, …) — not just the web UI.
- Endpoint (Streamable HTTP):
https://<deployment>/api/mcp. Any custom domain on this project +/api/mcpworks identically. - Legacy SSE fallback:
/api/sse - Auth: set
MCP_AUTH_TOKENto requireAuthorization: Bearer <token>(orx-api-key) on tool calls. If unset, the server is open (bounded by the per-IP rate limit + provider spend cap).
Add the /api/mcp URL as a custom MCP integration in your client. Tools exposed:
| Tool | Purpose |
|---|---|
usage_guide |
Returns a short how-to with examples. Call this first if unsure. |
enhance_prompt |
Optimize an existing prompt for a target model (optional reference images). |
generate_prompts |
Full pipeline (vision → brief → prompts). briefOnly: true returns just the creative brief (Art Direction). |
revise_prompt |
Refine a single prompt via a note and/or fix category. |
reformat_prompt |
Rewrite a prompt from one model's format to another. |
The server also advertises instructions (a short summary) that compatible clients surface to the agent automatically.
PromptEnhancer plugs into Langdock as a remote MCP integration, so you can call it from Chat, Agents, and Workflows.
- In Langdock, go to Integrations → Connect remote MCP.
- Server URL:
https://<agency-deployment>/api/mcp(Streamable HTTP). Use/api/sseif you need the SSE transport. - Authentication: choose API Key and paste the token (the value of
MCP_AUTH_TOKEN). Langdock formats the header automatically. - Test the connection, then select the tools to import (
usage_guide,generate_prompts,enhance_prompt,revise_prompt,reformat_prompt). - Attach the tools to an Agent, call them in Chat, or add them as Action nodes in a Workflow.
Once connected, ask an agent things like "generate 4 cinematic Flux prompts for a moody 1970s editorial portrait" — it calls generate_prompts and returns ready-to-use prompts. Call usage_guide any time for the full list of tools, modes, and supported models. On the agency deployment the whole pipeline runs on Langdock's own models, billed to the agency.
The MCP tools share the same logic as the REST routes via src/lib/services.ts. Reference
images should be passed as public URLs where possible (base64 is supported but large over MCP).
Test locally with the MCP Inspector:
npx @modelcontextprotocol/inspector
# connect to http://localhost:3000/api/mcp (Streamable HTTP)| Variable | Required | Description |
|---|---|---|
OPENROUTER_API_KEY |
Yes | Your OpenRouter API key — stored server-side only |
NEXT_PUBLIC_SITE_URL |
No | Production URL (defaults to http://localhost:3000) |
When APP_TENANT is set to a non-public value, the app uses a hybrid: text routes through Langdock (preserves agency billing for the bulk of LLM spend), vision routes through OpenRouter (Langdock's API does not accept image inputs).
| Variable | Required | Description |
|---|---|---|
APP_TENANT |
Yes | Agency identifier (e.g. WIN). Any non-empty value other than public/openrouter activates the agency path |
LANGDOCK_API_KEY |
Yes | Langdock workspace API key — used for text generation |
OPENROUTER_API_KEY |
Yes | Required even on the agency tenant — used for vision. Provider resolution fails fast if missing |
LANGDOCK_REGION |
No | Defaults to eu |
LANGDOCK_TEXT_MODEL |
No | Defaults to gemini-2.5-flash |
LANGDOCK_TEXT_FALLBACK |
No | Optional Langdock fallback model |
MCP_AUTH_TOKEN |
No | Bearer token guarding /api/mcp — recommended for agency deploys |
MIT