Skip to content

joosthel/PromptEnhancer

Repository files navigation

PromptEnhancer

AI prompt engineering for image and video production. By Joost Helfers.

Next.js TypeScript Tailwind


What it does

PromptEnhancer generates model-optimized prompts for commercial AI image and video production. It runs a three-step pipeline via OpenRouter:

  1. Vision Analysis — An AI vision model reads your reference images and extracts color palette, lighting, texture, and emotional tone.
  2. Creative Brief — A planning model develops a production brief: creative vision, visual metaphor, shot diversity, and color anchors.
  3. Prompt Derivation — Prompts are derived from the brief, formatted for the target model's specific strengths and syntax.

Three modes

The UI is organized around two choices: what you want to do and what you're making (Image / Image edit / Video).

  • Create prompts — The full pipeline. Describe a concept and/or upload reference images, and generate a diverse set of model-specific prompts. Choose Image (text-to-image), Image edit (image-to-image, needs a reference image), or Video.
  • Enhance a prompt — Paste an existing prompt and optimize it for the selected model. The enhancer restructures, expands, and adapts it — it doesn't generate from scratch.
  • Develop a brief — Shape a creative brief and shot list first (creative vision, visual metaphor, shot diversity, color anchors), then generate image or video prompts from it.

Supported models

Model Type Description
Z-Image Image Default. Alibaba's 6B photorealism + text-rendering model. Natural-language, positive-only prompts (Turbo runs at CFG 0).
Flux 2 Klein 9B Image Best for cinematic stills. Keep prompts concise (50-100 words).
NanoBanana 2 Image Fast and flexible. Up to 14 reference images with character consistency.
Gemini Omni Flash Video Google's multimodal video model. Conversational prompts, iterative editing, physics-aware.
Veo 3.1 Video Google video. Structured scenes with camera, dialogue, and audio.
Kling v3 Video Multi-shot video with character labels and temporal markers.
Kling o3 Video Enhanced Kling with deeper scene understanding for complex sequences.
LTX-Video 2.3 Video High-resolution video (up to 4K). Flowing present-tense with audio.

Quick start

Prerequisites

1. Clone and install

git clone https://github.com/joosthel/PromptEnhancer.git
cd PromptEnhancer
npm install

2. Add your API key

Create a .env.local file in the project root:

echo 'OPENROUTER_API_KEY=sk-or-v1-...' > .env.local

Replace sk-or-v1-... with your actual key from openrouter.ai/keys.

3. Run

macOS — double-click start.command, or run from terminal:

./start.command

Windows — double-click start.bat, or from a terminal:

start.bat

Both scripts install dependencies (if needed), check for your API key, start the dev server, and open your browser at http://localhost:3000.

You can also start manually:

npm run dev

Testing

Tests run on Vitest with React Testing Library (jsdom).

npm test            # run once
npm run test:watch  # watch mode
  • Pure-logic tests live next to the code, e.g. src/lib/__tests__/.
  • Component tests render real components in jsdom, e.g. src/app/__tests__/start-new.test.tsx.
  • Shared setup (DOM matchers, in-memory localStorage) is in vitest.setup.ts.

Usage

  1. Choose a mode — Select Create prompts, Enhance a prompt, or Develop a brief from the left panel (and, for the first two, what you're making: Image / Image edit / Video).
  2. Select a target model — Pick the AI model you're generating prompts for.
  3. Add reference images (optional) — Drag and drop, click to browse, paste from clipboard, or enter a URL. Click thumbnails to label images (style reference, subject, face, background).
  4. Describe your concept — Fill in the text area. All modes accept freeform descriptions.
  5. Set prompt count (Generation/Art Direction) — Choose 1-6 prompts for diversity.
  6. Generate — The pipeline runs server-side. Reference images are cached, so re-generating with the same images skips the vision step.
  7. Refine — Use Fix buttons on individual prompt cards to iterate without re-running the full pipeline (Hands, Lighting, Too AI, Mood, or custom notes), or "Polish all" to run an art-direction pass over the whole set. If a run ever fails, hit Retry — it reuses the brief already produced and finishes fast.

Stack

Layer Tech
Framework Next.js 16 (App Router)
Language TypeScript 5
Styling Tailwind CSS 4
AI Gateway OpenRouter (public) · Langdock (agency tenant)
Vision model google/gemini-3.5-flash (fallback google/gemini-2.5-flash) — always via OpenRouter, even on the agency tenant
Text model (public) openai/gpt-4o-mini (fallback openai/gpt-4.1-nano) — fast, reliable structured JSON, non-reasoning
Text model (agency) gemini-2.5-flash via Langdock (env-configurable)

No database. No auth. Runtime deps are just the Next.js scaffold + zod (schema validation) + the MCP SDK; vitest is a dev-only test dependency.


Project structure

src/
  app/
    page.tsx                  # Main page — state, mode logic, generation handlers
    layout.tsx                # Root layout, metadata, fonts
    globals.css               # Global styles, accessibility rules
    api/
      generate-stream/route.ts # Primary pipeline (vision → brief → prompts), streamed via SSE with a heartbeat
      generate/route.ts        # Non-streaming pipeline (shared with the MCP server)
      enhance/route.ts         # Single-step prompt enhancement
      revise/route.ts          # Single-card fix/revision
      reformat/route.ts        # Cross-model prompt reformatting
      refine/route.ts          # Opt-in art-direction "Polish" pass
      [transport]/route.ts     # MCP server (/api/mcp, /api/sse)
  components/
    ModeSelector.tsx          # Three app modes + sub-mode chips
    ModelSelector.tsx         # Target model cards with descriptions
    ImageUploader.tsx         # Drag-drop, paste, URL input, image labeling
    InputForm.tsx             # Description textarea + prompt count
    PromptList.tsx            # Results: brief, visual analysis, prompt cards
    PromptCard.tsx            # Individual prompt with copy, fix, reformat
    FixToolbar.tsx            # Fix category chips + custom input
    FixHistory.tsx            # Prompt revision history
    ModelChips.tsx            # Cross-model reformat chips per card
    BatchActions.tsx          # Select all, batch fix operations
    LoadingAnimation.tsx      # Dot-ring loading with phase labels
    CreditPopup.tsx           # API credit acknowledgment popup
    HelpModal.tsx             # How-it-works documentation modal
  lib/
    services.ts               # Shared orchestration (enhance/generate/revise/reformat) — used by REST routes + MCP
    openrouter.ts             # Typed fetch wrapper for OpenRouter API
    system-prompt.ts          # Vision prompt, types, shared constants
    prompt-engine.ts          # System prompt + user message builders
    model-profiles.ts         # Model definitions, modes, fix categories
    image-utils.ts            # Canvas resize, clipboard, URL validation, fingerprinting
    use-focus-trap.ts         # Shared focus trap hook for modals

MCP server

The same deployment also exposes an MCP server, so the prompt engine can be called from MCP clients (Langdock, Claude, Cursor, …) — not just the web UI.

  • Endpoint (Streamable HTTP): https://<deployment>/api/mcp. Any custom domain on this project + /api/mcp works identically.
  • Legacy SSE fallback: /api/sse
  • Auth: set MCP_AUTH_TOKEN to require Authorization: Bearer <token> (or x-api-key) on tool calls. If unset, the server is open (bounded by the per-IP rate limit + provider spend cap).

Add the /api/mcp URL as a custom MCP integration in your client. Tools exposed:

Tool Purpose
usage_guide Returns a short how-to with examples. Call this first if unsure.
enhance_prompt Optimize an existing prompt for a target model (optional reference images).
generate_prompts Full pipeline (vision → brief → prompts). briefOnly: true returns just the creative brief (Art Direction).
revise_prompt Refine a single prompt via a note and/or fix category.
reformat_prompt Rewrite a prompt from one model's format to another.

The server also advertises instructions (a short summary) that compatible clients surface to the agent automatically.

Use it in Langdock (for colleagues)

PromptEnhancer plugs into Langdock as a remote MCP integration, so you can call it from Chat, Agents, and Workflows.

  1. In Langdock, go to Integrations → Connect remote MCP.
  2. Server URL: https://<agency-deployment>/api/mcp (Streamable HTTP). Use /api/sse if you need the SSE transport.
  3. Authentication: choose API Key and paste the token (the value of MCP_AUTH_TOKEN). Langdock formats the header automatically.
  4. Test the connection, then select the tools to import (usage_guide, generate_prompts, enhance_prompt, revise_prompt, reformat_prompt).
  5. Attach the tools to an Agent, call them in Chat, or add them as Action nodes in a Workflow.

Once connected, ask an agent things like "generate 4 cinematic Flux prompts for a moody 1970s editorial portrait" — it calls generate_prompts and returns ready-to-use prompts. Call usage_guide any time for the full list of tools, modes, and supported models. On the agency deployment the whole pipeline runs on Langdock's own models, billed to the agency.

The MCP tools share the same logic as the REST routes via src/lib/services.ts. Reference images should be passed as public URLs where possible (base64 is supported but large over MCP).

Test locally with the MCP Inspector:

npx @modelcontextprotocol/inspector
# connect to http://localhost:3000/api/mcp (Streamable HTTP)

Environment variables

Public tenant (default)

Variable Required Description
OPENROUTER_API_KEY Yes Your OpenRouter API key — stored server-side only
NEXT_PUBLIC_SITE_URL No Production URL (defaults to http://localhost:3000)

Agency tenant (Langdock + OpenRouter hybrid)

When APP_TENANT is set to a non-public value, the app uses a hybrid: text routes through Langdock (preserves agency billing for the bulk of LLM spend), vision routes through OpenRouter (Langdock's API does not accept image inputs).

Variable Required Description
APP_TENANT Yes Agency identifier (e.g. WIN). Any non-empty value other than public/openrouter activates the agency path
LANGDOCK_API_KEY Yes Langdock workspace API key — used for text generation
OPENROUTER_API_KEY Yes Required even on the agency tenant — used for vision. Provider resolution fails fast if missing
LANGDOCK_REGION No Defaults to eu
LANGDOCK_TEXT_MODEL No Defaults to gemini-2.5-flash
LANGDOCK_TEXT_FALLBACK No Optional Langdock fallback model
MCP_AUTH_TOKEN No Bearer token guarding /api/mcp — recommended for agency deploys

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages