GitHub - abdrahman-dev/RatMD: Convert bloated PDFs into clean, AI-ready Markdown. Reduce LLM token usage with client-side parsing and tiktoken-accurate estimation.

██████╗  █████╗ ████████╗███╗   ███╗██████╗
██╔══██╗██╔══██╗╚══██╔══╝████╗ ████║██╔══██╗
██████╔╝███████║   ██║   ██╔████╔██║██║  ██║
██╔══██╗██╔══██║   ██║   ██║╚██╔╝██║██║  ██║
██║  ██║██║  ██║   ██║   ██║ ╚═╝ ██║██████╔╝
╚═╝  ╚═╝╚═╝  ╚═╝   ╚═╝   ╚═╝     ╚═╝╚═════╝

PDF to Markdown, optimized for AI — strip noise, preserve structure, and reduce token count for LLM ingestion.

✨ What is RatMD

RatMD converts PDF documents into clean, token-efficient Markdown designed for LLM workflows. It runs entirely in your browser — no uploads, no servers, no privacy leaks. The parser extracts text from PDFs using pdfjs-dist, groups content into structured lines, detects headings by font size ratios, and outputs Markdown that preserves document hierarchy.

Token savings are real but vary by document. Heavily formatted PDFs with repeated headers, footers, and whitespace typically see 30–60% fewer tokens. Plain academic papers with minimal formatting see smaller gains. The estimator uses OpenAI's cl100k_base encoding (via js-tiktoken) for accurate counts — not a heuristic.

🚀 Features

PDF parsing — text extraction via pdfjs-dist v5 with line grouping and heading detection
Token estimation — real cl100k_base encoding via js-tiktoken, not approximate math
Light/dark theme — warm parchment light mode, dark-first default, persisted in localStorage
Mobile navigation — hamburger menu with animated dropdown on screens < 768px
FAQ page — 18 questions across 6 categories with accordion expand/collapse
Client-side privacy — all processing happens in the browser, zero server uploads
RAG-ready output — clean Markdown structured for vector databases and LLM context windows
Export — download .md file or copy to clipboard
Responsive design — full mobile support, floating pill navbar, container breakpoints
Framer Motion animations — scroll-triggered fade-ins, entrance sequences, pulse effects

📦 Tech Stack

Technology	Version	Purpose
React	19	UI framework
TypeScript	6	Type safety
Vite	8	Bundler and dev server
TailwindCSS	4	Utility-first styling with `@theme` tokens
Framer Motion	12	Animation library
Zustand	5	State management
React Router	7	Client-side routing
pdfjs-dist	5	PDF text extraction
js-tiktoken	1	OpenAI `cl100k_base` token encoding

📁 Project Structure

app/web/src/
├── app/
│   ├── layouts/         # RootLayout with header + footer + outlet
│   ├── router/          # React Router config (home, converter, docs, faq)
│   └── store/           # Zustand store (file state, conversion state)
├── components/
│   ├── animations/      # AnimatedElement (Framer Motion scroll-reveal wrapper)
│   ├── layout/          # Header (fixed navbar), Footer
│   ├── shared/          # Section wrapper component
│   └── ui/              # Button, Card, Badge, Container, LogoIcon, Logo
├── features/
│   ├── export/          # Download .md + clipboard copy
│   ├── markdown-preview/# Rendered Markdown output viewer
│   ├── parser/          # ParserPanel with animated stage progression
│   ├── token-estimator/ # Token comparison bars + detail view
│   └── upload/          # Drag-and-drop upload zone
├── hooks/               # useTheme (dark/light toggle with localStorage)
├── lib/
│   ├── constants/       # Routes, nav links, feature data, steps
│   ├── pdf/             # Real PDF parser (pdfjs-dist, line grouping, heading detection)
│   ├── tokenizer/       # Real token estimator (js-tiktoken cl100k_base)
│   └── utils/           # cn() helper, formatBytes, generateId
├── pages/
│   ├── converter/       # Full conversion workflow page
│   ├── docs/            # CLI reference + web guide + token explanation
│   ├── faq/             # 18-question FAQ with accordion
│   └── home/            # 7-section landing page (hero, demo, savings, features, etc.)
├── services/            # Parser service abstraction (future: swap for API)
├── styles/              # index.css — @theme tokens + light mode overrides + keyframes
├── types/               # TypeScript interfaces (ConversionResult, EstimationResult, etc.)
├── App.tsx
└── main.tsx

🛠 Getting Started

Prerequisites

Node.js 20+
npm 10+

Installation

cd app/web
npm install

Development

npm run dev
# Opens at http://localhost:5173

Build

npm run build
# Output in app/web/dist/

🐳 Docker

# From project root
docker compose up -d
# Opens at http://localhost:3000

The Docker image serves the built static app via Nginx. No backend required.

🚀 Deploy

Vercel (one-click)

Push to GitHub
Import app/web as a new Vercel project
Vercel auto-detects Vite — no config needed
Deploy

Vercel (manual CLI)

cd app/web
npx vercel --prod

CI/CD

A GitHub Actions workflow is included at .github/workflows/deploy.yml. Configure three repository secrets:

VERCEL_TOKEN — from Vercel Account Tokens
VERCEL_ORG_ID — from ~/.vercel/project.json after vercel link
VERCEL_PROJECT_ID — same file

⚠️ Known Limitations

Heading detection is heuristic-based — font size ratios determine heading levels. PDFs with non-standard sizing or inline formatting may produce incorrect hierarchy.
Token savings vary by document type — heavily formatted PDFs (whitespace, repeated headers, page numbers) see 30–60% reduction. Plain academic papers with minimal formatting see smaller gains.
Client-side processing limit — PDFs over 10MB may be slow or fail on low-end devices. The 10MB file cap reflects practical browser memory limits.
No image/table extraction — the current parser only extracts text. Images, tables, and complex layouts are not preserved.
Browser-only — no backend API or server-side parsing yet. CLI tools are planned.

🗺 Roadmap

Backend API — REST endpoint for server-side PDF conversion
Server-side parsing — offload heavy processing to a worker service
Auth & API keys — secure access for programmatic use
CLI tool — standalone binary for terminal workflows (ratmd convert file.pdf)
Batch processing — convert multiple PDFs in a single operation
Image extraction — preserve embedded images in output

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
app/web		app/web
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
nginx.conf		nginx.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ What is RatMD

🚀 Features

📦 Tech Stack

📁 Project Structure

🛠 Getting Started

Prerequisites

Installation

Development

Build

🐳 Docker

🚀 Deploy

Vercel (one-click)

Vercel (manual CLI)

CI/CD

⚠️ Known Limitations

🗺 Roadmap

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨ What is RatMD

🚀 Features

📦 Tech Stack

📁 Project Structure

🛠 Getting Started

Prerequisites

Installation

Development

Build

🐳 Docker

🚀 Deploy

Vercel (one-click)

Vercel (manual CLI)

CI/CD

⚠️ Known Limitations

🗺 Roadmap

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages