Skip to content

feat(llms): emit llms.txt, llms-full.txt, and per-page markdown#2

Merged
luangjokaj merged 6 commits into
mainfrom
feat/llms-txt-support
May 8, 2026
Merged

feat(llms): emit llms.txt, llms-full.txt, and per-page markdown#2
luangjokaj merged 6 commits into
mainfrom
feat/llms-txt-support

Conversation

@luangjokaj
Copy link
Copy Markdown
Collaborator

@luangjokaj luangjokaj commented May 8, 2026

Summary

  • Adds support for the llms.txt standard by emitting three artifacts into the generated Next.js app's public/ dir on every build / watch update
  • public/llms.txt is a curated markdown index of all docs pages, grouped by ## Section and ### Category, sorted by categoryOrder then order
  • public/llms-full.txt concatenates every page's raw MDX (JSX preserved) for one-shot LLM ingestion
  • public/<slug>.md per-page raw markdown so agents can fetch a single page without parsing the React app
  • Mirrors the existing sitemap / robots pattern; regenerates on initial build, MDX add/change/delete, and config.json changes; falls back to relative URLs when no site URL is configured
  • Stale per-page .md files are cleaned up across runs via a small .doccupine-llms-manifest.json written to the output dir

luangjokaj added 6 commits May 8, 2026 12:20
Add support for the llms.txt standard (llmstxt.org) so the generated
Next.js app exposes a structured surface for LLM ingestion.

On every initial build and on MDX, fonts, public, or config changes,
the generator now writes three artifacts into the output app's public/:

- llms.txt: a curated markdown index of all docs pages, grouped by
  section and category, ordered by frontmatter.
- llms-full.txt: every page's raw MDX body concatenated for one-shot
  ingestion by long-context models.
- <slug>.md: per-page raw markdown for single-page fetches.

URLs resolve against the configured site URL when available and fall
back to relative paths otherwise, mirroring the sitemap/robots flow.
Stale per-page .md files left over from renamed or deleted pages are
cleaned up across runs via a .doccupine-llms-manifest.json tracked in
the output dir.
Generated apps failed npm install with code 1 on pnpm 10+ because
pnpm blocks native dep build scripts (core-js, sharp, etc.) by
default. Ship a pnpm-workspace.yaml using the new allowBuilds schema
that explicitly disables those scripts so install proceeds silently.

Switch the install spawn to inherit stdio so future install errors
are visible instead of being swallowed by the unread pipe.
Generated llms.txt, llms-full.txt, and per-page .md files are now
piped through prettier (using the user's resolved config from the
output directory) so they match .prettierrc on first write. Without
this, headings missed their trailing blank lines and code-fence
styles drifted, leaving every regenerated file dirty in git status
until manually reformatted.

Promotes prettier from devDependencies to dependencies. Also adds
a trailing newline to .doccupine-llms-manifest.json.
Running prettier on every per-page .md plus llms-full.txt on every
MDX keystroke made watch mode noticeably slow (full rebuild ~5-10s
on a typical docs site). Drop the prettier integration entirely and
instead extend the generated app's .prettierignore to cover the
artifacts we emit:

  public/llms.txt
  public/llms-full.txt
  public/**/*.md
  .doccupine-llms-manifest.json

These are generated, machine-consumed files; users shouldn't be
hand-editing them, and pnpm format shouldn't touch them. Per-page
writes are also parallelized via Promise.all. Build time on a 30+
page site drops from multi-second to ~0.6s end to end.

Moves prettier back to devDependencies.
Reformat embedded MDX in src/templates/mdx/**/*.mdx.ts so the content
that becomes the user's docs/ starter set is already prettier-stable.
Adds blank lines after headings, switches outer tilde fences to
backtick fences (with the right level for nesting), and normalizes
code-block indentation. With clean source, the per-page .md files
emitted by the llms generator are prettier-conformant on first write,
so we can drop the prettierignore additions added in 4aaf610.
@luangjokaj luangjokaj merged commit 90cb1e8 into main May 8, 2026
1 check passed
@luangjokaj luangjokaj deleted the feat/llms-txt-support branch May 8, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant