feat(llms): emit llms.txt, llms-full.txt, and per-page markdown#2
Merged
Conversation
Add support for the llms.txt standard (llmstxt.org) so the generated Next.js app exposes a structured surface for LLM ingestion. On every initial build and on MDX, fonts, public, or config changes, the generator now writes three artifacts into the output app's public/: - llms.txt: a curated markdown index of all docs pages, grouped by section and category, ordered by frontmatter. - llms-full.txt: every page's raw MDX body concatenated for one-shot ingestion by long-context models. - <slug>.md: per-page raw markdown for single-page fetches. URLs resolve against the configured site URL when available and fall back to relative paths otherwise, mirroring the sitemap/robots flow. Stale per-page .md files left over from renamed or deleted pages are cleaned up across runs via a .doccupine-llms-manifest.json tracked in the output dir.
Generated apps failed npm install with code 1 on pnpm 10+ because pnpm blocks native dep build scripts (core-js, sharp, etc.) by default. Ship a pnpm-workspace.yaml using the new allowBuilds schema that explicitly disables those scripts so install proceeds silently. Switch the install spawn to inherit stdio so future install errors are visible instead of being swallowed by the unread pipe.
Generated llms.txt, llms-full.txt, and per-page .md files are now piped through prettier (using the user's resolved config from the output directory) so they match .prettierrc on first write. Without this, headings missed their trailing blank lines and code-fence styles drifted, leaving every regenerated file dirty in git status until manually reformatted. Promotes prettier from devDependencies to dependencies. Also adds a trailing newline to .doccupine-llms-manifest.json.
Running prettier on every per-page .md plus llms-full.txt on every MDX keystroke made watch mode noticeably slow (full rebuild ~5-10s on a typical docs site). Drop the prettier integration entirely and instead extend the generated app's .prettierignore to cover the artifacts we emit: public/llms.txt public/llms-full.txt public/**/*.md .doccupine-llms-manifest.json These are generated, machine-consumed files; users shouldn't be hand-editing them, and pnpm format shouldn't touch them. Per-page writes are also parallelized via Promise.all. Build time on a 30+ page site drops from multi-second to ~0.6s end to end. Moves prettier back to devDependencies.
Reformat embedded MDX in src/templates/mdx/**/*.mdx.ts so the content that becomes the user's docs/ starter set is already prettier-stable. Adds blank lines after headings, switches outer tilde fences to backtick fences (with the right level for nesting), and normalizes code-block indentation. With clean source, the per-page .md files emitted by the llms generator are prettier-conformant on first write, so we can drop the prettierignore additions added in 4aaf610.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
public/dir on every build / watch updatepublic/llms.txtis a curated markdown index of all docs pages, grouped by## Sectionand### Category, sorted bycategoryOrderthenorderpublic/llms-full.txtconcatenates every page's raw MDX (JSX preserved) for one-shot LLM ingestionpublic/<slug>.mdper-page raw markdown so agents can fetch a single page without parsing the React appconfig.jsonchanges; falls back to relative URLs when no site URL is configured.mdfiles are cleaned up across runs via a small.doccupine-llms-manifest.jsonwritten to the output dir