RDoc-3786 Add strict canonical verification#2404
Open
poissoncorp wants to merge 11 commits intoravendb:mainfrom
Open
RDoc-3786 Add strict canonical verification#2404poissoncorp wants to merge 11 commits intoravendb:mainfrom
poissoncorp wants to merge 11 commits intoravendb:mainfrom
Conversation
f4335d8 to
5eaad3c
Compare
Lwiel
reviewed
Apr 28, 2026
Introduce the canonical-redirects Docusaurus plugin. loadContent reads scripts/redirects.json, validates schema + cycles, and builds the redirect map. postBuild walks the emitted HTML, rewrites every <link rel="canonical"> to the current-version equivalent (legacy versions get a self-canonical), and verifies each rewritten canonical against the Docusaurus route universe. - CLI: `npm run validate-redirects` (scripts/validate-redirects.ts) runs the same schema + cycle checks standalone. - CI: DOCUSAURUS_STRICT_CANONICALS=true gates strict builds in build-on-pr.yml. - Handles both pretty-printed and minified HTML output from Docusaurus.
…s; hide templates Move version handling into a single source of truth and generate SEO assets from it. - scripts/lib/version-policy.js exports CURRENT_VERSION + LEGACY_VERSIONS. docusaurus.config.ts, the canonical rewriter, the edge handler, and the generators all import from here. - scripts/generate-robots.js renders scripts/robots-templates/*.template.txt with the current legacy list, writing build/robots.txt on every build. - scripts/split-sitemap.ts replaces split-sitemap.js; core logic lives in src/lib/split-sitemap with unit tests. - src/plugins/templates-noindex-plugin injects noindex,nofollow into /templates/* so doc-authoring scaffolding doesn't surface in search. - 6.0 and 7.0 are marked legacy in this commit's version-policy update.
…uild time Make the edge function and build-time resolver equivalent, and move cycle detection upstream. - scripts/handle_redirects.js gains a bounded chain-collapse loop so N-hop chains become exactly one 301 at the edge. - CloudFront Functions runtime is single-file and can't resolve project-local imports, so two values are inlined in handle_redirects.js and guarded by parity tests against drift: compareVersions (mirror of the plugin's TS copy at lib/compare-versions.ts) and the CURRENT_VERSION literal (mirror of scripts/lib/version-policy.js). The parity test at __tests__/compare-versions-parity.test.ts reads handle_redirects.js as text, extracts each mirrored value, and asserts behavioural / literal equality with the authoritative source. - validateNoCycles runs as part of npm run validate-redirects and the plugin's loadContent, making cycles unreachable at runtime. - With cycles impossible upstream, both resolveChain and the edge chain loop drop their runtime visited set. - __tests__/handle-redirects.test.ts covers static-asset pass-through, the /templates + /guides + /cloud branches, versioned / versionless URIs, minimumVersion gating, and chain collapse.
…ckup target Backfill 26 redirects the strict verifier surfaced the moment it was turned on: - 19 from the initial strict-mode pass (pages moved or renamed in 7.2 without accompanying redirects.json updates). - 7 more from the compare-exchange and Studio revisions restructures. - One pre-existing malformed target (/encrypted-backup missing the leading slash). CI is flipped to DOCUSAURUS_STRICT_CANONICALS=true in the same commit — the gate can't come on before the data is complete.
…ble verifier errors Tighten the schema and polish the failure-mode surface. - validateRedirects now requires minimumVersion on versioned (docs-area) keys; /guides and /cloud are versionless content areas so they're exempt. Stray minimumVersion on versionless entries isn't flagged — PR review catches it. - validateTargetsExist checks each targetUrl resolves to a real .md / .mdx file (or an index.* under a directory). Bare directories backed only by _category_.json are rejected — redirects should always land on a concrete page. - Legacy-version pages get <meta name="robots" content="noindex,follow"> injected idempotently in addition to the self-canonical, so search engines don't keep old pages in the index even when a crawler follows an inbound link directly. - Verifier errors ship a fix: block with a ready-to-paste redirects.json entry and the npm run validate-redirects command, collapsing the diagnose → fix loop.
…iagrams Add the plugin's public README with two pre-rendered SVG diagrams: - data-flow.svg — the loadContent + postBuild sequence from routesPaths to canonical rewrite to verification. - resolve-chain.svg — the resolveChain flowchart (gate check → terminal return, no runtime cycle guard). Source .mmd files live alongside the SVGs for regeneration via @mermaid-js/mermaid-cli. Prettier formatting is applied across the plugin source tree so everything ships consistent.
Missing <link rel="canonical"> on a versioned page previously only warned — it wasn't added to the verifier's issues[] and so strict mode wouldn't fail. Merge it into the unified issues pipeline so every canonical problem surfaces through the same gate. README failure-mode bullet updated to match.
Auto-fix 40 curly-brace and unused-eslint-disable warnings across the touched plugin + split-sitemap sources. For the 4 no-console warnings in build-time plugin loggers, swap console.log/warn for @docusaurus/logger — the idiomatic Docusaurus plugin logger writes through process.stdout / stderr with colored level prefixes and doesn't trip no-console.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue link
RDoc-3786 Rewrite canonical URLs between versions if there's an entry in redirects
Additional description
(didn't use ai to write it, so don't scan - read 😂)
This PR introduces a guarantee that every
<link rel="canonical">won't cause 404 across all pages.It introduces
DOCUSAURUS_STRICT_CANONICALSenv flag - when enabled, and if there's any invalid redirect or canonical mismatch, the build fails.New features:
scripts/redirects.json, then verifies each rewritten canonical against the actual docs structurenpm run validate-redirectsCLI - comprehensive validation ofredirects.jsonfile. Checks if schema is correct and validates the data, runnable without a full build - added as a CI stepscripts/handle_redirects.js) now handles the hops to pair it with plugin validation. Added parity tests and sprinkled with unit tests that check if the scripts works as expected and handles all corner cases before pushing to prodversion-policy.jswhich is a single source of truth. Whenever RavenDB version becomes legacy, simply adding version to this list cascades all necessary SEO ops.templates-noindex-pluginwhich hides/templates/from search enginesRedirects JSON fixes:
encrypted-backup)So, new infra consists of:
handle_redirectsas it was before, now just tested before pushing to AWS :)validate-redirectsCI toolversion-policy.jsas mentioned to control the legacy docs opsgenerate-robots.jsrendersrobots.txtfrom template on every build, keeping legacy disallows in syncsplit-sitemapported to TypeScript to make it testableSprinkled it with docs for agents, added
.svg-s with UML diagrams on how the build flow works, and how redirect chains are resolved (the akward situation when we moved the document more than once between versions).Once merged, let's remember to set strict canonicals env on CD.
Type of change
/templatesor readme)Changes in docs URLs
/scripts/redirects.jsonfile, setDocuments MovedPR label)Changes in UX/UI