Skip to content

Improve markdown conversion quality#1229

Open
GBKS wants to merge 1 commit intomasterfrom
feature/markdown-negotiation-tweaks
Open

Improve markdown conversion quality#1229
GBKS wants to merge 1 commit intomasterfrom
feature/markdown-negotiation-tweaks

Conversation

@GBKS
Copy link
Copy Markdown
Contributor

@GBKS GBKS commented Apr 22, 2026

Refactor HTML-to-Markdown converter to eliminate navigation chrome and noise artifacts, improving content quality for AI consumption.

Based on BOLTy's feedback in this comment.

Refactor HTML-to-Markdown converter to eliminate navigation chrome and noise artifacts, improving content quality for AI consumption.
@GBKS GBKS self-assigned this Apr 22, 2026
@GBKS GBKS added Enhancement New feature or request Dev Development-focused tasks. labels Apr 22, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 22, 2026

Deploy Preview for bitcoin-design-site ready!

Name Link
🔨 Latest commit ec6661a
🔍 Latest deploy log https://app.netlify.com/projects/bitcoin-design-site/deploys/69e870cfb4489f0008ec8dde
😎 Deploy Preview https://deploy-preview-1229--bitcoin-design-site.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Netlify Edge HTML-to-Markdown conversion to reduce non-content “chrome” in the generated markdown and improve the content signal for AI/agent consumption.

Changes:

  • Introduces a shared selector list to prune common non-content elements (nav/header/footer, sidebars, anchors, etc.) before conversion.
  • Updates conversion to render only a selected “primary content” root (e.g., article/main) instead of always converting the full <body>.
  • Improves the fallback (regex-based) converter by stripping more noise and refining link/image rendering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread netlify/edge-functions/markdown-negotiation.ts
Comment thread netlify/edge-functions/markdown-negotiation.ts
@GBKS GBKS requested a review from sbddesign April 23, 2026 05:51
@swedishfrenchpress
Copy link
Copy Markdown
Collaborator

this is about over my head. i'm not confident i can give an accurate / valuable approval / review.

@GBKS
Copy link
Copy Markdown
Contributor Author

GBKS commented Apr 23, 2026

@swedishfrenchpress all good. Interestingly, the best entity to give feedback on this PR are AI agents, since it's about improving how they can understand the site content.

@sbddesign
Copy link
Copy Markdown
Collaborator

this is about over my head. i'm not confident i can give an accurate / valuable approval / review.

It's probably not over your agent's head

Copy link
Copy Markdown
Collaborator

@sbddesign sbddesign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr - this is a huge improvement, though there are edge cases you could defend against

My first reaction was "LGTM".

I asked BOLTy, and he sees it as a clear improvement over the one before.

However, there are some edge cases where some of the stripping might be too aggressive. For example, code samples in inline SVG and pages that don't follow the structured article format.

I don't actually think the guide contains any of those three things, so I think you're good to go. However, if you wanted to be cautious in case such things existed in the guide in the future, you could feed both these feedback below into your coding agent to have it fixed.

BOLTy's Feedback

This is a clear improvement for agent-readability. I compared the deploy preview against live, including https://bitcoin.design/guide/daily-spending-wallet/, and the preview is much better at surfacing actual page content instead of nav/sidebar noise.

A few things worth feeding back to the agent before merge:

Fallback path may break HTML code samples
In the regex fallback flow, noise stripping happens before <pre> blocks are protected. If a guide page includes HTML examples with tags like <nav>, <header>, or <footer>, those examples could get mangled.

SVG stripping may be overly aggressive
Removing inline <svg> is probably fine for chrome/icons, but it could drop meaningful content if any pages use inline SVG diagrams or illustrations. Worth checking a few representative pages.

Content-root selection should be spot-checked on different page types
The new root-selection logic is directionally good, but it’s opinionated. I’d sanity-check homepage, guide article pages, and section/index pages to make sure it’s not narrowing too aggressively or skipping useful content.

Overall: strong improvement, just worth validating those edge cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Dev Development-focused tasks. Enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants