diff --git a/README.md b/README.md index 936fbaa..3642244 100644 --- a/README.md +++ b/README.md @@ -40,6 +40,35 @@ pdflatex htmltrust.tex The compiled PDF will be output as `paper/htmltrust.pdf`. +## Known Issue: Runtime DOM Mutation Breaks Verification + +HTMLTrust signs the **static HTML** that leaves the publishing pipeline. Browser verifiers, however, read the **live DOM** β€” the state of the page after every script on the page has finished running. If anything inside a `` is mutated between page load and verification, the verifier's recomputed `content-hash` will not match the signed one, and the signature will be reported as invalid even though it is cryptographically correct. + +### Concrete cases we have observed + +- **Hugo Blox docs theme** injects a `` into every `
` block at runtime. When a signed page contains a code block, the verifier sees an extra `Copy` token inside the signed region that the signer never saw.
+- Any **client-side syntax-highlighting** library (Prism, highlight.js) that rewrites a code block's inner HTML at load time has the same effect.
+- **Analytics, lazy-loading, or social-share injection** libraries that add nodes inside content containers will break verification if they touch a signed region.
+
+### Mitigations available today
+
+| Mitigation | Trade-off |
+|---|---|
+| Ensure no client-side script writes into `` descendants | Simplest, but constrains theme/framework choice |
+| Pre-render any decoration server-side (e.g., emit the "Copy" button into the static HTML so the signer hashes it) | Works, but every page-template change requires a re-bake |
+| Move runtime-injected decoration **outside** the `` (sibling, not child) | Often the cleanest fix when you control the script |
+| Read `outerHTML` from a pristine fetch instead of `element.innerHTML` for verification | Requires a verifier-side change; doesn't help current extensions |
+
+### Open spec question
+
+This is a real, general challenge for any content-signing protocol that targets browser-side verification. The spec needs to give implementations explicit guidance β€” likely a combination of:
+
+1. **Stage 1 canonicalization** SHOULD define a "skip-on-mutation-marker" mechanism (e.g., `data-htmltrust-ignore="true"` on a subtree) so themes can mark decoration that must be excluded from the hash.
+2. **Authoring guidance** SHOULD warn against injecting nodes inside signed regions at runtime.
+3. **Verifier guidance** MAY recommend fetching the original document and verifying against that, treating DOM-state verification as a separate, optional capability.
+
+This is tracked as an active open design question (see also [open design questions on the implementation page](https://www.htmltrust.org/implementation/#open-design-questions)). Community input is welcome.
+
 ## Companion Repositories
 
 | Repository | Description |
diff --git a/paper/htmltrust.tex b/paper/htmltrust.tex
index 05e11b9..2866458 100644
--- a/paper/htmltrust.tex
+++ b/paper/htmltrust.tex
@@ -59,7 +59,7 @@ \subsection{Signed HTML Blocks}
 
 \paragraph{Required attributes.} \texttt{keyid} (identifies the signer, resolvable per \S2.2), \texttt{signature} (the cryptographic signature, encoded per the hash encoding rules below), \texttt{content-hash} (hash of the canonicalized content, prefixed with the hash algorithm, e.g. \texttt{sha256:\ldots}), and \texttt{algorithm} (the signature algorithm, e.g. \texttt{ed25519}, \texttt{ecdsa}, or \texttt{rsa}).
 
-\paragraph{Hash and signature encoding (open feedback).} Hashes and signatures in this revision are encoded as unpadded Base64, which is shorter than hexadecimal by roughly one-third. We invite community feedback on whether hexadecimal (widespread in tooling such as git and TLS), Base32 (case-insensitive and easier to transcribe by hand), or another encoding would be preferable for broader ecosystem alignment.\footnote{Or Ecoji, anyone? A 32-byte SHA-256 digest encodes to 26 emoji characters via the Ecoji base-1024 alphabet, producing a delightful \texttt{content-hash="sha256:πŸŽ‚πŸ¦ŠπŸ™πŸŒΊπŸŽ¨πŸ•πŸš€πŸŒˆπŸŽ­πŸ”οΈβš‘πŸ€πŸ¦„βœ¨πŸŒŠπŸ„πŸŽͺπŸ–οΈπŸŒ»πŸŽ―πŸ¦πŸŽ²πŸŒ™πŸ¦‹πŸŽΈπŸŽƒ"}. It is, alas, \textit{longer} in wire bytes because each emoji occupies four UTF-8 bytes, so we have not adopted it. But we thought you should know it exists.}
+\paragraph{Hash and signature encoding (open feedback).} Hashes and signatures in this revision are encoded as unpadded Base64, which is shorter than hexadecimal by roughly one-third. We invite community feedback on whether hexadecimal (widespread in tooling such as git and TLS), Base32 (case-insensitive and easier to transcribe by hand), or another encoding would be preferable for broader ecosystem alignment.\footnote{Or Ecoji, anyone? A 32-byte SHA-256 digest encodes to 26 emoji characters via the Ecoji base-1024 alphabet, producing a delightfully unreadable \texttt{content-hash="sha256:<26 emoji from the Ecoji alphabet>"}. It is, alas, \textit{longer} in wire bytes because each emoji occupies four UTF-8 bytes, so we have not adopted it. But we thought you should know it exists.}
 
 \paragraph{Canonical signing payload.} The signature is computed over a deterministic binding string:
 \begin{lstlisting}[basicstyle=\ttfamily\footnotesize]