diff --git a/docs/html-protocol.md b/docs/html-protocol.md
index 0de34eb..2b93022 100644
--- a/docs/html-protocol.md
+++ b/docs/html-protocol.md
@@ -96,39 +96,95 @@ Both forms are valid. Verifying clients should handle either case.
Before hashing, content MUST be canonicalized:
-1. Strip all HTML tags (extract text content only)
-2. Collapse all whitespace sequences to a single space
-3. Trim leading and trailing whitespace
-4. Encode as UTF-8
+1. Parse the HTML and extract text nodes in document order
+2. Strip all HTML markup (tags and attributes); only the text content contributes to the hash
+3. Collapse all whitespace sequences to a single space
+4. Trim leading and trailing whitespace
+5. Apply the text normalization defined by the `@htmltrust/canonicalization` library (NFKC, quote normalization, dash normalization, invisible character stripping)
+6. Encode as UTF-8
-The resulting string is hashed with SHA-256 and prefixed: `sha256:`.
+The resulting string is hashed with SHA-256 and expressed as `sha256:`, where `` is the unpadded Base64 encoding of the 32-byte digest.
+
+### Text-only scope
+
+The canonicalization hashes **text content only**, not the HTML markup or attributes that surround it. This means an adversary with possession of signed text MAY:
+
+- Rewrap the text in misleading block elements (e.g., change an `