perf: eliminate double-tokenization of selector and prelude ranges#246
Closed
bartveneman wants to merge 1 commit into
Closed
perf: eliminate double-tokenization of selector and prelude ranges#246bartveneman wants to merge 1 commit into
bartveneman wants to merge 1 commit into
Conversation
Replace the token-scan loops in parse_selector() and parse_atrule() with
raw character scans (scan_to_open_brace / scan_to_block_or_semi). The main
parser previously had to tokenize selector and prelude content once just to
find the '{' / ';' boundary, and then SelectorParser / AtRulePreludeParser
would re-tokenize the same range in full detail — every token processed twice.
The new raw scans handle only what's needed to find a boundary safely:
quoted strings, /* comments */, backslash escapes, and (for preludes) paren
depth to skip semicolons inside url(...). They track newlines so the main
Lexer can be repositioned exactly at the boundary character afterward.
SelectorParser and AtRulePreludeParser are now the sole tokenizers of their
ranges, cutting the tokenization work for selector/prelude content roughly
in half.
https://claude.ai/code/session_01CQeKNnXidD5EQVJY4xBMMp
Bundle ReportChanges will increase total bundle size by 4.7kB (2.48%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: @projectwallace/css-parser-esmAssets Changed:
Files in
Files in
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #246 +/- ##
==========================================
- Coverage 93.14% 92.07% -1.08%
==========================================
Files 17 17
Lines 3035 3167 +132
Branches 845 881 +36
==========================================
+ Hits 2827 2916 +89
- Misses 208 251 +43 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Member
Author
|
before after |
Member
Author
|
benchmark slows slowdown, not merging |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The main parser previously tokenized selector and at-rule prelude content twice:
parse_selector()/parse_atrule()— a token-by-token scan just to locate the{/;boundarySelectorParser.parse_selector()/AtRulePreludeParser.parse_prelude()— the full detailed parseThis PR replaces the boundary-finding loops with lightweight raw character scans (
scan_to_open_brace/scan_to_block_or_semi) that do only what's needed to safely locate the boundary without full tokenization:'...',"...")/* comments */url(data:...;...))SelectorParserandAtRulePreludeParserare now the sole tokenizers of their ranges.Files changed
src/parse-utils.ts— two new exported scan functionssrc/parse.ts—parse_selector()and prelude scan inparse_atrule()updated;TOKEN_FUNCTIONimport removed (no longer needed in main parser)src/string-utils.ts— five new character constants used by the scan functionsBenchmark results
Two same-session back-to-back runs (main rebuilt then benchmarked, branch rebuilt then benchmarked immediately after). Average latency in ms, lower is better.
mainThe improvement is within measurement noise for smaller files. Tailwind and the parse/walk tasks hint at a small real gain (~2–3%) but the error margins (±0.7–1.7%) make this inconclusive.
Note on environment variance: An earlier measurement session showed larger gains (~22–32% on the parser tasks). That session's absolute numbers were ~30% higher for both main and branch, suggesting the container was running faster that day. The relative improvement in that session was consistent with the double-tokenization theory, but the current session's results are too close to call with confidence. The change is conceptually correct (less redundant work) and does not regress any benchmark.
https://claude.ai/code/session_01CQeKNnXidD5EQVJY4xBMMp