Skip to content

perf: cache source lines, stream file bytes, and trim hot-path allocations#1535

Open
Thorium wants to merge 1 commit into
ionide:mainfrom
Thorium:perf-opt-2
Open

perf: cache source lines, stream file bytes, and trim hot-path allocations#1535
Thorium wants to merge 1 commit into
ionide:mainfrom
Thorium:perf-opt-2

Conversation

@Thorium

@Thorium Thorium commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Reduces allocations and redundant work in several FsAutoComplete hot paths. Unlike a "looks faster" change set, every optimization here was validated one-by-one with real BenchmarkDotNet A/B measurements (pre-optimization baseline vs. this branch), measuring both time and allocations. Changes that turned out to be empirically pointless or to break the public API for marginal gain were deliberately excluded (see below).

All changes are body/implementation-only and backward compatible — no public signature (.fsi) changes.

What's included (with evidence)

1. RoslynSourceTextFile.Lines — lazily cache the line array

Previously every access recomputed sourceText.Lines |> Seq.toArray |> Array.map (_.ToString()). Now cached on first use.

Repeated access on one file instance (2000-line file):

Reads Before After
249 µs / 431 KB one-time build, then free
10× 2,291 µs / 4,306 KB ~free after first
100× 22,487 µs / 43,063 KB ~free after first

Repeat reads (the common case over a file's lifetime) go from O(lines) each to effectively free; first read costs the same as before.

2. FileSystem file-content read — stream bytes via chunked ISourceText.CopyTo

Replaces file.Source.ToString() |> Encoding.UTF8.GetBytes (which materializes the whole file as an intermediate string) with a chunked copy.

File size Time before → after Alloc before → after
1,000 lines 178 µs → 68 µs (−62%) 332 KB → 356 KB (+7%)
20,000 lines 1,733 µs → 1,191 µs (−31%) 7,067 KB → 5,544 KB (−22%)

Clear time win at both sizes and an allocation win on large files. Small files show a slight allocation bump from MemoryStream buffer doubling — addressed in a follow-up (see below).

3. CompilerProjectOption.SourceFilesTagged — fuse two passes into one

Avoids an extra List.map/Array.toList pass when tagging source-file paths. Time is dominated by normalizePath and stays within noise; the win is allocation:

Path Allocation
TransparentCompiler (list) −50%
BackgroundCompiler (array) −37%

4. processFSIArgs — O(n²) Array.append-in-fold → O(n) ResizeArray

At its real call site (SetFSIAdditionalArguments, a small user-configured arg list) the impact is negligible (realistic N≈8: ~800 B saved, time within noise). Included as a correctness/scaling improvement — it is dramatic only at unrealistic sizes (N=1000: 25× faster, 47× less allocation).

5. OTel tag source.textsource.length

The trace tag previously boxed the entire file contents as a string. Replaced with the integer length, eliminating that per-trace allocation.

6. Completion retry only re-reads the file when content is actually stale

getCompletions now takes a rereadFile flag, so the document is re-read only when the error indicates stale content (line-lookup failure / trigger-char mismatch), not on every retry — avoiding redundant I/O on the hot completion path.

What was deliberately excluded (also evidence-based)

  • Lexer.tokenizeLine define-parsing (Array.fold → for-loop): allocations identical (−1%), time within noise — FSharpSourceTokenizer creation/scan dominates. No measurable benefit; dropped.
  • LoadedProject lazy-cache of SourceFilesTagged: required adding a public record field (_sourceFilesTagged: Lazy<…>) to AdaptiveServerState.fsi — a source/binary-breaking public-API change leaking an implementation detail into the signature — for a benefit that measured as marginal (the underlying computation is cheap, per Correct build script name #3). Dropped: compatibility cost outweighed the gain.

Follow-up (not in this PR)

Pre-sizing the byte buffer in #2 (new MemoryStream(length)) measured as a large further win (1k lines: 68 µs → 14 µs, 356 KB → 170 KB; 20k lines: 1,191 µs → 725 µs, 5,544 KB → 2,866 KB) and removes the small-file allocation bump. Can be added here or as a separate PR.

Methodology

BenchmarkDotNet v0.14, .NET 8.0, Release/optimized toolchain, [<MemoryDiagnoser>], on an i9-13900H. Each optimization exercised through its real code path (or, for the private/algorithmic ones, an exact head-to-head reproduction of the old vs. new body). Numbers reported correspond to the code in this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant