perf: cache source lines, stream file bytes, and trim hot-path allocations#1535
Open
Thorium wants to merge 1 commit into
Open
perf: cache source lines, stream file bytes, and trim hot-path allocations#1535Thorium wants to merge 1 commit into
Thorium wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reduces allocations and redundant work in several FsAutoComplete hot paths. Unlike a "looks faster" change set, every optimization here was validated one-by-one with real BenchmarkDotNet A/B measurements (pre-optimization baseline vs. this branch), measuring both time and allocations. Changes that turned out to be empirically pointless or to break the public API for marginal gain were deliberately excluded (see below).
All changes are body/implementation-only and backward compatible — no public signature (
.fsi) changes.What's included (with evidence)
1.
RoslynSourceTextFile.Lines— lazily cache the line arrayPreviously every access recomputed
sourceText.Lines |> Seq.toArray |> Array.map (_.ToString()). Now cached on first use.Repeated access on one file instance (2000-line file):
Repeat reads (the common case over a file's lifetime) go from O(lines) each to effectively free; first read costs the same as before.
2.
FileSystemfile-content read — stream bytes via chunkedISourceText.CopyToReplaces
file.Source.ToString() |> Encoding.UTF8.GetBytes(which materializes the whole file as an intermediate string) with a chunked copy.Clear time win at both sizes and an allocation win on large files. Small files show a slight allocation bump from
MemoryStreambuffer doubling — addressed in a follow-up (see below).3.
CompilerProjectOption.SourceFilesTagged— fuse two passes into oneAvoids an extra
List.map/Array.toListpass when tagging source-file paths. Time is dominated bynormalizePathand stays within noise; the win is allocation:TransparentCompiler(list)BackgroundCompiler(array)4.
processFSIArgs— O(n²)Array.append-in-fold → O(n)ResizeArrayAt its real call site (
SetFSIAdditionalArguments, a small user-configured arg list) the impact is negligible (realistic N≈8: ~800 B saved, time within noise). Included as a correctness/scaling improvement — it is dramatic only at unrealistic sizes (N=1000: 25× faster, 47× less allocation).5. OTel tag
source.text→source.lengthThe trace tag previously boxed the entire file contents as a string. Replaced with the integer length, eliminating that per-trace allocation.
6. Completion retry only re-reads the file when content is actually stale
getCompletionsnow takes arereadFileflag, so the document is re-read only when the error indicates stale content (line-lookup failure / trigger-char mismatch), not on every retry — avoiding redundant I/O on the hot completion path.What was deliberately excluded (also evidence-based)
Lexer.tokenizeLinedefine-parsing (Array.fold→ for-loop): allocations identical (−1%), time within noise —FSharpSourceTokenizercreation/scan dominates. No measurable benefit; dropped.LoadedProjectlazy-cache ofSourceFilesTagged: required adding a public record field (_sourceFilesTagged: Lazy<…>) toAdaptiveServerState.fsi— a source/binary-breaking public-API change leaking an implementation detail into the signature — for a benefit that measured as marginal (the underlying computation is cheap, per Correct build script name #3). Dropped: compatibility cost outweighed the gain.Follow-up (not in this PR)
Pre-sizing the byte buffer in #2 (
new MemoryStream(length)) measured as a large further win (1k lines: 68 µs → 14 µs, 356 KB → 170 KB; 20k lines: 1,191 µs → 725 µs, 5,544 KB → 2,866 KB) and removes the small-file allocation bump. Can be added here or as a separate PR.Methodology
BenchmarkDotNet v0.14, .NET 8.0, Release/optimized toolchain,
[<MemoryDiagnoser>], on an i9-13900H. Each optimization exercised through its real code path (or, for the private/algorithmic ones, an exact head-to-head reproduction of the old vs. new body). Numbers reported correspond to the code in this branch.