Skip to content

SwiftQUIC: PERF: acknowledgedSendDataInner is using too much CPU#17

Merged
agnosticdev merged 3 commits into
mainfrom
agnosticdev/PerfAcknowledgedSendData
Jun 16, 2026
Merged

SwiftQUIC: PERF: acknowledgedSendDataInner is using too much CPU#17
agnosticdev merged 3 commits into
mainfrom
agnosticdev/PerfAcknowledgedSendData

Conversation

@agnosticdev

Copy link
Copy Markdown
Collaborator

In StreamSendBuffer acknowledgedSendDataInner does a insert and removal operation for almost all ACKs that are processed. This is very costly based on the CPU metrics below.
The new change uses a custom struct that holds an array of Range<StreamOffset> to not pay the overhead of dealing with Set operations on each ACK. acknowledgedSendDataInner now has two branches, a in-order branch that only does a remove operation, and an out of order path that still falls back to using both an insert and remove operation. The result of this is about a 7.3 gigacycle reduction in CPU for QUICTransfer:

Baseline:

7.46 G  7.4%	-	         StreamSendBuffer.acknowledgedSendData(offset:length:log:)
7.42 G  7.3%	52.13 M	        StreamSendBuffer.acknowledgedSendDataInner(offset:length:log:)
2.91 G  2.9%	31.01 M	           specialized RangeSet.insert(contentsOf:)
2.80 G  2.8%	135.26 M	       RangeSet.Ranges._remove(contentsOf:)
639.99 M  0.6%	33.00 M	           RangeSet.Ranges.subscript.getter
567.07 M  0.6%	9.00 M	           RangeSet.remove(contentsOf:)
132.83 M  0.1%	-	               FrameArrayQueue.claim(fromStart:)
61.00 M  0.1%	4.00 M	           swift_release

With this change:

161.49 M  96.2%	1.87 M	        StreamSendBuffer.acknowledgedSendData(offset:length:log:)
159.62 M  95.1%	2.00 M	          StreamSendBuffer.acknowledgedSendDataInner(offset:length:log:)
151.62 M  90.3%	22.00 M	          specialized StreamSendBuffer.acknowledgedSendDataInner(offset:length:log:)
129.62 M  77.2%	-	              FrameArrayQueue.claim(fromStart:)
97.76 M  58.2%	20.00 M	          Frame.claim(fromStart:fromEnd:adjustSingleIPAggregate:)
31.86 M  19.0%	-	              FrameArray._claim(fromStart:existingLength:removeClaimedFrames:)
6.00 M   3.6%	6.00 M	  Frame.claim(fromStart:fromEnd:adjustSingleIPAggregate:)

@agnosticdev agnosticdev requested a review from rnro June 14, 2026 21:56
}

var acknowledgedDataRanges = RangeSet<StreamOffset>()
var acknowledgedDataRanges = AcknowledgedRanges()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change here is doing two things:

  1. Making sure we don't insert and remove in the happy path
  2. Changing the storage type from RangeSet to a custom type

Can we see a perf difference if we do (1) but keep (2) using the standard type?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the first one is the most impactful. I ran a test with RangeSet<StreamOffset>() and this does work fine, but we still incur more CPU about 180 megacycles more. This is still a big win in performance but I think we need to do more investigation to see why we are getting different performance characteristics with RangeSet<StreamOffset> compared to [Range<StreamOffset>]. Let's get this change in and I can do some follow up investigation to see what could be the difference here between these two types.

@agnosticdev agnosticdev merged commit 25821c5 into main Jun 16, 2026
39 of 40 checks passed
@agnosticdev agnosticdev deleted the agnosticdev/PerfAcknowledgedSendData branch June 16, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants