[pull] main from containerd:main#368
Merged
Merged
Conversation
The pull progress reporter resets lastSeenTimestamp on every tick where activeReqs == 0, but never on the transition to a non-zero count. When a pull is held in content.OpenWriter (idle in HTTP terms) and then unblocks, the next request can be cancelled less than `timeout` after it was actually issued — its first byte must arrive within whatever fraction of `timeout` remains on the timer captured during the previous idle tick. Track the previous tick's activeReqs and reset the timer on the 0→1 transition so a newly-issued request always gets a full timeout window to produce its first byte. This deflakes TestCRIImagePullTimeout/HoldingContentOpenWriterWithLocalPull, which hits ghcr.io directly and can exceed the shrunken window during auth handshakes in CI. Signed-off-by: Derek McGowan <derek@mcg.dev>
…-reset cri: reset pull progress timer on idle→active transition
The CRI progress reporter cancels an image pull if it sees no progress for 5 seconds. It tracks this through active HTTP requests. During remote fetches, the HTTP response reader is closed via a deferred call after `content.Copy` completes. Diagnosis: `content.Copy` handles both downloading the stream and committing the writer to the content store. Any delays during the database commit phase (e.g. from database locks, slow disk syncs, or concurrent pull deduplication blocks) keep the HTTP connection open. The progress reporter sees the request is still active (`activeReqs = 1`) but no new bytes are coming in, leading to a premature timeout cancellation. Reproduction: We reproduced this flakiness deterministically on a GCE VM under a simulated 2 Mbps ingress bandwidth limit using Linux traffic control ingress policing (`tc filter ... action police rate 2mbit`). Under this slowness, the download took longer than the progress timeout during the slow commit phase, triggering context cancellation and failing the `TestCRIImagePullTimeout/HoldingContentOpenWriterWithLocalPull` test. Solution: To fix this, we wrap the HTTP reader in a `closeOnEOFReader` or `closeOnEOFReadSeeker` before handing it to `content.Copy`. If the underlying connection reader implements `io.Seeker`, it is dynamically wrapped in `closeOnEOFReadSeeker` to forward `Seek` operations. This ensures that O(1) Range seeks are fully preserved during network resumes or retries. The wrappers automatically close the underlying network stream as soon as `Read()` returns `io.EOF` (when the download completes, before the database commit begins). This drops `activeReqs` to `0` early, freeing the socket and preventing progress timeouts during commits. A `sync.Once` ensures that subsequent deferred `Close()` calls do not double-decrement the reporter. How it was tested: Verified the fix on a GCE VM under a simulated 2 Mbps ingress bandwidth limit. Verified seeker safety via standalone logic audits and trace proofs. Assisted-by: Antigravity Signed-off-by: Samuel Karp <samuelkarp@google.com>
The TestCRIImagePullTimeout test case "NoDataTransferred" flaked under constrained networks because the test proxy mirror registry used a blocking ReadAtLeast call to forward bytes to containerd. This blocking wait (up to 4KB) meant the mirror registry server completely stopped forwarding data during network slowness, triggering containerd's aggressive 5-second progress timeout and canceling the pull before it could reach its 3MB circuit-breaker limit. This is resolved by changing the proxy's custom copy loop from io.ReadAtLeast(src, buf, len(buf)) to standard src.Read(buf). This streams network chunks to containerd immediately as they arrive, preventing false timeout cancellations while maintaining correct circuit-breaker byte tracking. Assisted-by: Antigravity Signed-off-by: Samuel Karp <samuelkarp@google.com>
remotes: close fetch reader immediately on EOF
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )