[Deepin-Kernel-SIG] [linux 6.18.y] [Upstream] fuse: use iomap for buffered reads + readahead#1851
[Deepin-Kernel-SIG] [linux 6.18.y] [Upstream] fuse: use iomap for buffered reads + readahead#1851opsiff wants to merge 16 commits into
Conversation
This reverts commit 01f84e4. This is a partial work of the next commit series, revert it and merge whole series. Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Most callers of iomap_iter_advance() do not need the remaining length returned. Get rid of the extra iomap_length() call that iomap_iter_advance() does. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit ca82a7e) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Move the iomap_readpage_iter() bio read logic into a separate helper function, iomap_bio_read_folio_range(). This is needed to make iomap read/readahead more generically usable, especially for filesystems that do not require CONFIG_BLOCK. Additionally rename buffered write's iomap_read_folio_range() function to iomap_bio_read_folio_range_sync() to better describe its synchronous behavior. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 573c14c) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Move the read/readahead bio submission logic into a separate helper. This is needed to make iomap read/readahead more generically usable, especially for filesystems that do not require CONFIG_BLOCK. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Tested-by: syzbot@syzkaller.appspotmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 7588469) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Store the iomap_readpage_ctx bio generically as a "void *read_ctx". This makes the read/readahead interface more generic, which allows it to be used by filesystems that may not be block-based and may not have CONFIG_BLOCK set. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Tested-by: syzbot@syzkaller.appspotmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit d1f9893) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Iterate over all non-uptodate ranges of a folio mapping in a single call to iomap_readpage_iter() instead of leaving the partial iteration to the caller. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit e0e1534) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature ->readpage was deprecated and reads are now on folios. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 8805a9c) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature ->readpage was deprecated and reads are now on folios. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Tested-by: syzbot@syzkaller.appspotmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 87a1381) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Instead of incrementing read_bytes_pending for every folio range read in (which requires acquiring the spinlock to do so), set read_bytes_pending to the folio size when the first range is asynchronously read in, keep track of how many bytes total are asynchronously read in, and adjust read_bytes_pending accordingly after issuing requests to read in all the necessary ranges. iomap_read_folio_ctx->cur_folio_in_bio can be removed since a non-zero value for pending bytes necessarily indicates the folio is in the bio. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit d43558a) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Advance iter to the correct position before calling an IO helper to read in a folio range. This allows the helper to reliably use iter->pos to determine the starting offset for reading. This will simplify the interface for reading in folio ranges when iomap read/readahead supports caller-provided callbacks. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit fb7a10a) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.19-rc1
category: feature
Add caller-provided callbacks for read and readahead so that it can be
used generically, especially by filesystems that are not block-based.
In particular, this:
* Modifies the read and readahead interface to take in a
struct iomap_read_folio_ctx that is publicly defined as:
struct iomap_read_folio_ctx {
const struct iomap_read_ops *ops;
struct folio *cur_folio;
struct readahead_control *rac;
void *read_ctx;
};
where struct iomap_read_ops is defined as:
struct iomap_read_ops {
int (*read_folio_range)(const struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx,
size_t len);
void (*read_submit)(struct iomap_read_folio_ctx *ctx);
};
read_folio_range() reads in the folio range and is required by the
caller to provide. read_submit() is optional and is used for
submitting any pending read requests.
* Modifies existing filesystems that use iomap for read and readahead to
use the new API, through the new statically inlined helpers
iomap_bio_read_folio() and iomap_bio_readahead(). There is no change
in functionality for those filesystems.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit b2f35ac)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Move bio logic in the buffered io code into its own file and remove CONFIG_BLOCK gating for iomap read/readahead. [1] https://lore.kernel.org/linux-fsdevel/aMK2GuumUf93ep99@infradead.org/ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit c2b1adc) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature No errors are propagated in iomap_read_folio(). Change iomap_read_folio() to a void return to make this clearer to callers. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit d4e88bb) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Read folio data into the page cache using iomap. This gives us granular uptodate tracking for large folios, which optimizes how much data needs to be read in. If some portions of the folio are already uptodate (eg through a prior write), we only need to read in the non-uptodate portions. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 03e9618) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Do readahead in fuse using iomap. This gives us granular uptodate tracking for large folios, which optimizes how much data needs to be read in. If some portions of the folio are already uptodate (eg through a prior write), we only need to read in the non-uptodate portions. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 4ea9071) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.19-rc1 category: feature Now that fuse is integrated with iomap for read/readahead, we can remove the workaround that was added in commit bd24d21 ("fuse: fix fuseblk i_blkbits for iomap partial writes"), which was previously needed to avoid a race condition where an iomap partial write may be overwritten by a read if blocksize < PAGE_SIZE. Now that fuse does iomap read/readahead, this is protected against since there is granular uptodate tracking of blocks, which means this workaround can be removed. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Tested-by: syzbot@syzkaller.appspotmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 93570c6) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Reviewer's GuideRefactors iomap buffered read infrastructure to support pluggable read backends and granular uptodate tracking, then ports FUSE buffered reads and readahead to iomap (removing the blkbits workaround) while updating existing iomap users to the new interfaces. Sequence diagram for FUSE readahead using iomapsequenceDiagram
actor Kernel_readahead
participant fuse_readahead
participant iomap_readahead
participant iomap_read_folio_iter
participant fuse_iomap_read_folio_range_async
participant fuse_handle_readahead
participant fuse_send_readpages
Kernel_readahead ->> fuse_readahead: fuse_readahead(rac)
activate fuse_readahead
fuse_readahead ->> iomap_readahead: iomap_readahead(&fuse_iomap_ops, &ctx)
deactivate fuse_readahead
activate iomap_readahead
loop over folios
iomap_readahead ->> iomap_read_folio_iter: iomap_read_folio_iter(iter, ctx, &cur_bytes_pending)
activate iomap_read_folio_iter
iomap_read_folio_iter ->> fuse_iomap_read_folio_range_async: read_folio_range(iter, ctx, len)
activate fuse_iomap_read_folio_range_async
fuse_iomap_read_folio_range_async ->> fuse_handle_readahead: fuse_handle_readahead(folio, rac, data, pos, len)
activate fuse_handle_readahead
alt need_to_send
fuse_handle_readahead ->> fuse_send_readpages: fuse_send_readpages(ia, file, nr_bytes, async)
end
deactivate fuse_handle_readahead
fuse_iomap_read_folio_range_async -->> iomap_read_folio_iter: ret
deactivate fuse_iomap_read_folio_range_async
iomap_read_folio_iter -->> iomap_readahead: iter.status
deactivate iomap_read_folio_iter
end
iomap_readahead ->> fuse_iomap_read_submit: submit_read(ctx)
deactivate iomap_readahead
note over fuse_send_readpages: On reply, fuse_readpages_end() calls
note over fuse_send_readpages: iomap_finish_folio_read(..., err)
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="fs/iomap/buffered-io.c" line_range="580-589" />
<code_context>
* the filesystem to be reentered.
*/
-void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
+void iomap_readahead(const struct iomap_ops *ops,
+ struct iomap_read_folio_ctx *ctx)
{
+ struct readahead_control *rac = ctx->rac;
struct iomap_iter iter = {
.inode = rac->mapping->host,
.pos = readahead_pos(rac),
.len = readahead_length(rac),
};
- struct iomap_readpage_ctx ctx = {
- .rac = rac,
- };
+ size_t cur_bytes_pending;
trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
while (iomap_iter(&iter, ops) > 0)
- iter.status = iomap_readahead_iter(&iter, &ctx);
-
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Initialize cur_bytes_pending before first use to avoid relying on control-flow for correctness.
cur_bytes_pending is declared but never initialized before being passed by reference into iomap_readahead_iter. This currently works only because iomap_readahead_iter sets it before use and relies on ctx->cur_folio being NULL. Initializing cur_bytes_pending to 0 here would remove that control-flow dependency and make the code safer against future changes.
</issue_to_address>
### Comment 2
<location path="Documentation/filesystems/iomap/operations.rst" line_range="225-227" />
<code_context>
+``iomap_read_folio`` must set:
+ * ``ops->read_folio_range()`` and ``cur_folio``
+
+``ops->submit_read()`` and ``read_ctx`` are optional. ``read_ctx`` is used to
+pass in any custom data the caller needs accessible in the ops callbacks for
+fulfilling reads.
+
Buffered Writes
</code_context>
<issue_to_address>
**nitpick (typo):** Slight grammar improvement around "needs accessible".
Rephrase to "any custom data the caller needs to be accessible in the ops callbacks" for smoother grammar.
```suggestion
``ops->submit_read()`` and ``read_ctx`` are optional. ``read_ctx`` is used to
pass in any custom data the caller needs to be accessible in the ops callbacks
for fulfilling reads.
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| void iomap_readahead(const struct iomap_ops *ops, | ||
| struct iomap_read_folio_ctx *ctx) | ||
| { | ||
| struct readahead_control *rac = ctx->rac; | ||
| struct iomap_iter iter = { | ||
| .inode = rac->mapping->host, | ||
| .pos = readahead_pos(rac), | ||
| .len = readahead_length(rac), | ||
| }; | ||
| struct iomap_readpage_ctx ctx = { | ||
| .rac = rac, | ||
| }; | ||
| size_t cur_bytes_pending; |
There was a problem hiding this comment.
suggestion (bug_risk): Initialize cur_bytes_pending before first use to avoid relying on control-flow for correctness.
cur_bytes_pending is declared but never initialized before being passed by reference into iomap_readahead_iter. This currently works only because iomap_readahead_iter sets it before use and relies on ctx->cur_folio being NULL. Initializing cur_bytes_pending to 0 here would remove that control-flow dependency and make the code safer against future changes.
| ``ops->submit_read()`` and ``read_ctx`` are optional. ``read_ctx`` is used to | ||
| pass in any custom data the caller needs accessible in the ops callbacks for | ||
| fulfilling reads. |
There was a problem hiding this comment.
nitpick (typo): Slight grammar improvement around "needs accessible".
Rephrase to "any custom data the caller needs to be accessible in the ops callbacks" for smoother grammar.
| ``ops->submit_read()`` and ``read_ctx`` are optional. ``read_ctx`` is used to | |
| pass in any custom data the caller needs accessible in the ops callbacks for | |
| fulfilling reads. | |
| ``ops->submit_read()`` and ``read_ctx`` are optional. ``read_ctx`` is used to | |
| pass in any custom data the caller needs to be accessible in the ops callbacks | |
| for fulfilling reads. |
There was a problem hiding this comment.
Pull request overview
This PR refactors iomap’s buffered read/readahead path to use a callback-based iomap_read_ops interface (with a shared bio-backed implementation) and switches FUSE buffered reads + readahead to iomap so that large-folio + granular uptodate tracking works safely (and removes the old FUSE blkbits workaround).
Changes:
- Introduce
struct iomap_read_folio_ctx/struct iomap_read_ops, exportiomap_finish_folio_read(), and reworkiomap_read_folio()/iomap_readahead()around range-based read callbacks. - Add
fs/iomap/bio.candiomap_bio_read_opshelpers; update multiple filesystems (XFS/EROFS/GFS2/zonefs/blkdev) to useiomap_bio_read_folio()/iomap_bio_readahead(). - Convert FUSE read_folio + readahead to use the iomap read callbacks; remove
fc->blkbitsworkaround in favor of superblock blocksize bits.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| include/linux/iomap.h | New iomap read ctx/ops API; iomap_iter_advance signature change; new exported helpers/wrappers. |
| fs/iomap/buffered-io.c | Core refactor: range-based read iteration, pending-read tracking, exports iomap_finish_folio_read. |
| fs/iomap/bio.c | New shared bio-based read/readahead helper implementing iomap_read_ops. |
| fs/iomap/internal.h | Exposes iomap_bio_read_folio_range_sync() for buffered write read-before-write logic. |
| fs/iomap/Makefile | Adds bio.o under CONFIG_BLOCK. |
| fs/iomap/iter.c | Updates iomap_iter_advance() to take a raw count. |
| fs/iomap/seek.c | Updates callers for new iomap_iter_advance() signature. |
| fs/iomap/direct-io.c | Updates callers for new iomap_iter_advance() signature. |
| fs/dax.c | Updates callers/loops for new iomap_iter_advance() signature semantics. |
| fs/fuse/file.c | Switches FUSE read_folio + readahead to iomap read callbacks; batching changes for async readahead. |
| fs/fuse/inode.c | Removes blkbits workaround; uses superblock blocksize bits when attr blksize absent. |
| fs/fuse/dir.c | Uses superblock blocksize bits when attr blksize absent. |
| fs/fuse/fuse_i.h | Removes struct fuse_conn::blkbits. |
| fs/xfs/xfs_aops.c | Uses iomap_bio_read_folio / iomap_bio_readahead. |
| fs/erofs/data.c | Uses iomap_bio_read_folio / iomap_bio_readahead. |
| fs/gfs2/aops.c | Uses iomap_bio_read_folio / iomap_bio_readahead. |
| fs/zonefs/file.c | Uses iomap_bio_read_folio / iomap_bio_readahead. |
| block/fops.c | Uses iomap_bio_read_folio / iomap_bio_readahead for blkdev aops. |
| Documentation/filesystems/iomap/operations.rst | Documents new iomap read ctx/ops interface. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file, | ||
| unsigned int count) | ||
| unsigned int count, bool async) | ||
| { | ||
| struct fuse_file *ff = file->private_data; | ||
| struct fuse_mount *fm = ff->fm; |
| fuse_invalidate_atime(inode); | ||
|
|
||
| for (i = 0; i < ap->num_folios; i++) { | ||
| folio_end_read(ap->folios[i], !err); | ||
| iomap_finish_folio_read(ap->folios[i], ap->descs[i].offset, | ||
| ap->descs[i].length, err); |
| nr_pages = min(fc->max_pages, readahead_count(rac)); | ||
| data->ia = fuse_io_alloc(NULL, nr_pages); | ||
| if (!data->ia) | ||
| return -ENOMEM; |
Link: https://lore.kernel.org/all/20250926002609.1302233-1-joannelkoong@gmail.com/
This series adds fuse iomap support for buffered reads and readahead.
This is needed so that granular uptodate tracking can be used in fuse when
large folios are enabled so that only the non-uptodate portions of the folio
need to be read in instead of having to read in the entire folio. It also is
needed in order to turn on large folios for servers that use the writeback
cache since otherwise there is a race condition that may lead to data
corruption if there is a partial write, then a read and the read happens
before the write has undergone writeback, since otherwise the folio will not
be marked uptodate from the partial write so the read will read in the entire
folio from disk, which will overwrite the partial write.
This is on top of two locally-patched iomap patches [1] [2] patched on top of
commit f1c864be6e88 ("Merge branch 'vfs-6.18.async' into vfs.all") in
Christian's vfs.all tree.
This series was run through fstests on fuse passthrough_hp with an
out-of kernel patch enabling fuse large folios.
This patchset does not enable large folios on fuse yet. That will be part
of a different patchset.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/20250919214250.4144807-1-joannelkoong@gmail.com/
[2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
Changelog
v4:
https://lore.kernel.org/linux-fsdevel/20250923002353.2961514-1-joannelkoong@gmail.com/
v4 -> v5:
was suggested by Darrick and improves both the performance and the interface
v3:
https://lore.kernel.org/linux-fsdevel/20250916234425.1274735-1-joannelkoong@gmail.com/
v3 -> v4:
v2:
https://lore.kernel.org/linux-fsdevel/20250908185122.3199171-1-joannelkoong@gmail.com/
v2 -> v3:
v1:
https://lore.kernel.org/linux-fsdevel/20250829235627.4053234-1-joannelkoong@gmail.com/
v1 -> v2:
ctx->private instead (Darrick & Christoph)
(Christoph)
has been merged into Christian's
Joanne Koong (14):
iomap: move bio read logic into helper function
iomap: move read/readahead bio submission logic into helper function
iomap: store read/readahead bio generically
iomap: iterate over folio mapping in iomap_readpage_iter()
iomap: rename iomap_readpage_iter() to iomap_read_folio_iter()
iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx
iomap: track pending read bytes more optimally
iomap: set accurate iter->pos when reading folio ranges
iomap: add caller-provided callbacks for read and readahead
iomap: move buffered io bio logic into new file
iomap: make iomap_read_folio() a void return
fuse: use iomap for read_folio
fuse: use iomap for readahead
fuse: remove fc->blkbits workaround for partial writes
.../filesystems/iomap/operations.rst | 44 +++
block/fops.c | 5 +-
fs/erofs/data.c | 5 +-
fs/fuse/dir.c | 2 +-
fs/fuse/file.c | 288 +++++++++++-------
fs/fuse/fuse_i.h | 8 -
fs/fuse/inode.c | 13 +-
fs/gfs2/aops.c | 6 +-
fs/iomap/Makefile | 3 +-
fs/iomap/bio.c | 88 ++++++
fs/iomap/buffered-io.c | 246 +++++++--------
fs/iomap/internal.h | 12 +
fs/xfs/xfs_aops.c | 5 +-
fs/zonefs/file.c | 5 +-
include/linux/iomap.h | 63 +++-
15 files changed, 505 insertions(+), 288 deletions(-)
create mode 100644 fs/iomap/bio.c
--
2.47.3
Summary by Sourcery
Adopt iomap-based buffered read and readahead for FUSE and refactor iomap buffered read paths to support granular folio uptodate tracking while removing the legacy FUSE blkbits workaround.
New Features:
Enhancements: