summaryrefslogtreecommitdiff
path: root/mm/filemap.c
AgeCommit message (Collapse)Author
2021-09-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <[email protected]>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...
2021-09-03mm: remove irqsave/restore locking from contexts with irqs enabledJohannes Weiner
The page cache deletion paths all have interrupts enabled, so no need to use irqsafe/irqrestore locking variants. They used to have irqs disabled by the memcg lock added in commit c4843a7593a9 ("memcg: add per cgroup dirty page accounting"), but that has since been replaced by memcg taking the page lock instead, commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge AP"). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-08-31Merge tag 'for-5.15-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The highlights of this round are integrations with fs-verity and idmapped mounts, the rest is usual mix of minor improvements, speedups and cleanups. There are some patches outside of btrfs, namely updating some VFS interfaces, all straightforward and acked. Features: - fs-verity support, using standard ioctls, backward compatible with read-only limitation on inodes with previously enabled fs-verity - idmapped mount support - make mount with rescue=ibadroots more tolerant to partially damaged trees - allow raid0 on a single device and raid10 on two devices, degenerate cases but might be useful as an intermediate step during conversion to other profiles - zoned mode block group auto reclaim can be disabled via sysfs knob Performance improvements: - continue readahead of node siblings even if target node is in memory, could speed up full send (on sample test +11%) - batching of delayed items can speed up creating many files - fsync/tree-log speedups - avoid unnecessary work (gains +2% throughput, -2% run time on sample load) - reduced lock contention on renames (on dbench +4% throughput, up to -30% latency) Fixes: - various zoned mode fixes - preemptive flushing threshold tuning, avoid excessive work on almost full filesystems Core: - continued subpage support, preparation for implementing remaining features like compression and defragmentation; with some limitations, write is now enabled on 64K page systems with 4K sectors, still considered experimental - no readahead on compressed reads - inline extents disabled - disabled raid56 profile conversion and mount - improved flushing logic, fixing early ENOSPC on some workloads - inode flags have been internally split to read-only and read-write incompat bit parts, used by fs-verity - new tree items for fs-verity - descriptor item - Merkle tree item - inode operations extended to be namespace-aware - cleanups and refactoring Generic code changes: - fs: new export filemap_fdatawrite_wbc - fs: removed sync_inode - block: bio_trim argument type fixups - vfs: add namespace-aware lookup" * tag 'for-5.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (114 commits) btrfs: reset replace target device to allocation state on close btrfs: zoned: fix ordered extent boundary calculation btrfs: do not do preemptive flushing if the majority is global rsv btrfs: reduce the preemptive flushing threshold to 90% btrfs: tree-log: check btrfs_lookup_data_extent return value btrfs: avoid unnecessarily logging directories that had no changes btrfs: allow idmapped mount btrfs: handle ACLs on idmapped mounts btrfs: allow idmapped INO_LOOKUP_USER ioctl btrfs: allow idmapped SUBVOL_SETFLAGS ioctl btrfs: allow idmapped SET_RECEIVED_SUBVOL ioctls btrfs: relax restrictions for SNAP_DESTROY_V2 with subvolids btrfs: allow idmapped SNAP_DESTROY ioctls btrfs: allow idmapped SNAP_CREATE/SUBVOL_CREATE ioctls btrfs: check whether fsgid/fsuid are mapped during subvolume creation btrfs: allow idmapped permission inode op btrfs: allow idmapped setattr inode op btrfs: allow idmapped tmpfile inode op btrfs: allow idmapped symlink inode op btrfs: allow idmapped mkdir inode op ...
2021-08-23fs: add a filemap_fdatawrite_wbc helperJosef Bacik
Btrfs sometimes needs to flush dirty pages on a bunch of dirty inodes in order to reclaim metadata reservations. Unfortunately most helpers in this area are too smart for us: 1) The normal filemap_fdata* helpers only take range and sync modes, and don't give any indication of how much was written, so we can only flush full inodes, which isn't what we want in most cases. 2) The normal writeback path requires us to have the s_umount sem held, but we can't unconditionally take it in this path because we could deadlock. 3) The normal writeback path also skips inodes with I_SYNC set if we write with WB_SYNC_NONE. This isn't the behavior we want under heavy ENOSPC pressure, we want to actually make sure the pages are under writeback before returning, and if another thread is in the middle of writing the file we may return before they're under writeback and miss our ordered extents and not properly wait for completion. 4) sync_inode() uses the normal writeback path and has the same problem as #3. What we really want is to call do_writepages() with our wbc. This way we can make sure that writeback is actually started on the pages, and we can control how many pages are written as a whole as we write many inodes using the same wbc. Accomplish this with a new helper that does just that so we can use it for our ENOSPC flushing infrastructure. Reviewed-by: Nikolay Borisov <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-07-13mm: Add functions to lock invalidate_lock for two mappingsJan Kara
Some operations such as reflinking blocks among files will need to lock invalidate_lock for two mappings. Add helper functions to do that. Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2021-07-13mm: Protect operations adding pages to page cache with invalidate_lockJan Kara
Currently, serializing operations such as page fault, read, or readahead against hole punching is rather difficult. The basic race scheme is like: fallocate(FALLOC_FL_PUNCH_HOLE) read / fault / .. truncate_inode_pages_range() <create pages in page cache here> <update fs block mapping and free blocks> Now the problem is in this way read / page fault / readahead can instantiate pages in page cache with potentially stale data (if blocks get quickly reused). Avoiding this race is not simple - page locks do not work because we want to make sure there are *no* pages in given range. inode->i_rwsem does not work because page fault happens under mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes the performance for mixed read-write workloads suffer. So create a new rw_semaphore in the address_space - invalidate_lock - that protects adding of pages to page cache for page faults / reads / readahead. Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2021-07-12mm: Fix comments mentioning i_mutexJan Kara
inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix comments still mentioning i_mutex. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2021-07-03Merge branch 'work.iov_iter' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter cleanups and fixes. There are followups, but this is what had sat in -next this cycle. IMO the macro forest in there became much thinner and easier to follow..." * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller clean up copy_mc_pipe_to_iter() pipe_zero(): we don't need no stinkin' kmap_atomic()... iov_iter: clean csum_and_copy_...() primitives up a bit copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases iterate_xarray(): only of the first iteration we might get offset != 0 pull handling of ->iov_offset into iterate_{iovec,bvec,xarray} iov_iter: make iterator callbacks use base and len instead of iovec iov_iter: make the amount already copied available to iterator callbacks iov_iter: get rid of separate bvec and xarray callbacks iov_iter: teach iterate_{bvec,xarray}() about possible short copies iterate_bvec(): expand bvec.h macro forest, massage a bit iov_iter: unify iterate_iovec and iterate_kvec iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec iterate_and_advance(): get rid of magic in case when n is 0 csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter() iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant [xarray] iov_iter_npages(): just use DIV_ROUND_UP() iov_iter_npages(): don't bother with iterate_all_kinds() ...
2021-06-29mm: charge active memcg when no mm is setDan Schatzberg
set_active_memcg() worked for kernel allocations but was silently ignored for user pages. This patch establishes a precedence order for who gets charged: 1. If there is a memcg associated with the page already, that memcg is charged. This happens during swapin. 2. If an explicit mm is passed, mm->memcg is charged. This happens during page faults, which can be triggered in remote VMs (eg gup). 3. Otherwise consult the current process context. If there is an active_memcg, use that. Otherwise, current->mm->memcg. Previously, if a NULL mm was passed to mem_cgroup_charge (case 3) it would always charge the root cgroup. Now it looks up the active_memcg first (falling back to charging the root cgroup if not set). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dan Schatzberg <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: Chris Down <[email protected]> Acked-by: Jens Axboe <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Reviewed-by: Michal Koutný <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Ming Lei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-10iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing ↵Al Viro
variant Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the callers do *not* need to do iov_iter_advance() after it. In case when they end up consuming less than they'd been given they need to do iov_iter_revert() on everything they had not consumed. That, however, needs to be done only on slow paths. All in-tree callers converted. And that kills the last user of iterate_all_kinds() Signed-off-by: Al Viro <[email protected]>
2021-06-02generic_perform_write()/iomap_write_actor(): saner logics for short copyAl Viro
if we run into a short copy and ->write_end() refuses to advance at all, use the amount we'd managed to copy for the next iteration to handle. Signed-off-by: Al Viro <[email protected]>
2021-05-07mm: fix typos in commentsIngo Molnar
Fix ~94 single-word typos in locking code comments, plus a few very obvious grammar mistakes. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Cc: Bhaskar Chowdhury <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/mempool: minor coding style tweaksZhiyuan Dai
Various coding style tweaks to various files under mm/ [[email protected]: mm/swapfile: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/sparse: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/vmscan: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/compaction: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/oom_kill: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/shmem: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/page_alloc: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/filemap: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/mlock: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/frontswap: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/vmalloc: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/memory_hotplug: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/mempolicy: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhiyuan Dai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05dax: account DAX entries as nrpagesMatthew Wilcox (Oracle)
Simplify mapping_needs_writeback() by accounting DAX entries as pages instead of exceptional entries. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Vishal Verma <[email protected]> Acked-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm: stop accounting shadow entriesMatthew Wilcox (Oracle)
We no longer need to keep track of how many shadow entries are present in a mapping. This saves a few writes to the inode and memory barriers. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Vishal Verma <[email protected]> Acked-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-30mm/filemap: update stale commentRui Sun
Commit a6de4b4873e1 ("mm: convert find_get_entry to return the head page") uses @index instead of @offset, but the comment is stale, update it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rui Sun <[email protected]> Signed-off-by: Shaokun Zhang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-30mm/filemap: drop check for truncated page after I/OMatthew Wilcox (Oracle)
If the I/O completed successfully, the page will remain Uptodate, even if it is subsequently truncated. If the I/O completed with an error, this check would cause us to retry the I/O if the page were truncated before we woke up. There is no need to retry the I/O; the I/O to fill the page failed, so we can legitimately just return -EIO. This code was originally added by commit 56f0d5fe6851 ("[PATCH] readpage-vs-invalidate fix") in 2005 (this commit ID is from the linux-fullhistory tree; it is also commit ba1f08f14b52 in tglx-history). At the time, truncate_complete_page() called ClearPageUptodate(), and so this was fixing a real bug. In 2008, commit 84209e02de48 ("mm: dont clear PG_uptodate on truncate/invalidate") removed the call to ClearPageUptodate, and this check has been unnecessary ever since. It doesn't do any real harm, but there's no need to keep it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-30mm/filemap: use filemap_read_page in filemap_faultMatthew Wilcox (Oracle)
After splitting generic_file_buffered_read() into smaller parts, it turns out we can reuse one of the parts in filemap_fault(). This fixes an oversight -- waiting for the I/O to complete is now interruptible by a fatal signal. And it saves us a few bytes of text in an unlikely path. $ ./scripts/bloat-o-meter before.o after.o add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-207 (-207) Function old new delta filemap_fault 2187 1980 -207 Total: Before=37491, After=37284, chg -0.55% Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-30mm: use filemap_range_needs_writeback() for O_DIRECT readsJens Axboe
For the generic page cache read helper, use the better variant of checking for the need to call filemap_write_and_wait_range() when doing O_DIRECT reads. This avoids falling back to the slow path for IOCB_NOWAIT, if there are no pages to wait for (or write out). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-30mm: provide filemap_range_needs_writeback() helperJens Axboe
Patch series "Improve IOCB_NOWAIT O_DIRECT reads", v3. An internal workload complained because it was using too much CPU, and when I took a look, we had a lot of io_uring workers going to town. For an async buffered read like workload, I am normally expecting _zero_ offloads to a worker thread, but this one had tons of them. I'd drop caches and things would look good again, but then a minute later we'd regress back to using workers. Turns out that every minute something was reading parts of the device, which would add page cache for that inode. I put patches like these in for our kernel, and the problem was solved. Don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache entries for the given range. This causes unnecessary work from the callers side, when the IO could have been issued totally fine without blocking on writeback when there is none. This patch (of 3): For O_DIRECT reads/writes, we check if we need to issue a call to filemap_write_and_wait_range() to issue and/or wait for writeback for any page in the given range. The existing mechanism just checks for a page in the range, which is suboptimal for IOCB_NOWAIT as we'll fallback to the slow path (and needing retry) if there's just a clean page cache page in the range. Provide filemap_range_needs_writeback() which tries a little harder to check if we actually need to issue and/or wait for writeback in the range. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-27Merge tag 'netfs-lib-20210426' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull network filesystem helper library updates from David Howells: "Here's a set of patches for 5.13 to begin the process of overhauling the local caching API for network filesystems. This set consists of two parts: (1) Add a helper library to handle the new VM readahead interface. This is intended to be used unconditionally by the filesystem (whether or not caching is enabled) and provides a common framework for doing caching, transparent huge pages and, in the future, possibly fscrypt and read bandwidth maximisation. It also allows the netfs and the cache to align, expand and slice up a read request from the VM in various ways; the netfs need only provide a function to read a stretch of data to the pagecache and the helper takes care of the rest. (2) Add an alternative fscache/cachfiles I/O API that uses the kiocb facility to do async DIO to transfer data to/from the netfs's pages, rather than using readpage with wait queue snooping on one side and vfs_write() on the other. It also uses less memory, since it doesn't do buffered I/O on the backing file. Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available to be read from the cache. Whilst this is an improvement from the bmap interface, it still has a problem with regard to a modern extent-based filesystem inserting or removing bridging blocks of zeros. Fixing that requires a much greater overhaul. This is a step towards overhauling the fscache API. The change is opt-in on the part of the network filesystem. A netfs should not try to mix the old and the new API because of conflicting ways of handling pages and the PG_fscache page flag and because it would be mixing DIO with buffered I/O. Further, the helper library can't be used with the old API. This does not change any of the fscache cookie handling APIs or the way invalidation is done at this time. In the near term, I intend to deprecate and remove the old I/O API (fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(), fscache_write_page() and fscache_uncache_page()) and eventually replace most of fscache/cachefiles with something simpler and easier to follow. This patchset contains the following parts: - Some helper patches, including provision of an ITER_XARRAY iov iterator and a function to do readahead expansion. - Patches to add the netfs helper library. - A patch to add the fscache/cachefiles kiocb API. - A pair of patches to fix some review issues in the ITER_XARRAY and read helpers as spotted by Al and Willy. Jeff Layton has patches to add support in Ceph for this that he intends for this merge window. I have a set of patches to support AFS that I will post a separate pull request for. With this, AFS without a cache passes all expected xfstests; with a cache, there's an extra failure, but that's also there before these patches. Fixing that probably requires a greater overhaul. Ceph also passes the expected tests. I also have patches in a separate branch to tidy up the handling of PG_fscache/PG_private_2 and their contribution to page refcounting in the core kernel here, but I haven't included them in this set and will route them separately" Link: https://lore.kernel.org/lkml/[email protected]/ * tag 'netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: netfs: Miscellaneous fixes iov_iter: Four fixes for ITER_XARRAY fscache, cachefiles: Add alternate API to use kiocb for read/write to cache netfs: Add a tracepoint to log failures that would be otherwise unseen netfs: Define an interface to talk to a cache netfs: Add write_begin helper netfs: Gather stats netfs: Add tracepoints netfs: Provide readahead and readpage netfs helpers netfs, mm: Add set/end/wait_on_page_fscache() aliases netfs, mm: Move PG_fscache helper funcs to linux/netfs.h netfs: Documentation for helper library netfs: Make a netfs helper module mm: Implement readahead_control pageset expansion mm/readahead: Handle ractl nr_pages being modified fs: Document file_ra_state mm/filemap: Pass the file_ra_state in the ractl mm: Add set/end/wait functions for PG_private_2 iov_iter: Add ITER_XARRAY
2021-04-23mm/filemap: fix mapping_seek_hole_data on THP & 32-bitHugh Dickins
No problem on 64-bit, or without huge pages, but xfstests generic/285 and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs, and on 32-bit architectures, with the new mapping_seek_hole_data(). Several different bugs turned out to need fixing. u64 cast to stop losing bits when converting unsigned long to loff_t (and let's use shifts throughout, rather than mixed with * and /). Use round_up() when advancing pos, to stop assuming that pos was already THP-aligned when advancing it by THP-size. (This use of round_up() assumes that any THP has THP-aligned index: true at present and true going forward, but could be recoded to avoid the assumption.) Use xas_set() when iterating away from a THP, so that xa_index stays in synch with start, instead of drifting away to return bogus offset. Check start against end to avoid wrapping 32-bit xa_index to 0 (and to handle these additional cases, seek_data or not, it's easier to break the loop than goto: so rearrange exit from the function). [[email protected]: remove unneeded u64 casts, per Matthew] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 41139aa4c3a3 ("mm/filemap: add mapping_seek_hole_data") Signed-off-by: Hugh Dickins <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: William Kucharski <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-23mm/filemap: fix find_lock_entries hang on 32-bit THPHugh Dickins
No problem on 64-bit, or without huge pages, but xfstests generic/308 hung uninterruptibly on 32-bit huge tmpfs. Since commit 0cc3b0ec23ce ("Clarify (and fix) in 4.13 MAX_LFS_FILESIZE macros"), MAX_LFS_FILESIZE is only a PAGE_SIZE away from wrapping 32-bit xa_index to 0, so the new find_lock_entries() has to be extra careful when handling a THP. Link: https://lkml.kernel.org/r/[email protected] Fixes: 5c211ba29deb ("mm: add and use find_lock_entries") Signed-off-by: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: William Kucharski <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-23mm/filemap: Pass the file_ra_state in the ractlMatthew Wilcox (Oracle)
For readahead_expand(), we need to modify the file ra_state, so pass it down by adding it to the ractl. We have to do this because it's not always the same as f_ra in the struct file that is already being passed. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: David Howells <[email protected]> Tested-by: Jeff Layton <[email protected]> Tested-by: Dave Wysochanski <[email protected]> Tested-By: Marc Dionne <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ Link: https://lore.kernel.org/r/161789067431.6155.8063840447229665720.stgit@warthog.procyon.org.uk/ # v6
2021-04-23mm: Add set/end/wait functions for PG_private_2David Howells
Add three functions to manipulate PG_private_2: (*) set_page_private_2() - Set the flag and take an appropriate reference on the flagged page. (*) end_page_private_2() - Clear the flag, drop the reference and wake up any waiters, somewhat analogously with end_page_writeback(). (*) wait_on_page_private_2() - Wait for the flag to be cleared. Wrappers will need to be placed in the netfs lib header in the patch that adds that. [This implements a suggestion by Linus[1] to not mix the terminology of PG_private_2 and PG_fscache in the mm core function] Changes: v7: - Use compound_head() in all the functions to make them THP safe[6]. v5: - Add set and end functions, calling the end function end rather than unlock[3]. - Keep a ref on the page when PG_private_2 is set[4][5]. v4: - Remove extern from the declaration[2]. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Jeff Layton <[email protected]> Tested-by: Dave Wysochanski <[email protected]> Tested-By: Marc Dionne <[email protected]> cc: Alexander Viro <[email protected]> cc: Christoph Hellwig <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ # v1 Link: https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/[email protected]/ [2] Link: https://lore.kernel.org/r/161340387944.1303470.7944159520278177652.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539528910.286939.1252328699383291173.stgit@warthog.procyon.org.uk # v4 Link: https://lore.kernel.org/r/[email protected] [3] Link: https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pUm3ww@mail.gmail.com/ [4] Link: https://lore.kernel.org/r/CAHk-=wjSGsRj7xwhSMQ6dAQiz53xA39pOG+XA_WeTgwBBu4uqg@mail.gmail.com/ [5] Link: https://lore.kernel.org/r/[email protected]/ [6] Link: https://lore.kernel.org/r/161653788200.2770958.9517755716374927208.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789066013.6155.9816857201817288382.stgit@warthog.procyon.org.uk/ # v6
2021-02-26mm: pass pvec directly to find_get_entriesMatthew Wilcox (Oracle)
All callers of find_get_entries() use a pvec, so pass it directly instead of manipulating it in the caller. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: add an 'end' parameter to find_get_entriesMatthew Wilcox (Oracle)
This simplifies the callers and leads to a more efficient implementation since the XArray has this functionality already. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: add and use find_lock_entriesMatthew Wilcox (Oracle)
We have three functions (shmem_undo_range(), truncate_inode_pages_range() and invalidate_mapping_pages()) which want exactly this function, so add it to filemap.c. Before this patch, shmem_undo_range() would split any compound page which overlaps either end of the range being punched in both the first and second loops through the address space. After this patch, that functionality is left for the second loop, which is arguably more appropriate since the first loop is supposed to run through all the pages quickly, and splitting a page can sleep. [[email protected]: add assertion] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26iomap: use mapping_seek_hole_dataMatthew Wilcox (Oracle)
Enhance mapping_seek_hole_data() to handle partially uptodate pages and convert the iomap seek code to call it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: William Kucharski <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/filemap: add mapping_seek_hole_dataMatthew Wilcox (Oracle)
Rewrite shmem_seek_hole_data() and move it to filemap.c. [[email protected]: don't put an xa_is_value() page] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/filemap: add helper for finding pagesMatthew Wilcox (Oracle)
There is a lot of common code in find_get_entries(), find_get_pages_range() and find_get_pages_range_tag(). Factor out find_get_entry() which simplifies all three functions. [[email protected]: remove VM_BUG_ON_PAGE()] Link: https://lkml.kernel.org/r/[email protected]: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/filemap: rename find_get_entry to mapping_get_entryMatthew Wilcox (Oracle)
find_get_entry doesn't "find" anything. It returns the entry at a particular index. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: William Kucharski <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: add FGP_ENTRYMatthew Wilcox (Oracle)
The functionality of find_lock_entry() and find_get_entry() can be provided by pagecache_get_page(), which lets us delete find_lock_entry() and make find_get_entry() static. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: William Kucharski <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: make pagecache tagged lookups return only head pagesMatthew Wilcox (Oracle)
Patch series "Overhaul multi-page lookups for THP", v4. This THP prep patchset changes several page cache iteration APIs to only return head pages. - It's only possible to tag head pages in the page cache, so only return head pages, not all their subpages. - Factor a lot of common code out of the various batch lookup routines - Add mapping_seek_hole_data() - Unify find_get_entries() and pagevec_lookup_entries() - Make find_get_entries only return head pages, like find_get_entry(). These are only loosely connected, but they seem to make sense together as a series. This patch (of 14): Pagecache tags are used for dirty page writeback. Since dirtiness is tracked on a per-THP basis, we only want to return the head page rather than each subpage of a tagged page. All the filesystems which use huge pages today are in-memory, so there are no tagged huge pages today. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Yang Shi <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm: memcontrol: convert NR_SHMEM_THPS account to pagesMuchun Song
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_SHMEM_THPS account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Feng Tang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: NeilBrown <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Rafael. J. Wysocki <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm: memcontrol: convert NR_FILE_THPS account to pagesMuchun Song
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with if hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_FILE_THPS account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Feng Tang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: NeilBrown <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Rafael. J. Wysocki <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: simplify generic_file_read_iterChristoph Hellwig
Avoid the pointless goto out just for returning retval. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: rename generic_file_buffered_read to filemap_readChristoph Hellwig
Rename generic_file_buffered_read to match the naming of filemap_fault, also update the written parameter to a more descriptive name and improve the kerneldoc comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: don't relock the page after calling readpageMatthew Wilcox (Oracle)
We don't need to get the page lock again; we just need to wait for the I/O to finish, so use wait_on_page_locked_killable() like the other callers of ->readpage. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: restructure filemap_get_pagesMatthew Wilcox (Oracle)
Remove the got_pages label, remove indentation, rename find_page to retry, simplify error handling. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: split filemap_readahead out of filemap_get_pagesMatthew Wilcox (Oracle)
This simplifies the error handling. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: add filemap_range_uptodateMatthew Wilcox (Oracle)
Move the complicated condition and the calculations out of filemap_update_page() into its own function. [[email protected]: unlock page before dropping its refcount] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: move the iocb checks into filemap_update_pageMatthew Wilcox (Oracle)
We don't need to give up when a non-blocking request sees a !Uptodate page. We may be able to satisfy the read from a partially-uptodate page. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: convert filemap_update_page to return an errnoMatthew Wilcox (Oracle)
Use AOP_TRUNCATED_PAGE to indicate that no error occurred, but the page we looked up is no longer valid. In this case, the reference to the page will have been removed; if we hit any other error, the caller will release the reference. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: change filemap_create_page calling conventionsMatthew Wilcox (Oracle)
By moving the iocb flag checks to the caller, we can pass the file and the page index instead of the iocb. It never needed the iter. By passing the pagevec, we can return an errno (or AOP_TRUNCATED_PAGE) instead of an ERR_PTR. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: change filemap_read_page calling conventionsMatthew Wilcox (Oracle)
Make this function more generic by passing the file instead of the iocb. Check in the callers whether we should call readpage or not. Also make it return an errno / 0 / AOP_TRUNCATED_PAGE, and make calling put_page() the caller's responsibility. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: don't call ->readpage if IOCB_WAITQ is setMatthew Wilcox (Oracle)
The readpage operation can block in many (most?) filesystems, so we should punt to a work queue instead of calling it. This was the last caller of lock_page_for_iocb(), so remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: inline __wait_on_page_locked_async into callerMatthew Wilcox (Oracle)
The previous patch removed wait_on_page_locked_async(), so inline __wait_on_page_locked_async into __lock_page_async(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: support readpage splitting a pageMatthew Wilcox (Oracle)
For page splitting to succeed, the thread asking to split the page has to be the only one with a reference to the page. Calling wait_on_page_locked() while holding a reference to the page will effectively prevent this from happening with sufficient threads waiting on the same page. Use put_and_wait_on_page_locked() to sleep without holding a reference to the page, then retry the page lookup after the page is unlocked. Since we now get the page lock a little earlier in filemap_update_page(), we can eliminate a number of duplicate checks. The original intent (commit ebded02788b5 ("avoid unnecessary calls to lock_page when waiting for IO to complete during a read")) behind getting the page lock later was to avoid re-locking the page after it has been brought uptodate by another thread. We still avoid that because we go through the normal lookup path again after the winning thread has brought the page uptodate. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/filemap: pass a sleep state to put_and_wait_on_page_lockedMatthew Wilcox (Oracle)
This is prep work for the next patch, but I think at least one of the current callers would prefer a killable sleep to an uninterruptible one. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>