Skip to content

Commit

Permalink
mm, page_alloc: avoid expensive reclaim when compaction may not succeed
Browse files Browse the repository at this point in the history
Memory compaction has a couple significant drawbacks as the allocation
order increases, specifically:

 - isolate_freepages() is responsible for finding free pages to use as
   migration targets and is implemented as a linear scan of memory
   starting at the end of a zone,

 - failing order-0 watermark checks in memory compaction does not account
   for how far below the watermarks the zone actually is: to enable
   migration, there must be *some* free memory available.  Per the above,
   watermarks are not always suffficient if isolate_freepages() cannot
   find the free memory but it could require hundreds of MBs of reclaim to
   even reach this threshold (read: potentially very expensive reclaim with
   no indication compaction can be successful), and

 - if compaction at this order has failed recently so that it does not even
   run as a result of deferred compaction, looping through reclaim can often
   be pointless.

For hugepage allocations, these are quite substantial drawbacks because
these are very high order allocations (order-9 on x86) and falling back to
doing reclaim can potentially be *very* expensive without any indication
that compaction would even be successful.

Reclaim itself is unlikely to free entire pageblocks and certainly no
reliance should be put on it to do so in isolation (recall lumpy reclaim).
This means we should avoid reclaim and simply fail hugepage allocation if
compaction is deferred.

It is also not helpful to thrash a zone by doing excessive reclaim if
compaction may not be able to access that memory.  If order-0 watermarks
fail and the allocation order is sufficiently large, it is likely better
to fail the allocation rather than thrashing the zone.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  • Loading branch information
rientjes authored and ananjaser1211 committed Aug 31, 2024
1 parent e69f300 commit d3b1281
Showing 1 changed file with 22 additions and 0 deletions.
22 changes: 22 additions & 0 deletions mm/page_alloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -3706,6 +3706,28 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;

if (order >= pageblock_order && (gfp_mask & __GFP_IO)) {
/*
* If allocating entire pageblock(s) and compaction
* failed because all zones are below low watermarks
* or is prohibited because it recently failed at this
* order, fail immediately.
*
* Reclaim is
* - potentially very expensive because zones are far
* below their low watermarks or this is part of very
* bursty high order allocations,
* - not guaranteed to help because isolate_freepages()
* may not iterate over freed pages as part of its
* linear scan, and
* - unlikely to make entire pageblocks free on its
* own.
*/
if (compact_result == COMPACT_SKIPPED ||
compact_result == COMPACT_DEFERRED)
goto nopage;
}

/*
* Checks for costly allocations with __GFP_NORETRY, which
* includes THP page fault allocations
Expand Down

0 comments on commit d3b1281

Please sign in to comment.