Local Aware IBM by danieljvickers · Pull Request #1378 · MFlowCode/MFC

danieljvickers · 2026-04-23T20:06:32Z

Description

The current code has an issue scaling much past 10k particles due to limitations in the MPIAllReduceSum in the IB force computation. This PR attempts to alleviate this by limiting the number of IBs any given rank can be aware of to its neighbors. This turns the AllReduce compute to a MPI neighbor computation, removing the communication bottlneck. To support this, a massive overhaul of IB ownership between ranks was required.

Type of change

Refactor

Testing

TBD

Checklist

I added or updated tests for new behavior

AI code reviews

Reviews are not triggered automatically. To request a review, comment on the PR:

@coderabbitai review — incremental review (new changes only)
@coderabbitai full review — full review from scratch
/review — Qodo review
/improve — Qodo code suggestions
@claude full review — Claude full review (also triggers on PR open/reopen/ready)
Add label claude-full-review — Claude full review via label

… boundaries

… are only locally aware

… intended

github-actions · 2026-04-23T20:10:39Z

Claude Code Review

Head SHA: ee0bc0c

Files changed:

8
docs/documentation/case.md
src/common/m_constants.fpp
src/common/m_derived_types.fpp
src/simulation/m_global_parameters.fpp
src/simulation/m_ib_patches.fpp
src/simulation/m_ibm.fpp
src/simulation/m_start_up.fpp
src/simulation/m_time_steppers.fpp

Findings:

[Correctness — Out-of-bounds array access in s_reduce_ib_patch_array]
src/simulation/m_start_up.fpp

patch_ib_gbl is a fixed-size local array of size num_ib_patches_max = 50000 (from m_constants.fpp). In the 3D branch, num_aware_ibs = num_local_ibs_max * 27 = 2000 * 27 = 54000. Both the num_procs == 1 MPI path and the non-MPI #else path then execute:

patch_ib(:) = patch_ib_gbl(1:num_aware_ibs)   ! 1:54000 from a 50000-element array

Accessing elements 50001–54000 of patch_ib_gbl is an out-of-bounds array access for any 3D simulation. The 2D case (num_aware_ibs = 18000 < 50000) is unaffected. Fix: either increase patch_ib_gbl's declared size to accommodate the 3D case (e.g. max(num_ib_patches_max, num_local_ibs_max*27)), or clamp the right-hand-side slice to min(num_aware_ibs, num_ib_patches_max) (with acknowledgement that patches beyond index 50000 would be dropped).

[MPI Correctness — Global allreduce replaced by 2-hop neighborhood reduce]
src/simulation/m_ibm.fpp

The previous s_mpi_allreduce_vectors_sum(forces, ..., num_ibs, 3) (a global collective) is replaced by s_communicate_ib_forces, which propagates force/torque contributions at most 2 hops in each Cartesian direction. This is only correct if every processor that contributed ghost-point force calculations for a given IB is within 2 MPI ranks of the owning rank along every axis. An IB whose ghost-point stencil spans more than 2 subdomains in any direction (possible with fine MPI decompositions or large gp_layers) will accumulate under-counted forces and torques silently, producing wrong physics with no diagnostic. The comments acknowledge "local neighborhood" but do not document the assumption or enforce it with a runtime guard (e.g. asserting gp_layers is small relative to subdomain size).

[Memory Management / GPU Correctness — patch_ib allocated with raw allocate/deallocate]
src/simulation/m_global_parameters.fpp, src/simulation/m_start_up.fpp

patch_ib is now allocatable and listed in GPU_DECLARE. The initial allocation in s_set_default_values:

allocate (patch_ib(num_ib_patches_max))

and the resize in s_reduce_ib_patch_array:

deallocate (patch_ib)
...
allocate (patch_ib(num_aware_ibs))

both use bare Fortran statements instead of @:ALLOCATE/@:DEALLOCATE. Per CLAUDE.md, every @:ALLOCATE must have a matching @:DEALLOCATE. More critically, after s_reduce_ib_patch_array populates the freshly-allocated host patch_ib (via the filtering loop and assignments), there is no $:GPU_UPDATE(device=[patch_ib]) call to push the updated data to the device. On GPU builds, the device-side patch_ib will be stale or uninitialized after this function returns, corrupting any subsequent GPU kernel that reads it before the data is otherwise re-uploaded.

danieljvickers added 16 commits March 15, 2026 07:51

Remove interior point conservative variable protection for stationary…

4273c84

… boundaries

Merge branch 'master' of github.com:danieljvickers/MFC

4b2f519

Merge branch 'MFlowCode:master' into master

ba0cacb

Merge branch 'MFlowCode:master' into master

e6988ba

Merge branch 'MFlowCode:master' into master

0ce6f11

Merge branch 'MFlowCode:master' into master

16bd456

Merge branch 'MFlowCode:master' into master

f0e117d

Merge branch 'MFlowCode:master' into master

b42028a

Merge branch 'MFlowCode:master' into master

8728091

Initial separation for patch_ibs

f652274

Added IB patch reduction at the start of the simulation so that ranks…

3baf894

… are only locally aware

intermittent commit

2365f2c

we now write the global IB index, not the local one to ib_markers, as…

71bae6a

… intended

Refactored ib reduction to use neighbor bounds

0779098

prototype of send-receive replacing all-to-all

d9ac1c2

Compilation errors resolved

ee0bc0c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local Aware IBM#1378

Local Aware IBM#1378
danieljvickers wants to merge 16 commits intoMFlowCode:masterfrom
danieljvickers:local-aware-ibm

danieljvickers commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

danieljvickers commented Apr 23, 2026

Description

Type of change

Testing

Checklist

AI code reviews

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant