global_evaluate: faithful parallel evaluate() (fix out-of-domain mislocation)#222
Open
lmoresi wants to merge 1 commit into
Open
global_evaluate: faithful parallel evaluate() (fix out-of-domain mislocation)#222lmoresi wants to merge 1 commit into
lmoresi wants to merge 1 commit into
Conversation
…location global_evaluate's parallel path had no correct handling for query points outside the (old) domain: the evaluation-swarm migrate routes unclaimed points by nearest rank centre-of-mass and strands them on an arbitrary rank, which then extrapolates from a geometrically-far cell -> silently-wrong values, parallel-only (e.g. an annulus boundary point reading the opposite side). This corrupted mesh-variable transfer on parallel mover-adapted meshes. Restore the serial evaluate() contract (interpolate inside / extrapolate from the TRUE nearest cell outside / flag inside-outside) with a best-claim out-of-domain fallback in global_evaluate_nd: allgather the (small, boundary-layer) extrapolated set; every rank reports its nearest-local-cell distance + its LOCAL rbf extrapolation; Allreduce(MIN dist / MIN rank / SUM winner value) picks the globally-nearest rank's value. Only unconditional collectives + local rbf_evaluate (never the collective FE interpolation, which would desync) -> deadlock-safe. O(boundary points), no dense global tree. Default on; GE_LOCAL_FALLBACK=0 restores legacy. Serial unchanged (gated mpi.size>1). Validated: deterministic-rotation gate, linear field T=x, np=5, max_err 1.06 -> 0.003 (== serial). Underworld development team with AI support from Claude Code
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates global_evaluate_nd to correct MPI-parallel evaluation for points that end up outside any rank’s owned cells after the swarm migrate round-trip, aiming to make parallel behavior match the serial evaluate() contract (interpolate in-domain; extrapolate just outside; return inside/outside via check_extrapolated) and to avoid deadlocks by using only unconditional collectives and rank-local RBF evaluation.
Changes:
- Adds a parallel “best-claim” out-of-domain fallback: allgather stranded points, compute per-rank distance/value, and Allreduce to select the globally-best rank’s extrapolation.
- Adds an environment-variable escape hatch (
GE_LOCAL_FALLBACK) to disable the new fallback and restore legacy behavior. - Expands the
global_evaluate_nddocstring/comments to document the parallel contract and deadlock-safety constraints.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+593
to
+603
| ext_vals, ext_flag = evaluate_nd( | ||
| expr, all_ext, rbf=True, evalf=False, verbose=False, | ||
| check_extrapolated=True,) | ||
| ext_vals = np.ascontiguousarray( | ||
| np.asarray(ext_vals, dtype=np.double).reshape((n_ext_total,) + expr_shape)) | ||
| ext_flag = np.asarray(ext_flag).reshape(n_ext_total).astype(np.int32) | ||
|
|
||
| # Nearest-local-cell distance for every point (local kd-tree query). | ||
| mesh._build_kd_tree_index() | ||
| dist2, _ = mesh._centroid_index.query(all_ext, k=1, sqr_dists=True) | ||
| dist2 = np.ascontiguousarray(np.asarray(dist2, dtype=np.double).ravel()) |
Comment on lines
+574
to
+576
| import os | ||
| if uw.mpi.size > 1 and os.environ.get("GE_LOCAL_FALLBACK", "1") not in ( | ||
| "0", "off", "false", "no", ""): |
Comment on lines
+371
to
+374
| Contract: this is a faithful *parallel* counterpart of :func:`evaluate` — | ||
| a query point is interpolated wherever in the mesh it lands (on any rank), | ||
| a point just outside the mesh is extrapolated from its true nearest cell, | ||
| and ``check_extrapolated`` returns an inside/outside flag per point. The |
Comment on lines
+533
to
+537
| # ------------------------------------------------------------------ | ||
| # Out-of-domain extrapolation — keep the parallel result a faithful | ||
| # match for the serial ``evaluate()`` contract: interpolate a point | ||
| # wherever it lands across ranks, extrapolate a point just outside the | ||
| # mesh, and flag inside/outside. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
global_evaluate's parallel path silently returned wrong values for query points outside the (old) domain. The evaluation-swarmmigrateroutes unclaimed points by nearest rank centre-of-mass and strands them on an arbitrary rank, which extrapolates from a geometrically-far local cell. Deterministic reproduction (rotation gate, linear fieldT=x, np=5): a point atx=+0.64read−0.42(opposite side),max_err=1.06. This corrupted mesh-variable transfer on parallel mover-adapted meshes.Fix
Make
global_evaluatea faithful parallelevaluate()— interpolate inside, extrapolate from the true nearest cell outside, flag inside/outside. A best-claim out-of-domain fallback inglobal_evaluate_nd:allgatherthe (small, boundary-layer) extrapolated set,Allreduce(MIN dist / MIN rank / SUM winner-value)picks the globally-nearest rank's value.Only unconditional collectives + a local
rbf_evaluate(never the collective FE interpolation, which would desync per-rank → hang). O(boundary points), no dense global tree. Deadlock-safe by construction.Default on;
GE_LOCAL_FALLBACK=0restores legacy. Serial path unchanged (gatedmpi.size>1).Validation
Rotation gate,
T=x, np=5:max_err 1.06 → 0.003(bit-identical to serial0.003); turning the fallback off reproduces1.06. tier-A green; used throughout the parallel adaptive-convection runs.Underworld development team with AI support from Claude Code