Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
240 changes: 240 additions & 0 deletions docs/developer/design/CONSTRAINED_FREESLIP_MULTIPLIER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
# Constrained free-slip via a recoverable Lagrange multiplier (dynamic topography)

**Status**: proof-of-concept (Phase 0 + Phase 1), serial. Branch
`feature/constrained-freeslip-topography`.

## Motivation

Free-slip / no-normal-flow on curved (annulus, spherical) boundaries is
currently enforced with **penalty-like** methods — a penalty natural BC
(`add_natural_bc(penalty · Γ·v · Γ, ...)`) or Nitsche. These are fragile: the
penalty magnitude must be tuned against the Rayleigh number and viscosity. Too
weak and a coherent radial throughflow appears (an under-scaled `1e4` natural BC
is ~100× too weak at Ra=1e6); too strong and the system ill-conditions and the
Stokes solve diverges in line search.

This feature enforces `u·n = g` on a curved boundary with a **true Lagrange
multiplier** `λ` instead of a penalty. Because the converged multiplier *is* the
normal traction holding the boundary, it is simultaneously a direct estimate of
**dynamic surface topography**, `h = λ / (Δρ g)`. The equilibrium `λ` is also the
target end-state toward which a free surface can be integrated over a time
interval (connecting to the ETD free-surface work on
`feature/exp-integrator-freesurface`).

## Formulation

Stokes with a surface constraint `u·n = g` on Γ, multiplier `λ`:

```
[ A Bᵀ Cᵀ ] [u] [f]
[ B 0 0 ] [p] = [0] A = viscous, B = div, C = ∫_Γ (n·v) ψ
[ C 0 0 ] [λ] [g] (C couples only the boundary trace of u)
```

`C` is **co-dimension-1**: it touches only velocity DOFs on Γ. A monolithic
third FE field would therefore either waste interior DOFs or need a boundary
trace space PETSc/DMPlex does not provide on the same DM. We instead solve the
boundary Schur complement `S_λ = C K⁻¹ Cᵀ` by an **outer loop**, leaving the
validated 2×2 Stokes assembly untouched.

### Augmented Lagrangian (ALG2) — the production algorithm

Each Stokes solve carries a natural-BC traction with **both** the multiplier and
a penalty augmentation, reusing the existing penalty machinery:

$$\mathbf{t} = \bigl[\lambda + r\,(\mathbf{u}\cdot\mathbf{n} - g)\bigr]\,\mathbf{n}
\quad\text{on } \Gamma,$$

and the multiplier is updated with the same `r`:

$$\lambda \leftarrow \lambda + r\,(\mathbf{u}\cdot\mathbf{n} - g).$$

The penalty term `r(u·n)n` preconditions *every* boundary mode uniformly, so the
outer loop converges in a handful of iterations; the multiplier removes the
penalty's accuracy bias, so a **moderate, well-conditioned** `r` gives both fast
convergence *and* an exact constraint — unlike a pure penalty, which must be made
large and fragile.

## Phase-0 spike findings (what shaped the design)

Spikes (`/tmp/s3_*.py`) on a 2D annulus (no-slip inner boundary to remove the
rigid-rotation null space, multiplier free-slip on the outer boundary):

- **Plain Uzawa works but is slow.** A damped-Richardson update
`λ ← λ + ρ(u·n)` converges and matches the penalty solution, but a single
scalar `ρ` cannot kill both the fast and slow boundary-Schur modes — the
residual contracts the dominant mode in ~5 iterations then crawls.
- **`ρ ∝ μ`, NOT `ρ ∝ μ/h`.** The optimal Richardson step is `ρ ≈ C·μ` with `C`
a geometry constant, **independent of mesh resolution** (`ρ=8μ` converged in 5
iterations at cellSize 0.1/0.05/0.025). The naive `μ/h` scaling over-steps on
refinement and stalls.
- **CG is the wrong accelerator.** CG on `S_λ` diverged: each matvec is an
*inexact* iterative Stokes solve (plus pressure-null-space noise), and the
nodal Euclidean inner product is not the one in which `S_λ` is SPD. Krylov
acceleration needs an exact symmetric operator; this is neither.
- **Augmented Lagrangian is the right accelerator** (per L. Moresi). It converges
in **2 iterations** where plain Uzawa took 21, reusing the existing penalty BC.
This is the implemented algorithm.

## Implementation

`SNES_Stokes_Constrained(SNES_Stokes)` in `src/underworld3/systems/solvers.py`,
exported as `uw.systems.Stokes_Constrained`. **Purely additive** — the validated
2×2 saddle-point assembly and fieldsplit configuration are untouched, honouring
"solver stability is paramount".

```python
stokes = uw.systems.Stokes_Constrained(mesh, velocityField=v, pressureField=p)
stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
stokes.constitutive_model.Parameters.shear_viscosity_0 = mu
stokes.bodyforce = buoyancy * unit_r
stokes.add_dirichlet_bc((0.0, 0.0), "Lower") # no-slip inner

lam = stokes.add_constraint_bc("Upper", g=0.0) # free-slip outer (Gamma_P1)
stokes.solve() # no constraint tuning needed

topo = stokes.topography("Upper", buoyancy_scale=delta_rho_g) # h = lambda/(drho g)
```

The outer loop is hidden behind ``solve()``: the tolerance is **relative** to the
velocity scale (``RMS(u.n-g) < rtol·RMS|u|``, default ``rtol=1e-3``) so it is
problem-independent, and the augmentation defaults to ``1e3·μ(x)`` (local-
viscosity-weighted). On SolCx this gives ~2e-4 accuracy in 3 outer iterations at
viscosity contrast 1 → 10⁶ with no user tuning.

Key design points:

- **Multiplier representation.** `λ` is an ordinary full-mesh scalar field at the
*velocity degree* (P2). Only its trace on Γ enters the weak form. Matching the
velocity degree means the multiplier reaches every velocity normal-trace DOF
(including P2 mid-edge), so there is no penalty floor on the constraint — a P1
multiplier plateaus at ~`‖t‖/r` on the mid-edge component.
- **Clean topography field.** The dual update is restricted to boundary nodes via
a P1 boundary marker (`petsc_dm_find_labeled_points_local`) sampled at the
multiplier's nodes — boundary mid-edge nodes interpolate to 1, interior to
< 0.5. Interior `λ` therefore stays **exactly zero**, so `λ` is a directly
usable topography field (not interior garbage).
- **Coupling registered once.** `add_natural_bc([λ + r(u·n − g)]·n, boundary)`
is set up a single time; only `λ`'s boundary data changes between solves, so
nothing recompiles.
- **`add_constraint_bc(boundary, g=0, normal=None, augmentation=None)`** —
`normal` defaults to the smooth projected normals `mesh.Gamma_P1`;
`augmentation` defaults to a viscosity-scaled `r = 10³·μ`.

## Validation

`tests/test_1061_constrained_freeslip_annulus.py` (level_2 / tier_b), buoyancy-
driven annulus, no-slip inner + multiplier free-slip outer, vs a `1e6` penalty
reference. All pass (~14 s):

| Check | Result |
|---|---|
| `RMS(u·n)` on outer boundary (no penalty coefficient) | 6.5e-5 |
| `relL2(v_multiplier vs v_penalty)` | 3.1e-3 |
| Outer iterations (augmented Lagrangian) | 2 |
| Interior `λ` (clean field) | exactly 0 |
| boundary `corr(λ, −n·σ·n)` (dynamic-topography stress) | 0.9999 |

The consistent-boundary-flux identity `λ = −n·σ·n|_Γ` is the independent
cross-check: the multiplier's boundary trace equals the recovered normal Cauchy
stress (negative sign = the reaction traction holding the boundary), confirming
`λ` is the dynamic topography signal.

## The augmentation parameter `r`: true-work trade-off

`r` is a *speed* knob, not an *accuracy* knob — this is the key advantage over a
pure penalty. Sweep on the annulus (constraint tol 1e-4, wall time for one cold
solve; `tot_lin` = total outer Schur-KSP linear iterations across the loop):

cellSize 0.1 (~2940 dof):

| `r` | outer its | tot_lin | wall (s) | relL2 vs penalty |
|---:|---:|---:|---:|---:|
| 10 | 26 | 26 | 3.77 | 3.1e-3 |
| 100 | 4 | 4 | 0.61 | 3.2e-3 |
| 300 | 3 | 3 | 0.47 | 3.1e-3 |
| 1,000 | 3 | 3 | 0.53 | 3.1e-3 |
| 3,000 | 2 | 2 | **0.38** | 3.1e-3 |
| 10,000 | 2 | 2 | 0.51 | 2.9e-3 |
| 100,000 | 2 | 5 | 1.83 | 1.6e-3 |

cellSize 0.05 (~10852 dof) shows the same shape (min wall ≈ 1.25 s at r=1e3;
6.98 s at r=10; 7.07 s at r=1e5).

- **Outer iterations fall with `r`** (`26 → 4 → 3 → 2`) — bigger penalty, faster
dual convergence (`contraction ≈ ‖S_λ‖/(r+‖S_λ‖)`).
- **But the inner solve stiffens at large `r`.** Linear iterations per outer
solve stay at 1.0 up to `r=10⁴`, then rise (2.5 at `r=10⁵`), and wall time
balloons (the velocity sub-block conditioning degrades — visible in wall time
even before the outer KSP count moves).
- **True work is U-shaped**: both extremes are 3–8× slower than the optimum. The
efficient basin is `r ∈ [300, 10⁴]` (>1.5 decades) at both resolutions; the
default `r = 10³·μ` sits inside it.
- **Accuracy is `r`-independent** (relL2 ≈ 3.1e-3, flat across four decades of
`r`). So `r` is tuned for *speed* with a benign failure mode — too small just
costs iterations, too large just costs inner work; **the answer is never
wrong**. Contrast a pure penalty, where the magnitude must be tuned against
forcing strength and viscosity to get *accuracy* (too small ⇒ wrong), which is
the fragility this method removes.

## Option trade-offs and what is deferred

| Option | Verdict |
|---|---|
| (A) full-domain 3rd FE field + ε-screening | 3-way nested fieldsplit; ε re-introduces tuning. Rejected as primary. |
| (B1) boundary-stratum-only PetscFE field | No DMPlex support on the same DM. |
| (B2) co-dim-1 submesh + MATNEST | The honest monolithic form; deferred. |
| (C) reuse pressure / `_constraints` | `p` enforces `∇·u=0` interior, not `u·n=0` on Γ — not redundant. The CBF identity is a *validation* tool, not an implementation. |
| **(D) augmented-Lagrangian outer loop** | **Implemented.** Non-invasive, 2 iterations, exact, recoverable topography. |

Deferred to follow-up PRs (Phase-0 spike S2 gathers the fieldsplit-feasibility
evidence): the monolithic 3-field / co-dim-1 representation; 3D spherical shells;
**parallel** (the boundary mask and the nodal update are serial); the
**both-boundaries-free-slip** annulus case (admits a rigid-rotation velocity null
space needing explicit removal); and live free-surface equilibrium integration
(pass `λ` as the target normal-stress end-state).

## Monolithic `P'=[p,h]` fieldsplit — feasibility spike

A future direction (and a general "inject arbitrary constraints into the saddle
point" capability): rather than a third field forcing a nested 3-way Schur, group
pressure and the multiplier into a composite `P' = [p, h]` and keep a **2-way
`u | P'`** split.

**Spike result (confirmed):** a 3-field DM `(u, p, h)` on a real mesh, with the `p`
and `h` index sets grouped (`pc_fieldsplit_1_fields 1,2`, or an explicit
concatenated IS), produces exactly a 2-block `u | [p,h]` Schur fieldsplit
(block sizes 84 | 84 = u | (p+h) on a coarse test). The nested-Schur /
KSP-reconfiguration concern is therefore moot — the split structure is identical to
the current `u | p` solver. (`/tmp/spike_pph.py`.)

**Remaining work (bounded, ~2 weeks of Cython, behind a subclass):**
- Register `h` as field 2 (one `dm.setField`).
- `h`-equation residual: boundary part `∫_Γ ψ(n·u − g)` (the existing Nitsche path
already registers field-1 boundary residuals — same pattern) plus a small interior
screening `ε∫_Ω h ψ` to de-singularise the interior `h` block.
- Three new Jacobian blocks: `uh`, `hu` (boundary integrals — the `UW_PetscDSSetBdJacobian`
machinery already does arbitrary field pairs for `up`/`pu`) and `hh` (interior mass);
`ph`/`hp` are zero.
- Group `[p,h]` in the fieldsplit and extend the Schur PC (`p` keeps its `1/μ` mass PC;
`h` gets the screening diagonal).
- The 3-IS field decomposition / nullspace path (currently assumes 2 fields).

**Caveats:** the interior screening reintroduces a small `ε` (benign — it does not bias
the boundary multiplier); the work touches the validated `uu/up/pu/pp` assembly, so it
must live behind a subclass and be regression-tested. The one genuine PETSc limitation
remains a *co-dimension-1* `h` (boundary-only DOFs) — avoided here by the full-domain +
screening representation.

**Recommendation:** the architecture is de-risked, but since the outer loop converges in
2–3 iterations with no user-facing tuning, monolithic is scheduled work (a general
saddle-point-constraint capability), not an urgent replacement.

## Files

- `src/underworld3/systems/solvers.py` — `SNES_Stokes_Constrained`, `_ConstraintBC`.
- `src/underworld3/systems/__init__.py` — `Stokes_Constrained` export.
- `tests/test_1061_constrained_freeslip_annulus.py` — validation.

Pre-existing (unrelated) failures noted: `tests/test_1060_nitsche_freeslip.py`
has two failing assertions on `development` independent of this work.
1 change: 1 addition & 0 deletions src/underworld3/systems/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
from .solvers import SNES_Poisson as Poisson
from .solvers import SNES_Darcy as SteadyStateDarcy
from .solvers import SNES_Stokes as Stokes
from .solvers import SNES_Stokes_Constrained as Stokes_Constrained
from .solvers import SNES_VE_Stokes as VE_Stokes
from .solvers import SNES_Projection as Projection
from .solvers import SNES_Vector_Projection as Vector_Projection
Expand Down
Loading
Loading