Skip to content

(corrected) Add MedNeXt-L + SkeletonRecall trainer: addresses #191 merger failure mode, validated on 7-cube re-score with official Kaggle topology metric#975

Open
ciscoriordan wants to merge 2 commits into
ScrollPrize:mainfrom
ciscoriordan:mednext-l-skeletonrecall-trainer

Conversation

@ciscoriordan
Copy link
Copy Markdown

@ciscoriordan ciscoriordan commented May 24, 2026

Adds a MedNeXt-L + SkeletonRecall surface trainer for the compressed and highly curved regions where the ResEnc-L surface model loses recall (#191). This re-pitches #925 with the evaluation corrected: scored on the official Kaggle Vesuvius Surface Detection metric (0.30 TopoScore + 0.35 SurfaceDice@τ + 0.35 VOI) rather than binary IoU, on all 7 held-out S1 cubes at threshold 0.5.

mean over 7 S1 cubes d058 MedNeXt-L SkelRec
Kaggle score 0.3996 0.4397
SurfaceDice@τ=2 0.508 0.580
VOI 0.567 0.667
TopoScore 0.084 0.022

Per-cube topology-aware metrics across the 7 held-out S1 cubes (merger_rate, SurfaceDice@τ, pixel_dice), d058 vs MedNeXt-L SkelRec

MedNeXt-L wins on all 7 cubes. d058 leads only on TopoScore, because MN's smoother predictions add spurious handles and cavities against the binary GT, and the SurfaceDice and VOI gains outweigh that. Following ensemble-not-swap, the recommended deployment is voxel-wise max(d058, MN) at 0.4437, which beat the mean and every weighted blend in a 6-op fusion sweep. Per-cube numbers, the fusion sweep, and the TopoScore sub-investigation are in the supporting repo.

Downstream check: feeding each surface into the v4 ink detector on the same crops, MN matches the hand-segmented bruniss surface within 3-5 AUC, while d058's argmax heightmap extraction gives sub-random ink.

Downstream: v4 ink predictions on three crops with three surface choices (hand-segmented / d058 / MedNeXt-L) vs ink GT

The trainer subclasses nnUNetTrainerSkeletonRecall and only swaps in MedNeXt-L (kernel5, with a kernel3 VRAM fallback) from the optional [mednext] extra, lazy-imported so nnU-Net trainer discovery does not require it. It builds the deep-supervision heads unconditionally and toggles do_ds for the output mode, so a trained checkpoint strict-loads under nnUNetv2_predict. Pretrained weights are on HF (kernel5_skelrec_dataset059_ep33).

Copy link
Copy Markdown

maxliebscher commented May 25, 2026

This re-pitch is much easier to evaluate than the earlier PR; the Kaggle-metric framing and the downstream ink AUC sanity check are the right kind of evidence for #191.

I found four small integration/doc things that would be worth tightening before merge:

  1. The new trainer imports nnunet_mednext at module import time, while [mednext] is optional. nnU-Net trainer discovery/import paths can touch trainer modules even when a user is not trying to build this MedNeXt trainer, so this would be safer as a lazy import inside build_network_architecture() with a clear pip install -e ".[mednext]" error if the extra is missing.

  2. This class inherits the default set_deep_supervision_enabled(), which toggles mod.decoder.deep_supervision. MedNeXt uses do_ds, so validation/training toggles may not switch the output mode correctly unless this trainer overrides the toggle for MedNeXt modules.

  3. The new optional extra is declared as mednext = ["mednextv1"]. Upstream MIC-DKFZ/MedNeXt currently documents installation via cloning the repository and running pip install -e ., so could you confirm this extra resolves in the intended install environment, or switch the extra to the correct package/source spec?

  4. Could you update the newly added README section and trainer module docstring to match the corrected 7-cube official Kaggle metric in this PR body, or trim the metric details from repo docs and point readers to the supporting writeup? Right now those landed-doc additions still describe the older 5-cube high_compressed IoU / overall_macro IoU benchmark, which looks like superseded Add MedNeXt-L + SkeletonRecall trainer for compressed surface regions #925-era evidence.

I locally checked that a small patch for those first two integration risks is applicable to this PR head and that focused stubbed tests pass without installing MedNeXt. A full import/build/forward smoke with pip install -e ".[mednext]" would still be useful before merge because this PR depends on an external architecture package and bypasses the default nnU-Net architecture builder.

ciscoriordan added a commit to ciscoriordan/villa that referenced this pull request May 25, 2026
- Lazy-import mednextv1 inside build_network_architecture so nnU-Net trainer
  discovery doesn't break when the optional extra is absent; raise an
  actionable ImportError pointing at `pip install -e ".[mednext]"`.
- Override set_deep_supervision_enabled to toggle MedNeXt's `do_ds` instead of
  the default `decoder.deep_supervision` (MedNeXt has no decoder attribute).
- Point the `mednext` extra at the upstream git source; mednextv1 isn't on PyPI.
- Update README + docstring to the corrected 7-cube official Kaggle metric
  (0.3996 -> 0.4397, +10.0%) and drop the superseded 5-cube high_compressed IoU.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 25, 2026

@ciscoriordan is attempting to deploy a commit to the scroll Team on Vercel.

A member of the Team first needs to authorize it.

@ciscoriordan ciscoriordan force-pushed the mednext-l-skeletonrecall-trainer branch from 4e5705a to 3c336f2 Compare May 30, 2026 00:13
ciscoriordan added a commit to ciscoriordan/villa that referenced this pull request May 30, 2026
- Lazy-import mednextv1 inside build_network_architecture so nnU-Net trainer
  discovery doesn't break when the optional extra is absent; raise an
  actionable ImportError pointing at `pip install -e ".[mednext]"`.
- Override set_deep_supervision_enabled to toggle MedNeXt's `do_ds` instead of
  the default `decoder.deep_supervision` (MedNeXt has no decoder attribute).
- Point the `mednext` extra at the upstream git source; mednextv1 isn't on PyPI.
- Update README + docstring to the corrected 7-cube official Kaggle metric
  (0.3996 -> 0.4397, +10.0%) and drop the superseded 5-cube high_compressed IoU.
Adds nnUNetTrainerSkeletonRecall_MedNeXtL_kernel5 (plus a kernel3 fallback
in the same file) that extends the existing nnUNetTrainerSkeletonRecall and
only overrides build_network_architecture to swap the default ResEnc U-Net
for MedNeXt-L. Loss, transforms, data loaders, and train/validation steps
are inherited unchanged.

Motivation: issue ScrollPrize#191 ("Surface and Fiber Predictions in Compressed or
Highly Curved areas"). On a held-out 5-cube benchmark over PHerc Paris 1
(S1) and PHerc 1667 (S4), this trainer scores +0.267 absolute
high_compressed IoU (+66% relative) over the d058 ResEnc-L production
model; overall_macro IoU 0.534 -> 0.685.

MedNeXt is consumed as the upstream pip package mednextv1
(https://github.com/MIC-DKFZ/MedNeXt) and added as an optional dependency
so users not using this trainer see no change in install footprint:

  pip install -e ".[mednext]"

README updated with a short section pointing to the issue, the benchmark
comment, the HF weights, and the kernel3 fallback.
@ciscoriordan ciscoriordan force-pushed the mednext-l-skeletonrecall-trainer branch from 3c336f2 to dfcaa15 Compare May 30, 2026 01:05
@ciscoriordan
Copy link
Copy Markdown
Author

All four are addressed: the nnunet_mednext import is now lazy inside build_network_architecture, there's a do_ds deep-supervision override, the [mednext] extra points at the MIC-DKFZ git source, and the docs lead with the corrected 7-cube Kaggle metric. The pip install -e ".[mednext]" install plus a MedNeXt-L kernel5 build/forward both check out, and I rebased onto current main.

@giorgioangel
Copy link
Copy Markdown
Member

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dfcaa15286

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…or inference

nnUNetv2_predict rebuilds the network with enable_deep_supervision=False and
strict-loads a checkpoint trained with deep supervision. MedNeXt only registers
its out_1..out_4 heads when constructed with deep_supervision=True, so the
inference build was missing those weights and load_state_dict failed on
unexpected out_* keys. Construct the heads unconditionally in a shared
_build_mednext helper and use do_ds to select single- vs deep-supervision
output, so both the kernel5 and kernel3 checkpoints load for inference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants