(corrected) Add MedNeXt-L + SkeletonRecall trainer: addresses #191 merger failure mode, validated on 7-cube re-score with official Kaggle topology metric#975
Conversation
|
This re-pitch is much easier to evaluate than the earlier PR; the Kaggle-metric framing and the downstream ink AUC sanity check are the right kind of evidence for #191. I found four small integration/doc things that would be worth tightening before merge:
I locally checked that a small patch for those first two integration risks is applicable to this PR head and that focused stubbed tests pass without installing MedNeXt. A full import/build/forward smoke with |
- Lazy-import mednextv1 inside build_network_architecture so nnU-Net trainer discovery doesn't break when the optional extra is absent; raise an actionable ImportError pointing at `pip install -e ".[mednext]"`. - Override set_deep_supervision_enabled to toggle MedNeXt's `do_ds` instead of the default `decoder.deep_supervision` (MedNeXt has no decoder attribute). - Point the `mednext` extra at the upstream git source; mednextv1 isn't on PyPI. - Update README + docstring to the corrected 7-cube official Kaggle metric (0.3996 -> 0.4397, +10.0%) and drop the superseded 5-cube high_compressed IoU.
|
@ciscoriordan is attempting to deploy a commit to the scroll Team on Vercel. A member of the Team first needs to authorize it. |
4e5705a to
3c336f2
Compare
- Lazy-import mednextv1 inside build_network_architecture so nnU-Net trainer discovery doesn't break when the optional extra is absent; raise an actionable ImportError pointing at `pip install -e ".[mednext]"`. - Override set_deep_supervision_enabled to toggle MedNeXt's `do_ds` instead of the default `decoder.deep_supervision` (MedNeXt has no decoder attribute). - Point the `mednext` extra at the upstream git source; mednextv1 isn't on PyPI. - Update README + docstring to the corrected 7-cube official Kaggle metric (0.3996 -> 0.4397, +10.0%) and drop the superseded 5-cube high_compressed IoU.
Adds nnUNetTrainerSkeletonRecall_MedNeXtL_kernel5 (plus a kernel3 fallback in the same file) that extends the existing nnUNetTrainerSkeletonRecall and only overrides build_network_architecture to swap the default ResEnc U-Net for MedNeXt-L. Loss, transforms, data loaders, and train/validation steps are inherited unchanged. Motivation: issue ScrollPrize#191 ("Surface and Fiber Predictions in Compressed or Highly Curved areas"). On a held-out 5-cube benchmark over PHerc Paris 1 (S1) and PHerc 1667 (S4), this trainer scores +0.267 absolute high_compressed IoU (+66% relative) over the d058 ResEnc-L production model; overall_macro IoU 0.534 -> 0.685. MedNeXt is consumed as the upstream pip package mednextv1 (https://github.com/MIC-DKFZ/MedNeXt) and added as an optional dependency so users not using this trainer see no change in install footprint: pip install -e ".[mednext]" README updated with a short section pointing to the issue, the benchmark comment, the HF weights, and the kernel3 fallback.
3c336f2 to
dfcaa15
Compare
|
All four are addressed: the nnunet_mednext import is now lazy inside build_network_architecture, there's a do_ds deep-supervision override, the [mednext] extra points at the MIC-DKFZ git source, and the docs lead with the corrected 7-cube Kaggle metric. The |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dfcaa15286
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…or inference nnUNetv2_predict rebuilds the network with enable_deep_supervision=False and strict-loads a checkpoint trained with deep supervision. MedNeXt only registers its out_1..out_4 heads when constructed with deep_supervision=True, so the inference build was missing those weights and load_state_dict failed on unexpected out_* keys. Construct the heads unconditionally in a shared _build_mednext helper and use do_ds to select single- vs deep-supervision output, so both the kernel5 and kernel3 checkpoints load for inference.
Adds a MedNeXt-L + SkeletonRecall surface trainer for the compressed and highly curved regions where the ResEnc-L surface model loses recall (#191). This re-pitches #925 with the evaluation corrected: scored on the official Kaggle Vesuvius Surface Detection metric (0.30 TopoScore + 0.35 SurfaceDice@τ + 0.35 VOI) rather than binary IoU, on all 7 held-out S1 cubes at threshold 0.5.
MedNeXt-L wins on all 7 cubes. d058 leads only on TopoScore, because MN's smoother predictions add spurious handles and cavities against the binary GT, and the SurfaceDice and VOI gains outweigh that. Following ensemble-not-swap, the recommended deployment is voxel-wise max(d058, MN) at 0.4437, which beat the mean and every weighted blend in a 6-op fusion sweep. Per-cube numbers, the fusion sweep, and the TopoScore sub-investigation are in the supporting repo.
Downstream check: feeding each surface into the v4 ink detector on the same crops, MN matches the hand-segmented bruniss surface within 3-5 AUC, while d058's argmax heightmap extraction gives sub-random ink.
The trainer subclasses nnUNetTrainerSkeletonRecall and only swaps in MedNeXt-L (kernel5, with a kernel3 VRAM fallback) from the optional
[mednext]extra, lazy-imported so nnU-Net trainer discovery does not require it. It builds the deep-supervision heads unconditionally and togglesdo_dsfor the output mode, so a trained checkpoint strict-loads undernnUNetv2_predict. Pretrained weights are on HF (kernel5_skelrec_dataset059_ep33).