Skip to content

Bench harness + obs_mobility unit fix#65

Open
jspaezp wants to merge 41 commits into
mainfrom
infra/benchmark_fixture
Open

Bench harness + obs_mobility unit fix#65
jspaezp wants to merge 41 commits into
mainfrom
infra/benchmark_fixture

Conversation

@jspaezp
Copy link
Copy Markdown
Collaborator

@jspaezp jspaezp commented May 18, 2026

Summary

  • Fixture-driven bench harness under bench/: stage_fixture, push_fixture, wandb_bench, entrapment analyser (FDRBench Method 1/1B), peptide-list pipeline, plus tests.
  • Fix obs_mobility unit bug in timsseek scoring: percent error was being added to absolute 1/k0, biasing observed mobility ~50% high. Now converts pct -> abs via ref_mobility/100 before combining. Also splits NaN gating in weighted_ms1, names the magic .take(3) constant, renames a confusing variable shadow.
  • Refit supersimpleprediction on hela_iccoff_gt20peps after the unit fix: holdout MAPE 1.36% (was 1.83% in the prior docstring). New tool at scripts/refit_mobility.py to redo this on any results.parquet.
  • Bench staging path bugs: stage_fixture emitted relative paths into the staged TOML which then failed schema validation; target_peptides was unconditionally staged though the schema made it optional. wandb_bench s3_download_file calls now go through a _materialize_uri helper so absolute local paths pass through.

Test plan

  • cargo check --workspace clean
  • cargo test -p timsseek --doc supersimpleprediction passes (new expected value 1.144405)
  • bench stage hela_iccoff_gt20peps -> wandb_bench --fixtures-dir bench_out/staged runs end to end
  • bench stage hela_smoke_shuffled (entrapment fixture, shuffled mode w/ pairing) -> wandb_bench runs end to end and logs entrap/* scalars
  • obs_mobility distribution post-fix: mean ~0.92, range [0.71, 1.24], residual obs-ref bias dropped from ~+0.5 to ~+0.03 (real instrument offset, not a unit bug)
  • (followup, separate) chase the residual +0.03 obs-ref bias - likely asymmetric ridge picking or MS1<->MS2 mobility offset
  • (followup, separate) q05 entrapment combined FDR 0.109 vs nominal 0.05 on smoke fixture - investigate

jspaezp added 30 commits May 8, 2026 20:14
Adds resolve_dbs() that turns a list of --db spec strings into one
concatenated FASTA on disk (local, S3, UniProt proteome/accession).
Individual accession specs are coalesced into a single batched HTTP call.
Adds bench/_s3.py wrapping aws s3 cp for download/upload/upload-dir.
ruff reformatted 6 bench/ files (magic trailing comma, line-length normalisation).
jspaezp added 11 commits May 9, 2026 10:01
…ent metadata

Replace fasta/entrapment_fasta with target_peptides/entrapment_peptides plus
entrapment_ratio, entrapment_mode (foreign|shuffled), and pairing fields.
Adds cross-field consistency validation and has_pairing() helper on Fixture.
obs_mobility was computed by adding the percent-of-ref mobility error
back to ref_mobility, giving values biased ~50% high. avg_delta_mobs now
converts pct -> absolute via ref_mobility/100 before feeding the
collector, so obs_mob = ref + abs_delta is dimensionally correct.
delta_ms1_ms2_mobility likewise switched to absolute 1/k0 units.

Side cleanups in offsets.rs:
- weighted_ms1 NaN gate was checking mz_error_ppm only, poisoning the
  mobility accumulator (or dropping valid mobility) on mismatched NaNs.
  Now gates each dimension independently.
- magic .take(3) on FRAGMENT_TOP_N=7 named as FRAGMENT_OBS_MOB_TOP_N.
- renamed shadow mz/mob in with_sorted_offsets to ms1_mob/ms2_mob.

supersimpleprediction refit on hela_iccoff_gt20peps (34k IDs, holdout
MAPE 1.36% vs prior 1.83% claim) using scripts/refit_mobility.py.
stage_fixture emitted relative paths (bench_out/cache/...) into the
staged TOML, which then failed the schema validator that requires
s3:// or absolute local paths. Now resolves to absolute via dst.resolve()
on both _stage_one_file and _stage_one_dir. Also guards
target_peptides for the gt20peps fixture (target_peptides is optional
since e6339f4 but stage still unconditionally tried to download it).

wandb_bench grew a _materialize_uri helper used in the entrapment block
so target_peptides / entrapment_peptides / pairing pass through when
already local instead of erroring out of s3_download_file.

scripts/stage_gt20.sh: one-shot wrapper for the gt20peps fixture.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant