A minimal working prototype for segmentation-free latent spatial program discovery from point-level spatial transcriptomics data.
A .parquet, .csv, or .tsv file with columns:
xy- optional
z gene_id
No nuclei masks, cell masks, or prior assignments are required.
- reads raw molecule coordinates and gene identities
- builds a spatial k-nearest-neighbor graph over molecules
- learns latent spatial programs using a graph encoder plus probabilistic program-specific:
- gene distributions
- spatial Gaussian fields
- outputs soft and hard point-level program assignments
spatial_point_program.py— backward-compatible CLI entrypointspatial_point_process/— package modules (io,graph,model,train,toy_data,cli)test_toy_run.py— toy end-to-end test.github/workflows/ci.yml— CI workflow running toy test
Generate toy data:
python spatial_point_program.py make-toy --output toy.parquetTrain on toy data:
python spatial_point_program.py train \
--input toy.parquet \
--outdir toy_out \
--n-programs 3 \
--epochs 240 \
--k-neighbors 12Run the toy test:
python test_toy_run.pyfit_metadata.jsondiagnostics_summary.jsonpoint_assignments.csvprogram_gene_summary.csvtraining_history.csvtraining_metrics.pngprogram_gene_heatmap.pngassignment_scatter.pngspatial_latent_fields.pngembeddings.npyprogram_gene_probs.npyedge_index.npy
This is a prototype, not a production-scale billion-molecule implementation. It is designed to be easy to inspect and extend.
A natural next step would be blockwise training, anchor/superpoint hierarchy, and minibatched distributed graph processing.