Skip to content

[8.3.2] ALCF Agent#137

Draft
davramov wants to merge 57 commits intoals-computing:mainfrom
davramov:alcf_agent
Draft

[8.3.2] ALCF Agent#137
davramov wants to merge 57 commits intoals-computing:mainfrom
davramov:alcf_agent

Conversation

@davramov
Copy link
Copy Markdown
Contributor

@davramov davramov commented May 1, 2026

This PR introduces an agentic workflow using Academy Agents that runs reconstruction and segmentation on ALCF.

  • Utilizes the Globus Compute Multi-user Endpoint on ALCF. This is great because it is maintained by ALCF, and does not require us to set up our own compute endpoint. You can pass in the queue/node/etc requirement as a parameter for the job.
  • Utilizes the alcf-ai inference service:
    • A vision LLM call is made that assess the quality of a reconstructed slice to determine if it is good enough to send for segmentation
    • If that check completes, the reconstruction agent hands off to the segmentation agent, which uses the ALCF SAM3 inference service. This is also great because the service is "warmed up", so time is not spent waiting for the model to load.

Video demo:
https://drive.google.com/file/d/15Rv82W5HZhGXT1GRqAEHLCx-C2n-m7Qi/view?usp=drive_link

Notes:

  • still needs to be rebased
  • uses Yadu's skeleton for setting up agents: https://github.com/als-computing/splash_flows/blob/agentic_rework/orchestration/agentic/skeleton.py
  • potential to add additional features...
    • What if the agent runs reconstruction of 1 slice (i.e., contrast/sharpness), assesses with a vision-enabled LLM, and iterates on the parameters (e.g., turn on/off phase retrieval) to find a good setting before running full recon
    • What if the agents read the metadata from the h5 file, such as the abstract, and an LLM helps pick a good prompt for segmentation
    • For fine-tuning workflows, we could think about the agents capturing the center-slice from reconstructions, and sending it to Tiled/segmentation app

…t for segmentation on ALCF. Still needs to be configured for GPU and the environment with dependencies
…e for the TomographyController. Turning off TIFF to ZARR on ALCF for the demo
…ant to collapse these into the final version, but for testing purposes I'm leaving both codes
…olaris that scales well to multiple gpu nodes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant