This repository contains an experimental Codon-backed port of selected CAM5 routines inside the isotope-enabled iCESM1.3/iHESP CAM component.
The native Fortran CAM implementation remains the reference. The Codon
implementation is selected at runtime with a global default switch
(CAM_CODON_IMPL) plus per-entry *_IMPL overrides. The main goal is to move
computational kernels into Codon while preserving bit-for-bit (BFB) CAM output
against a pristine native baseline.
As of the 2026-06-16 validation snapshot, the tracked selector set used for
validation contains 675 runtime *_IMPL switches, and two 6-month
production-style validations on the Derecho HPC system have achieved
overall_numeric_equal=True against matching native baselines.
- CAM5: the Community Atmosphere Model version used by this CESM/iCESM tree.
- Native path: the original CAM Fortran implementation.
- Codon path: the translated implementation compiled from
*_codon.pyinto a shared library. - Selector: a runtime environment variable, usually named
*_IMPL, that choosesnativeorcodonfor one entry point. - Global selector default:
CAM_CODON_IMPL, which sets the default implementation for every selector unless a specific*_IMPLoverride is set. - BFB: bit-for-bit equality. In this project, BFB means the compare script
reports
overall_numeric_equal=True. - PI and MCO: the internal pre-industrial and Miocene validation cases used for long-run testing.
- Preserve BFB CAM output relative to a pristine native Fortran baseline.
- Move computational CAM kernels into Codon while keeping the CESM/CAM calling interface stable.
- Allow routine-by-routine rollout through runtime selectors.
- Allow all-Codon and all-native runs without generating hundreds of selector variables.
- Track progress with routine-level execution evidence rather than touched-file counts.
- Keep the native Fortran path available for comparison, fallback, and numerically fragile expression islands.
Snapshot: 2026-06-16 validation runs, after the UWSHCU positive-moisture expression-order fix.
- Selector coverage: 675 tracked runtime selectors in the validation snapshot.
- Long-run BFB evidence: PI (pre-industrial) and MCO (Miocene) 6-month
all-Codon runs both compare with
overall_numeric_equal=Trueagainst matching pristine native baselines. - Progress tracking: routine status is maintained in the CAM Codon status dashboard, with separate states for complete, partial, in progress, and not started routines.
- Reference implementation: native Fortran remains authoritative.
The selector count is a snapshot, not a permanent API. It may change as new CAM entry points are added or split.
This port does not replace the CESM case workflow. It adds a runtime-selectable Codon layer beside the native CAM Fortran implementation.
-
Original Fortran routines remain in place.
-
Selected routines have Fortran wrappers with
bind(C)interfaces that call Codon shared libraries. -
Codon implementations live in
*_codon.pyfiles and are compiled intolib*_codon.so. -
Runtime selectors choose the implementation:
CAM_CODON_IMPL=codon CAM_CODON_IMPL=native EXAMPLE_IMPL=native EXAMPLE_IMPL=codon
CAM_CODON_IMPLsets the default for all runtime selectors. Individual*_IMPLvariables still override that default for a single entry point. -
CAM logs print execution proof lines such as
implementation = codonordirect = codon. -
The BFB rule is strict: a Codon path is accepted only when the compare script reports
overall_numeric_equal=True.
A native island is a minimal Fortran expression or statement block intentionally kept in the native path because translating that specific operation to Codon was shown to break proven BFB behavior.
Important Codon-port files live beside the original CAM source:
src/physics/cam/*_codon.py: Codon implementations for CAM physics kernels.src/chemistry/*/*_codon.py: Codon chemistry and aerosol helpers.src/dynamics/se/*_codon.py: Codon spectral-element dynamics helpers.src/utils/cam_misc_codon.py: shared CAM utility helpers.src/utils/cam_codon_selectors.F90: shared selector lookup helper for the globalCAM_CODON_IMPLdefault plus per-entry*_IMPLoverrides.cime_config/buildlib: CIME build hook that compiles Codon sources duringcase.build.scripts/cam_codon_guard.py: validation/pre-commit guard for high-risk numerical changes.scripts/validation/compare_cesm_runpair.py: repo-local CAM output compare helper used by the examples below.scripts/validation/env_allcodon_675.sh: compatibility selector file for the 2026-06-16 validation snapshot. New all-Codon runs can usually useCAM_CODON_IMPL=codoninstead..codon_guard.yaml: guard policy, validation metadata, forbidden generated artifacts, and metadata-only compare differences.doc/internal_validation.md: internal dashboard, workspace, and validation artifact locations for this project.
This CAM tree is intended to sit inside a full iCESM/CESM checkout:
iCESM1.3.1_fzhu/
cime/
components/
cam/ # this repository
cice/
clm/
pop/
Requirements:
- A working CESM/CIME environment for the target machine.
- A Fortran compiler and MPI stack supported by the case.
- Codon available as
codoninPATH, or at~/.codon/bin/codon. - Python for helper scripts and compare tooling.
- NetCDF/CIME runtime dependencies required by the parent CESM case.
Build a case normally through CIME:
SRCROOT=/path/to/iCESM1.3.1_fzhu
CAM_REPO="$SRCROOT/components/cam"
CASEROOT=/path/to/case
COMPSET=YOUR_COMPSET
GRID=YOUR_GRID
MACHINE=YOUR_MACHINE
COMPILER=YOUR_COMPILER
PROJECT=YOUR_PROJECT
"$SRCROOT/cime/scripts/create_newcase" \
--case "$CASEROOT" \
--compset "$COMPSET" \
--res "$GRID" \
--machine "$MACHINE" \
--compiler "$COMPILER" \
--project "$PROJECT" \
--srcroot "$SRCROOT" \
--run-unsupported
cd "$CASEROOT"
./case.setup
./case.buildThe important setting is --srcroot "$SRCROOT". A CESM case points to the full
iCESM/CESM source root, and CAM is found at $SRCROOT/components/cam.
Use $SRCROOT/cime/scripts/query_config --compsets, --grids, and
--machines to find valid values for the placeholders above.
During ./case.build, components/cam/cime_config/buildlib finds the Codon
compiler and builds the CAM Codon shared libraries. If Codon is missing, the
build fails before model execution.
Codon libraries are built with floating-point contraction disabled:
codon build -release -lib --relocation-model=pic --fp-contract=off --global-ctor=no--fp-contract=off is required for BFB because fused multiply-add contraction
changes rounding relative to the native Fortran baseline.
First create or verify a CESM case that uses this CAM source tree. The CAM repo
is not selected directly by case.submit; it is selected through the case's
SRCROOT.
Create a new case:
SRCROOT=/path/to/iCESM1.3.1_fzhu
CAM_REPO="$SRCROOT/components/cam"
CASEROOT=/path/to/case
COMPSET=YOUR_COMPSET
GRID=YOUR_GRID
MACHINE=YOUR_MACHINE
COMPILER=YOUR_COMPILER
PROJECT=YOUR_PROJECT
"$SRCROOT/cime/scripts/create_newcase" \
--case "$CASEROOT" \
--compset "$COMPSET" \
--res "$GRID" \
--machine "$MACHINE" \
--compiler "$COMPILER" \
--project "$PROJECT" \
--srcroot "$SRCROOT" \
--run-unsupported
cd "$CASEROOT"
./case.setupFor an existing case, confirm it points to the same source tree before using it:
cd "$CASEROOT"
./xmlquery SRCROOT --valueThe printed path must be the same as $SRCROOT, and
$SRCROOT/components/cam must be this repository. After case.setup, the CAM
file path should include this CAM source tree:
rg "$CAM_REPO" Buildconf/camconf/FilepathIf an existing case points at a different SRCROOT, recreate or clone the case
with the correct --srcroot instead of reusing stale build and run directories.
Use $SRCROOT/cime/scripts/query_config --compsets, --grids, and
--machines to find valid values for new cases.
Build the model:
./case.buildRun one Codon-enabled path:
env MICROP_DRIVER_IMPL=codon ./case.submit --skip-preview-namelistRun all selectors through Codon by default:
env CAM_CODON_IMPL=codon ./case.submit --skip-preview-namelistRun all selectors through native Fortran by default:
env CAM_CODON_IMPL=native ./case.submit --skip-preview-namelistCheck that the Codon path executed:
zgrep -n 'implementation = codon\|direct = codon' /path/to/run/atm.log.*.gzFor BFB validation, compare the run against a pristine native baseline produced with the same case configuration:
CAM_REPO=/path/to/iCESM1.3.1_fzhu/components/cam
python "$CAM_REPO/scripts/validation/compare_cesm_runpair.py" \
--native-run-dir /path/to/pristine/native/run \
--codon-run-dir /path/to/codon/runThe expected pass condition is:
overall_numeric_equal=True
Selectors are ordinary environment variables read by the Fortran wrappers. The decision order is:
- Use the specific
*_IMPLselector if it is set. - Otherwise use
CAM_CODON_IMPLif it is set. - Otherwise default to
codon, which preserves the current Codon-first behavior of this branch.
Accepted selector values:
| Requested path | Accepted values |
|---|---|
| Codon | codon, on, true, 1 |
| Native Fortran | native, fortran, off, false, 0 |
Run all selectors through Codon by default:
env CAM_CODON_IMPL=codon ./case.submit --skip-preview-namelistRun all selectors through native Fortran by default:
env CAM_CODON_IMPL=native ./case.submit --skip-preview-namelistForce one entry point into Codon while the rest default to native:
env CAM_CODON_IMPL=native MICROP_DRIVER_IMPL=codon ./case.submit --skip-preview-namelistForce one entry point back to native while the rest default to Codon:
env CAM_CODON_IMPL=codon MICROP_DRIVER_IMPL=native ./case.submit --skip-preview-namelistFor a mixed run, use CAM_CODON_IMPL=codon and leave known non-BFB or
intentionally native selectors as explicit *_IMPL=native overrides.
Generated selector files that set every individual *_IMPL value still work,
but they are no longer required for all-Codon or all-native runs. The repo keeps
the current 675-selector all-Codon snapshot here:
CAM_REPO=/path/to/iCESM1.3.1_fzhu/components/cam
set -a
source "$CAM_REPO/scripts/validation/env_allcodon_675.sh"
set +a
./case.submit --skip-preview-namelistCAM_CODON_IMPL=native disables Codon dispatch for selector-controlled paths in
the current source tree. It is not the same thing as comparing against a
pristine native source tree; BFB validation still requires a matching pristine
baseline run.
For isolated validation or parallel work, keep each lane separate:
- one CAM source tree or worktree
- one case root
- one
EXEROOT - one fresh
RUNDIR - lane-local Codon library paths in
Macros.make
Every validation must compare against a pristine native baseline for the same case configuration. Case settings that can affect outputs, such as compset, grid, restart mode, orbit settings, domain files, compiler, and runtime length, require a matching baseline.
Typical short validation settings:
./xmlchange \
STOP_OPTION=nsteps,STOP_N=50,REST_OPTION=nsteps,REST_N=50, \
BFBFLAG=TRUE,DOUT_S=FALSE,TIMER_DETAIL=6,TIMER_LEVEL=16, \
CONTINUE_RUN=FALSEAfter the job completes, prove execution and compare:
rg -n 'CAM_CODON_IMPL|<SELECTOR_NAME>' /path/to/case/logs/run_environment.txt.*
zgrep -n 'implementation = codon\|direct = codon' /path/to/run/atm.log.*.gz
CAM_REPO=/path/to/iCESM1.3.1_fzhu/components/cam
python "$CAM_REPO/scripts/validation/compare_cesm_runpair.py" \
--native-run-dir /path/to/pristine/native/run \
--codon-run-dir /path/to/codon/runOnly overall_numeric_equal=True is accepted as BFB. Character metadata
differences such as time_written, cpath, nfpath, and nhfil are expected
when all numeric variables match.
Do not reuse an old run directory for BFB proof unless old model outputs and logs have been removed and the cleanup is recorded.
Routine status is tracked outside this repository by the CAM Codon status
dashboard. The internal project deployment is listed in
doc/internal_validation.md.
The dashboard tracks which routines are complete, in progress, partial, or not
started, and provides routine pages, formula/equation pages, coverage-case
views, CSV/HTML exports, and REST APIs such as /api/summary and
/api/routines.
Status meanings:
done: the default active path enters Codon and returns for the same routine.done-native-island: the active path is Codon except for a minimal native expression or statement block retained for proven BFB reasons.processing: someone is actively editing or validating the routine.partial: Codon covers helper islands or some branches, but the default routine body still has native orchestration.none: covered by the current case snapshot but no Codon evidence yet.unknown: parser or evidence conflict that needs manual review.
Use exact (relpath, routine, kind) keys when updating status. Example:
cd /path/to/cam-codon-status
export CAM_STATUS_TOKEN='<token>'
uv run cam-codon-status remote-mark \
--remote https://<dashboard-host> \
--relpath src/physics/cam/example.F90 \
--routine example_subroutine \
--kind subroutine \
--status processing \
--note 'started Codon validation'After validation:
uv run cam-codon-status remote-mark \
--remote https://<dashboard-host> \
--relpath src/physics/cam/example.F90 \
--routine example_subroutine \
--kind subroutine \
--status done \
--note 'commit <sha>; selector EXAMPLE_IMPL=codon; proof atm.log line; overall_numeric_equal=True'Do not mark a routine done just because a helper library exports a related
symbol. The proof must show that the same routine, wrapper, or accepted
same-routine dispatch path actually executed.
The following long validations were run on the Derecho HPC system with GNU
builds after the UWSHCU positive-moisture expression-order fix. Both used all
tracked selectors in the Codon path (CAM_CODON_IMPL=codon, equivalent to
675 codon / 0 native in the 2026-06-16 selector snapshot) and compared against
matching pristine native baselines.
| Case | Length | Jobs | Result | Main timing |
|---|---|---|---|---|
PI pre-industrial case, ne16_g16, startup |
6 months | baseline 6467097.desched1, all-Codon 6467103.desched1 |
overall_numeric_equal=True |
CPL:RUN_LOOP 6443.491 -> 7608.488, +18.080% |
MCO Miocene case, ne16_g16, hybrid restart from MCO/restart/2001-01-01-00000 |
6 months | baseline 6467105.desched1, all-Codon 6467112.desched1 |
overall_numeric_equal=True |
CPL:RUN_LOOP 3826.046 -> 4744.456, +24.004% |
The internal compare outputs and run directories are listed in
doc/internal_validation.md.
These runs are validation examples, not a claim that every future compset, compiler, restart state, or production campaign is automatically BFB.
- Keep source changes and generated artifacts separate. Do not commit
__pycache__, run directories, build directories, case logs, compare outputs,.sofiles, or guard receipts. - Use fresh run directories for validation proof.
- Record job id, global selector setting, single-selector overrides, selector
counts when generated by validation tooling, run directory,
atm.log,run_environment,END OF MODEL RUN, compare output, and timing. - Preserve Fortran floating-point expression order when translating to Codon.
- If a routine is BFB only with a small native expression island, document the
expression and validation evidence before marking it
done-native-island.
- Native Fortran remains the reference implementation.
- BFB has been proven only for specific case configurations and matching native baselines.
- New compsets, grids, compilers, restart states, runtime lengths, and namelist changes require separate validation.
- Some numerically fragile expressions intentionally remain in native Fortran.
- The Codon layer is not a general drop-in replacement for all CAM configurations.
- Performance is not yet the primary optimization target; the documented long-run validations are slower than native Fortran.
codonnot found duringcase.build: install Codon or add it toPATH. The build also checks~/.codon/bin/codon.- Codon selector is set but no proof line appears in
atm.log: the selected path may not execute in this case configuration, or the routine may still dispatch through native orchestration. CAM_CODON_IMPL=nativestill compares non-BFB against a pristine baseline: confirm that the current source tree itself matches the pristine source tree. The global switch disables selector-controlled Codon dispatch; it does not undo source-level changes.- Compare reports non-BFB: first verify the baseline uses the same case configuration, compiler, restart state, orbit settings, and runtime length.
- Scattered one-ULP differences: check for FMA contraction, changed operation
order,
powlowering, complexsqrt, or reduction-order differences. - A run looks complete but compare is suspicious: confirm output timestamps,
job ids,
run_environment, andEND OF MODEL RUNall match the submitted validation run. - Dashboard counts look inconsistent: use
/api/summaryfor totals before interpreting paginated/api/routinesresults.