Skip to content

Various madmatrix improvements and bugfixes#3

Merged
theoheimel merged 41 commits into
mainfrom
feat-madmatrix-theo
Jun 17, 2026
Merged

Various madmatrix improvements and bugfixes#3
theoheimel merged 41 commits into
mainfrom
feat-madmatrix-theo

Conversation

@theoheimel

Copy link
Copy Markdown
Contributor

GPU:

  • reduce number of cudaMallocAsync calls in UMAMI interface
  • allow for fully asynchronous calls of sigmaKin without device or stream synchronization
  • allow for parallelization over helicities in a single kernel launch instead of using CUDA streams

SIMD:

  • use one flavor index per SIMD vector instead of per batch
  • reorder events in UMAMI such that the flavor index is the same for each vector

stloufra and others added 29 commits May 11, 2026 12:51
Define a const nw6 in ALOHAOBJ class and change the loop calls.
Before, the map was build when scanning over the different
wavefunctions, however, we have access to the full model in the
constructor, so we can build that maps there by looping over the
interactions. This passage is done only once.
Fix of inconsistency in the index of the 'S' particles.
Addition of fix for sxxxxx.
Fix of the broken_symmetry function to iterate only on the outcoming particles
Same CM energy as fortran and standalone_cpp.
Hardcoded momenta block to uncomment.
Sensible flavor index.
Solution for pointer mismatch and compile error
User can specify either --flavor <int> or -f <int> while calling ./check_sa.exe.
TODO: check on GPU, add guard for index out of range
Subcommand matrix (default) computes the matrix element for each flavor
combination and for one phase space point (it generates automatically
8 events with RAMBO and keeps only the first one - nthreads = 8, nblocks
= 1, niterations = 1).
Subcommand perf (can be activated also with -p flag) computes the matrix
element for nthreads * nblocks * niterations events and outputs
performance counters (timings) for each phase of the computation.
@theoheimel theoheimel requested a review from Qubitol May 28, 2026 12:32
oliviermattelaer added a commit that referenced this pull request Jun 11, 2026
…tructure

The merged standalone_cpp evaluates a flavor by index via
process.sigmaKin(iflav), which reads CPPProcess's internal flavor_table and the
per-flavor bookkeeping arrays sized by nflavors. The old check_sa-local
flavor_arr no longer exists, so the test's patch silently no-op'd it and the two
injected non-representative flavors evaluated to wrong/missing values (C++
backend failed; Fortran backend was unaffected).

Rewrite the C++ injection to the new architecture: extend the internal
flavor_table (CPPProcess.cc) + nflavors (CPPProcess.h) and maxflavor/pdg_arr
(check_sa.cpp), via a multi-line-safe array extender. Verified the test passes
on both backends: s c~ > s c~ reproduces d u~ > d u~ (8.5706e-3), s c~ > c c~
vanishes, masks partial (known) vs all-on (lookup miss). Marks
MERGE_TEST_INVESTIGATION.md item #3 RESOLVED.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
stloufra added 4 commits June 11, 2026 17:57
Additional key, error and case to obtain masses in get_meta()
Same as in standalone_cpp, will need vectorization in future.
Copy of it to dir.
Same as the massless RAMBO, for now internal RNG and not splitted
into the inital and final state particles sampling
New flag -r [c|ml] (classic, massless)
Flag only for perf - matrix always massive
Massive
	host only so for now copy to device
	self RNG inside

Massless
	kept for back compability for now
	RNG outside (we pass buffer with rnd numbers)
}};
std::size_t total_size = 0;
for (auto [ptr, size] : ptrs_and_sizes) {
std::size_t aligned_size = (size + 7) / 8 * 8;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe naive question: this is used to round to the closest multiple of 8 because 8 is sizeof(double)? If that's the case, I suggest 2 things: to use fptype and not double, and I propose to make the numbers explicit by doing like const std::size_t ROUND = sizeof(fptype); std::size_t aligned_size = (size + ROUND - 1) / ROUND * ROUND;, which helps a bit for readibility while constants are optimised out.

{
for( std::size_t i_diag = 0; i_diag < CPPProcess::ndiagrams; ++i_diag )
std::size_t i_sorted = permutation[i_event];
std::size_t page_size = MemoryAccessMomentaBase::neppM;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's call it simd_vector_size instead of page_size to avoid confusion, since we have already pages to handle the memory. I guess it can be made constant as well.

std::size_t i_sorted = permutation[i_event];
std::size_t page_size = MemoryAccessMomentaBase::neppM;
std::size_t i_page = i_sorted / page_size;
std::size_t i_vector = i_sorted % page_size;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::size_t i_vector = i_sorted % page_size;
std::size_t i_vector = i_sorted % page_size; // vector lane

@Qubitol Qubitol left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added few comments on nomenclature, just to try to improve readibility from a not-expert of the code. But the logic looks good to me!

stloufra and others added 4 commits June 15, 2026 14:22
naming changed to rambo.h and massless_rambo.h
flag instead of switch (--rambo-massless)
Introduction of massive RAMBO in standalone_mg7.
@theoheimel theoheimel marked this pull request as draft June 17, 2026 10:04
@theoheimel theoheimel marked this pull request as ready for review June 17, 2026 10:21
@theoheimel theoheimel merged commit bd54791 into main Jun 17, 2026
56 of 148 checks passed
@theoheimel theoheimel deleted the feat-madmatrix-theo branch June 17, 2026 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants