Various madmatrix improvements and bugfixes#3
Conversation
Define a const nw6 in ALOHAOBJ class and change the loop calls.
Before, the map was build when scanning over the different wavefunctions, however, we have access to the full model in the constructor, so we can build that maps there by looping over the interactions. This passage is done only once.
Fix of inconsistency in the index of the 'S' particles. Addition of fix for sxxxxx. Fix of the broken_symmetry function to iterate only on the outcoming particles
Same CM energy as fortran and standalone_cpp. Hardcoded momenta block to uncomment. Sensible flavor index.
Solution for pointer mismatch and compile error
User can specify either --flavor <int> or -f <int> while calling ./check_sa.exe. TODO: check on GPU, add guard for index out of range
Subcommand matrix (default) computes the matrix element for each flavor combination and for one phase space point (it generates automatically 8 events with RAMBO and keeps only the first one - nthreads = 8, nblocks = 1, niterations = 1). Subcommand perf (can be activated also with -p flag) computes the matrix element for nthreads * nblocks * niterations events and outputs performance counters (timings) for each phase of the computation.
…r all events in a vector
…h7 into feat-madmatrix-theo
…tructure The merged standalone_cpp evaluates a flavor by index via process.sigmaKin(iflav), which reads CPPProcess's internal flavor_table and the per-flavor bookkeeping arrays sized by nflavors. The old check_sa-local flavor_arr no longer exists, so the test's patch silently no-op'd it and the two injected non-representative flavors evaluated to wrong/missing values (C++ backend failed; Fortran backend was unaffected). Rewrite the C++ injection to the new architecture: extend the internal flavor_table (CPPProcess.cc) + nflavors (CPPProcess.h) and maxflavor/pdg_arr (check_sa.cpp), via a multi-line-safe array extender. Verified the test passes on both backends: s c~ > s c~ reproduces d u~ > d u~ (8.5706e-3), s c~ > c c~ vanishes, masks partial (known) vs all-on (lookup miss). Marks MERGE_TEST_INVESTIGATION.md item #3 RESOLVED. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Additional key, error and case to obtain masses in get_meta()
Same as in standalone_cpp, will need vectorization in future. Copy of it to dir.
Same as the massless RAMBO, for now internal RNG and not splitted into the inital and final state particles sampling
New flag -r [c|ml] (classic, massless) Flag only for perf - matrix always massive Massive host only so for now copy to device self RNG inside Massless kept for back compability for now RNG outside (we pass buffer with rnd numbers)
| }}; | ||
| std::size_t total_size = 0; | ||
| for (auto [ptr, size] : ptrs_and_sizes) { | ||
| std::size_t aligned_size = (size + 7) / 8 * 8; |
There was a problem hiding this comment.
Maybe naive question: this is used to round to the closest multiple of 8 because 8 is sizeof(double)? If that's the case, I suggest 2 things: to use fptype and not double, and I propose to make the numbers explicit by doing like const std::size_t ROUND = sizeof(fptype); std::size_t aligned_size = (size + ROUND - 1) / ROUND * ROUND;, which helps a bit for readibility while constants are optimised out.
| { | ||
| for( std::size_t i_diag = 0; i_diag < CPPProcess::ndiagrams; ++i_diag ) | ||
| std::size_t i_sorted = permutation[i_event]; | ||
| std::size_t page_size = MemoryAccessMomentaBase::neppM; |
There was a problem hiding this comment.
Maybe let's call it simd_vector_size instead of page_size to avoid confusion, since we have already pages to handle the memory. I guess it can be made constant as well.
| std::size_t i_sorted = permutation[i_event]; | ||
| std::size_t page_size = MemoryAccessMomentaBase::neppM; | ||
| std::size_t i_page = i_sorted / page_size; | ||
| std::size_t i_vector = i_sorted % page_size; |
There was a problem hiding this comment.
| std::size_t i_vector = i_sorted % page_size; | |
| std::size_t i_vector = i_sorted % page_size; // vector lane |
Qubitol
left a comment
There was a problem hiding this comment.
I just added few comments on nomenclature, just to try to improve readibility from a not-expert of the code. But the logic looks good to me!
naming changed to rambo.h and massless_rambo.h flag instead of switch (--rambo-massless)
Introduction of massive RAMBO in standalone_mg7.
GPU:
cudaMallocAsynccalls in UMAMI interfacesigmaKinwithout device or stream synchronizationSIMD: