Skip to content

Fix ADF15 parser and refine regex patterns#497

Merged
jacklovell merged 6 commits into
cherab:developmentfrom
munechika-koyo:fix/parse-adf15
May 28, 2026
Merged

Fix ADF15 parser and refine regex patterns#497
jacklovell merged 6 commits into
cherab:developmentfrom
munechika-koyo:fix/parse-adf15

Conversation

@munechika-koyo
Copy link
Copy Markdown
Member

@munechika-koyo munechika-koyo commented Mar 22, 2026

Summary

Because ADF15 raw format for tungsten, like pec40#w_ic#w12.dat, is different from what cherab has handled like:

C  energy levels
C  -------------
C
C   lv      configuration    (2S+1)L(w-1/2)      energy (cm^-1)
C  ---   ------------------- --------------      --------------
c    1   60964A52B             (1) 0(  0.0)             0.0
c    2   60963A52B51C          (3) 2(  3.0)         16720.4
				:
				:
				:

C  photon emissivity atomic transitions
C  ------------------------------------
C
C   isel  wvlen(A)            transition                type  ispb nspb
C                                                             ispp nspp sz    tg pr wr
C  ----- ---------- ----------------------------------- ----- ---- ---- --    -- -- --
C      1   56.5300  1074(3)1(  2.0)-   2(3)2(  3.0)     excit   1    1  12   569 12  1
C      2   56.5520  1075(1)3(  3.0)-   3(1)4(  4.0)     excit   1    1  12  1068  7  2
C      3   70.1877   414(1)5(  5.0)-   3(1)4(  4.0)     excit   1    1  12   778 31  3
C      4   70.5640   365(3)1(  2.0)-   2(3)2(  3.0)     excit   1    1  12   275 36  4
C      5   70.6699   371(3)3(  4.0)-   3(1)4(  4.0)     excit   1    1  12   756 43  5
C      6   70.7048   367(3)3(  3.0)-   3(1)4(  4.0)     excit   1    1  12   753 44  6

I tried to modify the ADF15 parser to match these lines.

Key changes include:

  • Fixed the ADF15 parser to handle additional raw file (tungsten) formats.

  • Moved regex pattern definitions outside of loops for efficiency.

@munechika-koyo munechika-koyo marked this pull request as ready for review March 22, 2026 00:53
@jacklovell
Copy link
Copy Markdown
Member

Looks OK to me. I don't think lifting the declaration of a string literal out of the loop will have any impact on performance, as the assignment operation is trivial. If you really wanted to optimize, you could instead create compiled regex objects using r = re.compile(expression) and then use r.match instead of re.match(expression), but Python caches the compilation of regexes anyway so unless there's a large number of regular expressions all being used this wouldn't make a difference either. It is more correct to do so if you're refactoring anyway though, so I'd recommend switching to re.compile.

The other thing it would be good to have in place are some unit/regression tests for this. We can't use data from real ADF15 files in unit tests for licensing reasons, but I don't see a problem with creating some mock files using the same format but using dummy numerical data to use in unit tests: try dummy variants of Hydrogen, Carbon and Tungsten files.

@munechika-koyo
Copy link
Copy Markdown
Member Author

@jacklovell, I am so sorry for not responding for so long.
I defined some patterns using re.compile in the module's global scope and wrote a unit test to parse the ADF15 format.

This is the benchmark result comparing the newly implemented parser and the old one.

File: hydrogen
  Main branch (plain re.match):     4.5352s
  Current branch (compiled regex):  2.8475s
  Speedup:                          1.59x
  Improvement:                      +59.3%

File: carbon
  Main branch (plain re.match):     1.9508s
  Current branch (compiled regex):  1.2299s
  Speedup:                          1.59x
  Improvement:                      +58.6%

Here is the benchmark script I used.
benchmark_adf15.py
old_adf15.py

@jacklovell jacklovell merged commit 01dc361 into cherab:development May 28, 2026
3 of 6 checks passed
@munechika-koyo munechika-koyo deleted the fix/parse-adf15 branch May 28, 2026 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants