ld-decode Possible performance optimizations

master issue for various performance bottlenecks that could be improved on

### Memory bandwidth/use between threads
As identified by several people, there is a fair bit of time spend on shuffling data to and back from the demod threads, and to concatenate the data afterwards, just removing the completely unused data in the shared recarray in #796 gave a notable improvement in performance, but there is more that could be improved
- ```demod_raw``` is only used in one spot in the dropout detect function to check where the data exceeds a threshold, this could as well be done in the demod threads themselves, storing the boolean array data on where the thr is exceeded instead which should be much smaller.
- ```demod_burst``` would likely be sufficient to store as 32-bit instead of 64-bit float since the data will be around where the floating point precision is high anyhow.
- ideally we should use shared memory for the result data if possible to avoid copying between threads, as noted by limer and putnam on irc/discord (they indicated they may submit a PR for this when they are back home)

### FFT
The real-part only rfft functions should be used rather than fft where we don't need the imaginary part (which is only needed for the hilbert/demod function afaik), as they are gonna be faster and we don't need to store much data for the fft filters either.

We're using pyfft rather than numpy's fft for speed improvements as of now. It has a bunch of settings/caching one could maybe play around with to improve things. It's currently not used on windows as it seems to conflict with using Thread instead of Process (which doesn't work on win with the current code).


### numba/native code optimization
Some of the tbc/sync stuff could benefit a ton from using numba (or alternatively cython or similar) as a lot of logic is being done in loops which is slow in python - ```dropout_detect_demod```, ```refine_linelocs_pilot``` and ```refine_linelocs_hsync``` in particular, but probably more. (The last one I've implemented partially in cython in vhs-decode)

Any runs involving EFM will have a fair bit of extra startup time as it uses numba classes which the compilation can't be cached for, so it has to be re-compiled on every run. If we start using cython or similar in ld-decode it might be worth using that for this purpose instead.

### JSON
I don't know if this has a large performance hit in practice but as of now we are rewriting the whole json rather than appending to the file, which can get pretty large on large runs. Might be worth looking if it's feasible to just append the file and modify the needed stuff at the start instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ld-decode Possible performance optimizations #802

Memory bandwidth/use between threads

FFT

numba/native code optimization

JSON

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ld-decode Possible performance optimizations #802

Description

Memory bandwidth/use between threads

FFT

numba/native code optimization

JSON

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions