TransInferSim is a cycle-accurate simulator for analyzing the hardware performance of Transformer NN inference on custom systolic-array accelerators. Combined with Accelergy, it reports latency, energy, area, and other efficiency metrics, enabling cache-policy analysis, memory-hierarchy optimization, hardware design-space exploration, and exportable execution plans for RTL validation and deployment.
- Analyzes Transformer NN inference on hardware
- Integrates with Accelergy for energy estimation
- Includes various plugins for Accelergy's flexibility
If you find our work useful, please refer our paper.
J. Klhufek, A. Marchisio, V. Mrazek, L. Sekanina and M. Shafique, "TransInferSim: Toward Fast and Accurate Evaluation of Embedded Hardware Accelerators for Transformer Networks," in IEEE Access, vol. 13, pp. 177215-177226, 2025, doi: 10.1109/ACCESS.2025.3621062.
@ARTICLE{transinfersim,
author={Klhufek, Jan and Marchisio, Alberto and Mrazek, Vojtech and Sekanina, Lukas and Shafique, Muhammad},
journal={IEEE Access},
title={TransInferSim: Toward Fast and Accurate Evaluation of Embedded Hardware Accelerators for Transformer Networks},
year={2025},
volume={13},
number={},
pages={177215-177226},
keywords={Transformers;Accuracy;Hardware acceleration;Computational modeling;Schedules;Analytical models;Data models;Computer architecture;Memory management;Register transfer level;Transformers;hardware accelerators;modeling tools;memory subsystem;evaluation and optimizations},
doi={10.1109/ACCESS.2025.3621062}}To get started with TransInferSim, follow these steps:
- Python 3.9 or higher This project requires Graphviz, uv, and basic build tools (make, g++) to be installed. On Ubuntu/Debian:
sudo apt install graphviz build-essentialClone the repository and its submodules and build using uv:
git clone --recurse-submodules https://github.com/ehw-fit/TransInferSim
cd TransInferSim
make install
source .venv/bin/activateYou can find an example run in the example.py script, which demonstrates how to instantiate a transformer model or layer of your choice along with a showcase of an example hardware specification. The script then runs an inference simulation, and the runtime performance statistics are saved to a stats_out.txt file.
To analyze memory utilization across the simulation, run the memory trace example from the project root:
python mem_trace_example.pyThis project is licensed under the MIT License - see the LICENSE file for details.
