llama.cpp fork that cuts KV-cache VRAM by 82% near-lossless (+0.15% PPL). Run longer contexts on the same GPU.
-
Updated
Apr 29, 2026 - C++
llama.cpp fork that cuts KV-cache VRAM by 82% near-lossless (+0.15% PPL). Run longer contexts on the same GPU.
💎 LTS Industrial Standard: PDF/Word optimization with scientific Trellis Mimic engine. Features Turbo Parallel processing & Global Camouflage (Ricoh/Fujitsu/Canon 2025 profiles). Embedded Python 3.12, zero-install, no admin needed. Ultimate document privacy for Windows LTSC/Enterprise.
Add a description, image, and links to the trellis-quantization topic page so that developers can more easily learn about it.
To associate your repository with the trellis-quantization topic, visit your repo's landing page and select "manage topics."