a unix-like du command line tool to count token usage per files and directories
-
Updated
Jul 1, 2026 - TypeScript
a unix-like du command line tool to count token usage per files and directories
Byte Pair Encoding (BPE) tokenizer implemented from scratch in Python. Features an interactive Streamlit playground to visualize token merging, trace vocab rules, analyze compression ratios, and verify lossless decoding in real time.
LLM Tokenizers provides a user-friendly tool for exploring how large language models tokenize text. Enter your text, choose a model, and get detailed tokenized outputs including tokens, token IDs, token count, and character count. API access with Swagger documentation is available for developers.
100% client-side privacy-first converter that transforms docs, spreadsheets, code & data into token-optimized .toon files for LLMs. Powered by MarkItDownJS with RAG-ready chunking, semantic optimization & gzip packing — cutting tokens 15-40%. Zero server. Zero telemetry.
Add a description, image, and links to the gpt-tokenizer topic page so that developers can more easily learn about it.
To associate your repository with the gpt-tokenizer topic, visit your repo's landing page and select "manage topics."