Implement core engine entry point and refactor Python inference#43
Merged
Eamon2009 merged 18 commits intoMay 17, 2026
Merged
Conversation
# Description This PR introduces the primary entry point for the QUADTRIX engine in src/main.cpp. It establishes a unified workflow that handles model lifecycle management without relying on TorchScript, utilizing our custom internal headers for model architecture. # Key Features - Dual-Mode Execution: Integrated support for both a training loop and an interactive chat mode. - Infinite Generation: Implemented an unconstrained inference loop for continuous text generation. - C++ Architecture: Bypasses TorchScript to use custom-defined layers and headers, ensuring direct control over the execution graph. - Resource Management: only for CPU
# Description This PR synchronizes the model interaction logic across both the Python backend utilities and the web frontend. It establishes a consistent way to interface with the model weights and the C++ engine. ## Python Backend (inference.py) - Goal: Refactor the standalone inference script to support modern weight loading. - Weight Mapping: Updated to load and map .pt files directly using the refactored architecture. - Chat Mode: Implemented a robust interactive loop for rapid model testing and verification. ## Frontend Layer (frontend/src/api) - Goal: Establish the bridge between the UI and the Quadtrix engine. - Service Definition: Created the base API client to handle requests to the C++ backend. - Dual-Path Logic: Added handlers for both Training control and Inference/Chat endpoints. - Stream Support: Prepared the API layer to handle "generation" data chunks for real-time UI updates. ## other PR merge #7 #6 #5 #4 #3
## Summary <img width="2185" height="829" alt="run_20260430_192930" src="https://github.com/user-attachments/assets/420ebbb4-cadf-4408-bc69-fc32ad081c6f" /> ## Model Configuration | Parameter | Value | |---|---| | Layers | 6 | | Heads | 6 | | Embedding dim | 100 | | Block size | 190 | | Batch size | 64 | | Dropout | 0.2 | | Learning rate | 3e-4 | | Total parameters | **10,837,257** | ## Training Details | Field | Value | |---|---| | Steps | 8,000 | | Eval every | 200 steps | | Optimizer seed | 1337 | | Train tokens | 14,080,249 | | Val tokens | 1,564,473 | | Precision | bf16 | | MFU | 60.0% | ## Results | Metric | Value | |---|---| | Best val loss | **2.3918** | | Final train loss | 2.2825 | | Total loss drop | 8.57 | | Peak throughput | 19,602 tok/s | | Mean throughput | 18,756 tok/s | | Peak grad norm | 2.2504 | | Mean grad norm | 1.6894 | | Training time | **82m 43s** | | Checkpoint | `best_model.pt` |
…#30) ## Summary Publish GitHub Package using npm ## Checks - [ ] C++ build still works - [ ] Backend changes were smoke-tested locally - [ ] Frontend build still passes
## Summary benchmarks c++ for performance test ## Checks - [ ] C++ build still works - [ ] Backend changes were smoke-tested locally - [ ] Frontend build still passes - [ ] Docs were updated
## Summary docs improvement with chat images ## Checks C++ build still works Backend changes were smoke-tested locally Frontend build still passes Docs or screenshots were updated if needed
Introduces configuration for real C++ and Python Quadtrix benchmark runs, including warmup, token generation, and training step dimensions.
## Summary Introduces configuration for real C++ and Python Quadtrix benchmark runs, including warmup, token generation, and training step dimensions.
## Summary Introduces a CLI tool to load, index, and align benchmark JSON results from both backends. It displays a side-by-side comparison table showing latency (ms), throughput (tokens/s), and the percentage speedup/slowdown.
#40) Introduces a CLI tool to load, index, and align benchmark JSON results from both backends. It displays a side-by-side comparison table showing latency (ms), throughput (tokens/s), and the percentage speedup/slowdown.
suite Introduces a standard entry point script that invokes the core python_benchmark module execution flow.
## Summary execution wrapper for Python runner Adds a boilerplate compatibility script to handle safe system exits and execution routing for python benchmark.
Introduces the primary Python benchmark runner, measuring model metadata, data throughput, forward latency, training-step latency, and autoregressive generation. Includes utility functions for dynamic module loading, timing, and percentile calculation.
## Summary
Introduces the primary Python benchmark runner, measuring model
metadata, data throughput, forward latency, training-step latency, and
autoregressive generation. Includes utility functions for dynamic module
loading, timing, and percentile calculation.
## Model BenchmarkingLatency Profiling:
Tracks forward pass, training step, and autoregressive generation
latencies.Throughput Tracking: Measures tokenizer processing speeds and
data throughput.Resource Monitoring: Captures model metadata and system
memory footprints during runs.
## Math UtilitiesDynamic Loading:
Implements safe runtime module loading via importlib to dynamically
interact with engine/inference.py.Statistical Metrics: Adds custom
mathematical utility functions, including a precise percentile
calculator ($P_{50}$, $P_{90}$, $P_{99}$) for latency distribution
reporting.Standardized Exports: Lays the groundwork for structured JSON
and CSV output formatting.
Introduces the primary C++ benchmark runner (cpp_benchmark.cpp). It defines the parsing configurations, tracking metrics structures (Stats and BenchRow), and basic time/utility abstractions needed to mirror the Python benchmark suite capabilities.
720ffc1
into
dependabot/npm_and_yarn/frontend/multi-bb2efd036b
8 of 9 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#42
#41
#40