Skip to content

Implement core engine entry point and refactor Python inference#43

Merged
Eamon2009 merged 18 commits into
dependabot/npm_and_yarn/frontend/multi-bb2efd036bfrom
master
May 17, 2026
Merged

Implement core engine entry point and refactor Python inference#43
Eamon2009 merged 18 commits into
dependabot/npm_and_yarn/frontend/multi-bb2efd036bfrom
master

Conversation

@Eamon2009
Copy link
Copy Markdown
Owner

codeaddict-119 and others added 18 commits May 1, 2026 10:08
# Description
This PR introduces the primary entry point for the QUADTRIX engine in
src/main.cpp. It establishes a unified workflow that handles model
lifecycle management without relying on TorchScript, utilizing our
custom internal headers for model architecture.

# Key Features
- Dual-Mode Execution: Integrated support for both a training loop and
an interactive chat mode.

- Infinite Generation: Implemented an unconstrained inference loop for
continuous text generation.

- C++ Architecture: Bypasses TorchScript to use custom-defined layers
and headers, ensuring direct control over the execution graph.

- Resource Management: only for CPU
# Description
This PR synchronizes the model interaction logic across both the Python
backend utilities and the web frontend. It establishes a consistent way
to interface with the model weights and the C++ engine.

##  Python Backend (inference.py)
- Goal: Refactor the standalone inference script to support modern
weight loading.

- Weight Mapping: Updated to load and map .pt files directly using the
refactored architecture.

- Chat Mode: Implemented a robust interactive loop for rapid model
testing and verification.

##  Frontend Layer (frontend/src/api)
- Goal: Establish the bridge between the UI and the Quadtrix engine.

- Service Definition: Created the base API client to handle requests to
the C++ backend.

- Dual-Path Logic: Added handlers for both Training control and
Inference/Chat endpoints.

- Stream Support: Prepared the API layer to handle "generation" data
chunks for real-time UI updates.

## other PR merge

#7  #6  #5  #4 #3
## Summary
<img width="2185" height="829" alt="run_20260430_192930"
src="https://github.com/user-attachments/assets/420ebbb4-cadf-4408-bc69-fc32ad081c6f"
/>

 
## Model Configuration
 
| Parameter | Value |
|---|---|
| Layers | 6 |
| Heads | 6 |
| Embedding dim | 100 |
| Block size | 190 |
| Batch size | 64 |
| Dropout | 0.2 |
| Learning rate | 3e-4 |
| Total parameters | **10,837,257** |
 
## Training Details
 
| Field | Value |
|---|---|
| Steps | 8,000 |
| Eval every | 200 steps |
| Optimizer seed | 1337 |
| Train tokens | 14,080,249 |
| Val tokens | 1,564,473 |
| Precision | bf16 |
| MFU | 60.0% |
 
## Results
 
| Metric | Value |
|---|---|
| Best val loss | **2.3918** |
| Final train loss | 2.2825 |
| Total loss drop | 8.57 |
| Peak throughput | 19,602 tok/s |
| Mean throughput | 18,756 tok/s |
| Peak grad norm | 2.2504 |
| Mean grad norm | 1.6894 |
| Training time | **82m 43s** |
| Checkpoint | `best_model.pt` |
…#30)

## Summary
 Publish GitHub Package using npm
## Checks

- [ ] C++ build still works
- [ ] Backend changes were smoke-tested locally
- [ ] Frontend build still passes
## Summary

benchmarks c++ for performance test

## Checks

- [ ] C++ build still works
- [ ] Backend changes were smoke-tested locally
- [ ] Frontend build still passes
- [ ] Docs were updated
…38)

## Summary

benchmarks c++ for performance test

## Checks

- [ ] C++ build still works
- [ ] Backend changes were smoke-tested locally
- [ ] Frontend build still passes
- [ ] Docs were updated
## Summary
docs improvement with chat images

## Checks
 C++ build still works
 Backend changes were smoke-tested locally
 Frontend build still passes
 Docs or screenshots were updated if needed
Introduces configuration for real C++ and Python Quadtrix benchmark runs, including warmup, token generation, and training step dimensions.
## Summary
Introduces configuration for real C++ and Python Quadtrix benchmark
runs, including warmup, token generation, and training step dimensions.
## Summary
Introduces a CLI tool to load, index, and align benchmark JSON results
from both backends. It displays a side-by-side comparison table showing
latency (ms), throughput (tokens/s), and the percentage
speedup/slowdown.
#40)

Introduces a CLI tool to load, index, and align benchmark JSON results from both backends. It displays a side-by-side comparison table showing latency (ms), throughput (tokens/s), and the percentage speedup/slowdown.
 suite  Introduces a standard entry point script that invokes the core python_benchmark module execution flow.
## Summary
 execution wrapper for Python runner

Adds a boilerplate compatibility script to handle safe system exits and
execution routing for python benchmark.
Introduces the primary Python benchmark runner, measuring model metadata, data throughput, forward latency, training-step latency, and autoregressive generation. Includes utility functions for dynamic module loading, timing, and percentile calculation.
## Summary
Introduces the primary Python benchmark runner, measuring model
metadata, data throughput, forward latency, training-step latency, and
autoregressive generation. Includes utility functions for dynamic module
loading, timing, and percentile calculation.

## Model BenchmarkingLatency Profiling: 
Tracks forward pass, training step, and autoregressive generation
latencies.Throughput Tracking: Measures tokenizer processing speeds and
data throughput.Resource Monitoring: Captures model metadata and system
memory footprints during runs.

## Math UtilitiesDynamic Loading: 
Implements safe runtime module loading via importlib to dynamically
interact with engine/inference.py.Statistical Metrics: Adds custom
mathematical utility functions, including a precise percentile
calculator ($P_{50}$, $P_{90}$, $P_{99}$) for latency distribution
reporting.Standardized Exports: Lays the groundwork for structured JSON
and CSV output formatting.
Introduces the primary C++ benchmark runner (cpp_benchmark.cpp). It defines the parsing configurations, tracking metrics structures (Stats and BenchRow), and basic time/utility abstractions needed to mirror the Python benchmark suite capabilities.
@Eamon2009 Eamon2009 merged commit 720ffc1 into dependabot/npm_and_yarn/frontend/multi-bb2efd036b May 17, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants