AI researcher and systems engineer with 10+ years shipping production systems at scale. Former Co-Founder & CTO at QWERKY AI, where I distilled 70B-parameter LLMs into 3B-8B hybrid models on 24 H200 GPUs with a pending patent on novel attention architectures. Currently pursuing my MS in Computer Science at Georgia Tech (BS CS, Summa Cum Laude, University of South Carolina). I've led teams of 20+ engineers and shipped 15+ production applications across AI, blockchain, and distributed systems.
- LLM Architecture Research -- Custom CUDA kernels for novel attention mechanisms (pending patent)
- State Space Models -- Contributed Mamba SSM architecture to Modular's MAX framework in Mojo
- QDistill -- 70B→3B-8B hybrid distillation achieving 4x throughput and 1M token context lengths
- Open Source -- Selective scan, causal conv1d, and RMSNorm kernels in the Modular ecosystem
Languages
AI / ML
Infrastructure
|
Modular MAX Framework
|
Pulley
|
|
QWERKY AI
|
key-gen
|
- Bringing Blazing Fast State Space Models to the Modular MAX Framework -- Feb 2026
- Mother May AI: An Opinion on Geoffrey Hinton's Mother AI -- Sep 2025
- Attention: The Breakthroughs and the Bottlenecks -- Jun 2025
- Incidental Non-Determinism: When AI Surprises You (and Why) -- May 2025
Read more on the QWERKY AI blog →




