LLM Systems Engineer — Production Inference · Multi-Agent Orchestration · Edge Deployment
I build production LLM systems from the metal up — from quantized models running on Jetson edge hardware to multi-agent cloud deployments with tool-use, permission gating, and audit trails. I care about systems that are secure, measurable, and actually useful.
Dallas-Fort Worth, TX
Core stack: Python, Rust, PyTorch, CUDA, Docker, LiteLLM
Security-first open-source coding agent. Hand-rolled async ReAct loop with a 4-tier deny-first permission engine, SHA-256 hash-chained audit trail, and 200+ LLM providers via LiteLLM.
- 30+ built-in tools with JSON Schema validation, MCP server/client
- Parallel + speculative tool dispatch, cost budget enforcement
- Self-evolution via LLM-guided mutations, multi-language verify gate
- SWE-bench Lite: 52.2% oracle best-of-5
Autonomous multi-agent personal intelligence system running on NVIDIA Jetson hardware. Fully on-device inference — zero cloud dependencies, privacy-preserving by design.
Multi-agent algorithmic trading pipeline with DeepSeek R1 reasoning at every stage.
- 4-agent pipeline: Technical Analysis → Chief Strategist → Risk Manager → Execution
- Kelly Criterion position sizing, Monte Carlo risk simulation
- Real-time WebSocket market data, paper trading integration
Qwen3.5-4B fine-tuned with ORPO for biblical question-answering.
- Hybrid RAG: dense embeddings + keyword search
- Constitutional AI self-critique guardrails for theological accuracy
- Voice pipeline: speech-to-text → LLM → text-to-speech
Comprehensive GPU diagnostic toolkit modeled on NVIDIA DCGM architecture.
- Automated stress testing, memory validation, ECC detection
- Health monitoring for GPU server fleets
Production-grade ML training infrastructure for single-GPU homelabs.
- Unsloth fp8 quantization, torch.compile graph optimization
- DeepSpeed ZeRO stages, vLLM + lm-eval harness
- Multi-seed reporting for statistically sound results
SQL + Python ETL pipeline for semiconductor quality analysis.
- Supplier performance scoring with trend detection
- Defect Pareto distributions, yield rate dashboards
- Automated alerting on quality threshold breaches
Multi-model ML pipeline predicting tire wear for Tesla vehicles. Random Forest, XGBoost, Neural Network, and Ensemble models with Claude AI analysis for tire longevity insights.
- Simulated driving data with vehicle-specific tire degradation modeling
- GridSearch-tuned Random Forest, XGBoost, and TensorFlow/Keras neural network
- Ensemble averaging across all models for robust predictions
- Claude AI integration for natural language tire wear analysis
Git-backed knowledge wiki — Karpathy's LLM Wiki pattern with LangGraph ingestion pipelines for structured and unstructured content. Full diff history.
📈 Contribution Graph
| Area | Technologies |
|---|---|
| LLMs & Agents | LiteLLM, Claude/GPT/Gemini APIs, Ollama, llama.cpp, RAG, prompt engineering |
| ML Infrastructure | PyTorch, Unsloth, DeepSpeed, vLLM, lm-eval, torch.compile, MLflow |
| Systems | Python, Rust, TypeScript, CUDA, Docker, GitHub Actions |
| Edge / Hardware | NVIDIA Jetson (Orin, Nano), RTX 5070 Ti, multi-GPU inference |
| Data | PostgreSQL, SQL, pandas, SQLAlchemy, ETL pipelines |


