ML Engineer @ Mercedes-Benz R&D | M.Tech CS, IISc Bengaluru
📄 Resume
3.5+ years building production ML — LLM inference systems, fine-tuning pipelines, and edge-optimized models deployed across Mercedes car-lines.
| Project | What | Key Result |
|---|---|---|
| LLM Inference Engine | From-scratch inference stack in PyTorch: KV cache → paged memory → continuous batching → async serving | 8.7× VRAM reduction · 11.1× system throughput · 122 tests |
| Efficient LLM Fine-Tuning | LoRA/QLoRA adaptation → per-layer sensitivity profiling → selective QAT → AWQ INT4 export | 3.5× compression · 2.1× speedup · <0.1pt ROUGE-L loss |
| TinyStories Transformer | Decoder-only transformer trained from scratch with optimizer & scheduler ablations | Token-based stopping · training stability analysis |
| NN From Scratch | MLP with backprop implemented from scratch in NumPy — no frameworks | Forward/backward pass · gradient computation · training loop |
- Head Orientation — Driver Monitoring System (Sep 2022 – Present)
Replaced legacy multi-stage pipeline with end-to-end architecture for real-time head pose estimation.
↓ 2RMSE by 4.6° Yaw / 2.65° Pitch / 1.1° Roll. Met production KPI (< 4.0° 2RMSE).
Deployed across Mercedes car-lines. Currently developing unified multi-task Transformer consolidating head orientation, gaze, and landmarks into a single model. - Face Detection — Driver Monitoring System (Jan – Oct 2025)
Designed lightweight multi-scale detector: 69K params, 0.126 GMACs (16.4× reduction from 2.07 GMACs baseline).
IoU@0.5: 0.9956 vs. 0.93 baseline (+7pt absolute) — no new data collection required.
Adopted as drop-in production replacement by downstream perception teams.
PyTorch Transformers KV Caching Paged Attention Continuous Batching LoRA/QLoRA Quantization (AWQ/GPTQ/QAT) FastAPI Mixed Precision
- Production RAG System (chunking, embedding, vector search, reranking, and evaluation pipeline)
- Distributed training (FSDP, tensor parallelism, scaling laws)
- Triton kernel development (FlashAttention, fused ops)
- OSS contributions to vLLM / SGLang
