Skip to content
View achi9629's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report achi9629

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
achi9629/README.md

Hi, I'm Achitya 👋

ML Engineer @ Mercedes-Benz R&D | M.Tech CS, IISc Bengaluru

📄 Resume

3.5+ years building production ML — LLM inference systems, fine-tuning pipelines, and edge-optimized models deployed across Mercedes car-lines.


Projects

Project What Key Result
LLM Inference Engine From-scratch inference stack in PyTorch: KV cache → paged memory → continuous batching → async serving 8.7× VRAM reduction · 11.1× system throughput · 122 tests
Efficient LLM Fine-Tuning LoRA/QLoRA adaptation → per-layer sensitivity profiling → selective QAT → AWQ INT4 export 3.5× compression · 2.1× speedup · <0.1pt ROUGE-L loss
TinyStories Transformer Decoder-only transformer trained from scratch with optimizer & scheduler ablations Token-based stopping · training stability analysis
NN From Scratch MLP with backprop implemented from scratch in NumPy — no frameworks Forward/backward pass · gradient computation · training loop

Production Work

  • Head Orientation — Driver Monitoring System (Sep 2022 – Present)
    Replaced legacy multi-stage pipeline with end-to-end architecture for real-time head pose estimation.
    ↓ 2RMSE by 4.6° Yaw / 2.65° Pitch / 1.1° Roll. Met production KPI (< 4.0° 2RMSE).
    Deployed across Mercedes car-lines. Currently developing unified multi-task Transformer consolidating head orientation, gaze, and landmarks into a single model.
  • Face Detection — Driver Monitoring System (Jan – Oct 2025)
    Designed lightweight multi-scale detector: 69K params, 0.126 GMACs (16.4× reduction from 2.07 GMACs baseline).
    IoU@0.5: 0.9956 vs. 0.93 baseline (+7pt absolute) — no new data collection required.
    Adopted as drop-in production replacement by downstream perception teams.

Core Skills

PyTorch Transformers KV Caching Paged Attention Continuous Batching LoRA/QLoRA Quantization (AWQ/GPTQ/QAT) FastAPI Mixed Precision

Currently Exploring

  • Production RAG System (chunking, embedding, vector search, reranking, and evaluation pipeline)
  • Distributed training (FSDP, tensor parallelism, scaling laws)
  • Triton kernel development (FlashAttention, fused ops)
  • OSS contributions to vLLM / SGLang

Pinned Loading

  1. efficient-llm-finetuning efficient-llm-finetuning Public

    Efficient LLM fine-tuning & deployment: LoRA, QLoRA, PTQ and QAT — with benchmarking and config-driven pipelines.

    Python 2

  2. llm-inference-engine llm-inference-engine Public

    A from scratch LLM inference engine build in PyTorch with custom GPT2 transformers, kv cache, paged kv cache, continuous batching and A100 benchmarks

    Python 1

  3. tinystories-transformer-training tinystories-transformer-training Public

    Decoder-only Transformer trained from scratch with token-based stopping, optimizer & scheduler ablations

    Python

  4. nn-from-scratch-numpy nn-from-scratch-numpy Public

    This repo contains MLP implementation from scratch using numpy

    Python