Hemant Kumar B K HemantBK

Hemant Kumar B K

ML Engineer building production-grade AI systems with safety at the core. Currently researching Multi-Agent RL for cybersecurity at the University of Arizona and co-authoring StepShield — a safety benchmark for autonomous code agents (targeting ICML 2027). Previously built recommendation engines at Escape LLC (30% engagement lift) and agentic RAG chatbots at Omdena (95% reduction in harmful responses).

I don't treat AI safety as a checkbox — I treat it as an engineering discipline.

🔬 Research

🛡️ StepShield — Co-Author

First benchmark for evaluating when autonomous code agents go rogue — not just whether they do. Detects specification violations (data exfiltration, unauthorized access) in real-time across 9,213 agent trajectories. Early detection cuts monitoring costs by 75% (~$108M projected savings).

Python PyTorch LLM Safety Red-Teaming Autonomous Agents

🚀 Featured Projects

🛡️ LLM Eval Pipeline

Production-grade LLM evaluation + red-teaming

Hybrid n8n + FastAPI architecture with 4 LLM providers, LLM-as-Judge scoring, circuit breaker, DLQ, Redis caching, Prometheus/Grafana monitoring.

Python FastAPI Redis Prometheus Red-Teaming

🔐 MLShield

ML-infra-aware defense for model weights

Protects against model-weight exfiltration using a 3-layer cascaded architecture (Rules → ML → LLM). Kubernetes-native, GPU-aware anomaly detection.

Python Kubernetes Model Security Anomaly Detection

⚖️ LLM Bias Sentinel

7-benchmark bias evaluation + guardrails

Open-source LLM bias evaluation framework with red-teaming, guardrails, and monitoring — all running locally via Ollama. Zero API costs.

Python Ollama Red-Teaming Guardrails Responsible AI

💰 Dynamic Pricing Engine

Production-grade ML pricing system

XGBoost demand forecasting + price elasticity estimation + scipy revenue optimization. FastAPI serving, Streamlit dashboard, MLflow tracking, Evidently drift monitoring.

Python XGBoost FastAPI MLflow Streamlit

🗣️ AI Voice Assistant

Full-stack speech pipeline: STT → LLM → TTS

End-to-end voice assistant running entirely on your own machine — FastAPI backend, React frontend, Docker. Private by design: zero cloud calls.

Python FastAPI React Docker LLM

🏍️ RideShala

AI motorcycle advisor for Indian riders

RAG over motorcycle specs with vLLM serving, Qdrant vector store, FastAPI. Personalized bike recommendations with source citations.

Python vLLM RAG Qdrant FastAPI

📂 All Projects

🛡️ AI Safety & Responsible AI

chatbot-auditor — Quality auditor for AI chatbots; analyzes conversation logs to surface where bots underperform.
credit-scoring-fairness-mlops — End-to-end MLOps with automated fairness gates, drift monitoring, EU AI Act compliance (XGBoost, Fairlearn, MLflow).
healthcare-bias-audit — Bias audit of healthcare ML on the MEPS dataset; AIF360 mitigation, SHAP/LIME explainability.

🤖 LLM Systems & RAG

AI-Chief — Food science assistant with multi-agent RAG, real-time safety monitoring, dangerous-advice detection (TypeScript, Fastify, HNSW).
Interactive-Multilingual-AI-Audiobook-Assistant — OCR extraction → neural TTS → multilingual translation → real-time Q&A audiobook pipeline.
AI-Wildlife-Tracker — RAG identifying 500+ Indian wildlife species from text or photos; hybrid retrieval, ONNX inference, Langfuse observability.

⚙️ Applied ML & MLOps

Multilingual-Sentiment-Emotion-Intelligence-Engine — 5 languages + Hindi-English code-switching; multi-task XLM-RoBERTa with LoRA adapters, ONNX INT8.
Algorithmic-Trading-AI — FinBERT sentiment + spaCy NER + TimeGPT forecasting → BUY/SELL/HOLD signals from real-time financial news.
LLaMA-Sum-Fine-Tuning — LLaMA 3.2 1B fine-tuned via QLoRA; 40%+ ROUGE-2 improvement over base on CNN/DailyMail.

🛠️ Tech Stack

💻 Languages
🤖 ML / DL
🧠 LLM & Agents
🛠️ MLOps / Cloud
📊 Observability
🛡️ AI Safety & Responsible AI
🗄️ Data