Hritik Datta Hritikd

Hritik Datta

Product @ Pre6 AI · I build production-grade AI agent systems.

Product by title, builder by craft — I design AI products and ship the engineering behind them: multi-agent orchestration, agent evaluation, and AI safety infrastructure.

What I work on

I care about the unglamorous half of AI products — the part that decides whether they survive contact with real users. Most demos route a single LLM call. Production systems need orchestration, evaluation, safety gates, and observability. That gap is what I build into.

Multi-agent orchestration — supervisor/specialist architectures with typed state, tool binding, and streaming traces.
Agent reliability — measurable, auditable evaluation of agent runs across reliability, safety, latency, and cost.
LLM safety — scanning retrieval context for prompt injection, secret leakage, PII, and exfiltration before it reaches a model.
Developer tooling — sharp CLIs that turn fuzzy engineering signals into decisions teams can act on.

Featured work

Project	What it is	Stack	Links
winnow	Budget-aware context compression for RAG and agents — BM25 relevance + MMR diversity packs the highest-signal context into a token budget. Deterministic, zero runtime deps, no API keys, with a reproducible benchmark.	Python · CI	Code
gemma4-multi-agent	Multi-agent system — a Supervisor routes work across 4 specialist agents with live reasoning traces and sandboxed tool execution.	Python · LangGraph · Gemini · Streamlit	Code
agent-evals-lab	Evaluation workbench for agent reliability — typed scoring engine, policy rules, regression detection, and a trace-inspection dashboard.	TypeScript · React · CI	Live Demo · Code
verdict	Adversarial LLM red-teaming platform — runs PAIR, Crescendo, and injection attacks against any model, then reports attack-success-rate metrics with per-category breakdowns and HTML reports.	Python · CI	Code
rag-safety-gateway	AI security gateway that scans RAG context for prompt injection, secrets, PII, and exfiltration risk, producing deterministic allow/redact/quarantine decisions.	TypeScript · React · CI	Live Demo · Code
hermes	Test-time compute scaling engine — gives any LLM o1-style reasoning search via Process Reward Models, MCTS, and beam search.	Python · CI	Code

Every featured project ships with tests, CI, and documentation — clone, run, and review the design in minutes.

_{repo-pulse — one of my CLIs, generating a real engineering-health report with no keys or config.}

How I build

Typed contracts first   →  domain models before logic, so behavior is auditable
Deterministic by default →  scoring and decisions reproducible without a live model
Measurable, then pretty  →  evals and telemetry before dashboards
Reviewable in 60 seconds →  clone, run, understand — no API keys to start

Stack

Python · TypeScript · LangGraph · LangChain · React · Streamlit · Google Gemini · OpenAI · pytest · Vitest · GitHub Actions · uv

_{Open to conversations on AI agent engineering, evals, and LLM safety.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hritik Datta Hritikd

Achievements

Achievements

Block or report Hritikd

Hritik Datta

What I work on

Featured work

How I build

Stack

Pinned Loading

Uh oh!