Yash Raj Pandey devYRPauli

`whoami`

class YashRajPandey:
    role        = "AI Agents Architect @ UF IFAS"
    based_in    = "Gainesville, Florida"
    focus       = ["local-first LLMs", "RAG", "vector search", "AI agents"]
    also_builds = "full-stack production systems"
    philosophy  = "Understand the real problem. Ship the simplest thing that works. Measure. Iterate."

I build AI infrastructure and production software. I'm the AI Agents Architect at the University of Florida's Institute of Food and Agricultural Sciences, where I lead a function I proposed myself: self-hosted, air-gapped AI systems built on open-weight models.

The work is end to end: open-weight models served on-prem, retrieval that pairs dense vector search with reranking, agents that call real tools, and evaluation gates that decide what ships. None of it leaves the building, and it holds up under real users rather than a demo.

I joined UF as a Software Engineer in March 2025, was promoted to Lead Software Engineer in October 2025, and moved into the AI Agents Architect role in April 2026.

What I'm building

TurboQuant on Apple Silicon

Independent evaluation of TurboQuant (arXiv 2504.19874), a near-optimal LLM quantization method, ported to run CPU-only on Apple Silicon. Fixed five blocking bugs, then benchmarked long-context retrieval across an MLX path and a llama.cpp Metal path.

Needle-in-a-haystack retrieval: 0% to 100% at 16K tokens, with a large KV cache memory reduction, all on a consumer M1 Pro with no dedicated GPU.

View the repo ->

mddocs - local-first collaborative Markdown, with an API for AI agents

A git-native, self-hostable editor for Markdown: real-time multiplayer, inline comments, and accept/reject suggestions, plus a first-class HTTP API so AI agents read, edit, and review documents the same way people do. Published on npm and built on proof-sdk.

Local-first and git-native: every change is a commit, no central database to run
Real-time collaboration with comments and accept/reject suggestion review, backed by a CRDT (Yjs) document model that merges concurrent edits without conflicts
Agent-facing HTTP API with per-agent tokens and rate limits, so automated writers are first-class collaborators, not bolt-ons

View the repo ->

Blue Omics - full-stack research data platform

A Django, React, and PostgreSQL platform I designed and built from scratch for a research lab. It grew from zero to 5M+ live records across 32 data models and 58 API endpoints, replaced years of manual spreadsheet workflows, and became the primary system for every lab submission.

Tuned PostgreSQL with explicit indexing and caching to hold low-millisecond median latency under concurrent access by 30+ researchers
Built 7 ingestion pipelines for heterogeneous formats: PDF, Excel, CSV, Word, PowerPoint
Automated R Markdown reporting, cutting a 2-3 hour manual process to 15-20 minutes
Deployed on GCP with Kubernetes and Terraform; optimized the frontend from 8s to 3s load time

ApplyScore - AI resume gap-analysis Chrome extension

A published Chrome extension that scores how well a resume matches any job posting on the web, with evidence-linked gaps and no hallucinated fluff.

Universal scraper that pierces Shadow DOM to read postings across LinkedIn, Greenhouse, Ashby, Lever, Workday and more
Strict, evidence-based analysis: a confidence-weighted fit score with requirement-by-requirement matches linked to the exact resume bullets that prove them
Privacy-first and bring-your-own-key (OpenAI, Anthropic, or Google), so data and model choice stay with the user

View on the Chrome Web Store ->

Open source

I fix real bugs in the AI infrastructure I build on: inference engines, ML frameworks, and the tooling around them. 30+ merged pull requests across 20+ open-source projects, each one a root-caused fix backed by a regression test.

LLM inference and ML frameworks

llama.cpp (ggml) - fixed rms_norm_back producing wrong output under in-place aliasing on the CPU backend
MLX (Apple) - signed-integer overflow in roll and tile; undefined behavior in arange(step=0)
MLX-LM (Apple) - server 404 on short prompts; sampler top_k bound fix
RAGFlow - broken language detection; Excel parser off-by-one chunking
mem0 - Redis, FAISS, and Weaviate crashes and filtered-search truncation

Developer tooling

Cost-accuracy and edge-case fixes across a suite of open-source CLI and macOS tools: pricing-unit bugs (per-token vs per-request), timezone and Unicode handling, performance regressions, and more.

See all merged contributions ->

Stack


AI / ML	Open-weight LLMs, vLLM, Ollama, RAG, hybrid retrieval, Qdrant, rerankers, embeddings, agents and tool use, quantization, evaluation harnesses
Backend	Python, Django, Django REST, FastAPI, Node.js, PostgreSQL, Redis
Frontend	React, TypeScript, Modern HTML/CSS
Infra	Docker, Kubernetes, Terraform, GCP, AWS, CI/CD, GitHub Actions

Recognition and education

Herbert Wertheim College of Engineering Achievement Award, University of Florida. Merit scholarship awarded for top academic standing.

M.S. Computer and Information Science and Engineering, University of Florida - GPA 3.8 / 4.0
Semester Exchange, University of Florida - GPA 3.7 / 4.0
B.Tech Computer Science and Engineering, Jaypee University of Engineering and Technology - GPA 9.1 / 10.0

Writing

I keep playbooks on what I learn shipping local AI, over on yashrajpandey.com:

Self-hosting open-weight LLMs without sending data to a cloud API
RAG that holds up in production: retrieval, reranking, and the evals that keep it honest
Evaluation-gated releases for LLM systems

Off the clock: football (I built Football Hub and World Cup 2026 Picks because I love the game), tactical FPS, story-rich RPGs, and lo-fi for flow state.

Open to good conversations on AI infrastructure and local-first LLM systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly