I build white-box, governable infrastructure for long-running LLM agents — memory, agent runtimes, and evaluation/observability. Career-changer, ~6 months full-time into AI, shipping real systems with measured results (and honest about their limits).
- MASE-agent-memory — Anti-RAG dual-white-box memory engine. Pure SQLite + FTS5/BM25, no vector DB, no embeddings. Lifts qwen2.5-7B from 1.79% → 60.71% on NoLiMa-32k (+58.9pp) and to 88.71% on LV-Eval EN 256k.
- agent-cowork — A local Kimi-native agent host: ReAct loop + human approval gates + MCP tools + path/SSRF security. Added Kimi prompt-cache optimization (measured ~50–72% hit rate, ~40–58% input-cost cut).
- agent-observability-platform — AgentOps observability & evaluation for agent traces, metrics, evals, and API keys.
Merged PRs into MoonshotAI's own repos (walle #12, #13) + a kimi-cli contribution. Responsibly disclosed a security vulnerability in kimi-code (Windows filename-equivalence sensitive-file-guard bypass).
Python · TypeScript · SQLite/FTS5 · LangGraph / RAG · MCP · Playwright · honest evaluation & anti-overfitting
📫 zhou1051061805@gmail.com · open to remote / project-based AI-agent & RAG work

