I'm an AI Engineer at SFEIR, a French IT consulting firm, currently on mission at Decathlon France building DaiLY — a multi-agent HR assistant for 30,000+ employees in Google Chat, with answer accuracy lifted from ~60% to 97.7% on its lead agent through a systematic eval pipeline. Before joining SFEIR, I spent five years inside Decathlon — first in France on internal HR tooling, then in the UK building the e-commerce marketplace platform that drove £2.6M GMV in 2024.
I work embedded inside client teams, owning delivery end-to-end. My four open-source projects mirror the patterns I ship in production. Currently exploring privacy-safe RAG and cost-attribution patterns for production LLM systems.
- 🛠️ Multi-agent systems · RAG · LLMOps · evaluation pipelines
- 🌍 🇫🇷 French (native) · 🇬🇧 English (C2) · 🇪🇸 Spanish
Four production-grade open-source projects, all type-strict, high-coverage, full CI. Together they cover the three problems every team running LLMs hits:
Cost: gateway tracks where the budget went; autopilot prevents it going to the wrong place. Quality: detector catches quality drops when prompts change. Privacy: guardian keeps personal data out of both the index and the response.
An OpenAI-compatible gateway that attributes spend across the four stages of a RAG pipeline — retrieval, reranking, generation, evaluation — so teams stop guessing which stage is eating their budget.
RAG-aware cost attribution · <8ms gateway overhead · multi-provider fallback · circuit breakers · 92% coverage
A two-stage router (embedding similarity, then DeBERTa zero-shot on ambiguous cases) that sends each request to the cheapest capable model, then learns from its own routing mistakes via a feedback loop.
94.6% routing accuracy · self-improving · 60–80% cost reduction on typical workloads · 95% coverage
A CI quality gate that runs your LLM against a golden dataset on every PR, diffs accuracy with Wilson 95% confidence intervals, and blocks the merge when the drop is statistically real — inspired by the eval pipeline behind DaiLY in production.
-30pp regression detected automatically in CI · 86% coverage · GitHub Actions + Slack alerts
A RAG pipeline with three-stage PII detection at ingestion (Presidio + GLiNER + DeBERTa) and a post-generation audit on every answer — aligned with EU AI Act Article 10 by design.
100% PII recall · 0.93 precision · 0 post-generation leaks · 93% coverage
Sole technical lead on a multi-agent HR assistant in Google Chat serving 30,000+ employees across France and Switzerland.
- Coordinator + 4 specialized sub-agents over the A2A protocol on Cloud Run, built on Google ADK and Gemini (Vertex AI)
- LLM-as-Judge eval pipeline: 600+ golden cases across 4 agents + a coordinator routing suite — rubric pass lifted from ~60% to 97.7% on HR Knowledge, 87–96% across the remaining agents
- 2-layer production kill switch (5–10s Cloud Run cutoff + 30s TTL registry toggle, no redeploy) · keyless CI/CD via GitHub Actions + Workload Identity Federation
- BigQuery observability tying answer quality to the exact prompt revision (per-prompt-hash, per-model, per-cost-center)
- Appointed France's technical lead on the RAG Alignment Task Force for the 100K-user global rollout
Built the e-commerce marketplace connector platform across three countries.
- 8 Java/Spring Boot microservice connectors across UK, South Korea, and Switzerland
- £2.6M GMV in 2024 · €528K GMV on the Glovo connector since August 2025
- 40,000+ product updates/day via Cloud Firestore
- Onboarding time per new marketplace: 8 weeks → 4 weeks
Automation of HR processes and internal tooling.
- Built a Java/Spring Boot aggregator integrating with Greenhouse webhooks — cut manual data entry by 50%
- Streamlined contract generation and internal API workflows



