Hugo Canaval Hacanaval

Hi, I'm Hugo Canaval

Data Scientist designing and deploying ML and AI systems in production environments. I bring a business-first perspective to data problems. I came from 6+ years in commercial strategy before going deep into engineering, and that combination shapes how I build.

What I build

Most of my production work lives in private repositories, but here's the kind of systems I design and deploy:

Predictive ML at scale — delivery probability model trained on 103M+ records at Coordinadora, reducing operational costs by 15% ($50K USD/week in logistics savings)
Multi-agent AI systems — Cloud Run + Gemini for automated operational reporting across 8+ data sources
Document intelligence pipelines — Gemini Vision for invoice OCR + automatic bank reconciliation
Production APIs — FastAPI services on Cloud Run consuming BigQuery at scale
LLM validation systems — automated photo evidence verification with Gemini Vision

Stack

Languages: Python, SQL
ML & Data: Scikit-learn, XGBoost, Pandas, NumPy, PySpark, FAISS, BERT/TF-IDF
Google Cloud Platform (GCP): BigQuery, Cloud Run, Gemini API, Cloud Storage, gcloud CLI
LLMs & AI: RAG, Prompt Engineering, LLM pipelines, multimodal (text + vision)
Backend: FastAPI, Docker, PostgreSQL
Tools: Git, Telegram Bot API

Projects

Production work (private repositories — Coordinadora)

Delivery probability model: Random Forest trained on 103M+ records, 15% reduction in failed deliveries, $50K USD/week in operational savings. Stack: Python, Scikit-learn, BigQuery, Cloud Run, Optuna
Multi-agent AI reporting system: Gemini + Cloud Run automating daily operational metrics across 8+ BigQuery data sources
Document intelligence pipeline: Gemini Vision for invoice OCR + automatic bank reconciliation across multiple bank formats. Stack: Cloud Run, BigQuery, Python
Delivery history API: Production REST API serving real-time ML scoring. Stack: FastAPI, Cloud Run, BigQuery
Photo evidence validation: Automated delivery proof classification using Gemini Vision, replacing manual quality review

Training projects (public)

Projects from TripleTen Data Science bootcamp (2024–2025) — reflect the foundations, not the production work.

SaaS Sales Agent — Multi-tenant chatbot backend with RAG, FAISS vector search, FastAPI
Taxi Demand Forecast — Hourly demand prediction comparing SARIMA, XGBoost, Prophet

Background

Before engineering, I spent 6+ years in commercial roles at BAT, AB InBev, Mondelēz, and Nielsen, pricing strategy, category management, trade marketing. That gave me something most data scientists don't have, a working understanding of what a business decision actually costs.

Connect

📍 Cali, Colombia · hacanaval@hotmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly