Skip to content
View Hacanaval's full-sized avatar

Block or report Hacanaval

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Hacanaval/README.md

Hi, I'm Hugo Canaval

Data Scientist designing and deploying ML and AI systems in production environments. I bring a business-first perspective to data problems. I came from 6+ years in commercial strategy before going deep into engineering, and that combination shapes how I build.


What I build

Most of my production work lives in private repositories, but here's the kind of systems I design and deploy:

  • Predictive ML at scale — delivery probability model trained on 103M+ records at Coordinadora, reducing operational costs by 15% ($50K USD/week in logistics savings)
  • Multi-agent AI systems — Cloud Run + Gemini for automated operational reporting across 8+ data sources
  • Document intelligence pipelines — Gemini Vision for invoice OCR + automatic bank reconciliation
  • Production APIs — FastAPI services on Cloud Run consuming BigQuery at scale
  • LLM validation systems — automated photo evidence verification with Gemini Vision

Stack

  • Languages: Python, SQL
  • ML & Data: Scikit-learn, XGBoost, Pandas, NumPy, PySpark, FAISS, BERT/TF-IDF
  • Google Cloud Platform (GCP): BigQuery, Cloud Run, Gemini API, Cloud Storage, gcloud CLI
  • LLMs & AI: RAG, Prompt Engineering, LLM pipelines, multimodal (text + vision)
  • Backend: FastAPI, Docker, PostgreSQL
  • Tools: Git, Telegram Bot API

Projects

Production work (private repositories — Coordinadora)

  • Delivery probability model: Random Forest trained on 103M+ records, 15% reduction in failed deliveries, $50K USD/week in operational savings. Stack: Python, Scikit-learn, BigQuery, Cloud Run, Optuna
  • Multi-agent AI reporting system: Gemini + Cloud Run automating daily operational metrics across 8+ BigQuery data sources
  • Document intelligence pipeline: Gemini Vision for invoice OCR + automatic bank reconciliation across multiple bank formats. Stack: Cloud Run, BigQuery, Python
  • Delivery history API: Production REST API serving real-time ML scoring. Stack: FastAPI, Cloud Run, BigQuery
  • Photo evidence validation: Automated delivery proof classification using Gemini Vision, replacing manual quality review

Training projects (public)

Projects from TripleTen Data Science bootcamp (2024–2025) — reflect the foundations, not the production work.


Background

Before engineering, I spent 6+ years in commercial roles at BAT, AB InBev, Mondelēz, and Nielsen, pricing strategy, category management, trade marketing. That gave me something most data scientists don't have, a working understanding of what a business decision actually costs.


Connect

LinkedIn CV

📍 Cali, Colombia · hacanaval@hotmail.com

Pinned Loading

  1. cnn-image-based-age-verification cnn-image-based-age-verification Public

    Jupyter Notebook

  2. ML-Imbalanced-Class-Handling ML-Imbalanced-Class-Handling Public

    Jupyter Notebook

  3. car-price-regression-models-comparison car-price-regression-models-comparison Public

    Jupyter Notebook

  4. nlp-sentiment-models-comparison nlp-sentiment-models-comparison Public

    Jupyter Notebook

  5. agente_vendedor_backend agente_vendedor_backend Public

    Python

  6. chatbot_multiagentico_viara chatbot_multiagentico_viara Public

    Python 2