Data Scientist designing and deploying ML and AI systems in production environments. I bring a business-first perspective to data problems. I came from 6+ years in commercial strategy before going deep into engineering, and that combination shapes how I build.
Most of my production work lives in private repositories, but here's the kind of systems I design and deploy:
- Predictive ML at scale — delivery probability model trained on 103M+ records at Coordinadora, reducing operational costs by 15% ($50K USD/week in logistics savings)
- Multi-agent AI systems — Cloud Run + Gemini for automated operational reporting across 8+ data sources
- Document intelligence pipelines — Gemini Vision for invoice OCR + automatic bank reconciliation
- Production APIs — FastAPI services on Cloud Run consuming BigQuery at scale
- LLM validation systems — automated photo evidence verification with Gemini Vision
- Languages: Python, SQL
- ML & Data: Scikit-learn, XGBoost, Pandas, NumPy, PySpark, FAISS, BERT/TF-IDF
- Google Cloud Platform (GCP): BigQuery, Cloud Run, Gemini API, Cloud Storage, gcloud CLI
- LLMs & AI: RAG, Prompt Engineering, LLM pipelines, multimodal (text + vision)
- Backend: FastAPI, Docker, PostgreSQL
- Tools: Git, Telegram Bot API
- Delivery probability model: Random Forest trained on 103M+ records, 15% reduction in failed deliveries, $50K USD/week in operational savings. Stack: Python, Scikit-learn, BigQuery, Cloud Run, Optuna
- Multi-agent AI reporting system: Gemini + Cloud Run automating daily operational metrics across 8+ BigQuery data sources
- Document intelligence pipeline: Gemini Vision for invoice OCR + automatic bank reconciliation across multiple bank formats. Stack: Cloud Run, BigQuery, Python
- Delivery history API: Production REST API serving real-time ML scoring. Stack: FastAPI, Cloud Run, BigQuery
- Photo evidence validation: Automated delivery proof classification using Gemini Vision, replacing manual quality review
Projects from TripleTen Data Science bootcamp (2024–2025) — reflect the foundations, not the production work.
- SaaS Sales Agent — Multi-tenant chatbot backend with RAG, FAISS vector search, FastAPI
- Taxi Demand Forecast — Hourly demand prediction comparing SARIMA, XGBoost, Prophet
Before engineering, I spent 6+ years in commercial roles at BAT, AB InBev, Mondelēz, and Nielsen, pricing strategy, category management, trade marketing. That gave me something most data scientists don't have, a working understanding of what a business decision actually costs.
📍 Cali, Colombia · hacanaval@hotmail.com