Data Engineer | Cloud Architecture & ML Pipelines — I build institutional-grade data pipelines, canonical ledgers, and resilient analytical systems.
I specialize in Data Engineering and Systems Architecture, transforming raw, unstructured, and fragmented data into highly reliable, canonical Data Warehouses. My focus is on performance-first engineering: solving aggressive API rate limits, ensuring I/O database optimization, and building secure, cloud-native ingestion pipelines for the financial sector.
I don't just move data from A to B; I design systems that guarantee absolute integrity, idempotency, and strict data governance.
Over the past years, I have architected and deployed complex data systems across multiple domains:
- Enterprise Crypto Data Lakes: Architected async ingestion engines handling multiple institutional exchange accounts, implementing custom HTTP header rate limiters to prevent API bans.
- Multi-Chain Ledger Integrators: Replaced third-party indexers with raw RPC extraction across diverse blockchains (EVM/Solana), featuring a "Pacemaker" Auto-Reconciliation engine to audit database states against live nodes.
- Custody & BaaS Integration: Developed secure data sinks using Azure Active Directory and programmatic token integration to centralize clearinghouse and institutional custody data securely.
A massive end-to-end personal project focused on predictive modeling and quantitative analysis.
- Automated Data Engineering: Daily web scraping, feature engineering (42+ variables), and real-time data pipelines.
- Machine Learning: Statistical models, ensemble ML (Scikit-Learn), and continuous experiment cycles.
- Production: Full deployment lifecycle including backtesting, signal logging, and metric monitoring.
This project reflects my technical depth, persistence, and ability to architect, deploy, and maintain a complex algorithmic system long-term.
- Python: Advanced Asyncio, ThreadPoolExecutor, Pandas, SQLAlchemy, Polars.
- SQL (PostgreSQL): High-performance bulk loading (
COPYcommands,io.StringIO), temporary tables, indexing, and query optimization. - Cloud (Microsoft Azure): Azure Functions, Blob Storage (Checkpointing), Azure Key Vault (Dynamic Auth), Managed Identities.
- ML & Analytics: Scikit-learn, Metabase, PowerBI.
- Idempotency & Resilience: Upsert patterns, robust retry/backoff strategies.
- API Security: Zero-hardcoded credentials, secure token injection, header monitoring.
- Observability: Custom structured logging and pipeline state management.
- Clean Architecture: Modular project structures and OOP design.
"I enjoy solving real problems with pragmatic, clean, and reliable data solutions."
🔗 Portfolio: lukasrozado.github.io | 💼 LinkedIn: in/lukasrozado
