Prathy prathyyyyy

Hi 👋 I'm Prathy P

Data Systems & Machine Learning Engineer

Professional Summary

Data Systems and Machine Learning Engineer with experience designing high-throughput batch and real-time data pipelines, lakehouse architectures, and production ML platforms on AWS and Azure. Skilled in Spark, Kafka, Databricks, and vector search systems, with a strong focus on building scalable, reliable data and ML infrastructure for real-world applications.

🌍 India and open for relocation
✉️ csprathyy@gmail.com
🤝 Open to Data Engineer | Data Scientist | ML Engineer roles

🚀 What I Build

High-throughput batch & real-time data pipelines (Spark, Kafka, Kinesis, Flink)
Lakehouse architectures using Delta, Iceberg, Hudi, Unity Catalog
Streaming analytics & security detection systems
ML pipelines on Spark with GPU acceleration
Vector search & semantic retrieval systems using FAISS & embeddings
Multimodal RAG systems (text + image retrieval)
Production ML with monitoring, CI/CD, and drift detection

🧠 Core Expertise

Data Engineering

PySpark Kafka Kinesis Flink Databricks Delta Lake Iceberg Hudi Unity Catalog

Machine Learning Systems

Spark ML XGBoost4J-Spark RAPIDS Evidently AI SageMaker Pipelines

Vector & LLM Systems

FAISS Sentence-BERT Embeddings Multimodal RAG LangChain

Cloud

AWS (Glue, Lambda, Athena, S3, SageMaker)
Azure (Databricks, Data Factory, Azure ML, DevOps)

Backend

.NET PostgreSQL Docker Flask API

🏗️ Featured Projects

🔹 High-Throughput E-Commerce Streaming Analytics & Security Detection

Processed 67M+ events
Built batch + real-time analytics pipelines
Apache Hudi → 50% faster queries, 40% less storage
Kinesis + Flink + DynamoDB for DDoS/Bot detection

🔹 Truck Delay Prediction using Spark ML + GPU XGBoost

XGBoost4J-Spark + RAPIDS Accelerator
Production pipeline with SageMaker + Evidently AI
Drift monitoring, CI/CD, orchestration

🔹 Semantic Search & Relevance Platform

Sentence-BERT embeddings
FAISS vector retrieval
Iceberg storage + Dockerized Flask API

🔹 Multimodal RAG Food Recommendation System

Text + Image embeddings
FAISS vector indexing
Streamlit app deployed on AWS

🔹 Databricks Streaming ETL (Medallion Architecture)

Kafka + PySpark streaming joins
Unity Catalog governance
Azure DevOps CI/CD

🏅 Certification

Microsoft Certified: Azure Data Scientist Associate (DP-100)
https://learn.microsoft.com/en-us/users/prathy-0029/credentials/certification/azure-data-scientist

🛠️ Tech Stack

Python • PySpark • SQL • Spark ML • Kafka • Databricks • AWS • Azure • FAISS • Docker • PostgreSQL • PowerBI

🤝 Let’s Collaborate

I love working on:

Distributed data systems
ML at scale
Vector search & RAG systems
Streaming analytics

⚡ Fun Fact

I enjoy translating complex data problems into scalable engineering systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly