Skip to content
View prathyyyyy's full-sized avatar

Block or report prathyyyyy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
prathyyyyy/readme.md

Hi ๐Ÿ‘‹ I'm Prathy P

Data Systems & Machine Learning Engineer

Professional Summary

Data Systems and Machine Learning Engineer with experience designing high-throughput batch and real-time data pipelines, lakehouse architectures, and production ML platforms on AWS and Azure. Skilled in Spark, Kafka, Databricks, and vector search systems, with a strong focus on building scalable, reliable data and ML infrastructure for real-world applications.

๐ŸŒ India and open for relocation
โœ‰๏ธ csprathyy@gmail.com
๐Ÿค Open to Data Engineer | Data Scientist | ML Engineer roles


๐Ÿš€ What I Build

  • High-throughput batch & real-time data pipelines (Spark, Kafka, Kinesis, Flink)
  • Lakehouse architectures using Delta, Iceberg, Hudi, Unity Catalog
  • Streaming analytics & security detection systems
  • ML pipelines on Spark with GPU acceleration
  • Vector search & semantic retrieval systems using FAISS & embeddings
  • Multimodal RAG systems (text + image retrieval)
  • Production ML with monitoring, CI/CD, and drift detection

๐Ÿง  Core Expertise

Data Engineering

PySpark Kafka Kinesis Flink Databricks Delta Lake Iceberg Hudi Unity Catalog

Machine Learning Systems

Spark ML XGBoost4J-Spark RAPIDS Evidently AI SageMaker Pipelines

Vector & LLM Systems

FAISS Sentence-BERT Embeddings Multimodal RAG LangChain

Cloud

AWS (Glue, Lambda, Athena, S3, SageMaker)
Azure (Databricks, Data Factory, Azure ML, DevOps)

Backend

.NET PostgreSQL Docker Flask API


๐Ÿ—๏ธ Featured Projects

๐Ÿ”น High-Throughput E-Commerce Streaming Analytics & Security Detection

  • Processed 67M+ events
  • Built batch + real-time analytics pipelines
  • Apache Hudi โ†’ 50% faster queries, 40% less storage
  • Kinesis + Flink + DynamoDB for DDoS/Bot detection

๐Ÿ”น Truck Delay Prediction using Spark ML + GPU XGBoost

  • XGBoost4J-Spark + RAPIDS Accelerator
  • Production pipeline with SageMaker + Evidently AI
  • Drift monitoring, CI/CD, orchestration

๐Ÿ”น Semantic Search & Relevance Platform

  • Sentence-BERT embeddings
  • FAISS vector retrieval
  • Iceberg storage + Dockerized Flask API

๐Ÿ”น Multimodal RAG Food Recommendation System

  • Text + Image embeddings
  • FAISS vector indexing
  • Streamlit app deployed on AWS

๐Ÿ”น Databricks Streaming ETL (Medallion Architecture)

  • Kafka + PySpark streaming joins
  • Unity Catalog governance
  • Azure DevOps CI/CD

๐Ÿ… Certification

Microsoft Certified: Azure Data Scientist Associate (DP-100)
https://learn.microsoft.com/en-us/users/prathy-0029/credentials/certification/azure-data-scientist


๐Ÿ› ๏ธ Tech Stack

Python โ€ข PySpark โ€ข SQL โ€ข Spark ML โ€ข Kafka โ€ข Databricks โ€ข AWS โ€ข Azure โ€ข FAISS โ€ข Docker โ€ข PostgreSQL โ€ข PowerBI


๐Ÿค Letโ€™s Collaborate

I love working on:

  • Distributed data systems
  • ML at scale
  • Vector search & RAG systems
  • Streaming analytics

โšก Fun Fact

I enjoy translating complex data problems into scalable engineering systems.

Pinned Loading

  1. Forest-Fire-Detection Forest-Fire-Detection Public

    Forest Fire Detection By Convolutional Neural Network

    Jupyter Notebook 16 4

  2. Medical-Data-Extraction Medical-Data-Extraction Public

    Medical Data Extraction By Pytesseract (Google Optical Character Recognition Engine) and Computer Vision

    Jupyter Notebook 18 3