Skip to content
View rohith-66's full-sized avatar

Block or report rohith-66

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
rohith-66/README.md

header


Hey, I'm Rohith. I'm finishing my MS in Data Science at Arizona State and I build things that turn messy, real-world data into something actually useful.

Not dashboards for the sake of dashboards. Systems that give someone a decision they couldn't make before.

Right now I'm open to Data Engineer and Data Analyst roles on F1 OPT. Based in Tempe, AZ.


what I've built

ComorbidAlert — county-level health forecasting across the US

Python Prophet LightGBM SHAP AWS S3 Streamlit

Forecasts diabetes-cardiac comorbidity risk across all 3,144 US counties, flagging counties on a worsening trajectory 2-3 years before they hit critical thresholds. Built a 3-layer scoring model (clinical burden + social vulnerability + trajectory), ran a weighted ensemble that hit WAPE 0.46%, and surfaced 830 early warning alerts with plain-English explanations.

Found a Great Plains emerging cluster (NE/IA/SD) that hadn't been documented in prior literature.

live dashboard · repo


DataFlow Studio — upload a CSV, get a production pipeline

React FastAPI Claude API PySpark Vercel

Drop in any CSV and it runs through a full Bronze → Silver → Gold medallion architecture powered by Claude. It detects schema issues, generates real PySpark and SQL transformations, builds a KPI dashboard, and exports a production-ready .py file. Not a toy demo, actual code you can run.

live demo · repo


Lakehouse Pipeline — GCP + Spark + BigQuery

Apache Spark Docker GCS BigQuery Parquet

Production-style lakehouse processing 80,000+ records per run. Raw JSON in GCS goes through Dockerized Spark transforms, lands as Parquet in Silver, and loads into a partitioned + clustered BigQuery warehouse. Single command execution, schema enforcement, deduplication built in.

repo


Construction Portfolio — cost forecasting on 75,000+ work items

PostgreSQL Power BI

End-to-end project controls simulation. Planned vs actual cost tracking with SQL window functions, CPI & EAC forecasting, RAG risk classification, and what-if financial impact modeling. Identified real cost overruns and flagged high-risk projects before they escalated.


always exploring

data engineering is moving fast and I'm moving with it. right now I'm digging into:

  • real-time pipelines — Kafka, Flink, streaming architectures (batch is great but streams are next)
  • dbt — proper data modeling, not just SQL dumps
  • ML in production — not just notebooks, actual model serving, monitoring, drift detection
  • open source — looking for projects where I can actually contribute something useful

if you're building something interesting or know something I should learn, I'm genuinely all ears.


stack

data engineering   PySpark · Spark · Docker · GCP · BigQuery · AWS S3 · Parquet
ml & forecasting   Prophet · LightGBM · Scikit-learn · SHAP · PyTorch
languages          Python · SQL · PostgreSQL
ai tooling         Claude API
viz & bi           Streamlit · Plotly · Power BI · Tableau
backend            FastAPI · REST APIs
frontend           React · Tailwind CSS
tools              Git · Docker · Jupyter

find me

LinkedIn Portfolio Email


build systems that reduce uncertainty, not increase complexity

Pinned Loading

  1. comorbid-alert comorbid-alert Public

    US county-level diabetes-cardiac comorbidity forecasting and early warning system · Live dashboard

    Python 1

  2. dataflow-studio dataflow-studio Public

    AI-powered Bronze→Silver→Gold data engineering pipeline. Upload any CSV, get real PySpark & SQL transformations, KPIs, charts, and production pipeline code — powered by Claude AI.

    JavaScript 1

  3. lakehouse lakehouse Public

    Production lakehouse on GCP: Raw JSON to BigQuery via Dockerized Spark, 80k+ records per run

    Python 1

  4. project-performance-analytics-forecasting project-performance-analytics-forecasting Public

    End-to-end project performance and cost forecasting system using PostgreSQL and Power BI with CPI, EAC, risk classification, and scenario simulation.

    1

  5. real-time-marketplace-payments-platform real-time-marketplace-payments-platform Public

    Real-time data platform for marketplace payments using Kafka, Spark Structured Streaming, and medallion architecture (Bronze/Silver/Gold) for scalable analytics.

    Python 1

  6. squad-integrity-index squad-integrity-index Public

    Position-by-position squad depth analysis for all 48 FIFA World Cup 2026 teams

    Python 1