Rohith Srinivasa rohith-66

Hey, I'm Rohith. I'm finishing my MS in Data Science at Arizona State and I build things that turn messy, real-world data into something actually useful.

Not dashboards for the sake of dashboards. Systems that give someone a decision they couldn't make before.

Right now I'm open to Data Engineer and Data Analyst roles on F1 OPT. Based in Tempe, AZ.

what I've built

ComorbidAlert — county-level health forecasting across the US

Python Prophet LightGBM SHAP AWS S3 Streamlit

Forecasts diabetes-cardiac comorbidity risk across all 3,144 US counties, flagging counties on a worsening trajectory 2-3 years before they hit critical thresholds. Built a 3-layer scoring model (clinical burden + social vulnerability + trajectory), ran a weighted ensemble that hit WAPE 0.46%, and surfaced 830 early warning alerts with plain-English explanations.

Found a Great Plains emerging cluster (NE/IA/SD) that hadn't been documented in prior literature.

→ live dashboard · repo

DataFlow Studio — upload a CSV, get a production pipeline

React FastAPI Claude API PySpark Vercel

Drop in any CSV and it runs through a full Bronze → Silver → Gold medallion architecture powered by Claude. It detects schema issues, generates real PySpark and SQL transformations, builds a KPI dashboard, and exports a production-ready .py file. Not a toy demo, actual code you can run.

→ live demo · repo

Lakehouse Pipeline — GCP + Spark + BigQuery

Apache Spark Docker GCS BigQuery Parquet

Production-style lakehouse processing 80,000+ records per run. Raw JSON in GCS goes through Dockerized Spark transforms, lands as Parquet in Silver, and loads into a partitioned + clustered BigQuery warehouse. Single command execution, schema enforcement, deduplication built in.

→ repo

Construction Portfolio — cost forecasting on 75,000+ work items

PostgreSQL Power BI

End-to-end project controls simulation. Planned vs actual cost tracking with SQL window functions, CPI & EAC forecasting, RAG risk classification, and what-if financial impact modeling. Identified real cost overruns and flagged high-risk projects before they escalated.

always exploring

data engineering is moving fast and I'm moving with it. right now I'm digging into:

real-time pipelines — Kafka, Flink, streaming architectures (batch is great but streams are next)
dbt — proper data modeling, not just SQL dumps
ML in production — not just notebooks, actual model serving, monitoring, drift detection
open source — looking for projects where I can actually contribute something useful

if you're building something interesting or know something I should learn, I'm genuinely all ears.

stack

data engineering   PySpark · Spark · Docker · GCP · BigQuery · AWS S3 · Parquet
ml & forecasting   Prophet · LightGBM · Scikit-learn · SHAP · PyTorch
languages          Python · SQL · PostgreSQL
ai tooling         Claude API
viz & bi           Streamlit · Plotly · Power BI · Tableau
backend            FastAPI · REST APIs
frontend           React · Tailwind CSS
tools              Git · Docker · Jupyter

find me

_{build systems that reduce uncertainty, not increase complexity}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly