Hey, I'm Rohith. I'm finishing my MS in Data Science at Arizona State and I build things that turn messy, real-world data into something actually useful.
Not dashboards for the sake of dashboards. Systems that give someone a decision they couldn't make before.
Right now I'm open to Data Engineer and Data Analyst roles on F1 OPT. Based in Tempe, AZ.
Python Prophet LightGBM SHAP AWS S3 Streamlit
Forecasts diabetes-cardiac comorbidity risk across all 3,144 US counties, flagging counties on a worsening trajectory 2-3 years before they hit critical thresholds. Built a 3-layer scoring model (clinical burden + social vulnerability + trajectory), ran a weighted ensemble that hit WAPE 0.46%, and surfaced 830 early warning alerts with plain-English explanations.
Found a Great Plains emerging cluster (NE/IA/SD) that hadn't been documented in prior literature.
→ live dashboard · repo
React FastAPI Claude API PySpark Vercel
Drop in any CSV and it runs through a full Bronze → Silver → Gold medallion architecture powered by Claude. It detects schema issues, generates real PySpark and SQL transformations, builds a KPI dashboard, and exports a production-ready .py file. Not a toy demo, actual code you can run.
Apache Spark Docker GCS BigQuery Parquet
Production-style lakehouse processing 80,000+ records per run. Raw JSON in GCS goes through Dockerized Spark transforms, lands as Parquet in Silver, and loads into a partitioned + clustered BigQuery warehouse. Single command execution, schema enforcement, deduplication built in.
→ repo
PostgreSQL Power BI
End-to-end project controls simulation. Planned vs actual cost tracking with SQL window functions, CPI & EAC forecasting, RAG risk classification, and what-if financial impact modeling. Identified real cost overruns and flagged high-risk projects before they escalated.
data engineering is moving fast and I'm moving with it. right now I'm digging into:
- real-time pipelines — Kafka, Flink, streaming architectures (batch is great but streams are next)
- dbt — proper data modeling, not just SQL dumps
- ML in production — not just notebooks, actual model serving, monitoring, drift detection
- open source — looking for projects where I can actually contribute something useful
if you're building something interesting or know something I should learn, I'm genuinely all ears.
data engineering PySpark · Spark · Docker · GCP · BigQuery · AWS S3 · Parquet
ml & forecasting Prophet · LightGBM · Scikit-learn · SHAP · PyTorch
languages Python · SQL · PostgreSQL
ai tooling Claude API
viz & bi Streamlit · Plotly · Power BI · Tableau
backend FastAPI · REST APIs
frontend React · Tailwind CSS
tools Git · Docker · Jupyter
