class AnshikaMishra:
name = "Anshika Mishra"
pronouns = "she/her"
location = "Pune, Maharashtra"
university = "Savitribai Phule Pune University (SPPU)"
degree = "B.Sc. Data Science | CGPA: 8.61 / 10"
email = "anshikamishra.25000@gmail.com"
experience = [
"📊 Data Science Intern @ Ernst & Young LLP",
"🤖 AI Virtual Intern @ Infosys Springboard",
"🔬 Project Contributor @ Dept. of Tech, SPPU",
]
strengths = [
"End-to-end ML pipeline design",
"Time-series forecasting (ARIMA, Prophet)",
"NLP & LLM integration (FinBERT, Gemini, LLaMA)",
"Multi-agent AI systems (LangGraph, ReAct)",
"Reproducible, production-grade Python workflows",
]
achievements = [
"🏅 Elite Certificate — NPTEL Python for Data Science",
"🎓 Selected — Infosys Springboard Pragati Cohort 3",
"💎 SheFi Scholar — Cohort 1",
"🏛️ Workshop @ IIT Bombay — AI for Professionals",
]
motto = "Explore patterns → Build systems → Create impact"|
Ernst & Young LLP · Business Consulting & Supply Chain Ops
|
Infosys Springboard · Autonomous Cognitive Engine
|
|
Automated processing and validation of 10,000+ records for a Guinness World Record attempt — achieved 99% data accuracy using Python data-cleaning and EDA scripts under a tight deadline. |
|
FinBERT · XGBoost · MLflow · SHAP · Streamlit · SEC EDGAR
Scored 264 earnings transcripts across 40 S&P 500 tickers — built a 3-layer weighted NLP pipeline achieving 75% classification accuracy and a 2.17 Signal Sharpe Ratio.
- 🧠 3-layer pipeline: 40% hedging detection · 35% FinBERT sentiment · 25% vocabulary signals
- ⚡ XGBoost ensemble (WEIGHTS_V2) → +2.83pp accuracy improvement; validated across 10 ablation experiments
- 🔍 Identified hedging as critical signal with 18.5pp source-quality gap (Motley Fool vs SEC EDGAR)
- 📊 Streamlit dashboard with MLflow tracking + SHAP explainability + percentile-based signals
Python FinBERT XGBoost MLflow SHAP Streamlit yfinance SEC EDGAR
LangGraph · FastAPI · Groq/LLaMA · Pinecone · Redis · SSE
Production-grade 5-agent LangGraph pipeline: Task Planner → Code Generator → Tester → Debugger → Reviewer — autonomously plans, generates, tests, and reviews code end-to-end.
- ⚡ Real-time SSE streaming + Pinecone vector cache for code-pattern reuse
- 🔄 Redis pub/sub message queue with in-memory fallback + retry with exponential backoff (3 attempts)
- 🌐 Full FastAPI + Swagger UI REST API (5 endpoints incl. SSE stream + session management)
- 🛡️ Code review scoring: security · performance · maintainability
LangGraph FastAPI Groq LLaMA-70B Pinecone Redis SSE Python
Flask · LangGraph · SQLAlchemy · TF-IDF · Gmail API · Celery
Full-stack autonomous recruiting system with 4-agent LangGraph architecture — ingests candidates, scores, tiers, and sends Gmail-based recruiter loops via OAuth2 automation.
- 📊 Weighted NLP scoring: TF-IDF similarity + answer quality + GitHub scoring → FastTrack / Standard / Review / Reject tiers
- 🔁 Self-learning scoring loop that updates dynamic weights from hiring outcomes
- 🛡️ Anti-cheat layer: AI-answer detection + copy-ring clustering (TF-IDF cosine) + suspicious timing flags
- 🔧 Flask REST API (12 endpoints) + Celery/Redis async queue + full pytest coverage
Flask LangGraph SQLAlchemy TF-IDF Gmail API Celery Redis Groq
XGBoost · Random Forest · SHAP · Google Earth Engine · GeoPandas · K-Means
End-to-end geospatial ML platform predicting Urban Heat Island intensity using satellite + terrain data — deployable for any global city in 5–10 minutes.
- 🛰️ GEE data ingestion: MODIS/Landsat LST, NDVI, NDBI, ESA WorldCover, SRTM DEM
- 📍 K-Means UHI zone classification + Getis-Ord Gi* hotspot analysis + SHAP explainability
- 🗺️ CLI outputs: publication-quality infographics + interactive multi-layer HTML map
- 🎯 XGBoost predicts Land Surface Temperature with Heat Risk Scores (Low / Medium / High)
Python XGBoost Random Forest SHAP Google Earth Engine GeoPandas K-Means
ARIMA · Prophet · Random Forest · Scikit-learn · Streamlit · Plotly
Dual-model ML health platform: ARIMA + Prophet forecasting (2.3-day MAE) + Random Forest PCOS classifier (87.3% accuracy, 0.91 AUC-ROC).
- 🩺 15+ clinical features with cross-validation, stratified sampling, automated imputation
- 📊 5-module Streamlit platform: period logging · water tracking · risk assessment · analytics · nutrition insights
- 📈 Interactive Plotly dashboards + CSV export
ARIMA Prophet Random Forest Scikit-learn Streamlit Plotly
Pandas · NumPy · Scikit-learn · Pearson Correlation · Papermill
Born at EY — open-sourced for the community. Automated price segmentation across 157 retail segments × 3 channels — replaced 2+ weeks of manual Excel work with a sub-15-minute pipeline.
Python Pandas Pearson Clustering Papermill Excel Scikit-learn
Languages & Databases
ML / DL / Forecasting
NLP & LLMs
Data & Visualization
APIs, Deployment & DevOps
Tools
| 🏆 | Detail |
|---|---|
| 🥇 | Elite Certificate — Python for Data Science, NPTEL IIT Madras (2025) |
| 🏛️ | Workshop — Fundamentals of AI for Entrepreneurs & Professionals, IIT Bombay (2025) |
| 🎯 | Selected — Infosys Springboard Pragati Cohort 3 |
| 💎 | SheFi Scholar — Cohort 1 |
| 📜 | What is Data Science? — IBM / Coursera |
| 📊 | Foundations: Data, Data, Everywhere — Google Data Analytics Certificate |
| 🌍 | German Language (A1) — Dept. of Foreign Languages, SPPU (2024–25) |
| Status | Area | Details |
|---|---|---|
| ✅ | Core ML & Feature Engineering | Scikit-learn, XGBoost, cross-validation, EDA |
| ✅ | Time-Series Forecasting | ARIMA, Prophet, walk-forward backtesting |
| ✅ | NLP & Transformers | FinBERT, TF-IDF, sentiment analysis, NER |
| ✅ | Multi-Agent AI Systems | LangGraph, ReAct, LangSmith, Groq/LLaMA |
| ✅ | REST APIs & Dashboards | FastAPI, Flask, Streamlit, Plotly |
| ✅ | Experiment Tracking | MLflow, SHAP explainability |
| 🔄 | Geospatial ML | GeoPandas, Google Earth Engine, spatial analysis |
| 🔄 | LLM Fine-tuning & RAG | Hugging Face, vector DBs, Pinecone |
| 📅 | MLOps & Deployment | Docker, CI/CD, model serving at scale |
| 📅 | Deep Learning | LSTM, CNN, PyTorch workflows |
| 🎯 | AI Research Contributions | Open-source, papers, production AI systems |
