A supervised machine learning web application built using Streamlit to predict students' final grades based on study habits, attendance, motivation, learning style, and other educational features.
π Click below to use the deployed app:
π https://laya123-star-student-performance-predictor-app-o3lju5.streamlit.app/
Click below to open the notebook:
This project builds and deploys a Machine Learning Classification Model to predict students' final grades.
The complete pipeline includes:
-
Data Cleaning
-
Exploratory Data Analysis (EDA)
-
Feature Engineering
-
Feature Scaling
-
Feature Selection
-
Model Building
-
Hyperparameter Tuning
-
Model Evaluation
-
Model Deployment using Streamlit
Users can input student details and get real-time grade predictions.
The main objectives of this project are:
πΉ Analyze student performance data
πΉ Identify key factors affecting academic success
πΉ Build multiple ML classification models
πΉ Compare model performances
πΉ Optimize the best model using hyperparameter tuning
πΉ Deploy the model using Streamlit
| Component | Description |
|---|---|
| Target Variable | FinalGrade |
| Problem Type | Multi-class Classification |
| Classes | A, B, C, D, F |
| Features | StudyHours, Attendance, Motivation, LearningStyle, Extracurricular, Resources, OnlineCourses, etc. |
The following preprocessing steps were performed:
β Loaded dataset using Pandas
β Handled missing values
β Removed duplicate records
β Detected and removed outliers
β Scaled numerical features using StandardScaler
β Split dataset into Training (80%) and Testing (20%)
Visualizations used:
-
Histogram
-
Box Plot
-
Count Plot
-
Heatmap Correlation
-
Bar Plot
-
KDE Plot
- Understood feature distributions
- Identified outliers
- Analyzed feature relationships
- Checked class balance
The following models were trained and evaluated:
-
Decision Tree
-
Random Forest
-
k-Nearest Neighbors (k-NN)
-
Support Vector Machine (SVM)
-
Naive Bayes
-
Logistic Regression
| Model | Accuracy | ROC-AUC |
|---|---|---|
| Decision Tree | 0.878 | 0.919 |
| Random Forest | 0.824 | 0.946 |
| k-NN | 0.401 | 0.661 |
| SVM (RBF) | 0.352 | 0.606 |
| Naive Bayes | 0.287 | 0.532 |
| Logistic Regression | 0.284 | 0.526 |
β Best Model: Decision Tree
β Worst Model: Logistic Regression
-
Applied GridSearchCV
-
Used Pipeline to avoid data leakage
-
Tuned key parameters for better performance
| Model | Accuracy | ROC-AUC | Best CV Score |
|---|---|---|---|
| Random Forest | 0.901 | 0.986 | 0.859 |
| k-NN | 0.899 | 0.950 | 0.860 |
| Decision Tree | 0.878 | 0.919 | 0.839 |
| SVM (RBF) | 0.402 | 0.657 | 0.395 |
| Naive Bayes | 0.287 | 0.532 | 0.255 |
| Logistic Regression | 0.284 | 0.526 | 0.235 |
π Best Model: Random Forest Classifier
-
Accuracy: 90.1%
-
ROC-AUC: 0.986
β Random Forest Classifier selected
β Optimized using hyperparameter tuning
β Saved using Pickle for deployment
import pickle
pickle.dump(best_rf_model, open("Models/best_rf_model.pkl", "wb"))-
Accuracy Score
-
Confusion Matrix
-
Precision
-
Recall
-
F1-Score
-
ROC-AUC Score
-
Feature Importance
The trained model is deployed using Streamlit.
β User-friendly interface
β Real-time predictions
β Input student data easily
β Instant grade prediction output
-
The model depends on the quality and completeness of the dataset
-
Limited features may not capture all real-world academic influence
-
Performance may vary for students from different educational systems
-
Does not include psychological or external environmental factors
-
Model may not generalize well to unseen or highly imbalanced data
| Tool | Purpose |
|---|---|
| Python | Programming language |
| Pandas | Data handling |
| NumPy | Numerical computation |
| Scikit-learn | ML models & preprocessing |
| Matplotlib / Seaborn | Visualization |
| Pickle | Model saving |
| Streamlit | Web app deployment |
| Google Colab | Development |
student-performance-predictor/
β
βββ app.py
βββ Models/
β βββ best_rf_model.pkl
βββ notebook.ipynb
βββ README.md
git clone <your-repo-link>
cd student-performance-predictorpip install -r requirements.txtstreamlit run app.pyThis repository was developed as part of a Machine Learning & Data Science program to demonstrate data preprocessing, exploratory data analysis (EDA), model building, hyperparameter tuning, and deployment of a machine learning model using Streamlit for real-time student performance prediction.
Name: Laya Mary Joy
Organization: Entri Elevate
Date: February 14, 2026
Thanks to Entri Elevate for guidance and support in building this project.
-
Add more student behavioral features
-
Improve model generalization
-
Deploy using Docker / Cloud platforms
-
Add visualization dashboard