Machine Learning-based Credit Scoring System that predicts customer creditworthiness using Logistic Regression and Random Forest, achieving 79% accuracy through feature engineering and financial risk analysis.
This project develops a Machine Learning-based Credit Scoring System to evaluate the creditworthiness of loan applicants. By analyzing customer financial and demographic information, the model predicts whether an applicant represents a good or bad credit risk.
The goal is to support data-driven lending decisions, reduce the likelihood of loan defaults, and improve overall risk assessment.
The dataset contains information related to loan applicants, including:
- Account status
- Credit history
- Loan purpose
- Credit amount
- Savings account status
- Employment duration
- Payment-to-income ratio
- Age
- Housing status
- Number of existing credits
- Guarantor information
- Collateral details
Target Variable:
- Good Credit Risk
- Bad Credit Risk
Performed exploratory analysis to:
- Understand feature distributions
- Identify class imbalance
- Analyze relationships between variables
- Detect potential outliers and trends
To improve predictive performance, additional financial indicators were created:
credit_per_month = credit_amount / month_durationcredit_age_ratio = credit_amount / ageburden_score = credit_amount * payment_to_income_ratioThese engineered features help capture an applicant's financial burden and repayment capability.
- Label Encoding for ordinal categorical features
- One-Hot Encoding for nominal categorical features
- Feature Scaling using StandardScaler
- Train-Test Split for model evaluation
The following classification algorithms were trained and evaluated:
A baseline linear classification model used for credit risk prediction.
An ensemble learning model that combines multiple decision trees to improve prediction accuracy and robustness.
| Model | Accuracy |
|---|---|
| Logistic Regression | 77% |
| Random Forest Classifier | 78.5% |
The Random Forest model achieved the best overall performance and was selected as the final model.
- Python
- Pandas
- NumPy
- Matplotlib
- Scikit-learn
- Data Collection
- Data Cleaning and Preprocessing
- Exploratory Data Analysis
- Feature Engineering
- Model Training
- Model Evaluation
- Performance Comparison
- Built a Credit Risk Prediction System
- Applied Feature Engineering to financial data
- Compared multiple Machine Learning models
- Evaluated model performance using classification metrics
- Automated creditworthiness assessment
- Hyperparameter Tuning using GridSearchCV
- Implementation of XGBoost and LightGBM
- Credit Score Generation (300–850 scale)
- Deployment using Streamlit or Flask
- Real-time Credit Risk Assessment Dashboard
Janani M
Machine Learning | Data Science | Python