CodeAlpha_Credit-Scoring-Model

Machine Learning-based Credit Scoring System that predicts customer creditworthiness using Logistic Regression and Random Forest, achieving 79% accuracy through feature engineering and financial risk analysis.

Credit Scoring & Credit Risk Prediction using Machine Learning

Project Overview

This project develops a Machine Learning-based Credit Scoring System to evaluate the creditworthiness of loan applicants. By analyzing customer financial and demographic information, the model predicts whether an applicant represents a good or bad credit risk.

The goal is to support data-driven lending decisions, reduce the likelihood of loan defaults, and improve overall risk assessment.

Dataset Information

The dataset contains information related to loan applicants, including:

Account status
Credit history
Loan purpose
Credit amount
Savings account status
Employment duration
Payment-to-income ratio
Age
Housing status
Number of existing credits
Guarantor information
Collateral details

Target Variable:

Good Credit Risk
Bad Credit Risk

Exploratory Data Analysis (EDA)

Performed exploratory analysis to:

Understand feature distributions
Identify class imbalance
Analyze relationships between variables
Detect potential outliers and trends

Feature Engineering

To improve predictive performance, additional financial indicators were created:

Credit Per Month

credit_per_month = credit_amount / month_duration

Credit-Age Ratio

credit_age_ratio = credit_amount / age

Burden Score

burden_score = credit_amount * payment_to_income_ratio

These engineered features help capture an applicant's financial burden and repayment capability.

Data Preprocessing

Label Encoding for ordinal categorical features
One-Hot Encoding for nominal categorical features
Feature Scaling using StandardScaler
Train-Test Split for model evaluation

Machine Learning Models

The following classification algorithms were trained and evaluated:

Logistic Regression

A baseline linear classification model used for credit risk prediction.

Random Forest Classifier

An ensemble learning model that combines multiple decision trees to improve prediction accuracy and robustness.

Model Performance

Model	Accuracy
Logistic Regression	77%
Random Forest Classifier	78.5%

The Random Forest model achieved the best overall performance and was selected as the final model.

Technologies Used

Python
Pandas
NumPy
Matplotlib
Scikit-learn

Project Workflow

Data Collection
Data Cleaning and Preprocessing
Exploratory Data Analysis
Feature Engineering
Model Training
Model Evaluation
Performance Comparison

Key Outcomes

Built a Credit Risk Prediction System
Applied Feature Engineering to financial data
Compared multiple Machine Learning models
Evaluated model performance using classification metrics
Automated creditworthiness assessment

Future Improvements

Hyperparameter Tuning using GridSearchCV
Implementation of XGBoost and LightGBM
Credit Score Generation (300–850 scale)
Deployment using Streamlit or Flask
Real-time Credit Risk Assessment Dashboard

Author

Janani M

Machine Learning | Data Science | Python

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ML4_German_credit.ipynb		ML4_German_credit.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeAlpha_Credit-Scoring-Model

Credit Scoring & Credit Risk Prediction using Machine Learning

Project Overview

Dataset Information

Exploratory Data Analysis (EDA)

Feature Engineering

Credit Per Month

Credit-Age Ratio

Burden Score

Data Preprocessing

Machine Learning Models

Logistic Regression

Random Forest Classifier

Model Performance

Technologies Used

Project Workflow

Key Outcomes

Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeAlpha_Credit-Scoring-Model

Credit Scoring & Credit Risk Prediction using Machine Learning

Project Overview

Dataset Information

Exploratory Data Analysis (EDA)

Feature Engineering

Credit Per Month

Credit-Age Ratio

Burden Score

Data Preprocessing

Machine Learning Models

Logistic Regression

Random Forest Classifier

Model Performance

Technologies Used

Project Workflow

Key Outcomes

Future Improvements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages