Skip to content

layamaryjoy/Codeathon-Intermediate-Assessment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš— Car Price Prediction – Regression Model Building

Python Machine Learning Model Platform

A supervised machine learning project focused on implementing and comparing multiple regression algorithms to predict car prices in the American automobile market.


πŸš€ Run Notebook in Google Colab

Click below to open the notebook:

Open In Colab


πŸ“˜ Project Overview

This project is based on a business problem where an automobile company plans to enter the US market and aims to understand key factors influencing car prices.

The complete pipeline includes:

  • Data Cleaning

  • Exploratory Data Analysis (EDA)

  • Feature Engineering

  • Feature Scaling

  • Model Building

  • Model Evaluation

  • Hyperparameter Tuning

  • Model Comparison

The goal is to build an accurate regression model to predict car prices and support data-driven pricing strategies.


🎯 Objective

The main objectives of this project are:

πŸ”Ή Identify variables affecting car price

πŸ”Ή Analyze relationships between features and price

πŸ”Ή Build multiple regression models

πŸ”Ή Compare model performances

πŸ”Ή Optimize the best model using hyperparameter tuning

πŸ”Ή Provide business insights for pricing strategy


πŸ“‚ Dataset Description

Component Description
Records 205 cars
Features 25+ independent variables
Target Variable price
Data Type Numerical and Categorical

Key Features:

  • Engine size

  • Horsepower

  • Fuel type

  • Drive wheel type

  • Car dimensions

  • Brand

  • Mileage

  • Technical specifications


🧹 Data Preprocessing

The following preprocessing steps were performed:

βœ” Loaded dataset using Pandas

βœ” Initial exploration (shape, info, summary statistics)

βœ” Handled missing values

βœ” Removed duplicate records

βœ” Detected and handled outliers

βœ” Feature engineering (brand extraction)

βœ” Encoded categorical variables

βœ” Split dataset into Training (80%) and Testing (20%)

βœ” Scaled features using StandardScaler


πŸ“Š Exploratory Data Analysis (EDA)

Visualizations used:

  • Histogram

  • Box Plot

  • Heatmap Correlation

  • Scatter Plot

Insights:

  • Identified strong relationships between engine size, horsepower, and price

  • Detected outliers affecting model performance

  • Understood feature distributions


πŸ€– Regression Models Implemented

The following models were trained and evaluated:

  • Linear Regression

  • Decision Tree Regressor

  • Random Forest Regressor

  • Gradient Boosting Regressor

  • Support Vector Regressor (SVR)

  • Pruned Decision Tree


πŸ“Š Model Performance (Before Tuning)

Model RΒ² Score MSE MAE
Random Forest Regressor 0.913 0.0136 0.0882
Gradient Boosting Regressor 0.908 0.0143 0.0838
Linear Regression 0.879 0.0189 0.1054
Decision Tree Regressor 0.860 0.0219 0.1062
Support Vector Regressor 0.831 0.0263 0.1051
Pruned Decision Tree 0.831 0.0263 0.0958

βœ… Best Model: Random Forest Regressor

❌ Worst Model: Pruned Decision Tree


βš™οΈ Hyperparameter Tuning

  • Applied GridSearchCV

  • Used Pipeline to prevent data leakage

  • Performed 5-Fold Cross Validation

Parameters Tuned:

  • n_estimators

  • max_depth

  • min_samples_split


πŸ“Š Model Performance (After Tuning)

Model RΒ² Score MSE MAE
Random Forest (Untuned) 0.9129 0.0136 0.0882
Random Forest (Tuned) 0.9131 0.0136 0.0847

πŸ† Best Model: Tuned Random Forest Regressor

Final Metrics:

  • RΒ² Score: 0.9131

  • MSE: 0.0136

  • MAE: 0.0847

βœ… Model performance improved after hyperparameter tuning


πŸ† Final Model

βœ” Random Forest Regressor selected

βœ” Optimized using hyperparameter tuning

βœ” Provides stable and accurate predictions


πŸ” Feature Importance Analysis

Important features influencing car price:

  • Engine size

  • Curb weight

  • Horsepower

  • Car width

  • highwaympg

These features play a major role in determining car pricing strategy.


πŸ“Š Evaluation Metrics

  • RΒ² Score

  • Mean Squared Error (MSE)

  • Mean Absolute Error (MAE)


⚠️ Limitations

  • Dataset size is relatively small (205 records)

  • Limited features may not capture all real-world pricing factors

  • Market dynamics and external economic factors are not included

  • Model may not generalize well to different regions or time periods

  • Performance may vary for unseen or highly diverse car categories


πŸ›  Tech Stack

Tool Purpose
Python Programming language
Pandas Data handling
NumPy Numerical computation
Matplotlib / Seaborn Visualization
Scikit-learn ML models & preprocessing
Google Colab Development

πŸ“ Repository Structure

car-price-prediction/

β”‚

β”œβ”€β”€ CarPrice_Assignment.csv

β”œβ”€β”€ Car_Price_Prediction.ipynb

β”œβ”€β”€ README.md


πŸš€ How to Run the Project

1️⃣ Open Notebook

Click the Google Colab link above


2️⃣ Install Dependencies

pip install pandas numpy matplotlib seaborn scikit-learn

3️⃣ Run the Notebook

  • Execute all cells step-by-step

  • Analyze model performance


πŸ“Œ Business Insight

This project helps:

  • Identify key drivers of car pricing

  • Optimize product features

  • Support strategic pricing decisions

  • Enable data-driven business planning


πŸ“Œ Academic Submission

This project was created as part of a Machine Learning & Data Science program, showcasing end-to-end regression modeling, including data preprocessing, EDA, feature engineering, model comparison, and optimization for car price prediction.


πŸ‘€ Author

Name: Laya Mary Joy

Organization: Entri Elevate

Date: February 14, 2026


⭐ Acknowledgment

Thanks to Entri Elevate for guidance and support.


πŸ“Œ Future Improvements

  • Use larger and more diverse datasets

  • Include real-time market data

  • Try advanced models (XGBoost, LightGBM)

  • Deploy as a web application


Releases

No releases published

Packages

 
 
 

Contributors