Fine-tuned VGG16 model trained to detect and classify aircraft structural damage from images. Integrates a BLIP image captioning layer to generate natural language descriptions of detected damage. Built with TensorFlow/Keras for classification and HuggingFace Transformers for vision-language generation.
This project combines transfer learning and vision-language modelling to build an end-to-end aircraft damage detection pipeline. Given an image of an aircraft, the system:
- Classifies whether the aircraft is damaged or undamaged
- Generates a natural language caption or summary describing the damage
Input Image (128×128×3)
↓
VGG16 Base (frozen, pretrained on ImageNet)
↓
Flatten
↓
Dense(512) + ReLU + Dropout(0.3)
↓
Dense(512) + ReLU + Dropout(0.3)
↓
Dense(1) + Sigmoid
↓
Binary Output (damaged / undamaged)
Input Image
↓
BlipProcessor
↓
BlipForConditionalGeneration
↓
Natural Language Caption / Summary
Aircraft Damage Using Pretrained Model/
│
├── main.py # Main training and inference script
├── requirements.txt # Project dependencies
├── README.md # Project documentation
│
└── aircraft_damage_dataset_v1/ # Auto-downloaded dataset
├── train/
│ ├── damaged/
│ └── undamaged/
├── valid/
│ ├── damaged/
│ └── undamaged/
└── test/
├── damaged/
└── undamaged/
- Python 3.11 (recommended)
- pip
# Clone the repository
git clone https://github.com/yourusername/aircraft-damage-detector.git
cd aircraft-damage-detector
# Create and activate virtual environment
py -3.11 -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # macOS/Linux
# Install dependencies
pip install -r requirements.txtpython main.pyThe script will automatically download and extract the dataset, train the model, evaluate it, and run BLIP captioning on sample images.
| Package | Purpose |
|---|---|
tensorflow |
Model training and Keras layers |
keras |
High-level neural network API |
torch |
Required by BLIP/HuggingFace |
transformers==4.35.0 |
BLIP processor and model |
Pillow |
Image loading and processing |
numpy |
Numerical operations |
matplotlib |
Plotting training curves |
Install all at once:
pip install tensorflow torch transformers==4.35.0 Pillow numpy matplotlib| Parameter | Value |
|---|---|
| Base model | VGG16 (ImageNet weights) |
| Image size | 128 × 128 |
| Batch size | 32 |
| Epochs | 10 |
| Optimiser | Adam (lr=0.0001) |
| Loss | Binary Crossentropy |
| Metric | Accuracy |
- Horizontal flip
- Vertical flip
- Rotation (±20°)
- Rescale (1/255)
The BlipCaptionSummaryLayer is a custom Keras layer that wraps the BLIP model using tf.py_function to bridge PyTorch and TensorFlow.
Supports two tasks:
# Generate a short caption
caption = blip_layer(image_path, tf.constant("caption"))
# Generate a detailed summary
summary = blip_layer(image_path, tf.constant("summary"))After training, the following plots are generated:
- Training vs Validation Loss curve
- Training vs Validation Accuracy curve
- Sample prediction with true vs predicted label
- BLIP caption and summary for test images
transfer-learning binary-classification vgg16 pretrained-models feature-extraction
data-augmentation image-preprocessing blip-image-captioning huggingface-transformers
custom-keras-layers vision-language-models pytorch-tensorflow-bridge deep-learning
computer-vision tensorflow keras
This project is licensed under the MIT License.
- Dataset provided by IBM Skills Network
- VGG16 pretrained weights from ImageNet
- BLIP model by Salesforce Research
- HuggingFace Transformers