(Mask R-CNN + Camera Calibration)
This project implements a full computer vision pipeline for detecting, segmenting, and measuring real-world dimensions of desired object, in this case envelope is selected using:
- Camera calibration (lens distortion removal)
- Mask R-CNN instance segmentation
- Pixel-to-mm conversion using a reference object
- Automated width/height measurement system
The system outputs real-world measurements (mm) from images using deep learning + classical vision.
- Camera calibration using checkerboard images
- Image undistortion for geometric accuracy
- Envelope detection using Mask R-CNN (PyTorch)
- Pixel-to-mm conversion using reference ID card
- Automatic width & height measurement
- Visualization of results with annotations
- Full dataset pipeline (train/val/test split)
- Evaluation metrics and error analysis
xis_assessment/ ├── calibration/ ├── dataset/ ├── models/ ├── measurement/ ├── docs/ ├── requirements.txt └── README.md
more detailed in SETUP.md
- Object: Envelope
- Dimensions: 139 mm × 90 mm
- Task: Instance Segmentation + Measurement
- Framework: PyTorch (Mask R-CNN)
- Checkerboard-based calibration using OpenCV
- Computes:
- Camera matrix
- Distortion coefficients
- Removes lens distortion for accurate measurement
- Images captured using smartphone camera
- Conditions:
- Different lighting
- Multiple angles
- Varied backgrounds
- Annotation tool: CVAT (COCO format)
- Split:
- Train: 70%
- Validation: 20%
- Test: 10%
- Model: Mask R-CNN (ResNet-50-FPN)
- Framework: PyTorch
- Task: Instance segmentation of envelope
Hyperparameters:
| Parameter | Value |
|---|---|
| Epochs | 50 |
| Batch Size | 2 |
| Learning Rate | 0.005 |
- ID Card (85.6 mm × 54.0 mm)
- Used for pixel-to-mm conversion
- Detected using HSV color segmentation
- Detect reference card
- Compute pixels per mm
- Detect envelope using Mask R-CNN
- Extract mask and contour
- Fit rotated bounding box (
minAreaRect) - Convert pixels → mm
| Metric | Value |
|---|---|
| mAP@0.5 | 1.00 |
| mAP@0.5:0.95 | 0.9723 |
| Mean IoU | 0.957 |
| Precision | 1.00 |
| Recall | 1.00 |
| Metric | Width | Height |
|---|---|---|
| MAE | 3.23 mm | 7.01 mm |
| MPE | 3.59% | 5.04% |
- Requires reference card in every image
- Sensitive to lighting and color similarity (HSV detection)
- Assumes flat object plane
- Accuracy decreases with increased camera distance
git clone
cd xis_assessmentpip install -r requirements.txtpython calibration/calibrate.pypython calibration/undistort_dataset.pypython models/train.pypython models/evaluate.pypython measurement/measurement.pyTrained model: models/maskrcnn_final.pth
Loss curves: models/loss_curves.png
Evaluation metrics: models/evaluation_results.json
Measurement results: measurement/outputs/
- Python
- PyTorch
- Torchvision (Mask R-CNN)
- OpenCV
- NumPy
- Matplotlib
- CVAT (annotation tool)