Spaces:
Sleeping
title: DamageLensAI
sdk: docker
emoji: ⚡
colorFrom: red
colorTo: purple
pinned: true
🚗 DamageLens: AI-Powered Car Damage Detection
⚠️ Important Notes
Cold Startup Time: The API may take 4-5 minutes on the first request to warm up the models. Subsequent predictions will be significantly faster.
Model Size: The Fusion model is computationally intensive. Individual predictions typically complete in 30-60 seconds depending on hardware.
APP LINK : https://junaidariie.github.io/DamageLensAI/
HF REPO : https://huggingface.co/spaces/junaid17/DamageLensAI/tree/main
📋 Table of Contents
- Overview
- Features
- Architecture
- Model Performance
- CI Pipeline
- Setup & Installation
- Usage
- API Documentation
- Model Optimization
- Dataset & Training
- Web UI Features
- Directory Structure
- Limitations & Known Issues
🎯 Overview
DamageLens is an advanced AI system for detecting and classifying car damage using multi-model fusion architecture. It combines the power of ResNet-18, EfficientNet-V2-S, and ConvNeXt-Small to achieve robust damage classification across vehicle front and rear sections.
The system can identify six damage categories:
- ✅ Front Normal / Front Breakage / Front Crushed
- ✅ Rear Normal / Rear Breakage / Rear Crushed
Additionally, it uses YOLO object detection to localize damage regions with bounding boxes.
✨ Features
| Feature | Description |
|---|---|
| Dual Model Architecture | ResNet (lightweight) and Fusion (high-accuracy) options |
| Grad-CAM Visualization | Understand which image regions drive predictions |
| Real-time YOLO Detection | Localize damage with confidence scores |
| FP16 Optimization | Reduced model size (788MB → 135MB) with minimal accuracy loss |
| FastAPI Backend | High-performance REST API with async support |
| Responsive Web UI | Modern, interactive web interface with real-time feedback |
| Static File Serving | Efficient caching and delivery of results |
| CI/CD Pipeline | Automated testing via GitHub Actions on every push/PR |
| HuggingFace Integration | Models auto-downloaded from HF Hub on first startup |
🏗️ Architecture
System Overview
┌──────────────────────────────────────────────────────┐
│ Frontend (Web UI) │
│ HTML / CSS / JavaScript (Dark Mode, Glassmorphism) │
│ ├─ Drag & Drop Image Upload │
│ ├─ Model Selection (Fusion / ResNet) │
│ └─ Real-time Result Tabs (Prediction/GradCAM/YOLO) │
└───────────────────┬──────────────────────────────────┘
│ REST API (JSON)
┌───────────────────▼──────────────────────────────────┐
│ FastAPI Backend (app.py) │
│ ├─ POST /predict/resnet → ResNet inference │
│ ├─ POST /predict/fusion → Fusion inference │
│ ├─ POST /predict?mode=* → Grad-CAM generation │
│ └─ POST /predict/yolo → YOLO detection │
│ │
│ Lifespan: models loaded once at startup │
│ Static: /static/uploads /static/results │
└──────┬───────────┬──────────────┬────────────────────┘
│ │ │
┌──────▼──┐ ┌─────▼──────┐ ┌───▼──────────┐
│ ResNet │ │ Fusion │ │ YOLO v11m │
│ (77%) │ │ (84%) │ │ Detection │
└──────┬──┘ └─────┬──────┘ └───┬──────────┘
│ │ │
└─────┬─────┘ │
│ │
┌───────▼──────┐ ┌────────▼────────┐
│ Grad-CAM │ │ Bounding Boxes │
│ Heatmaps │ │ + Confidence │
└──────────────┘ └─────────────────┘
Model Loading (scripts/load_models.py)
Startup
│
├─ hf_hub_download("junaid17/car-damage-classifier")
│ └─> ResnetCarDamagePredictor(checkpoint, class_map)
│
├─ hf_hub_download("junaid17/best_fusion_model_fp16")
│ └─> FusionCarDamagePredictor(checkpoint, class_map)
│
└─ hf_hub_download("junaid17/Yolo_Model")
└─> YOLO(checkpoint)
Fusion Model (High Accuracy — 84%)
┌─────────────────────────────────────────────────────────────────┐
│ INPUT IMAGE │
│ (3, 260, 260) │
└────────────────┬────────────────────────────────┬──────────────┘
│ │
┌───────▼────────┐ ┌─────────▼────────┐
│ EfficientNet- │ │ ConvNeXt-Small │
│ V2-S Backbone │ │ Backbone │
│ │ │ │
│ Frozen except │ │ Frozen except │
│ features[5,6,7]│ │ stages[2,3] + │
│ (unfrozen) │ │ layernorm │
└───────┬────────┘ └─────────┬────────┘
│ │
┌───────▼────────┐ ┌─────────▼────────┐
│ AdaptiveAvg │ │ Pooler Output │
│ Pool → Flatten │ │ │
└───────┬────────┘ └─────────┬────────┘
│ (1280,) │ (768,)
└──────────────┬─────────────────┘
│
┌───────▼────────┐
│ CONCATENATE │
│ 1280 + 768 │
│ = (2048,) │
└───────┬────────┘
│
┌───────────▼───────────┐
│ FUSION HEAD │
│ Dropout(0.4) │
│ Linear(2048 → 512) │
│ LayerNorm(512) │
│ GELU() │
│ Dropout(0.3) │
│ Linear(512 → 256) │
│ LayerNorm(256) │
│ GELU() │
│ Dropout(0.2) │
│ Linear(256 → 6) │
└───────────┬───────────┘
│
┌───────▼────────┐
│ OUTPUT LOGITS │
│ (6 classes) │
└────────────────┘
Optimizer: AdamW with per-group learning rates
- EfficientNet features[5]: lr=1e-5
- EfficientNet features[6,7]: lr=3e-5
- ConvNeXt stages[2,3] + layernorm: lr=3e-5
- Fusion head: lr=1e-4
- Loss: CrossEntropyLoss with label_smoothing=0.1
- Early stopping patience: 7
ResNet-18 (Lightweight — 77%)
┌──────────────────────────────────┐
│ INPUT IMAGE │
│ (3, 128, 128) │
└───────────────┬──────────────────┘
│
┌───────▼─────────┐
│ ResNet-18 │
│ Backbone │
│ │
│ Frozen except │
│ layer3, layer4 │
└───────┬─────────┘
│ (512,)
┌───────▼─────────────────────┐
│ Classification Head │
│ Dropout(0.5) │
│ Linear(512 → 256) │
│ ReLU() │
│ Dropout(0.3) │
│ Linear(256 → 6 classes) │
└───────┬─────────────────────┘
│
┌───────▼──────────┐
│ OUTPUT LOGITS │
│ (6 classes) │
└──────────────────┘
Optimizer: AdamW with per-group learning rates
- layer3: lr=1e-5
- layer4: lr=1e-5
- fc head: lr=1e-4
- Loss: CrossEntropyLoss
- Early stopping patience: 7
YOLO v11m Integration
┌─────────────────────────────┐
│ INPUT IMAGE │
│ imgsz=640, conf=0.05 │
└──────────────┬──────────────┘
│
┌───────▼────────┐
│ YOLO v11m │
│ Inference │
└───────┬────────┘
│
┌──────────┴──────────┐
│ │
┌───▼───────┐ ┌──────▼──────┐
│ Bboxes │ │ Confidence │
│ (x1,y1, │ │ Scores + │
│ x2,y2) │ │ Class Label │
└───┬───────┘ └──────┬──────┘
└──────────┬──────────┘
│
┌───────▼────────┐
│ result.plot() │
│ Save to disk │
└────────────────┘
Grad-CAM Pipeline (scripts/gradcam.py)
Image Path
│
├─ ResNet mode: target_layer = model.layer4[-1]
└─ Fusion mode: target_layer = model.eff_features[-1]
(FP16 → FP32 cast on CPU automatically)
│
├─ Register forward hook (_GradCAMHook)
├─ Forward pass → score.backward()
├─ acts [C,H,W] × weights (mean of grads) → CAM [H,W]
├─ ReLU → normalize → resize to original dims
└─ cv2.applyColorMap(COLORMAP_JET) → addWeighted overlay
Data Pipeline (src/data/)
Raw Images (data/dataset/)
│
├─ ingestion.py → scan folders, build file list
├─ preprocessing.py → validate / clean images
├─ augmentation.py → train/val transforms
│ ResNet: Resize(128,128) + HFlip + Rotation(15°) + ColorJitter
│ Fusion: Resize(260,260) + HFlip + Rotation(10°) + ColorJitter
└─ dataset.py → ImageFolder DataLoaders
(train 80% / val 20%, seed=42)
Export & Deployment (src/export/)
Trained Checkpoints (checkpoints/)
│
├─ conver_model.py → FP32 → FP16 conversion
│ 788MB → 135MB (82.9% reduction)
└─ upload_to_huggingface.py → HfApi upload to:
junaid17/new-damagelens-resnet-classifier
junaid17/new-damagelens-fusion-fp16
junaid17/new-damagelens-yolo-detector
📊 Model Performance
Fusion Model (High Accuracy — 84% Overall)
Classification Report:
Confusion Matrix:
Training Curves:
ResNet-18 (Lightweight — 77% Overall)
Classification Report:
Confusion Matrix:
Training Curves:
YOLO Detection Results
🔁 CI Pipeline
DamageLens uses GitHub Actions for continuous integration. Every push or pull request to main, master, or dev triggers the full test suite automatically.
CI Screenshot (GitHub Actions — All Tests Passing):
What the pipeline tests:
| Step | Test File | What it covers |
|---|---|---|
| Config | test_config.py |
Paths, constants, class map |
| Ingestion | test_ingestion.py |
Dataset folder scanning |
| Preprocessing | test_preprocessing.py |
Image validation & cleaning |
| Augmentation | test_augmentation.py |
Transform pipelines |
| Dataset | test_dataset.py |
DataLoader creation |
| ResNet Architecture | test_resnet_model.py |
Model init & forward pass |
| ResNet Training | test_train_resnet.py |
Smoke test training loop |
Pipeline config (.github/workflows/ci.yaml):
- Runs on:
ubuntu-latest - Python:
3.10 - Triggers: push & PR to
main/master/dev
🚀 Setup & Installation
Prerequisites
- Python 3.11+
- CUDA 11.8+ (for GPU acceleration, optional but recommended)
- 8GB+ RAM (16GB recommended for Fusion model)
Installation Steps
# Clone the repository
git clone https://github.com/junaid17/damagelens.git
cd DamageLens
# Create virtual environment
python -m venv myvenv
source myvenv/bin/activate # On Windows: myvenv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create required directories
mkdir -p static/uploads static/results checkpoints assets
Download Pre-trained Models
Models are automatically downloaded from Hugging Face on first use:
car-damage-classifier.pt— ResNet-18 checkpointbest_fusion_model_fp16.pt— Fusion model (FP16 optimized, 135MB)damage_detector.pt— YOLO v11m model
💻 Usage
Running the FastAPI Server
uvicorn app:app --reload --host 127.0.0.1 --port 8000
Open your browser at http://127.0.0.1:8000
Quick Start:
- Upload a car image (JPG/PNG)
- Select analysis mode: Fusion (accurate) or ResNet (fast)
- Click "Run AI Analysis"
- View results in tabs:
- 📊 Prediction: Confidence scores and probabilities
- 👀 Grad-CAM: Visualize which regions influenced the prediction
- 🎯 YOLO: Damage bounding boxes with confidence
Python API Example
import requests
with open('car_image.jpg', 'rb') as f:
files = {'image': f}
resp = requests.post('http://127.0.0.1:8000/predict/resnet', files=files)
print(resp.json())
with open('car_image.jpg', 'rb') as f:
files = {'image': f}
resp = requests.post('http://127.0.0.1:8000/predict/fusion', files=files)
print(resp.json())
📡 API Documentation
POST /predict/resnet
Content-Type: multipart/form-data
Body: image (File)
Response:
{
"status": "success",
"prediction": {
"Rear Normal": 0.47,
"Front Normal": 0.25,
...
}
}
POST /predict/fusion
Content-Type: multipart/form-data
Body: image (File)
Response:
{
"status": "success",
"prediction": {
"Rear Normal": 0.49,
"Front Normal": 0.35,
...
}
}
POST /predict?mode={resnet|fusion} — Grad-CAM
Content-Type: multipart/form-data
Body: file (File), mode (String)
Response:
{
"status": "success",
"mode": "fusion",
"original_image": "/static/uploads/{uuid}_input.jpg",
"selected_viz": "/static/results/{uuid}_fusion.jpg",
"resnet_viz": null,
"fusion_viz": "/static/results/{uuid}_fusion.jpg"
}
POST /predict/yolo
Content-Type: multipart/form-data
Body: file (File)
Response:
{
"status": "success",
"original_image": "/static/uploads/{uuid}_input.jpg",
"yolo_image": "/static/results/{uuid}_yolo.jpg",
"detections": [
{ "label": "damage", "confidence": 0.87, "box": [x1, y1, x2, y2] }
],
"total_detections": 2,
"message": "Detections found"
}
🔧 Model Optimization
FP16 Conversion (Fusion Model)
Original Model (FP32): 788 MB
Optimized Model (FP16): 135 MB
───────────────────────────────────
Compression Ratio: 82.9% reduction ✅
Accuracy Loss: < 1% ⚠️
Speed Improvement: ~1.3x faster ⚡
The system auto-detects FP16 checkpoints at load time:
if first_tensor.dtype == torch.float16:
model = model.half()
# Grad-CAM on CPU: FP16 → FP32 cast applied automatically
if is_half:
model = model.float()
📚 Dataset & Training
Data Constraints
- Total Samples: ~1,800 images
- Train/Val Split: 80/20 (seed=42)
- Classes: 6 (F_Breakage, F_Crushed, F_Normal, R_Breakage, R_Crushed, R_Normal)
- YOLO subset: ~100 annotated images (train/val split)
Data Augmentation
| Transform | ResNet | Fusion |
|---|---|---|
| Resize | 128×128 | 260×260 |
| RandomHorizontalFlip | ✅ | ✅ |
| RandomRotation | ±15° | ±10° |
| ColorJitter (b/c/s) | ±20% | ±15% |
| ImageNet Normalize | ✅ | ✅ |
Training Configuration
| Setting | ResNet | Fusion |
|---|---|---|
| Backbone | ResNet-18 | EfficientNet-V2-S + ConvNeXt-Small |
| Frozen layers | All except layer3, layer4 | All except features[5,6,7] / stages[2,3] |
| Optimizer | AdamW | AdamW (per-group LR) |
| Loss | CrossEntropyLoss | CrossEntropyLoss (label_smoothing=0.1) |
| Early stopping | patience=7 | patience=7 |
| Input size | 128×128 | 260×260 (EfficientNet) / 224×224 (ConvNeXt) |
🎨 Web UI Features
- Dark mode glassmorphism design
- Drag & drop image upload
- Model selection dropdown (Fusion / ResNet)
- Real-time confidence bar animation
- Tab navigation: Prediction → Grad-CAM → YOLO
- Scan line effect during processing
- Plotly bar chart for class probabilities
- Side-by-side original vs heatmap comparison
🔍 Grad-CAM Visualization
Gradient-weighted Class Activation Mapping highlights which image regions most influenced the model's prediction.
Original Image + Grad-CAM Heatmap = Overlay
Red = High importance
Blue = Low importance
- ResNet: hooks into
layer4[-1] - Fusion: hooks into
eff_features[-1](EfficientNet's last block)
📋 Directory Structure
DamageLens/
├── app.py # FastAPI app + all endpoints
├── index.html # Web UI
├── requirements.txt
├── README.md
│
├── .github/
│ └── workflows/
│ └── ci.yaml # GitHub Actions CI pipeline
│
├── assets/ # ← Place README images here
│ ├── fusion_classification_report.png
│ ├── fusion_confusion_matrix.png
│ ├── fusion_training_curves.png
│ ├── resnet_classification_report.png
│ ├── resnet_confusion_matrix.png
│ ├── resnet_training_curves.png
│ ├── yolo_detection_sample.png
│ └── ci_pipeline_passing.png
│
├── scripts/
│ ├── prediction_helper.py # ResNet + Fusion model classes & inference
│ ├── gradcam.py # Grad-CAM (ResNet + Fusion, CPU-optimized)
│ ├── load_models.py # HF Hub download + model initialization
│ └── yolo_predict.py # YOLO inference + bbox drawing
│
├── src/
│ ├── config.py # Paths, hyperparams, class map
│ ├── data/
│ │ ├── ingestion.py # Dataset folder scanning
│ │ ├── preprocessing.py # Image validation
│ │ ├── augmentation.py # Train/val transforms
│ │ └── dataset.py # DataLoader creation
│ ├── models/
│ │ ├── resnet_model.py # CarClassifierResNet
│ │ └── fusion_model.py # FusionClassifier
│ ├── training/
│ │ ├── trainer.py # Generic train loop (single + dual input)
│ │ ├── train_resnet.py # ResNet training entry point
│ │ ├── train_fusion.py # Fusion training entry point
│ │ └── train_yolo.py # YOLO fine-tuning
│ └── export/
│ ├── conver_model.py # FP32 → FP16 conversion
│ └── upload_to_huggingface.py # HF Hub upload script
│
├── checkpoints/
│ ├── best_resnet_model.pt
│ ├── best_fusion_model_fp16.pt
│ ├── damage_detector.pt
│ └── yolo11m.pt
│
├── Notebooks/
│ ├── Resnet18_fine_tuning_final.ipynb
│ ├── EfficientNet_ConvNext_Fusion.ipynb
│ └── damage_detector_yolo.ipynb
│
├── test/
│ ├── test_config.py
│ ├── test_ingestion.py
│ ├── test_preprocessing.py
│ ├── test_augmentation.py
│ ├── test_dataset.py
│ ├── test_resnet_model.py
│ ├── test_fusion_model.py
│ ├── test_train_resnet.py
│ ├── test_train_fusion.py
│ ├── test_train_yolo.py
│ ├── test_model_conversion.py
│ └── test_upload_to_huggingface.py
│
├── data/
│ ├── dataset/ # 6-class image folders
│ │ ├── F_Breakage/
│ │ ├── F_Crushed/
│ │ ├── F_Normal/
│ │ ├── R_Breakage/
│ │ ├── R_Crushed/
│ │ └── R_Normal/
│ └── yolo/ # YOLO annotated subset
│ ├── train/images + labels/
│ ├── val/images + labels/
│ └── dataset_custom.yaml
│
└── static/
├── uploads/ # Temp uploaded images
└── results/ # Generated Grad-CAM / YOLO outputs
⚠️ Limitations & Known Issues
Data Constraints
- Limited Training Data: ~1,800 samples — may show variance on edge cases
- Class Imbalance: Rear Crushed class has fewer samples, affecting recall
Performance
| Metric | Value | Note |
|---|---|---|
| ResNet Inference | ~500ms | Fast, lower accuracy |
| Fusion Inference | 30-60s | Accurate, computationally heavy |
| Cold Startup | 4-5 min | HF Hub download + model warmup |
| GPU Memory | ~4GB | For Fusion model |
| ResNet Accuracy | 77% | Lightweight trade-off |
| Fusion Accuracy | 84% | Best accuracy |
Technical Limitations
- Fusion accuracy is 7% higher than ResNet (84% vs 77%)
- YOLO model may miss small or partially occluded damage
- Grad-CAM is for diagnostic/explainability purposes only
- Batch processing not currently supported
- FP16 Grad-CAM on CPU requires automatic FP32 cast (handled internally)







