DamageLensAI / README.md
junaid17's picture
Update README.md
1ca92cb verified
metadata
title: DamageLensAI
sdk: docker
emoji: 
colorFrom: red
colorTo: purple
pinned: true

🚗 DamageLens: AI-Powered Car Damage Detection

Python 3.11+ PyTorch FastAPI CI Pipeline License


⚠️ Important Notes

Cold Startup Time: The API may take 4-5 minutes on the first request to warm up the models. Subsequent predictions will be significantly faster.

Model Size: The Fusion model is computationally intensive. Individual predictions typically complete in 30-60 seconds depending on hardware.


APP LINK : https://junaidariie.github.io/DamageLensAI/

HF REPO : https://huggingface.co/spaces/junaid17/DamageLensAI/tree/main


📋 Table of Contents


🎯 Overview

DamageLens is an advanced AI system for detecting and classifying car damage using multi-model fusion architecture. It combines the power of ResNet-18, EfficientNet-V2-S, and ConvNeXt-Small to achieve robust damage classification across vehicle front and rear sections.

The system can identify six damage categories:

  • ✅ Front Normal / Front Breakage / Front Crushed
  • ✅ Rear Normal / Rear Breakage / Rear Crushed

Additionally, it uses YOLO object detection to localize damage regions with bounding boxes.


✨ Features

Feature Description
Dual Model Architecture ResNet (lightweight) and Fusion (high-accuracy) options
Grad-CAM Visualization Understand which image regions drive predictions
Real-time YOLO Detection Localize damage with confidence scores
FP16 Optimization Reduced model size (788MB → 135MB) with minimal accuracy loss
FastAPI Backend High-performance REST API with async support
Responsive Web UI Modern, interactive web interface with real-time feedback
Static File Serving Efficient caching and delivery of results
CI/CD Pipeline Automated testing via GitHub Actions on every push/PR
HuggingFace Integration Models auto-downloaded from HF Hub on first startup

🏗️ Architecture

System Overview

┌──────────────────────────────────────────────────────┐
│                   Frontend (Web UI)                  │
│  HTML / CSS / JavaScript  (Dark Mode, Glassmorphism) │
│  ├─ Drag & Drop Image Upload                         │
│  ├─ Model Selection (Fusion / ResNet)                │
│  └─ Real-time Result Tabs (Prediction/GradCAM/YOLO)  │
└───────────────────┬──────────────────────────────────┘
                    │ REST API (JSON)
┌───────────────────▼──────────────────────────────────┐
│              FastAPI Backend  (app.py)               │
│  ├─ POST /predict/resnet    → ResNet inference       │
│  ├─ POST /predict/fusion    → Fusion inference       │
│  ├─ POST /predict?mode=*    → Grad-CAM generation    │
│  └─ POST /predict/yolo      → YOLO detection         │
│                                                      │
│  Lifespan: models loaded once at startup             │
│  Static:   /static/uploads  /static/results          │
└──────┬───────────┬──────────────┬────────────────────┘
       │           │              │
┌──────▼──┐  ┌─────▼──────┐  ┌───▼──────────┐
│ ResNet  │  │   Fusion   │  │  YOLO v11m   │
│  (77%)  │  │   (84%)    │  │  Detection   │
└──────┬──┘  └─────┬──────┘  └───┬──────────┘
       │           │              │
       └─────┬─────┘              │
             │                    │
     ┌───────▼──────┐    ┌────────▼────────┐
     │  Grad-CAM    │    │  Bounding Boxes │
     │  Heatmaps    │    │  + Confidence   │
     └──────────────┘    └─────────────────┘

Model Loading (scripts/load_models.py)

Startup
  │
  ├─ hf_hub_download("junaid17/car-damage-classifier")
  │       └─> ResnetCarDamagePredictor(checkpoint, class_map)
  │
  ├─ hf_hub_download("junaid17/best_fusion_model_fp16")
  │       └─> FusionCarDamagePredictor(checkpoint, class_map)
  │
  └─ hf_hub_download("junaid17/Yolo_Model")
          └─> YOLO(checkpoint)

Fusion Model (High Accuracy — 84%)

┌─────────────────────────────────────────────────────────────────┐
│                          INPUT IMAGE                            │
│                         (3, 260, 260)                           │
└────────────────┬────────────────────────────────┬──────────────┘
                 │                                │
         ┌───────▼────────┐             ┌─────────▼────────┐
         │ EfficientNet-  │             │  ConvNeXt-Small  │
         │ V2-S Backbone  │             │  Backbone        │
         │                │             │                  │
         │ Frozen except  │             │ Frozen except    │
         │ features[5,6,7]│             │ stages[2,3] +    │
         │ (unfrozen)     │             │ layernorm        │
         └───────┬────────┘             └─────────┬────────┘
                 │                                │
         ┌───────▼────────┐             ┌─────────▼────────┐
         │ AdaptiveAvg    │             │  Pooler Output   │
         │ Pool → Flatten │             │                  │
         └───────┬────────┘             └─────────┬────────┘
                 │  (1280,)                        │  (768,)
                 └──────────────┬─────────────────┘
                                │
                        ┌───────▼────────┐
                        │  CONCATENATE   │
                        │  1280 + 768    │
                        │  = (2048,)     │
                        └───────┬────────┘
                                │
                    ┌───────────▼───────────┐
                    │   FUSION HEAD         │
                    │  Dropout(0.4)         │
                    │  Linear(2048 → 512)   │
                    │  LayerNorm(512)       │
                    │  GELU()               │
                    │  Dropout(0.3)         │
                    │  Linear(512 → 256)    │
                    │  LayerNorm(256)       │
                    │  GELU()               │
                    │  Dropout(0.2)         │
                    │  Linear(256 → 6)      │
                    └───────────┬───────────┘
                                │
                        ┌───────▼────────┐
                        │ OUTPUT LOGITS  │
                        │  (6 classes)   │
                        └────────────────┘

Optimizer: AdamW with per-group learning rates

  • EfficientNet features[5]: lr=1e-5
  • EfficientNet features[6,7]: lr=3e-5
  • ConvNeXt stages[2,3] + layernorm: lr=3e-5
  • Fusion head: lr=1e-4
  • Loss: CrossEntropyLoss with label_smoothing=0.1
  • Early stopping patience: 7

ResNet-18 (Lightweight — 77%)

┌──────────────────────────────────┐
│      INPUT IMAGE                 │
│     (3, 128, 128)                │
└───────────────┬──────────────────┘
                │
        ┌───────▼─────────┐
        │   ResNet-18     │
        │   Backbone      │
        │                 │
        │  Frozen except  │
        │  layer3, layer4 │
        └───────┬─────────┘
                │  (512,)
        ┌───────▼─────────────────────┐
        │  Classification Head        │
        │  Dropout(0.5)               │
        │  Linear(512 → 256)          │
        │  ReLU()                     │
        │  Dropout(0.3)               │
        │  Linear(256 → 6 classes)    │
        └───────┬─────────────────────┘
                │
        ┌───────▼──────────┐
        │  OUTPUT LOGITS   │
        │  (6 classes)     │
        └──────────────────┘

Optimizer: AdamW with per-group learning rates

  • layer3: lr=1e-5
  • layer4: lr=1e-5
  • fc head: lr=1e-4
  • Loss: CrossEntropyLoss
  • Early stopping patience: 7

YOLO v11m Integration

┌─────────────────────────────┐
│   INPUT IMAGE               │
│   imgsz=640, conf=0.05      │
└──────────────┬──────────────┘
               │
       ┌───────▼────────┐
       │  YOLO v11m     │
       │  Inference     │
       └───────┬────────┘
               │
    ┌──────────┴──────────┐
    │                     │
┌───▼───────┐      ┌──────▼──────┐
│ Bboxes    │      │ Confidence  │
│ (x1,y1,   │      │ Scores +    │
│  x2,y2)   │      │ Class Label │
└───┬───────┘      └──────┬──────┘
    └──────────┬──────────┘
               │
       ┌───────▼────────┐
       │ result.plot()  │
       │ Save to disk   │
       └────────────────┘

Grad-CAM Pipeline (scripts/gradcam.py)

Image Path
    │
    ├─ ResNet mode:  target_layer = model.layer4[-1]
    └─ Fusion mode:  target_layer = model.eff_features[-1]
                     (FP16 → FP32 cast on CPU automatically)
    │
    ├─ Register forward hook  (_GradCAMHook)
    ├─ Forward pass → score.backward()
    ├─ acts [C,H,W]  ×  weights (mean of grads) → CAM [H,W]
    ├─ ReLU → normalize → resize to original dims
    └─ cv2.applyColorMap(COLORMAP_JET) → addWeighted overlay

Data Pipeline (src/data/)

Raw Images (data/dataset/)
    │
    ├─ ingestion.py   → scan folders, build file list
    ├─ preprocessing.py → validate / clean images
    ├─ augmentation.py  → train/val transforms
    │     ResNet:  Resize(128,128) + HFlip + Rotation(15°) + ColorJitter
    │     Fusion:  Resize(260,260) + HFlip + Rotation(10°) + ColorJitter
    └─ dataset.py   → ImageFolder DataLoaders
                       (train 80% / val 20%, seed=42)

Export & Deployment (src/export/)

Trained Checkpoints (checkpoints/)
    │
    ├─ conver_model.py         → FP32 → FP16 conversion
    │                            788MB → 135MB (82.9% reduction)
    └─ upload_to_huggingface.py → HfApi upload to:
          junaid17/new-damagelens-resnet-classifier
          junaid17/new-damagelens-fusion-fp16
          junaid17/new-damagelens-yolo-detector

📊 Model Performance

Fusion Model (High Accuracy — 84% Overall)

Classification Report:

Fusion Classification Report

Confusion Matrix:

Fusion Confusion Matrix

Training Curves:

Fusion Training Curves


ResNet-18 (Lightweight — 77% Overall)

Classification Report:

ResNet Classification Report

Confusion Matrix:

ResNet Confusion Matrix

Training Curves:

ResNet Training Curves


YOLO Detection Results

YOLO Detection Sample


🔁 CI Pipeline

DamageLens uses GitHub Actions for continuous integration. Every push or pull request to main, master, or dev triggers the full test suite automatically.

CI Screenshot (GitHub Actions — All Tests Passing):

CI Pipeline Passing

What the pipeline tests:

Step Test File What it covers
Config test_config.py Paths, constants, class map
Ingestion test_ingestion.py Dataset folder scanning
Preprocessing test_preprocessing.py Image validation & cleaning
Augmentation test_augmentation.py Transform pipelines
Dataset test_dataset.py DataLoader creation
ResNet Architecture test_resnet_model.py Model init & forward pass
ResNet Training test_train_resnet.py Smoke test training loop

Pipeline config (.github/workflows/ci.yaml):

  • Runs on: ubuntu-latest
  • Python: 3.10
  • Triggers: push & PR to main / master / dev

🚀 Setup & Installation

Prerequisites

  • Python 3.11+
  • CUDA 11.8+ (for GPU acceleration, optional but recommended)
  • 8GB+ RAM (16GB recommended for Fusion model)

Installation Steps

# Clone the repository
git clone https://github.com/junaid17/damagelens.git
cd DamageLens

# Create virtual environment
python -m venv myvenv
source myvenv/bin/activate  # On Windows: myvenv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Create required directories
mkdir -p static/uploads static/results checkpoints assets

Download Pre-trained Models

Models are automatically downloaded from Hugging Face on first use:

  • car-damage-classifier.pt — ResNet-18 checkpoint
  • best_fusion_model_fp16.pt — Fusion model (FP16 optimized, 135MB)
  • damage_detector.pt — YOLO v11m model

💻 Usage

Running the FastAPI Server

uvicorn app:app --reload --host 127.0.0.1 --port 8000

Open your browser at http://127.0.0.1:8000

Quick Start:

  1. Upload a car image (JPG/PNG)
  2. Select analysis mode: Fusion (accurate) or ResNet (fast)
  3. Click "Run AI Analysis"
  4. View results in tabs:
    • 📊 Prediction: Confidence scores and probabilities
    • 👀 Grad-CAM: Visualize which regions influenced the prediction
    • 🎯 YOLO: Damage bounding boxes with confidence

Python API Example

import requests

with open('car_image.jpg', 'rb') as f:
    files = {'image': f}
    resp = requests.post('http://127.0.0.1:8000/predict/resnet', files=files)
    print(resp.json())

with open('car_image.jpg', 'rb') as f:
    files = {'image': f}
    resp = requests.post('http://127.0.0.1:8000/predict/fusion', files=files)
    print(resp.json())

📡 API Documentation

POST /predict/resnet

Content-Type: multipart/form-data
Body: image (File)

Response:
{
  "status": "success",
  "prediction": {
    "Rear Normal": 0.47,
    "Front Normal": 0.25,
    ...
  }
}

POST /predict/fusion

Content-Type: multipart/form-data
Body: image (File)

Response:
{
  "status": "success",
  "prediction": {
    "Rear Normal": 0.49,
    "Front Normal": 0.35,
    ...
  }
}

POST /predict?mode={resnet|fusion} — Grad-CAM

Content-Type: multipart/form-data
Body: file (File), mode (String)

Response:
{
  "status": "success",
  "mode": "fusion",
  "original_image": "/static/uploads/{uuid}_input.jpg",
  "selected_viz": "/static/results/{uuid}_fusion.jpg",
  "resnet_viz": null,
  "fusion_viz": "/static/results/{uuid}_fusion.jpg"
}

POST /predict/yolo

Content-Type: multipart/form-data
Body: file (File)

Response:
{
  "status": "success",
  "original_image": "/static/uploads/{uuid}_input.jpg",
  "yolo_image": "/static/results/{uuid}_yolo.jpg",
  "detections": [
    { "label": "damage", "confidence": 0.87, "box": [x1, y1, x2, y2] }
  ],
  "total_detections": 2,
  "message": "Detections found"
}

🔧 Model Optimization

FP16 Conversion (Fusion Model)

Original Model (FP32):     788 MB
Optimized Model (FP16):    135 MB
───────────────────────────────────
Compression Ratio:         82.9% reduction ✅
Accuracy Loss:             < 1%            ⚠️
Speed Improvement:         ~1.3x faster   ⚡

The system auto-detects FP16 checkpoints at load time:

if first_tensor.dtype == torch.float16:
    model = model.half()

# Grad-CAM on CPU: FP16 → FP32 cast applied automatically
if is_half:
    model = model.float()

📚 Dataset & Training

Data Constraints

  • Total Samples: ~1,800 images
  • Train/Val Split: 80/20 (seed=42)
  • Classes: 6 (F_Breakage, F_Crushed, F_Normal, R_Breakage, R_Crushed, R_Normal)
  • YOLO subset: ~100 annotated images (train/val split)

Data Augmentation

Transform ResNet Fusion
Resize 128×128 260×260
RandomHorizontalFlip
RandomRotation ±15° ±10°
ColorJitter (b/c/s) ±20% ±15%
ImageNet Normalize

Training Configuration

Setting ResNet Fusion
Backbone ResNet-18 EfficientNet-V2-S + ConvNeXt-Small
Frozen layers All except layer3, layer4 All except features[5,6,7] / stages[2,3]
Optimizer AdamW AdamW (per-group LR)
Loss CrossEntropyLoss CrossEntropyLoss (label_smoothing=0.1)
Early stopping patience=7 patience=7
Input size 128×128 260×260 (EfficientNet) / 224×224 (ConvNeXt)

🎨 Web UI Features

  • Dark mode glassmorphism design
  • Drag & drop image upload
  • Model selection dropdown (Fusion / ResNet)
  • Real-time confidence bar animation
  • Tab navigation: Prediction → Grad-CAM → YOLO
  • Scan line effect during processing
  • Plotly bar chart for class probabilities
  • Side-by-side original vs heatmap comparison

🔍 Grad-CAM Visualization

Gradient-weighted Class Activation Mapping highlights which image regions most influenced the model's prediction.

Original Image    +    Grad-CAM Heatmap    =    Overlay
                       Red   = High importance
                       Blue  = Low importance
  • ResNet: hooks into layer4[-1]
  • Fusion: hooks into eff_features[-1] (EfficientNet's last block)

📋 Directory Structure

DamageLens/
├── app.py                              # FastAPI app + all endpoints
├── index.html                          # Web UI
├── requirements.txt
├── README.md
│
├── .github/
│   └── workflows/
│       └── ci.yaml                     # GitHub Actions CI pipeline
│
├── assets/                             # ← Place README images here
│   ├── fusion_classification_report.png
│   ├── fusion_confusion_matrix.png
│   ├── fusion_training_curves.png
│   ├── resnet_classification_report.png
│   ├── resnet_confusion_matrix.png
│   ├── resnet_training_curves.png
│   ├── yolo_detection_sample.png
│   └── ci_pipeline_passing.png
│
├── scripts/
│   ├── prediction_helper.py            # ResNet + Fusion model classes & inference
│   ├── gradcam.py                      # Grad-CAM (ResNet + Fusion, CPU-optimized)
│   ├── load_models.py                  # HF Hub download + model initialization
│   └── yolo_predict.py                 # YOLO inference + bbox drawing
│
├── src/
│   ├── config.py                       # Paths, hyperparams, class map
│   ├── data/
│   │   ├── ingestion.py                # Dataset folder scanning
│   │   ├── preprocessing.py            # Image validation
│   │   ├── augmentation.py             # Train/val transforms
│   │   └── dataset.py                  # DataLoader creation
│   ├── models/
│   │   ├── resnet_model.py             # CarClassifierResNet
│   │   └── fusion_model.py             # FusionClassifier
│   ├── training/
│   │   ├── trainer.py                  # Generic train loop (single + dual input)
│   │   ├── train_resnet.py             # ResNet training entry point
│   │   ├── train_fusion.py             # Fusion training entry point
│   │   └── train_yolo.py               # YOLO fine-tuning
│   └── export/
│       ├── conver_model.py             # FP32 → FP16 conversion
│       └── upload_to_huggingface.py    # HF Hub upload script
│
├── checkpoints/
│   ├── best_resnet_model.pt
│   ├── best_fusion_model_fp16.pt
│   ├── damage_detector.pt
│   └── yolo11m.pt
│
├── Notebooks/
│   ├── Resnet18_fine_tuning_final.ipynb
│   ├── EfficientNet_ConvNext_Fusion.ipynb
│   └── damage_detector_yolo.ipynb
│
├── test/
│   ├── test_config.py
│   ├── test_ingestion.py
│   ├── test_preprocessing.py
│   ├── test_augmentation.py
│   ├── test_dataset.py
│   ├── test_resnet_model.py
│   ├── test_fusion_model.py
│   ├── test_train_resnet.py
│   ├── test_train_fusion.py
│   ├── test_train_yolo.py
│   ├── test_model_conversion.py
│   └── test_upload_to_huggingface.py
│
├── data/
│   ├── dataset/                        # 6-class image folders
│   │   ├── F_Breakage/
│   │   ├── F_Crushed/
│   │   ├── F_Normal/
│   │   ├── R_Breakage/
│   │   ├── R_Crushed/
│   │   └── R_Normal/
│   └── yolo/                           # YOLO annotated subset
│       ├── train/images + labels/
│       ├── val/images + labels/
│       └── dataset_custom.yaml
│
└── static/
    ├── uploads/                        # Temp uploaded images
    └── results/                        # Generated Grad-CAM / YOLO outputs

⚠️ Limitations & Known Issues

Data Constraints

  • Limited Training Data: ~1,800 samples — may show variance on edge cases
  • Class Imbalance: Rear Crushed class has fewer samples, affecting recall

Performance

Metric Value Note
ResNet Inference ~500ms Fast, lower accuracy
Fusion Inference 30-60s Accurate, computationally heavy
Cold Startup 4-5 min HF Hub download + model warmup
GPU Memory ~4GB For Fusion model
ResNet Accuracy 77% Lightweight trade-off
Fusion Accuracy 84% Best accuracy

Technical Limitations

  • Fusion accuracy is 7% higher than ResNet (84% vs 77%)
  • YOLO model may miss small or partially occluded damage
  • Grad-CAM is for diagnostic/explainability purposes only
  • Batch processing not currently supported
  • FP16 Grad-CAM on CPU requires automatic FP32 cast (handled internally)