Spaces:

junaid17
/

DamageLensAI

Sleeping

App Files Files Community

DamageLensAI / README.md

junaid17

Update README.md

1ca92cb verified 7 days ago

preview code

raw

history blame contribute delete

25.7 kB

	---
	title: DamageLensAI
	sdk: docker
	emoji: ⚡
	colorFrom: red
	colorTo: purple
	pinned: true
	---
	# 🚗 DamageLens: AI-Powered Car Damage Detection

	[![Python 3.11+](https://img.shields.io/badge/Python-3.11%2B-brightgreen)](https://python.org)
	[![PyTorch](https://img.shields.io/badge/PyTorch-2.0%2B-red)](https://pytorch.org)
	[![FastAPI](https://img.shields.io/badge/FastAPI-Latest-teal)](https://fastapi.tiangolo.com)
	[![CI Pipeline](https://github.com/junaidariie/DamageLensAI/actions/workflows/ci.yaml/badge.svg)](https://github.com/junaidariie/DamageLensAI/actions/workflows/ci.yaml)
	[![License](https://img.shields.io/badge/License-MIT-orange)](LICENSE)

	---

	## ⚠️ Important Notes

	> Cold Startup Time: The API may take 4-5 minutes on the first request to warm up the models. Subsequent predictions will be significantly faster.

	> Model Size: The Fusion model is computationally intensive. Individual predictions typically complete in 30-60 seconds depending on hardware.

	---

	APP LINK : https://junaidariie.github.io/DamageLensAI/

	HF REPO : https://huggingface.co/spaces/junaid17/DamageLensAI/tree/main

	---

	## 📋 Table of Contents

	- [Overview](#-overview)
	- [Features](#-features)
	- [Architecture](#-architecture)
	- [Model Performance](#-model-performance)
	- [CI Pipeline](#-ci-pipeline)
	- [Setup & Installation](#-setup--installation)
	- [Usage](#-usage)
	- [API Documentation](#-api-documentation)
	- [Model Optimization](#-model-optimization)
	- [Dataset & Training](#-dataset--training)
	- [Web UI Features](#-web-ui-features)
	- [Directory Structure](#-directory-structure)
	- [Limitations & Known Issues](#-limitations--known-issues)

	---

	## 🎯 Overview

	DamageLens is an advanced AI system for detecting and classifying car damage using multi-model fusion architecture. It combines the power of ResNet-18, EfficientNet-V2-S, and ConvNeXt-Small to achieve robust damage classification across vehicle front and rear sections.

	The system can identify six damage categories:
	- ✅ Front Normal / Front Breakage / Front Crushed
	- ✅ Rear Normal / Rear Breakage / Rear Crushed

	Additionally, it uses YOLO object detection to localize damage regions with bounding boxes.

	---

	## ✨ Features

	\| Feature \| Description \|
	\|---------\|-------------\|
	\| Dual Model Architecture \| ResNet (lightweight) and Fusion (high-accuracy) options \|
	\| Grad-CAM Visualization \| Understand which image regions drive predictions \|
	\| Real-time YOLO Detection \| Localize damage with confidence scores \|
	\| FP16 Optimization \| Reduced model size (788MB → 135MB) with minimal accuracy loss \|
	\| FastAPI Backend \| High-performance REST API with async support \|
	\| Responsive Web UI \| Modern, interactive web interface with real-time feedback \|
	\| Static File Serving \| Efficient caching and delivery of results \|
	\| CI/CD Pipeline \| Automated testing via GitHub Actions on every push/PR \|
	\| HuggingFace Integration \| Models auto-downloaded from HF Hub on first startup \|

	---

	## 🏗️ Architecture

	### System Overview

	```
	┌──────────────────────────────────────────────────────┐
	│ Frontend (Web UI) │
	│ HTML / CSS / JavaScript (Dark Mode, Glassmorphism) │
	│ ├─ Drag & Drop Image Upload │
	│ ├─ Model Selection (Fusion / ResNet) │
	│ └─ Real-time Result Tabs (Prediction/GradCAM/YOLO) │
	└───────────────────┬──────────────────────────────────┘
	│ REST API (JSON)
	┌───────────────────▼──────────────────────────────────┐
	│ FastAPI Backend (app.py) │
	│ ├─ POST /predict/resnet → ResNet inference │
	│ ├─ POST /predict/fusion → Fusion inference │
	│ ├─ POST /predict?mode=* → Grad-CAM generation │
	│ └─ POST /predict/yolo → YOLO detection │
	│ │
	│ Lifespan: models loaded once at startup │
	│ Static: /static/uploads /static/results │
	└──────┬───────────┬──────────────┬────────────────────┘
	│ │ │
	┌──────▼──┐ ┌─────▼──────┐ ┌───▼──────────┐
	│ ResNet │ │ Fusion │ │ YOLO v11m │
	│ (77%) │ │ (84%) │ │ Detection │
	└──────┬──┘ └─────┬──────┘ └───┬──────────┘
	│ │ │
	└─────┬─────┘ │
	│ │
	┌───────▼──────┐ ┌────────▼────────┐
	│ Grad-CAM │ │ Bounding Boxes │
	│ Heatmaps │ │ + Confidence │
	└──────────────┘ └─────────────────┘
	```

	### Model Loading (scripts/load_models.py)

	```
	Startup
	│
	├─ hf_hub_download("junaid17/car-damage-classifier")
	│ └─> ResnetCarDamagePredictor(checkpoint, class_map)
	│
	├─ hf_hub_download("junaid17/best_fusion_model_fp16")
	│ └─> FusionCarDamagePredictor(checkpoint, class_map)
	│
	└─ hf_hub_download("junaid17/Yolo_Model")
	└─> YOLO(checkpoint)
	```

	### Fusion Model (High Accuracy — 84%)

	```
	┌─────────────────────────────────────────────────────────────────┐
	│ INPUT IMAGE │
	│ (3, 260, 260) │
	└────────────────┬────────────────────────────────┬──────────────┘
	│ │
	┌───────▼────────┐ ┌─────────▼────────┐
	│ EfficientNet- │ │ ConvNeXt-Small │
	│ V2-S Backbone │ │ Backbone │
	│ │ │ │
	│ Frozen except │ │ Frozen except │
	│ features[5,6,7]│ │ stages[2,3] + │
	│ (unfrozen) │ │ layernorm │
	└───────┬────────┘ └─────────┬────────┘
	│ │
	┌───────▼────────┐ ┌─────────▼────────┐
	│ AdaptiveAvg │ │ Pooler Output │
	│ Pool → Flatten │ │ │
	└───────┬────────┘ └─────────┬────────┘
	│ (1280,) │ (768,)
	└──────────────┬─────────────────┘
	│
	┌───────▼────────┐
	│ CONCATENATE │
	│ 1280 + 768 │
	│ = (2048,) │
	└───────┬────────┘
	│
	┌───────────▼───────────┐
	│ FUSION HEAD │
	│ Dropout(0.4) │
	│ Linear(2048 → 512) │
	│ LayerNorm(512) │
	│ GELU() │
	│ Dropout(0.3) │
	│ Linear(512 → 256) │
	│ LayerNorm(256) │
	│ GELU() │
	│ Dropout(0.2) │
	│ Linear(256 → 6) │
	└───────────┬───────────┘
	│
	┌───────▼────────┐
	│ OUTPUT LOGITS │
	│ (6 classes) │
	└────────────────┘
	```

	Optimizer: AdamW with per-group learning rates
	- EfficientNet features[5]: lr=1e-5
	- EfficientNet features[6,7]: lr=3e-5
	- ConvNeXt stages[2,3] + layernorm: lr=3e-5
	- Fusion head: lr=1e-4
	- Loss: CrossEntropyLoss with label_smoothing=0.1
	- Early stopping patience: 7

	### ResNet-18 (Lightweight — 77%)

	```
	┌──────────────────────────────────┐
	│ INPUT IMAGE │
	│ (3, 128, 128) │
	└───────────────┬──────────────────┘
	│
	┌───────▼─────────┐
	│ ResNet-18 │
	│ Backbone │
	│ │
	│ Frozen except │
	│ layer3, layer4 │
	└───────┬─────────┘
	│ (512,)
	┌───────▼─────────────────────┐
	│ Classification Head │
	│ Dropout(0.5) │
	│ Linear(512 → 256) │
	│ ReLU() │
	│ Dropout(0.3) │
	│ Linear(256 → 6 classes) │
	└───────┬─────────────────────┘
	│
	┌───────▼──────────┐
	│ OUTPUT LOGITS │
	│ (6 classes) │
	└──────────────────┘
	```

	Optimizer: AdamW with per-group learning rates
	- layer3: lr=1e-5
	- layer4: lr=1e-5
	- fc head: lr=1e-4
	- Loss: CrossEntropyLoss
	- Early stopping patience: 7

	### YOLO v11m Integration

	```
	┌─────────────────────────────┐
	│ INPUT IMAGE │
	│ imgsz=640, conf=0.05 │
	└──────────────┬──────────────┘
	│
	┌───────▼────────┐
	│ YOLO v11m │
	│ Inference │
	└───────┬────────┘
	│
	┌──────────┴──────────┐
	│ │
	┌───▼───────┐ ┌──────▼──────┐
	│ Bboxes │ │ Confidence │
	│ (x1,y1, │ │ Scores + │
	│ x2,y2) │ │ Class Label │
	└───┬───────┘ └──────┬──────┘
	└──────────┬──────────┘
	│
	┌───────▼────────┐
	│ result.plot() │
	│ Save to disk │
	└────────────────┘
	```

	### Grad-CAM Pipeline (scripts/gradcam.py)

	```
	Image Path
	│
	├─ ResNet mode: target_layer = model.layer4[-1]
	└─ Fusion mode: target_layer = model.eff_features[-1]
	(FP16 → FP32 cast on CPU automatically)
	│
	├─ Register forward hook (_GradCAMHook)
	├─ Forward pass → score.backward()
	├─ acts [C,H,W] × weights (mean of grads) → CAM [H,W]
	├─ ReLU → normalize → resize to original dims
	└─ cv2.applyColorMap(COLORMAP_JET) → addWeighted overlay
	```

	### Data Pipeline (src/data/)

	```
	Raw Images (data/dataset/)
	│
	├─ ingestion.py → scan folders, build file list
	├─ preprocessing.py → validate / clean images
	├─ augmentation.py → train/val transforms
	│ ResNet: Resize(128,128) + HFlip + Rotation(15°) + ColorJitter
	│ Fusion: Resize(260,260) + HFlip + Rotation(10°) + ColorJitter
	└─ dataset.py → ImageFolder DataLoaders
	(train 80% / val 20%, seed=42)
	```

	### Export & Deployment (src/export/)

	```
	Trained Checkpoints (checkpoints/)
	│
	├─ conver_model.py → FP32 → FP16 conversion
	│ 788MB → 135MB (82.9% reduction)
	└─ upload_to_huggingface.py → HfApi upload to:
	junaid17/new-damagelens-resnet-classifier
	junaid17/new-damagelens-fusion-fp16
	junaid17/new-damagelens-yolo-detector
	```

	---

	## 📊 Model Performance

	### Fusion Model (High Accuracy — 84% Overall)

	Classification Report:

	![Fusion Classification Report](assets/fusion_classification_report.png)

	Confusion Matrix:

	![Fusion Confusion Matrix](assets/fusion_confusion_matrix.png)

	Training Curves:

	![Fusion Training Curves](assets/fusion_training_curves.png)

	---

	### ResNet-18 (Lightweight — 77% Overall)

	Classification Report:

	![ResNet Classification Report](assets/resnet_classification_report.png)

	Confusion Matrix:

	![ResNet Confusion Matrix](assets/resnet_confusion_matrix.png)

	Training Curves:

	![ResNet Training Curves](assets/resnet_training_curves.png)

	---

	### YOLO Detection Results

	![YOLO Detection Sample](assets/yolo_detection_sample.jpg)

	---

	## 🔁 CI Pipeline

	DamageLens uses GitHub Actions for continuous integration. Every push or pull request to `main`, `master`, or `dev` triggers the full test suite automatically.

	CI Screenshot (GitHub Actions — All Tests Passing):

	![CI Pipeline Passing](assets/ci_pipeline_passing.png)

	### What the pipeline tests:

	\| Step \| Test File \| What it covers \|
	\|------\|-----------\|----------------\|
	\| Config \| `test_config.py` \| Paths, constants, class map \|
	\| Ingestion \| `test_ingestion.py` \| Dataset folder scanning \|
	\| Preprocessing \| `test_preprocessing.py` \| Image validation & cleaning \|
	\| Augmentation \| `test_augmentation.py` \| Transform pipelines \|
	\| Dataset \| `test_dataset.py` \| DataLoader creation \|
	\| ResNet Architecture \| `test_resnet_model.py` \| Model init & forward pass \|
	\| ResNet Training \| `test_train_resnet.py` \| Smoke test training loop \|

	### Pipeline config (`.github/workflows/ci.yaml`):
	- Runs on: `ubuntu-latest`
	- Python: `3.10`
	- Triggers: push & PR to `main` / `master` / `dev`

	---

	## 🚀 Setup & Installation

	### Prerequisites

	- Python 3.11+
	- CUDA 11.8+ (for GPU acceleration, optional but recommended)
	- 8GB+ RAM (16GB recommended for Fusion model)

	### Installation Steps

	```bash
	# Clone the repository
	git clone https://github.com/junaid17/damagelens.git
	cd DamageLens

	# Create virtual environment
	python -m venv myvenv
	source myvenv/bin/activate # On Windows: myvenv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt

	# Create required directories
	mkdir -p static/uploads static/results checkpoints assets
	```

	### Download Pre-trained Models

	Models are automatically downloaded from Hugging Face on first use:
	- `car-damage-classifier.pt` — ResNet-18 checkpoint
	- `best_fusion_model_fp16.pt` — Fusion model (FP16 optimized, 135MB)
	- `damage_detector.pt` — YOLO v11m model

	---

	## 💻 Usage

	### Running the FastAPI Server

	```bash
	uvicorn app:app --reload --host 127.0.0.1 --port 8000
	```

	Open your browser at `http://127.0.0.1:8000`

	#### Quick Start:
	1. Upload a car image (JPG/PNG)
	2. Select analysis mode: Fusion (accurate) or ResNet (fast)
	3. Click "Run AI Analysis"
	4. View results in tabs:
	- 📊 Prediction: Confidence scores and probabilities
	- 👀 Grad-CAM: Visualize which regions influenced the prediction
	- 🎯 YOLO: Damage bounding boxes with confidence

	### Python API Example

	```python
	import requests

	with open('car_image.jpg', 'rb') as f:
	files = {'image': f}
	resp = requests.post('http://127.0.0.1:8000/predict/resnet', files=files)
	print(resp.json())

	with open('car_image.jpg', 'rb') as f:
	files = {'image': f}
	resp = requests.post('http://127.0.0.1:8000/predict/fusion', files=files)
	print(resp.json())
	```

	---

	## 📡 API Documentation

	### `POST /predict/resnet`
	```
	Content-Type: multipart/form-data
	Body: image (File)

	Response:
	{
	"status": "success",
	"prediction": {
	"Rear Normal": 0.47,
	"Front Normal": 0.25,
	...
	}
	}
	```

	### `POST /predict/fusion`
	```
	Content-Type: multipart/form-data
	Body: image (File)

	Response:
	{
	"status": "success",
	"prediction": {
	"Rear Normal": 0.49,
	"Front Normal": 0.35,
	...
	}
	}
	```

	### `POST /predict?mode={resnet\|fusion}` — Grad-CAM
	```
	Content-Type: multipart/form-data
	Body: file (File), mode (String)

	Response:
	{
	"status": "success",
	"mode": "fusion",
	"original_image": "/static/uploads/{uuid}_input.jpg",
	"selected_viz": "/static/results/{uuid}_fusion.jpg",
	"resnet_viz": null,
	"fusion_viz": "/static/results/{uuid}_fusion.jpg"
	}
	```

	### `POST /predict/yolo`
	```
	Content-Type: multipart/form-data
	Body: file (File)

	Response:
	{
	"status": "success",
	"original_image": "/static/uploads/{uuid}_input.jpg",
	"yolo_image": "/static/results/{uuid}_yolo.jpg",
	"detections": [
	{ "label": "damage", "confidence": 0.87, "box": [x1, y1, x2, y2] }
	],
	"total_detections": 2,
	"message": "Detections found"
	}
	```

	---

	## 🔧 Model Optimization

	### FP16 Conversion (Fusion Model)

	```
	Original Model (FP32): 788 MB
	Optimized Model (FP16): 135 MB
	───────────────────────────────────
	Compression Ratio: 82.9% reduction ✅
	Accuracy Loss: < 1% ⚠️
	Speed Improvement: ~1.3x faster ⚡
	```

	The system auto-detects FP16 checkpoints at load time:

	```python
	if first_tensor.dtype == torch.float16:
	model = model.half()

	# Grad-CAM on CPU: FP16 → FP32 cast applied automatically
	if is_half:
	model = model.float()
	```

	---

	## 📚 Dataset & Training

	### Data Constraints

	- Total Samples: ~1,800 images
	- Train/Val Split: 80/20 (seed=42)
	- Classes: 6 (F_Breakage, F_Crushed, F_Normal, R_Breakage, R_Crushed, R_Normal)
	- YOLO subset: ~100 annotated images (train/val split)

	### Data Augmentation

	\| Transform \| ResNet \| Fusion \|
	\|-----------\|--------\|--------\|
	\| Resize \| 128×128 \| 260×260 \|
	\| RandomHorizontalFlip \| ✅ \| ✅ \|
	\| RandomRotation \| ±15° \| ±10° \|
	\| ColorJitter (b/c/s) \| ±20% \| ±15% \|
	\| ImageNet Normalize \| ✅ \| ✅ \|

	### Training Configuration

	\| Setting \| ResNet \| Fusion \|
	\|---------\|--------\|--------\|
	\| Backbone \| ResNet-18 \| EfficientNet-V2-S + ConvNeXt-Small \|
	\| Frozen layers \| All except layer3, layer4 \| All except features[5,6,7] / stages[2,3] \|
	\| Optimizer \| AdamW \| AdamW (per-group LR) \|
	\| Loss \| CrossEntropyLoss \| CrossEntropyLoss (label_smoothing=0.1) \|
	\| Early stopping \| patience=7 \| patience=7 \|
	\| Input size \| 128×128 \| 260×260 (EfficientNet) / 224×224 (ConvNeXt) \|

	---

	## 🎨 Web UI Features

	- Dark mode glassmorphism design
	- Drag & drop image upload
	- Model selection dropdown (Fusion / ResNet)
	- Real-time confidence bar animation
	- Tab navigation: Prediction → Grad-CAM → YOLO
	- Scan line effect during processing
	- Plotly bar chart for class probabilities
	- Side-by-side original vs heatmap comparison

	---

	## 🔍 Grad-CAM Visualization

	Gradient-weighted Class Activation Mapping highlights which image regions most influenced the model's prediction.

	```
	Original Image + Grad-CAM Heatmap = Overlay
	Red = High importance
	Blue = Low importance
	```

	- ResNet: hooks into `layer4[-1]`
	- Fusion: hooks into `eff_features[-1]` (EfficientNet's last block)

	---

	## 📋 Directory Structure

	```
	DamageLens/
	├── app.py # FastAPI app + all endpoints
	├── index.html # Web UI
	├── requirements.txt
	├── README.md
	│
	├── .github/
	│ └── workflows/
	│ └── ci.yaml # GitHub Actions CI pipeline
	│
	├── assets/ # ← Place README images here
	│ ├── fusion_classification_report.png
	│ ├── fusion_confusion_matrix.png
	│ ├── fusion_training_curves.png
	│ ├── resnet_classification_report.png
	│ ├── resnet_confusion_matrix.png
	│ ├── resnet_training_curves.png
	│ ├── yolo_detection_sample.png
	│ └── ci_pipeline_passing.png
	│
	├── scripts/
	│ ├── prediction_helper.py # ResNet + Fusion model classes & inference
	│ ├── gradcam.py # Grad-CAM (ResNet + Fusion, CPU-optimized)
	│ ├── load_models.py # HF Hub download + model initialization
	│ └── yolo_predict.py # YOLO inference + bbox drawing
	│
	├── src/
	│ ├── config.py # Paths, hyperparams, class map
	│ ├── data/
	│ │ ├── ingestion.py # Dataset folder scanning
	│ │ ├── preprocessing.py # Image validation
	│ │ ├── augmentation.py # Train/val transforms
	│ │ └── dataset.py # DataLoader creation
	│ ├── models/
	│ │ ├── resnet_model.py # CarClassifierResNet
	│ │ └── fusion_model.py # FusionClassifier
	│ ├── training/
	│ │ ├── trainer.py # Generic train loop (single + dual input)
	│ │ ├── train_resnet.py # ResNet training entry point
	│ │ ├── train_fusion.py # Fusion training entry point
	│ │ └── train_yolo.py # YOLO fine-tuning
	│ └── export/
	│ ├── conver_model.py # FP32 → FP16 conversion
	│ └── upload_to_huggingface.py # HF Hub upload script
	│
	├── checkpoints/
	│ ├── best_resnet_model.pt
	│ ├── best_fusion_model_fp16.pt
	│ ├── damage_detector.pt
	│ └── yolo11m.pt
	│
	├── Notebooks/
	│ ├── Resnet18_fine_tuning_final.ipynb
	│ ├── EfficientNet_ConvNext_Fusion.ipynb
	│ └── damage_detector_yolo.ipynb
	│
	├── test/
	│ ├── test_config.py
	│ ├── test_ingestion.py
	│ ├── test_preprocessing.py
	│ ├── test_augmentation.py
	│ ├── test_dataset.py
	│ ├── test_resnet_model.py
	│ ├── test_fusion_model.py
	│ ├── test_train_resnet.py
	│ ├── test_train_fusion.py
	│ ├── test_train_yolo.py
	│ ├── test_model_conversion.py
	│ └── test_upload_to_huggingface.py
	│
	├── data/
	│ ├── dataset/ # 6-class image folders
	│ │ ├── F_Breakage/
	│ │ ├── F_Crushed/
	│ │ ├── F_Normal/
	│ │ ├── R_Breakage/
	│ │ ├── R_Crushed/
	│ │ └── R_Normal/
	│ └── yolo/ # YOLO annotated subset
	│ ├── train/images + labels/
	│ ├── val/images + labels/
	│ └── dataset_custom.yaml
	│
	└── static/
	├── uploads/ # Temp uploaded images
	└── results/ # Generated Grad-CAM / YOLO outputs
	```

	---

	## ⚠️ Limitations & Known Issues

	### Data Constraints
	- Limited Training Data: ~1,800 samples — may show variance on edge cases
	- Class Imbalance: Rear Crushed class has fewer samples, affecting recall

	### Performance

	\| Metric \| Value \| Note \|
	\|--------\|-------\|------\|
	\| ResNet Inference \| ~500ms \| Fast, lower accuracy \|
	\| Fusion Inference \| 30-60s \| Accurate, computationally heavy \|
	\| Cold Startup \| 4-5 min \| HF Hub download + model warmup \|
	\| GPU Memory \| ~4GB \| For Fusion model \|
	\| ResNet Accuracy \| 77% \| Lightweight trade-off \|
	\| Fusion Accuracy \| 84% \| Best accuracy \|

	### Technical Limitations
	- Fusion accuracy is 7% higher than ResNet (84% vs 77%)
	- YOLO model may miss small or partially occluded damage
	- Grad-CAM is for diagnostic/explainability purposes only
	- Batch processing not currently supported
	- FP16 Grad-CAM on CPU requires automatic FP32 cast (handled internally)