Image-to-Text
Transformers
PyTorch
TensorBoard
English
mplug_owl2
feature-extraction
image-quality-assessment
document-quality
mplug-owl2
vision-language
document-analysis
color-quality
IQA
custom_code
Instructions to use mapo80/DeQA-Doc-Color with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mapo80/DeQA-Doc-Color with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="mapo80/DeQA-Doc-Color", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("mapo80/DeQA-Doc-Color", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - en | |
| tags: | |
| - image-quality-assessment | |
| - document-quality | |
| - mplug-owl2 | |
| - vision-language | |
| - document-analysis | |
| - color-quality | |
| - IQA | |
| pipeline_tag: image-to-text | |
| library_name: transformers | |
| # DeQA-Doc-Color: Document Image Color Quality Assessment | |
| **DeQA-Doc-Color** is a vision-language model specialized in assessing the **color quality** of document images. It evaluates color fidelity, saturation, white balance, and color-related artifacts in scanned or photographed documents. | |
| ## Model Family | |
| This model is part of the **DeQA-Doc** family, which includes three specialized models: | |
| | Model | Description | HuggingFace | | |
| |-------|-------------|-------------| | |
| | **DeQA-Doc-Overall** | Overall document quality | [mapo80/DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) | | |
| | **DeQA-Doc-Color** | Color quality assessment (this model) | [mapo80/DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) | | |
| | **DeQA-Doc-Sharpness** | Sharpness/clarity assessment | [mapo80/DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) | | |
| ## Quick Start | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM | |
| from PIL import Image | |
| # Load the model | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "mapo80/DeQA-Doc-Color", | |
| trust_remote_code=True, | |
| torch_dtype=torch.float16, | |
| device_map="auto", | |
| ) | |
| # Score an image | |
| image = Image.open("document.jpg").convert("RGB") | |
| score = model.score([image]) | |
| print(f"Color Quality Score: {score.item():.2f} / 5.0") | |
| ``` | |
| ## What Does Color Quality Measure? | |
| The color quality score evaluates: | |
| - **Color Fidelity**: How accurately colors are reproduced | |
| - **White Balance**: Neutral whites without color casts (yellow, blue tints) | |
| - **Saturation**: Appropriate color intensity (not washed out or oversaturated) | |
| - **Color Artifacts**: Absence of color bleeding, banding, or chromatic aberration | |
| - **Uniformity**: Consistent color reproduction across the document | |
| ## Score Interpretation | |
| | Score Range | Quality Level | Typical Issues | | |
| |-------------|---------------|----------------| | |
| | 4.5 - 5.0 | **Excellent** | Perfect color reproduction | | |
| | 3.5 - 4.5 | **Good** | Minor color shifts, slight tinting | | |
| | 2.5 - 3.5 | **Fair** | Noticeable color cast, uneven colors | | |
| | 1.5 - 2.5 | **Poor** | Strong color distortion, washed out | | |
| | 1.0 - 1.5 | **Bad** | Severe color problems, unusable | | |
| ## Batch Processing | |
| ```python | |
| images = [ | |
| Image.open("doc1.jpg").convert("RGB"), | |
| Image.open("doc2.jpg").convert("RGB"), | |
| Image.open("doc3.jpg").convert("RGB"), | |
| ] | |
| scores = model.score(images) | |
| for i, score in enumerate(scores): | |
| print(f"Document {i+1} Color Score: {score.item():.2f} / 5.0") | |
| ``` | |
| ## Use Cases | |
| - **Scanner Calibration**: Detect when scanners need color calibration | |
| - **Photo Document QA**: Flag photos with poor lighting/white balance | |
| - **Color-Critical Documents**: Verify color accuracy for maps, charts, branded materials | |
| - **Archive Preservation**: Identify documents with color degradation | |
| - **Print Quality Control**: Verify color reproduction in printed documents | |
| ## Example: Detect Color Issues | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM | |
| from PIL import Image | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "mapo80/DeQA-Doc-Color", | |
| trust_remote_code=True, | |
| torch_dtype=torch.float16, | |
| device_map="auto", | |
| ) | |
| def diagnose_color_quality(image_path): | |
| img = Image.open(image_path).convert("RGB") | |
| score = model.score([img]).item() | |
| if score >= 4.5: | |
| diagnosis = "Excellent color quality" | |
| elif score >= 3.5: | |
| diagnosis = "Good - minor color issues" | |
| elif score >= 2.5: | |
| diagnosis = "Fair - consider color correction" | |
| elif score >= 1.5: | |
| diagnosis = "Poor - needs color correction or rescan" | |
| else: | |
| diagnosis = "Bad - severe color problems, rescan required" | |
| return score, diagnosis | |
| score, diagnosis = diagnose_color_quality("scanned_document.jpg") | |
| print(f"Score: {score:.2f}/5.0 - {diagnosis}") | |
| ``` | |
| ## Multi-Dimensional Quality Assessment | |
| Combine with other DeQA-Doc models for comprehensive assessment: | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM | |
| from PIL import Image | |
| # Load all three models | |
| models = { | |
| "overall": AutoModelForCausalLM.from_pretrained( | |
| "mapo80/DeQA-Doc-Overall", trust_remote_code=True, | |
| torch_dtype=torch.float16, device_map="auto" | |
| ), | |
| "color": AutoModelForCausalLM.from_pretrained( | |
| "mapo80/DeQA-Doc-Color", trust_remote_code=True, | |
| torch_dtype=torch.float16, device_map="auto" | |
| ), | |
| "sharpness": AutoModelForCausalLM.from_pretrained( | |
| "mapo80/DeQA-Doc-Sharpness", trust_remote_code=True, | |
| torch_dtype=torch.float16, device_map="auto" | |
| ), | |
| } | |
| def full_quality_report(image_path): | |
| img = Image.open(image_path).convert("RGB") | |
| scores = {} | |
| for name, model in models.items(): | |
| scores[name] = model.score([img]).item() | |
| return scores | |
| report = full_quality_report("document.jpg") | |
| print(f"Overall: {report['overall']:.2f}/5.0") | |
| print(f"Color: {report['color']:.2f}/5.0") | |
| print(f"Sharpness: {report['sharpness']:.2f}/5.0") | |
| ``` | |
| ## Model Architecture | |
| - **Base Model**: mPLUG-Owl2 (LLaMA2-7B + ViT-L Vision Encoder) | |
| - **Vision Encoder**: CLIP ViT-L/14 (1024 visual tokens via Visual Abstractor) | |
| - **Language Model**: LLaMA2-7B | |
| - **Training**: Full fine-tuning on document color quality datasets | |
| - **Input Resolution**: Images are resized to 448x448 (with aspect ratio preservation) | |
| ## Technical Details | |
| | Property | Value | | |
| |----------|-------| | |
| | Model Size | ~16 GB (float16) | | |
| | Parameters | ~7.2B | | |
| | Input | RGB images (any resolution) | | |
| | Output | Color quality score (1.0 - 5.0) | | |
| | Inference | ~2-3 seconds per image on A100 | | |
| ## Hardware Requirements | |
| | Setup | VRAM Required | Recommended | | |
| |-------|---------------|-------------| | |
| | Full precision (fp32) | ~32 GB | A100, H100 | | |
| | Half precision (fp16) | ~16 GB | A100, A40, RTX 4090 | | |
| | With CPU offload | ~8 GB GPU + RAM | RTX 3090, RTX 4080 | | |
| ## Installation | |
| ```bash | |
| pip install torch transformers accelerate pillow sentencepiece protobuf | |
| ``` | |
| **Note**: Use `transformers>=4.36.0` for best compatibility. | |
| ## Limitations | |
| - Optimized for document images (may not generalize to natural photos) | |
| - Color assessment is relative to training data distribution | |
| - Black & white documents may receive lower scores (use Overall model instead) | |
| - Requires GPU with sufficient VRAM for efficient inference | |
| ## Credits & Attribution | |
| This model is based on the **DeQA-Doc** project by Junjie Gao et al., which won the **Championship** in the VQualA 2025 DIQA (Document Image Quality Assessment) Challenge. | |
| **Original Repository**: [https://github.com/Junjie-Gao19/DeQA-Doc](https://github.com/Junjie-Gao19/DeQA-Doc) | |
| All credit for the research, training methodology, and model architecture goes to the original authors. | |
| ## Citation | |
| If you use this model in your research, please cite the original paper: | |
| ```bibtex | |
| @inproceedings{deqadoc, | |
| title={{DeQA-Doc}: Adapting {DeQA-Score} to Document Image Quality Assessment}, | |
| author={Gao, Junjie and Liu, Runze and Peng, Yingzhe and Yang, Shujian and Zhang, Jin and Yang, Kai and You, Zhiyuan}, | |
| booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop}, | |
| year={2025}, | |
| } | |
| ``` | |
| **ArXiv**: [https://arxiv.org/abs/2507.12796](https://arxiv.org/abs/2507.12796) | |
| ## License | |
| Apache 2.0 | |
| ## Related Models | |
| - [DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) - Overall quality assessment | |
| - [DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) - Sharpness assessment | |