---
library_name: transformers
pipeline_tag: image-text-to-text
tags:
- qwen3-vl
- vision-language
- multimodal
- image-text-to-text
---

# Pager

This repository contains the model weights, tokenizer, processor, and configuration files for **Pager**, a vision-language model based on the Qwen3-VL architecture.

## Files

The repository includes:

- `config.json`
- `generation_config.json`
- `tokenizer.json`
- `tokenizer_config.json`
- `vocab.json`
- `merges.txt`
- `special_tokens_map.json`
- `added_tokens.json`
- `preprocessor_config.json`
- `video_preprocessor_config.json`
- `chat_template.jinja`
- `model.safetensors.index.json`
- `model-00001-of-00004.safetensors`
- `model-00002-of-00004.safetensors`
- `model-00003-of-00004.safetensors`
- `model-00004-of-00004.safetensors`

## Usage

Install dependencies:

```bash
pip install -U transformers accelerate safetensors pillow
```

Load the model:

```python
import torch
from transformers import AutoProcessor, AutoModelForImageTextToText

model_id = "OpenRaiser/Pager"

processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

print("Model loaded successfully.")
```

If your local `transformers` version does not support this model class, please upgrade `transformers` first.

## Notes

- The model weights are stored in four `.safetensors` shards.
- `model.safetensors.index.json` maps model parameters to the corresponding weight shards.
- This repository is intended for research and development use.

## Citation

If you use this model, please cite or link to this repository:

```text
https://huggingface.co/OpenRaiser/Pager
```