--- library_name: transformers pipeline_tag: image-text-to-text tags: - qwen3-vl - vision-language - multimodal - image-text-to-text --- # Pager This repository contains the model weights, tokenizer, processor, and configuration files for **Pager**, a vision-language model based on the Qwen3-VL architecture. ## Files The repository includes: - `config.json` - `generation_config.json` - `tokenizer.json` - `tokenizer_config.json` - `vocab.json` - `merges.txt` - `special_tokens_map.json` - `added_tokens.json` - `preprocessor_config.json` - `video_preprocessor_config.json` - `chat_template.jinja` - `model.safetensors.index.json` - `model-00001-of-00004.safetensors` - `model-00002-of-00004.safetensors` - `model-00003-of-00004.safetensors` - `model-00004-of-00004.safetensors` ## Usage Install dependencies: ```bash pip install -U transformers accelerate safetensors pillow ``` Load the model: ```python import torch from transformers import AutoProcessor, AutoModelForImageTextToText model_id = "OpenRaiser/Pager" processor = AutoProcessor.from_pretrained( model_id, trust_remote_code=True ) model = AutoModelForImageTextToText.from_pretrained( model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True ) print("Model loaded successfully.") ``` If your local `transformers` version does not support this model class, please upgrade `transformers` first. ## Notes - The model weights are stored in four `.safetensors` shards. - `model.safetensors.index.json` maps model parameters to the corresponding weight shards. - This repository is intended for research and development use. ## Citation If you use this model, please cite or link to this repository: ```text https://huggingface.co/OpenRaiser/Pager ```