Instructions to use Zhang199/TinyLLaVA-Video-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Zhang199/TinyLLaVA-Video-R1 with Transformers:
# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("Zhang199/TinyLLaVA-Video-R1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| pipeline_tag: video-text-to-text | |
| library_name: transformers | |
| **<center><span style="font-size:2em;">TinyLLaVA-Video-R1</span></center>** | |
| [](https://arxiv.org/abs/2504.09641)[](https://github.com/ZhangXJ199/TinyLLaVA-Video-R1) | |
| Here, we introduce a small-scale video reasoning model TinyLLaVA-Video-R1, based on the traceably trained model [TinyLLaVA-Video](https://github.com/ZhangXJ199/TinyLLaVA-Video). After reinforcement learning on general Video-QA datasets, the model not only significantly improves its reasoning and thinking abilities, but also exhibits the emergent characteristic of “aha moments”. | |
| ### Result | |
| | Model (HF Path) | Video-MME(wo sub) | MVBench | MLVU | MMVU(mc) | | |
| | :----------------------------------------: | :-------------: | :-------: | :--------------: | :----------: | | |
| | [Zhang199/TinyLLaVA-Video-R1](https://huggingface.co/Zhang199/TinyLLaVA-Video-R1) | 46.6 | 49.5 | 52.4 | 46.9 | |