| --- |
| library_name: RAT |
| language: |
| - en |
| license: mit |
| datasets: |
| - HuggingFaceFW/fineweb-edu |
| tags: |
| - efficient architecture |
| - recurrence |
| - attention |
| - pretraining |
| metrics: |
| - perplexity |
| - accuracy |
| --- |
| |
| ## Description |
| Models trained from [RAT Paper](https://arxiv.org/abs/2507.04416). |
|
|
| ## Citation |
| If you find it useful, please consider citing the paper: |
| ``` |
| @article{wei2025rat, |
| title={RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling}, |
| author={Wei, Xiuying and Yadav, Anunay and Pascanu, Razvan and Gulcehre, Caglar}, |
| journal={arXiv preprint arXiv:2507.04416}, |
| year={2025} |
| } |
| ``` |