Clockz's picture
Add files using upload-large-folder tool
6ff72af verified
---
license: other
library_name: transformers
tags:
- reasoning
- mid-training
- extrapolation
- synthetic-data
- transformers
---
# Interplay-LM Extrapolation Mid-Train Models
This repository contains the `op11-14` CPT checkpoints and corresponding local RL outputs used by `scripts/composition/op-difficulty-10B/script_cpt_rl/id2-10_0.2easy_0.3medium_0.5hard_cpt11-14`.
For pretraining, only `cpt0.2-uniform_0.8-11-14_plus` is included. For RL, only final `actor/huggingface` checkpoints found locally are uploaded.
## CPT Checkpoints
| Path | Checkpoint | Used by nominal step / CPT epoch |
| --- | --- | --- |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387` | checkpoint-387 | 50step/0.2 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774` | checkpoint-774 | 100step/0.2 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1161` | checkpoint-1161 | 50step/0.5 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548` | checkpoint-1548 | 200step/0.2 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935` | checkpoint-1935 | 100step/0.5 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-2322` | checkpoint-2322 | 300step/0.2 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096` | checkpoint-3096 | 100step/0.8, 400step/0.2 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870` | checkpoint-3870 | 500step/0.2 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644` | checkpoint-4644 | 600step/0.2 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6192` | checkpoint-6192 | 300step/0.5 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579` | checkpoint-6579 | 800step/0.2 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740` | checkpoint-7740 | 954step/0.2 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127` | checkpoint-8127 | 400step/0.5 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-9675` | checkpoint-9675 | 300step/0.8 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062` | checkpoint-10062 | 500step/0.5 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997` | checkpoint-11997 | 600step/0.5 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771` | checkpoint-12771 | 400step/0.8 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867` | checkpoint-15867 | 800step/0.5 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254` | checkpoint-16254 | 500step/0.8 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963` | checkpoint-18963 | 954step/0.5 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350` | checkpoint-19350 | 600step/0.8 |
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542` | checkpoint-25542 | 800step/0.8 |
## RL Checkpoints
| Path | Nominal step | CPT epoch | Source CPT checkpoint | Uploaded checkpoint |
| --- | --- | --- | --- | --- |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL` | 50 | 0.2 | checkpoint-387 | `global_step_40` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-50step-0.5RL` | 50 | 0.5 | checkpoint-1161 | `global_step_25` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-50step-0.2RL` | 50 | 0.8 | checkpoint-1548 | `global_step_9` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL` | 100 | 0.8 | checkpoint-3096 | `global_step_19` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL` | 100 | 0.5 | checkpoint-1935 | `global_step_50` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL` | 100 | 0.2 | checkpoint-774 | `global_step_80` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-200step-0.2RL` | 200 | 0.8 | checkpoint-6579 | `global_step_39` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-200step-0.5RL` | 200 | 0.5 | checkpoint-3870 | `global_step_100` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL` | 200 | 0.2 | checkpoint-1548 | `global_step_160` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-300step-0.2RL` | 300 | 0.8 | checkpoint-9675 | `global_step_59` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-300step-0.5RL` | 300 | 0.5 | checkpoint-6192 | `global_step_150` |
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-300step-0.8RL` | 300 | 0.2 | checkpoint-2322 | `global_step_240` |
## Load
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "Interplay-LM-Reasoning/extrapolation_midtrain"
subdir = "id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542"
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir)
model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir)
```
## Citation
```bibtex
@misc{zhang2025interplaypretrainingmidtrainingrl,
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
author={Charlie Zhang and Graham Neubig and Xiang Yue},
year={2025},
eprint={2512.07783},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.07783},
}
```