| --- |
| license: other |
| library_name: transformers |
| tags: |
| - reasoning |
| - mid-training |
| - extrapolation |
| - synthetic-data |
| - transformers |
| --- |
| |
| # Interplay-LM Extrapolation Mid-Train Models |
|
|
| This repository contains the `op11-14` CPT checkpoints and corresponding local RL outputs used by `scripts/composition/op-difficulty-10B/script_cpt_rl/id2-10_0.2easy_0.3medium_0.5hard_cpt11-14`. |
|
|
| For pretraining, only `cpt0.2-uniform_0.8-11-14_plus` is included. For RL, only final `actor/huggingface` checkpoints found locally are uploaded. |
|
|
| ## CPT Checkpoints |
|
|
| | Path | Checkpoint | Used by nominal step / CPT epoch | |
| | --- | --- | --- | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387` | checkpoint-387 | 50step/0.2 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774` | checkpoint-774 | 100step/0.2 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1161` | checkpoint-1161 | 50step/0.5 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548` | checkpoint-1548 | 200step/0.2 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935` | checkpoint-1935 | 100step/0.5 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-2322` | checkpoint-2322 | 300step/0.2 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096` | checkpoint-3096 | 100step/0.8, 400step/0.2 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870` | checkpoint-3870 | 500step/0.2 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644` | checkpoint-4644 | 600step/0.2 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6192` | checkpoint-6192 | 300step/0.5 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579` | checkpoint-6579 | 800step/0.2 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740` | checkpoint-7740 | 954step/0.2 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127` | checkpoint-8127 | 400step/0.5 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-9675` | checkpoint-9675 | 300step/0.8 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062` | checkpoint-10062 | 500step/0.5 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997` | checkpoint-11997 | 600step/0.5 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771` | checkpoint-12771 | 400step/0.8 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867` | checkpoint-15867 | 800step/0.5 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254` | checkpoint-16254 | 500step/0.8 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963` | checkpoint-18963 | 954step/0.5 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350` | checkpoint-19350 | 600step/0.8 | |
| | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542` | checkpoint-25542 | 800step/0.8 | |
|
|
| ## RL Checkpoints |
|
|
| | Path | Nominal step | CPT epoch | Source CPT checkpoint | Uploaded checkpoint | |
| | --- | --- | --- | --- | --- | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL` | 50 | 0.2 | checkpoint-387 | `global_step_40` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-50step-0.5RL` | 50 | 0.5 | checkpoint-1161 | `global_step_25` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-50step-0.2RL` | 50 | 0.8 | checkpoint-1548 | `global_step_9` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL` | 100 | 0.8 | checkpoint-3096 | `global_step_19` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL` | 100 | 0.5 | checkpoint-1935 | `global_step_50` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL` | 100 | 0.2 | checkpoint-774 | `global_step_80` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-200step-0.2RL` | 200 | 0.8 | checkpoint-6579 | `global_step_39` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-200step-0.5RL` | 200 | 0.5 | checkpoint-3870 | `global_step_100` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL` | 200 | 0.2 | checkpoint-1548 | `global_step_160` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-300step-0.2RL` | 300 | 0.8 | checkpoint-9675 | `global_step_59` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-300step-0.5RL` | 300 | 0.5 | checkpoint-6192 | `global_step_150` | |
| | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-300step-0.8RL` | 300 | 0.2 | checkpoint-2322 | `global_step_240` | |
|
|
| ## Load |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| repo_id = "Interplay-LM-Reasoning/extrapolation_midtrain" |
| subdir = "id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542" |
| |
| tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir) |
| model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir) |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{zhang2025interplaypretrainingmidtrainingrl, |
| title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models}, |
| author={Charlie Zhang and Graham Neubig and Xiang Yue}, |
| year={2025}, |
| eprint={2512.07783}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2512.07783}, |
| } |
| ``` |
|
|