Add files using upload-large-folder tool

6ff72af verified 9 days ago

5.89 kB

	---
	license: other
	library_name: transformers
	tags:
	- reasoning
	- mid-training
	- extrapolation
	- synthetic-data
	- transformers
	---

	# Interplay-LM Extrapolation Mid-Train Models

	This repository contains the `op11-14` CPT checkpoints and corresponding local RL outputs used by `scripts/composition/op-difficulty-10B/script_cpt_rl/id2-10_0.2easy_0.3medium_0.5hard_cpt11-14`.

	For pretraining, only `cpt0.2-uniform_0.8-11-14_plus` is included. For RL, only final `actor/huggingface` checkpoints found locally are uploaded.

	## CPT Checkpoints

	\| Path \| Checkpoint \| Used by nominal step / CPT epoch \|
	\| --- \| --- \| --- \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387` \| checkpoint-387 \| 50step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774` \| checkpoint-774 \| 100step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1161` \| checkpoint-1161 \| 50step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548` \| checkpoint-1548 \| 200step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935` \| checkpoint-1935 \| 100step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-2322` \| checkpoint-2322 \| 300step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096` \| checkpoint-3096 \| 100step/0.8, 400step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870` \| checkpoint-3870 \| 500step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644` \| checkpoint-4644 \| 600step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6192` \| checkpoint-6192 \| 300step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579` \| checkpoint-6579 \| 800step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740` \| checkpoint-7740 \| 954step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127` \| checkpoint-8127 \| 400step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-9675` \| checkpoint-9675 \| 300step/0.8 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062` \| checkpoint-10062 \| 500step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997` \| checkpoint-11997 \| 600step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771` \| checkpoint-12771 \| 400step/0.8 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867` \| checkpoint-15867 \| 800step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254` \| checkpoint-16254 \| 500step/0.8 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963` \| checkpoint-18963 \| 954step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350` \| checkpoint-19350 \| 600step/0.8 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542` \| checkpoint-25542 \| 800step/0.8 \|

	## RL Checkpoints

	\| Path \| Nominal step \| CPT epoch \| Source CPT checkpoint \| Uploaded checkpoint \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL` \| 50 \| 0.2 \| checkpoint-387 \| `global_step_40` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-50step-0.5RL` \| 50 \| 0.5 \| checkpoint-1161 \| `global_step_25` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-50step-0.2RL` \| 50 \| 0.8 \| checkpoint-1548 \| `global_step_9` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL` \| 100 \| 0.8 \| checkpoint-3096 \| `global_step_19` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL` \| 100 \| 0.5 \| checkpoint-1935 \| `global_step_50` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL` \| 100 \| 0.2 \| checkpoint-774 \| `global_step_80` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-200step-0.2RL` \| 200 \| 0.8 \| checkpoint-6579 \| `global_step_39` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-200step-0.5RL` \| 200 \| 0.5 \| checkpoint-3870 \| `global_step_100` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL` \| 200 \| 0.2 \| checkpoint-1548 \| `global_step_160` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-300step-0.2RL` \| 300 \| 0.8 \| checkpoint-9675 \| `global_step_59` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-300step-0.5RL` \| 300 \| 0.5 \| checkpoint-6192 \| `global_step_150` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-300step-0.8RL` \| 300 \| 0.2 \| checkpoint-2322 \| `global_step_240` \|

	## Load

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo_id = "Interplay-LM-Reasoning/extrapolation_midtrain"
	subdir = "id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542"

	tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir)
	model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir)
	```

	## Citation

	```bibtex
	@misc{zhang2025interplaypretrainingmidtrainingrl,
	title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
	author={Charlie Zhang and Graham Neubig and Xiang Yue},
	year={2025},
	eprint={2512.07783},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2512.07783},
	}
	```

	---
	license: other
	library_name: transformers
	tags:
	- reasoning
	- mid-training
	- extrapolation
	- synthetic-data
	- transformers
	---

	# Interplay-LM Extrapolation Mid-Train Models

	This repository contains the `op11-14` CPT checkpoints and corresponding local RL outputs used by `scripts/composition/op-difficulty-10B/script_cpt_rl/id2-10_0.2easy_0.3medium_0.5hard_cpt11-14`.

	For pretraining, only `cpt0.2-uniform_0.8-11-14_plus` is included. For RL, only final `actor/huggingface` checkpoints found locally are uploaded.

	## CPT Checkpoints

	\| Path \| Checkpoint \| Used by nominal step / CPT epoch \|
	\| --- \| --- \| --- \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387` \| checkpoint-387 \| 50step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774` \| checkpoint-774 \| 100step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1161` \| checkpoint-1161 \| 50step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548` \| checkpoint-1548 \| 200step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935` \| checkpoint-1935 \| 100step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-2322` \| checkpoint-2322 \| 300step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096` \| checkpoint-3096 \| 100step/0.8, 400step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870` \| checkpoint-3870 \| 500step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644` \| checkpoint-4644 \| 600step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6192` \| checkpoint-6192 \| 300step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579` \| checkpoint-6579 \| 800step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740` \| checkpoint-7740 \| 954step/0.2 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127` \| checkpoint-8127 \| 400step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-9675` \| checkpoint-9675 \| 300step/0.8 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062` \| checkpoint-10062 \| 500step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997` \| checkpoint-11997 \| 600step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771` \| checkpoint-12771 \| 400step/0.8 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867` \| checkpoint-15867 \| 800step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254` \| checkpoint-16254 \| 500step/0.8 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963` \| checkpoint-18963 \| 954step/0.5 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350` \| checkpoint-19350 \| 600step/0.8 \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542` \| checkpoint-25542 \| 800step/0.8 \|

	## RL Checkpoints

	\| Path \| Nominal step \| CPT epoch \| Source CPT checkpoint \| Uploaded checkpoint \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL` \| 50 \| 0.2 \| checkpoint-387 \| `global_step_40` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-50step-0.5RL` \| 50 \| 0.5 \| checkpoint-1161 \| `global_step_25` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-50step-0.2RL` \| 50 \| 0.8 \| checkpoint-1548 \| `global_step_9` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL` \| 100 \| 0.8 \| checkpoint-3096 \| `global_step_19` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL` \| 100 \| 0.5 \| checkpoint-1935 \| `global_step_50` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL` \| 100 \| 0.2 \| checkpoint-774 \| `global_step_80` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-200step-0.2RL` \| 200 \| 0.8 \| checkpoint-6579 \| `global_step_39` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-200step-0.5RL` \| 200 \| 0.5 \| checkpoint-3870 \| `global_step_100` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL` \| 200 \| 0.2 \| checkpoint-1548 \| `global_step_160` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-300step-0.2RL` \| 300 \| 0.8 \| checkpoint-9675 \| `global_step_59` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-300step-0.5RL` \| 300 \| 0.5 \| checkpoint-6192 \| `global_step_150` \|
	\| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-300step-0.8RL` \| 300 \| 0.2 \| checkpoint-2322 \| `global_step_240` \|

	## Load

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo_id = "Interplay-LM-Reasoning/extrapolation_midtrain"
	subdir = "id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542"

	tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir)
	model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir)
	```

	## Citation

	```bibtex
	@misc{zhang2025interplaypretrainingmidtrainingrl,
	title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
	author={Charlie Zhang and Graham Neubig and Xiang Yue},
	year={2025},
	eprint={2512.07783},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2512.07783},
	}
	```