license: mit base_model:

  • Qwen/Qwen2.5-7B

参考simple_GROP项目训练的模型,GSM8K,训练了200个step,出现了一次however。 使用了3张A800 80G,训练了20多分钟

训练结果:

loss

GPU

memory


测试结果

demo_math_chat_gen(simple_GRPO_why) demo_math_chat_gen(Qwen2.5-7B) notice

在GSM8K上进行评估,Qwen2.5-7B的得分为85.4。原因可能是是https://github.com/open-compass/opencompass/issues/1878


Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for whysue/simple_GRPO

Base model

Qwen/Qwen2.5-7B
Finetuned
(3182)
this model

Dataset used to train whysue/simple_GRPO