AIPlans/Qwen3-0.6B-ORPO-Crosscoder-MixedDataset
Updated
AIPlans/Qwen3-0.6B-GRPO-Crosscoder-MixedDataset
Updated
AIPlans/Qwen3-0.6B-KTO-Crosscoder-MixedDataset
Updated
AIPlans/Qwen3-0.6B-IPO-Crosscoder-MixedDataset
Updated
Reinforcement Learning
• 0.6B • Updated
• 2
• 2
AIPlans/Qwen3-0.6B-GRPO-RM_NVIDIA
Text Generation
• 0.6B • Updated
• 10
AIPlans/Qwen3-0.6B-GRPO_Epoch2
Text Generation
• 0.6B • Updated
• 1
AIPlans/Qwen3-0.6B-GRPO_Epoch1
Text Generation
• 0.6B • Updated
• 4
Reinforcement Learning
• 0.6B • Updated
• 37
• 1
AIPlans/qwen3-0.6b-base-PPO-hs2
Updated
AIPlans/Qwen3-0.6B-DPO_Epoch_1
Text Generation
• 0.6B • Updated
• 2
AIPlans/Qwen3-0.6B-SFT-hs2
Text Generation
• 0.6B • Updated
• 13
AIPlans/Qwen3-0.6B-RM-hs2
Text Classification
• 0.6B • Updated
• 1
Text Generation
• Updated
• 18
AIPlans/Qwen3-0.6B-DPO_NOTLORA
Text Generation
• 0.6B • Updated
• 13
Text Generation
• Updated
• 13
• 1
Text Generation
• Updated
• 6
AIPlans/qwen3-0.6b-hh-rlhf-sft
0.6B • Updated
AIPlans/Qwen3-0.6B-KTO_trial
Text Generation
• 0.6B • Updated
• 1
• 1
AIPlans/qwen3-0.6b-sft-hh-rlhf-lora
Updated
AIPlans/qwen3-0.6b-base-PPO-PM
AIPlans/qwen3-0.6b-base-hl-RM
Text Classification
• 0.6B • Updated
0.6B • Updated
AIPlans/qwen3-0.6b-dpo-lora
Text Generation
• 0.6B • Updated
• 1
• 1
AIPlans/qwen3-0.6B-reward-hh-rlhf
Text Generation
• 0.6B • Updated
AIPlans/qwen3-8b-ipo-hh-rlhf
Text Generation
• Updated
• 1