Resources for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
Xin Lai
xinlai
AI & ML interests
Multimodal LLM, LLM Reasoning, Point Cloud Segmentation, Image Segmentation
Organizations
None yet
models 21
xinlai/Qwen2-7B-Instruct-Step-DPO
Text Generation • 8B • Updated
• 1 • 3
xinlai/Qwen2-57B-A14B-SFT-Step-DPO
Text Generation • 57B • Updated
• 3 • 1
xinlai/Qwen1.5-32B-SFT-Step-DPO
Text Generation • 33B • Updated
• 3 • 1
xinlai/Llama-3-70B-SFT-Step-DPO
Text Generation • 71B • Updated
• 1
xinlai/DeepSeekMath-Base-SFT-Step-DPO
Text Generation • 7B • Updated
• 1
xinlai/Qwen2-7B-SFT-Step-DPO
Text Generation • 8B • Updated
• 2
xinlai/Qwen2-72B-Instruct-Step-DPO
Text Generation • 73B • Updated
• 4
xinlai/DeepSeekMath-RL-Step-DPO
Text Generation • 7B • Updated
• 22 • 2
xinlai/Qwen2-57B-A14B-SFT
Text Generation • 57B • Updated
• 4
xinlai/Qwen1.5-32B-SFT
Text Generation • 33B • Updated
• 2