Models and datasets in paper "WPO: Enhancing RLHF with Weighted Preference Optimization".
Wenxuan Zhou
wzhouad
AI & ML interests
None yet
Organizations
models 8
wzhouad/Llama3-Instruct-8B-WPO-HB-v2
Text Generation • 8B • Updated
• 6 • 5
wzhouad/Llama3-Instruct-8B-WPO-HB
Text Generation • 8B • Updated
• 2 • 1
wzhouad/zephyr-7B-WPO-HB
Text Generation • 7B • Updated
• 3
wzhouad/gemma-2-9b-it-WPO-HB
Text Generation • 9B • Updated
• 16 • 34
wzhouad/gemma-2-9b-it-WPO-FP
Text Generation • 9B • Updated
• 2
wzhouad/zephyr-7B-WPO-FP
Text Generation • 7B • Updated
• 1
wzhouad/Llama3-Instruct-8B-WPO-FP
Text Generation • 8B • Updated
• 1
wzhouad/prix-lm
Text Generation • Updated
• 7