Demystifying Group Relative Policy Optimization: Its Policy Gradient is a U-Statistic Paper • 2603.01162 • Published 5 days ago
Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text Paper • 2601.21895 • Published Jan 29 • 1
AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees Paper • 2510.01268 • Published Sep 29, 2025 • 3
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning Paper • 2504.03784 • Published Apr 3, 2025 • 3