Meta Reward Modeling (MRM)
Overview
Meta Reward Modeling (MRM) is a personalized reward modeling framework designed to adapt to diverse user preferences with limited feedback.
Instead of learning a single global reward function, MRM treats each user as a separate learning task and applies a meta-learning approach to learn a shared initialization that enables fast, few-shot personalization.
MRM represents user-specific rewards as adaptive combinations over shared base reward functions and optimizes this structure through a bi-level meta-learning framework.
To improve robustness across heterogeneous users, MRM introduces a Robust Personalization Objective (RPO) that emphasizes hard-to-learn users during meta-training.
This repository provides trained checkpoints for reward modeling and user-level preference evaluation.
Links
- 📄 arXiv Paper: https://arxiv.org/abs/XXXX.XXXXX
- 🤗 Hugging Face Paper: https://huggingface.co/papers/XXXX.XXXXX
- 💻 GitHub Code: https://github.com/ModalityDance/MRM
- 📦 Hugging Face Collection: https://huggingface.co/collections/ModalityDance/mrm
Evaluation
The model is evaluated using user-level preference accuracy with few-shot personalization.
Inference follows the same adaptation procedure used during training: for each user, the reward weights are initialized from the meta-learned initialization and updated with a small number of gradient steps on user-specific preference data.
Example evaluation script
python inference.py \
--embed_pt data/emb/prism/V1.pt \
--meta_json data/emb/prism/V1.json \
--ckpt path/to/checkpoint.pt \
--dataset PRISM \
--seen_train_limit -1 \
--unseen_train_limit -1 \
--hidden_layers 2 \
--inner_lr 1e-3 \
--eval_inner_epochs 1 \
--val_ratio 0.9 \
--score_threshold -1 \
--seed 42 \
--device cuda:0
Citation
If you use this model or code in your research, please cite:
@article{mrm2025,
title = {Meta Reward Modeling for Personalized Alignment},
author = {Author Names},
journal = {arXiv preprint arXiv:XXXX.XXXXX},
year = {2025}
}
License
This model is released under the MIT License.
Model tree for ModalityDance/MRM-PRISM-V1
Base model
meta-llama/Llama-3.1-8B