Instructions to use keyfan/bloomz-rlhf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use keyfan/bloomz-rlhf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="keyfan/bloomz-rlhf")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("keyfan/bloomz-rlhf") model = AutoModelForCausalLM.from_pretrained("keyfan/bloomz-rlhf") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use keyfan/bloomz-rlhf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "keyfan/bloomz-rlhf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "keyfan/bloomz-rlhf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/keyfan/bloomz-rlhf
- SGLang
How to use keyfan/bloomz-rlhf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "keyfan/bloomz-rlhf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "keyfan/bloomz-rlhf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "keyfan/bloomz-rlhf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "keyfan/bloomz-rlhf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use keyfan/bloomz-rlhf with Docker Model Runner:
docker model run hf.co/keyfan/bloomz-rlhf
This is an attempt to replicate the RLHF pipeline
Base Model
We used bloomz-7b1-mt because of its less-restricted license and multilingual ability.
Supervised Fintune
For SFT we used a combination of multiple datasets including:
- RyokoAI/ShareGPT52K
- GPTeacher
- Alpaca-GPT4 en & zh
- Filtered subset of machine-translated ShareGPT dataset into Chinese
Reward Model
For RM we used the code of reward-modeling repo and datasets from
Reinforcement Learning
For RL we used the code of trlx with slight modification.
Instead of building value network upon the policy network with a single linear layer, we add another hydra head upon the reference network's frozen bottom layers as value network.
Example
We used Vicuna v1.1 template for model training
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "keyfan/bloomz-rlhf"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).cuda()
template = ("A chat between a curious human and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the human's questions. "
"USER: {}\nASSISTANT:")
question = template.format("Who was the president of the United States in 1955?")
inputs = tokenizer.encode(question, return_tensors="pt").cuda()
outputs = model.generate(inputs, do_sample=True, top_p=0.8, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
Evalutions
Result on the Chinese BELLE eval set
| others | rewrite | classification | generation | summarization | extract | open qa | brainstorming | closed qa | macro ave | macro ave w/o others |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.619 | 0.873 | 0.706 | 0.934 | 0.755 | 0.619 | 0.527 | 0.908 | 0.615 | 0.728 | 0.742 |
- We found in GPT-4 evaluation the order in which the responses were presented has unneglectable affect on the final score even with the very-well designed Vicuna prompt. So we removed the score on the Vicuna eval set.
- Downloads last month
- 13