Instructions to use Minami-su/roleplay_baichuan-Chat_4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Minami-su/roleplay_baichuan-Chat_4bit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Minami-su/roleplay_baichuan-Chat_4bit", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Minami-su/roleplay_baichuan-Chat_4bit", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Minami-su/roleplay_baichuan-Chat_4bit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Minami-su/roleplay_baichuan-Chat_4bit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Minami-su/roleplay_baichuan-Chat_4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Minami-su/roleplay_baichuan-Chat_4bit
- SGLang
How to use Minami-su/roleplay_baichuan-Chat_4bit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Minami-su/roleplay_baichuan-Chat_4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Minami-su/roleplay_baichuan-Chat_4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Minami-su/roleplay_baichuan-Chat_4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Minami-su/roleplay_baichuan-Chat_4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Minami-su/roleplay_baichuan-Chat_4bit with Docker Model Runner:
docker model run hf.co/Minami-su/roleplay_baichuan-Chat_4bit
language: - zh tags: - roleplay - multiturn_chat
介绍
基于self-instruct生成的多轮对话roleplay数据在baichuan13b chat上训练的模型,约1k条不同的人格数据和对话和约3k alpaca指令
存在问题:
1.roleplay数据基于模型自身生成,所以roleplay存在模型本身价值观融入情况,导致roleplay不够真实,不够准确。
使用方法:
可以参考https://github.com/PanQiWei/AutoGPTQ
prompt:
>>> tokenizer = AutoTokenizer.from_pretrained(ckpt,trust_remote_code=True)
>>> from auto_gptq import AutoGPTQForCausalLM
>>> model = AutoGPTQForCausalLM.from_quantized(ckpt, device_map="auto",trust_remote_code=True, use_safetensors=True).half()
>>> def generate(prompt):
>>> print("1",prompt,"2")
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
>>> generate_ids = model.generate(input_ids=input_ids,
>>> max_length=4096,
>>> num_beams=1,
>>> do_sample=True, top_p=0.9, temperature=0.95, repetition_penalty=1.05, eos_token_id=tokenizer.eos_token_id)
>>> output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
>>> response = output[len(prompt):]
>>> return response
>>> device = torch.device('cuda')
>>> history=[]
>>> max_history_len=12
>>> rating="0"
>>> while True:
>>> text=input("user:")
>>> text=f"人类:{text}</s>"
>>> history.append(text)
>>> input_text="爱丽丝的人格:你叫爱丽丝,是一个傲娇,腹黑的16岁少女</s>"
>>> for history_id, history_utr in enumerate(history[-max_history_len:]):
>>> input_text = input_text + history_utr + '\n'
>>> prompt = input_text+"爱丽丝:"
>>> prompt =prompt.strip()
>>> response = generate(prompt)
>>> response=response.strip()
>>> response="爱丽丝:"+response+"</s>"
>>> print("1",response,"2")
>>> history.append(response)
人类:我还要去上班
爱丽丝:哎呀呀~这么无聊,竟然还要去工作?
关于我自己:
我是小雨的开发者,小雨是一个情感ai,人格ai,如果对小雨感兴趣的话欢迎支持一下,她目前在bilibili直播,目前我仍在不断的改进,未来,“小雨”的目标是成为一个 具有真正人类情感的多模态通用人工智能。
Introduction
This model is trained on Baichuan13b chat using self-instructed, multi-turn dialogue roleplay data, consisting of approximately 1,000 distinct personality profiles and dialogues, along with around 3,000 Alpaca instructions.
Issues:
Roleplay data is generated based on the model itself, resulting in potential incorporation of the model's own values into roleplay scenarios. This may lead to roleplay that lacks authenticity and accuracy.
Usage:
You can refer to https://github.com/PanQiWei/AutoGPTQ for usage instructions.
prompt:
>>> tokenizer = AutoTokenizer.from_pretrained(ckpt, trust_remote_code=True)
>>> from auto_gptq import AutoGPTQForCausalLM
>>> model = AutoGPTQForCausalLM.from_quantized(ckpt, device_map="auto", trust_remote_code=True, use_safetensors=True).half()
>>> def generate(prompt):
>>> print("1", prompt, "2")
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
>>> generate_ids = model.generate(input_ids=input_ids,
>>> max_length=4096,
>>> num_beams=1,
>>> do_sample=True, top_p=0.9, temperature=0.95, repetition_penalty=1.05, eos_token_id=tokenizer.eos_token_id)
>>> output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
>>> response = output[len(prompt):]
>>> return response
>>> device = torch.device('cuda')
>>> history = []
>>> max_history_len = 12
>>> rating = "0"
>>> while True:
>>> text = input("user:")
>>> text = f"Human:{text}</s>"
>>> history.append(text)
>>> input_text = "Alice's personality: You are Alice, a 16-year-old tsundere and cunning girl</s>"
>>> for history_id, history_utr in enumerate(history[-max_history_len:]):
>>> input_text = input_text + history_utr + '\n'
>>> prompt = input_text + "Alice:"
>>> prompt = prompt.strip()
>>> response = generate(prompt)
>>> response = response.strip()
>>> response = "Alice:" + response + "</s>"
>>> print("1", response, "2")
>>> history.append(response)
Human: I have to go to work.
Alice: Oh dear~ boring! You actually have to go to work?
About Myself:
I am the developer of Xiaoyu, an AI specializing in emotion and personality. If you're interested in Xiaoyu, feel free to show your support! She is currently live on Bilibili, and I am continuously working on improvements.
In the future, '小雨' aims to become a multimodal general artificial intelligence with genuine human emotions.
引用
@misc{selfinstruct,
title={Self-Instruct: Aligning Language Model with Self Generated Instructions},
author={Wang, Yizhong and Kordi, Yeganeh and Mishra, Swaroop and Liu, Alisa and Smith, Noah A. and Khashabi, Daniel and Hajishirzi, Hannaneh},
journal={arXiv preprint arXiv:2212.10560},
year={2022}
}
- Downloads last month
- -