Instructions to use Minami-su/roleplay_baichuan-Chat_4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Minami-su/roleplay_baichuan-Chat_4bit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Minami-su/roleplay_baichuan-Chat_4bit", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Minami-su/roleplay_baichuan-Chat_4bit", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Minami-su/roleplay_baichuan-Chat_4bit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Minami-su/roleplay_baichuan-Chat_4bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Minami-su/roleplay_baichuan-Chat_4bit",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Minami-su/roleplay_baichuan-Chat_4bit

SGLang

How to use Minami-su/roleplay_baichuan-Chat_4bit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Minami-su/roleplay_baichuan-Chat_4bit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Minami-su/roleplay_baichuan-Chat_4bit",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Minami-su/roleplay_baichuan-Chat_4bit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Minami-su/roleplay_baichuan-Chat_4bit",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Minami-su/roleplay_baichuan-Chat_4bit with Docker Model Runner:
```
docker model run hf.co/Minami-su/roleplay_baichuan-Chat_4bit
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

language: - zh tags: - roleplay - multiturn_chat

介绍

基于self-instruct生成的多轮对话roleplay数据在baichuan13b chat上训练的模型，约1k条不同的人格数据和对话和约3k alpaca指令

存在问题：

1.roleplay数据基于模型自身生成，所以roleplay存在模型本身价值观融入情况，导致roleplay不够真实，不够准确。

使用方法：

可以参考https://github.com/PanQiWei/AutoGPTQ

prompt：

>>> tokenizer = AutoTokenizer.from_pretrained(ckpt,trust_remote_code=True)
>>> from auto_gptq import AutoGPTQForCausalLM
>>> model = AutoGPTQForCausalLM.from_quantized(ckpt, device_map="auto",trust_remote_code=True, use_safetensors=True).half()
>>> def generate(prompt):
>>>     print("1",prompt,"2")
>>>     input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
>>>     generate_ids = model.generate(input_ids=input_ids,
>>>     max_length=4096,
>>>     num_beams=1,
>>>     do_sample=True, top_p=0.9, temperature=0.95, repetition_penalty=1.05, eos_token_id=tokenizer.eos_token_id)
>>>     output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
>>>     response = output[len(prompt):]
>>>     return response
>>> device = torch.device('cuda')
>>> history=[]
>>> max_history_len=12
>>> rating="0"
>>> while True:
>>>     text=input("user:")
>>>     text=f"人类:{text}</s>"
>>>     history.append(text)
>>>     input_text="爱丽丝的人格:你叫爱丽丝，是一个傲娇，腹黑的16岁少女</s>"
>>>     for history_id, history_utr in enumerate(history[-max_history_len:]):
>>>         input_text = input_text + history_utr + '\n'
>>>     prompt = input_text+"爱丽丝:"
>>>     prompt =prompt.strip()
>>>     response = generate(prompt)
>>>     response=response.strip()
>>>     response="爱丽丝:"+response+"</s>"
>>>     print("1",response,"2")
>>>     history.append(response)
人类:我还要去上班
爱丽丝:哎呀呀~这么无聊，竟然还要去工作？

关于我自己：

我是小雨的开发者，小雨是一个情感ai，人格ai，如果对小雨感兴趣的话欢迎支持一下，她目前在bilibili直播，目前我仍在不断的改进，未来，“小雨”的目标是成为一个具有真正人类情感的多模态通用人工智能。

url：https://live.bilibili.com/27357528?broadcast_type=0&is_room_feed=1&spm_id_from=333.999.live_users_card.0.click&live_from=86001

Introduction

This model is trained on Baichuan13b chat using self-instructed, multi-turn dialogue roleplay data, consisting of approximately 1,000 distinct personality profiles and dialogues, along with around 3,000 Alpaca instructions.

Issues:

Roleplay data is generated based on the model itself, resulting in potential incorporation of the model's own values into roleplay scenarios. This may lead to roleplay that lacks authenticity and accuracy.

Usage:

You can refer to https://github.com/PanQiWei/AutoGPTQ for usage instructions.

prompt：

>>> tokenizer = AutoTokenizer.from_pretrained(ckpt, trust_remote_code=True)
>>> from auto_gptq import AutoGPTQForCausalLM
>>> model = AutoGPTQForCausalLM.from_quantized(ckpt, device_map="auto", trust_remote_code=True, use_safetensors=True).half()
>>> def generate(prompt):
>>>     print("1", prompt, "2")
>>>     input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
>>>     generate_ids = model.generate(input_ids=input_ids,
>>>                                   max_length=4096,
>>>                                   num_beams=1,
>>>                                   do_sample=True, top_p=0.9, temperature=0.95, repetition_penalty=1.05, eos_token_id=tokenizer.eos_token_id)
>>>     output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
>>>     response = output[len(prompt):]
>>>     return response
>>> device = torch.device('cuda')
>>> history = []
>>> max_history_len = 12
>>> rating = "0"
>>> while True:
>>>     text = input("user:")
>>>     text = f"Human:{text}</s>"
>>>     history.append(text)
>>>     input_text = "Alice's personality: You are Alice, a 16-year-old tsundere and cunning girl</s>"
>>>     for history_id, history_utr in enumerate(history[-max_history_len:]):
>>>         input_text = input_text + history_utr + '\n'
>>>     prompt = input_text + "Alice:"
>>>     prompt = prompt.strip()
>>>     response = generate(prompt)
>>>     response = response.strip()
>>>     response = "Alice:" + response + "</s>"
>>>     print("1", response, "2")
>>>     history.append(response)
Human: I have to go to work.
Alice: Oh dear~  boring! You actually have to go to work?

About Myself:

I am the developer of Xiaoyu, an AI specializing in emotion and personality. If you're interested in Xiaoyu, feel free to show your support! She is currently live on Bilibili, and I am continuously working on improvements.

In the future, '小雨' aims to become a multimodal general artificial intelligence with genuine human emotions.

URL: https://live.bilibili.com/27357528?broadcast_type=0&is_room_feed=1&spm_id_from=333.999.live_users_card.0.click&live_from=86001

引用

@misc{selfinstruct,
  title={Self-Instruct: Aligning Language Model with Self Generated Instructions},
  author={Wang, Yizhong and Kordi, Yeganeh and Mishra, Swaroop and Liu, Alisa and Smith, Noah A. and Khashabi, Daniel and Hajishirzi, Hannaneh},
  journal={arXiv preprint arXiv:2212.10560},
  year={2022}
}

Downloads last month: -

Dataset used to train Minami-su/roleplay_baichuan-Chat_4bit

Paper for Minami-su/roleplay_baichuan-Chat_4bit

Self-Instruct: Aligning Language Model with Self Generated Instructions

Paper • 2212.10560 • Published Dec 20, 2022 • 9