TeamDelta/bare-ja-v0.1
Viewer • Updated • 120k • 8 • 1
How to use DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2")
model = AutoModelForCausalLM.from_pretrained("DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2")How to use DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2
How to use DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2 with Docker Model Runner:
docker model run hf.co/DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2
ArrowIdeative-13b-NeoBase-ZERO-llm-jp は、ベースモデルから GRPO(RL)だけで事後学習を行うことを主軸に設計された、日本語向けLLMです。狙いとしては、典型的な「強い指示追従(Instruct)」に寄せ切らず、ベースモデル寄りの“出力の自由度”を残しつつ、チャット運用に最低限必要な形式順守と、回答品質の底上げを同時に実現することです。
位置づけを一言でまとめると:
import torch
from copy import deepcopy
from transformers import AutoTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList
# ===== モデル =====
model_path = "DataPilot/ArrowIdeative-13b-NeoBase-ZERO-llm-jp-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="auto",
torch_dtype=torch.bfloat16,
)
model.eval()
system_prompt = """あなたは有能なアシスタントです。日本語で丁寧に答えてください。"""
prompt = """CPUとGPUの違いについて教えてください。"""
# (元コードのChatML形式を維持)
text = f"""<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(text, add_special_tokens=False, return_tensors="pt", return_token_type_ids=False).to(model.device)
prompt_len = inputs["input_ids"].shape[1]
# "<|im_end|>" のトークン列(1トークンとは限らないので列で扱う)
stop_ids = tokenizer.encode("<|im_end|>", add_special_tokens=False)
stop_ids = torch.tensor(stop_ids, device=model.device, dtype=inputs["input_ids"].dtype)
class StopOnImEnd(StoppingCriteria):
def __init__(self, stop_ids_tensor: torch.Tensor):
super().__init__()
self.stop_ids = stop_ids_tensor
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
k = int(self.stop_ids.numel())
if k == 0 or input_ids.shape[1] < k:
return False
return torch.equal(input_ids[0, -k:], self.stop_ids)
stopping_criteria = StoppingCriteriaList([StopOnImEnd(stop_ids)])
# 既定EOSで止まらないようにする(= "<|im_end|>" のみで停止させる)
gen_config = deepcopy(model.generation_config)
gen_config.eos_token_id = None
gen_config.pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else model.config.eos_token_id
with torch.inference_mode():
output = model.generate(
**inputs,
generation_config=gen_config,
stopping_criteria=stopping_criteria,
max_new_tokens=1024,
do_sample=True,
top_p=0.95,
temperature=0.5,
repetition_penalty=1.05,
)
generated = tokenizer.decode(output[0, prompt_len:], skip_special_tokens=False)
print(generated.split("<|im_end|>", 1)[0])
このデータは、以下の合成フローにより作成されたものです(要約):
報酬は以下の5つの報酬関数で構成され、多角的に学習を誘導します:
<|im_end|>)の適切な出力とフォーマット準拠を評価None(マスク)として無視され学習に影響しないこのリポジトリやモデルカードを引用する場合は、以下をベースに調整してください:
@misc{arrowideative_13b_neobase_zero_llm_jp,
title = {ArrowIdeative-13b-NeoBase-ZERO-llm-jp},
author = {holy-fox},
year = {2026},
}
Base model
llm-jp/llm-jp-3-13b