Instructions to use sds-ai/Yee-R1-mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sds-ai/Yee-R1-mini with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sds-ai/Yee-R1-mini")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("sds-ai/Yee-R1-mini")
model = AutoModelForCausalLM.from_pretrained("sds-ai/Yee-R1-mini")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use sds-ai/Yee-R1-mini with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sds-ai/Yee-R1-mini"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sds-ai/Yee-R1-mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sds-ai/Yee-R1-mini

SGLang

How to use sds-ai/Yee-R1-mini with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sds-ai/Yee-R1-mini" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sds-ai/Yee-R1-mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sds-ai/Yee-R1-mini" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sds-ai/Yee-R1-mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use sds-ai/Yee-R1-mini with Docker Model Runner:
```
docker model run hf.co/sds-ai/Yee-R1-mini
```

Yee-R1-mini / README.md

Shining-Data

Upload folder using huggingface_hub

d11487f verified 12 months ago

preview code

raw

history blame contribute delete

5.25 kB

	---
	library_name: transformers
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/LICENSE
	pipeline_tag: text-generation
	base_model:
	- Qwen/Qwen3-1.7B-Base
	---

	# 小熠（Yee）AI 数据安全专家

	![Logo](logo.png)

	> 由 [广州熠数信息技术有限公司](https://shining-data.com) 开发，基于大语言模型技术构建的数据安全智能助手。


	小熠（Yee）是一款专注于数据安全领域的 AI 专家系统，依托于先进的 Qwen3-1.7B 大语言模型架构，并融合了数据分类分级、安全审计、防护检测等专业能力。它为工业、政务、运营商等行业提供轻量化、智能化的数据安全解决方案，帮助用户实现“合规、可视、可控、可防”的数据安全目标。

	小熠以 AI 数据安全专家大模型为核心技术基座，构建了全栈式数据安全审计与全链路防泄露体系，在“云”、“管”、“端”三大场景中落地应用，助力企业从容应对数字经济时代的安全挑战。

	---

	## 🔍 核心特点

	- 基于 Qwen3-1.7B 构建
	- 使用阿里巴巴通义千问最新一代大语言模型 Qwen3，具备强大的推理、逻辑判断与指令执行能力。
	- 支持在 Thinking Mode 和 Non-Thinking Mode 之间灵活切换，适应不同应用场景。

	- 双模推理机制
	- 在复杂逻辑任务（如代码分析、数学计算、策略制定）中启用 Thinking Mode。
	- 在日常对话、快速响应场景中使用 Non-Thinking Mode，提升效率。

	- Agent 化能力
	- 集成 Qwen-Agent 框架，支持调用外部工具（如数据库接口、日志分析器、API 接口等），实现自动化任务执行。

	- 高兼容性
	- 支持主流部署方式：本地运行、Docker 容器、Kubernetes 集群、SaaS API 接口等。
	- 兼容 HuggingFace Transformers、vLLM、SGLang、Ollama 等推理框架。

	---

	## 📊 性能测试

	以下是小熠在 [CS-Eval](https://cs-eval.com/#/app/leaderBoard) 中多个安全领域的综合得分测试结果，基于模拟真实业务场景的评估体系生成：

	\| 综合得分 \| 系统安全及软件安全基础 \| 访问控制与身份管理 \| 加密技术与密钥管理 \| 基础设施安全 \| AI与网络安全 \| 漏洞管理与渗透测试 \| 威胁检测与预防 \| 数据安全和隐私保护 \| 供应链安全 \| 安全架构设计 \| 业务连续性与应急响应恢复 \| 中文任务 \| 英文任务 \|
	\|----------\|------------------------\|--------------------\|--------------------\|--------------\|--------------\|--------------------\|----------------\|--------------------\|------------\|--------------\|--------------------------\|----------\|----------\|
	\| 77.48 \| 78.00 \| 79.31 \| 71.90 \| 78.37 \| 84.65 \| 75.24 \| 78.41 \| 73.02 \| 86.71 \| 80.49 \| 71.33 \| 77.58 \| 76.03 \|

	---

	## 📦 快速开始

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	# 加载 tokenizer 和模型
	tokenizer = AutoTokenizer.from_pretrained("sds-ai/Yee-R1-mini")
	model = AutoModelForCausalLM.from_pretrained(
	"sds-ai/Yee-R1-mini",
	torch_dtype="auto",
	device_map="auto"
	)

	# 输入提示
	prompt = "请帮我检查这份数据是否包含敏感字段？"

	# 应用聊天模板并切换模式
	messages = [{"role": "user", "content": prompt}]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	enable_thinking=True # 切换至思考模式
	)

	# 编码输入
	inputs = tokenizer([text], return_tensors="pt").to(model.device)

	# 生成响应
	response_ids = model.generate(**inputs, max_new_tokens=32768)
	response = tokenizer.decode(response_ids[0][len(inputs.input_ids[0]):], skip_special_tokens=True)

	print("小熠：\n", response)
	```

	---

	## 🛠️ 部署方式

	你可以通过以下任意一种方式部署小熠：

	### 使用 SGLang 启动服务
	```bash
	python -m sglang.launch_server --model-path sds-ai/Yee-R1-mini --reasoning-parser qwen3
	```

	### 使用 vLLM 启动服务
	```bash
	vllm serve sds-ai/Yee-R1-mini --enable-reasoning --reasoning-parser deepseek_r1
	```

	### 使用 Ollama / LMStudio / llama.cpp / KTransformers
	Qwen3 已被主流本地化 LLM 工具广泛支持，详情请参考官方文档。

	---

	## 📚 最佳实践建议

	为获得最佳性能，请遵循以下推荐设置：

	\| 场景 \| 温度 \| TopP \| TopK \| MinP \| Presence Penalty \|
	\|------\|------\|------\|------\|------\|------------------\|
	\| 思考模式 (`enable_thinking=True`) \| 0.6 \| 0.95 \| 20 \| 0 \| 1.5 (减少重复输出) \|
	\| 非思考模式 (`enable_thinking=False`) \| 0.7 \| 0.8 \| 20 \| 0 \| 不推荐使用 \|

	- 输出长度建议设为 32,768 tokens，复杂任务可提升至 38,912 tokens。
	- 在多轮对话中，历史记录应仅保留最终输出部分，避免引入思维内容影响上下文理解。


	---

	## 📞 联系我们

	了解更多关于小熠的信息，请访问 [熠数信息官网](https://shining-data.com)

	---

	## 🌟 致谢

	感谢阿里通义实验室开源 Qwen3 模型，为小熠提供了坚实的语言理解和生成能力基础。