Instructions to use vidavox/SKK-Router-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use vidavox/SKK-Router-1.5B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="vidavox/SKK-Router-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("vidavox/SKK-Router-1.5B")
model = AutoModelForCausalLM.from_pretrained("vidavox/SKK-Router-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use vidavox/SKK-Router-1.5B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "vidavox/SKK-Router-1.5B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vidavox/SKK-Router-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/vidavox/SKK-Router-1.5B

SGLang

How to use vidavox/SKK-Router-1.5B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "vidavox/SKK-Router-1.5B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vidavox/SKK-Router-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "vidavox/SKK-Router-1.5B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vidavox/SKK-Router-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use vidavox/SKK-Router-1.5B with Docker Model Runner:
```
docker model run hf.co/vidavox/SKK-Router-1.5B
```

SKK-Router-1.5B / README.md

ZeArkh

Update README.md

aaab0fc verified 6 months ago

preview code

raw

history blame contribute delete

14.9 kB

	---
	base_model:
	- katanemo/Arch-Router-1.5B
	language:
	- en
	- id
	library_name: transformers
	license: other
	license_name: katanemo-research
	license_link: https://huggingface.co/katanemo/Arch-Router-1.5B/blob/main/LICENSE
	pipeline_tag: text-generation
	tags:
	- routing
	- preference
	- llm
	- qwen2.5
	- reasoning
	- Indonesian
	- SKK
	- internal-routing
	- question-complexity
	paper: https://arxiv.org/abs/2506.16655
	---

	# vidavox/SKK-Router-1.5B

	Version: v1.0 – SKK Router for internal routing
	Base model: [katanemo/Arch-Router-1.5B](https://huggingface.co/katanemo/Arch-Router-1.5B) (itself built on Qwen2.5-1.5B-Instruct) :contentReference[oaicite:0]{index=0}

	SKK-Router-1.5B is a domain-specialized router model fine-tuned from Arch-Router-1.5B for question complexity routing inside an internal SKK agent system.

	Instead of routing across many domains and actions, this model focuses on a single domain (SKK upstream oil & gas and related KSMI regulations) and chooses between:

	- a non-reasoning model for basic questions
	- a reasoning model for complex questions

	The model outputs a minimal JSON object:

	```json
	{"route": "basic"}
	```

	or

	```json
	{"route": "complex"}
	```

	It is designed for internal orchestration, not for direct end-user text generation.

	---

	## 1. Intended Use

	### Primary use case

	* Task: Route incoming questions to either a basic or complex LLM path based on question difficulty and reasoning requirements.
	* Domain: SKK internal agent system, with content grounded in KSMI and related SKK upstream O&G documents.
	* Users: Internal systems and engineers building the SKK agent stack. Not intended for general public use.

	### What the routes mean

	* `"basic"` route

	* Short, direct, or factoid-style questions.
	* Queries that can be answered with light or no multi-step reasoning.
	* Good for low-latency, low-cost non-reasoning models.

	* `"complex"` route

	* Multi-step reasoning, multi-constraint, or ambiguous questions.
	* Questions that require combining multiple facts, interpreting regulations, or deeper analysis.
	* Intended for slower, more capable reasoning models.

	### Out of scope

	* General conversational use outside SKK / KSMI context.
	* Safety-critical routing (e.g., medical, legal, or financial decisions).
	* Direct Q&A: this router only selects models; it does not itself produce the final answer.

	---

	## 2. How It Relates to Arch-Router

	Arch-Router-1.5B is a 1.5B-parameter preference-aligned router that maps queries to user-defined domains and actions for flexible multi-model routing. ([Hugging Face][1])

	SKK-Router-1.5B:

	* keeps the same routing prompt format as the original Arch-Router model (including the JSON route output).
	* narrows the routing space to question complexity within the SKK domain.
	* is trained on a bilingual (Indonesian/English) mix of synthetic and manually-written Q&A tailored to SKK’s internal use.

	If you are already familiar with Arch-Router, you can plug this model in as a drop-in replacement for the router, as long as your route configuration reflects the `"basic"` and `"complex"` choices used during fine-tuning.

	---

	## 3. Model Architecture

	* Backbone: Qwen2.5-1.5B-Instruct via Arch-Router-1.5B ([Hugging Face][2])
	* Parameters: ≈1.5B (same as base router) ([Hugging Face][1])
	* Tokenizer & chat template: inherited from Arch-Router-1.5B.
	* Fine-tune type: PEFT/LoRA fine-tune on Arch-Router-1.5B, followed by merging the adapter into the base weights to form a standalone checkpoint (`vidavox/SKK-Router-1.5B`).

	Languages:

	* Indonesian (Bahasa Indonesia)
	* English

	---

	## 4. Training Data

	The fine-tune uses a private, domain-specific dataset:

	```text
	DatasetDict({
	train: 3096 samples
	val: 884 samples
	test: 443 samples
	})
	```

	Each split has the following fields:

	* `instruction`: the main user question / request.
	* `input`: optional auxiliary context (may be empty).
	* `route`: original label in the data pipeline.
	* `output_route`: JSON string used as the target, e.g. `{"route": "basic"}`.

	### Data sources

	* Synthetic conversations and prompts generated to reflect SKK’s internal workflows.
	* Manually authored Q&A examples capturing realistic SKK / KSMI questions.
	* All data is private and not released with this model.
	* Domain focus: questions grounded in KSMI and related SKK upstream O&G regulations.

	### Label space

	For this fine-tune, the router is effectively binary:

	* `basic` – non-reasoning route
	* `complex` – reasoning route

	The original Arch-Router `"other"` route is present in the base model evaluation but not used as a target in the fine-tuned test set (see evaluation below).

	---

	## 5. Training Details

	* Framework: [TRL](https://github.com/huggingface/trl) `SFTTrainer` with `SFTConfig` (supervised fine-tuning).
	* Adapter: PEFT / LoRA attached to Arch-Router-1.5B; final model created by merging adapters into base.
	* Hardware: single NVIDIA GeForce RTX 3090 GPU.

	Key training configuration (high-level):

	* `per_device_train_batch_size = 2`
	* `per_device_eval_batch_size = 4`
	* `gradient_accumulation_steps = 8`
	→ effective batch size ≈ 16 (sequence-wise)
	* Early stopping with patience = 1 based on validation loss.
	* Train/val splits above; `test` used only for the final benchmark.

	For full configuration details, see the `Router-SFTTrainer.ipynb` notebook in this repository.

	---

	## 6. Evaluation

	The model was evaluated on a held-out test set of 443 samples, containing only `basic` and `complex` routes as the target labels.

	### 6.1 Route distribution

	Comparison of how often each model predicts each route:

	\| Route \| Target test data \| Fine-tuned model \| Base Arch-Router \|
	\| ------- \| ---------------- \| ---------------- \| ---------------- \|
	\| Basic \| 147 \| 160 \| 201 \|
	\| Complex \| 296 \| 283 \| 156 \|
	\| Other \| 0 \| 0 \| 86 \|

	Observations:

	* The fine-tuned model routes almost all queries to `basic` or `complex`, matching the target distribution closely.
	* The base Arch-Router tends to:

	* over-predict `basic`, and
	* send many SKK-style queries to the generic `other` route.

	### 6.2 Routing accuracy

	Accuracy is computed as:

	* prediction is correct if the chosen `"route"` matches the `output_route` label for that sample.

	\| Metric \| Fine-tuned model \| Base Arch-Router \|
	\| ---------------------- \| ---------------- \| ---------------- \|
	\| Basic route accuracy \| 91.50% \| 74.83% \|
	\| Complex route accuracy \| 93.10% \| 45.27% \|
	\| Overall accuracy \| 92.55% \| 55.08% \|

	Improvements (absolute percentage points):

	* Basic route: +16.67 pp
	* Complex route: +47.83 pp
	* Overall: +37.47 pp

	In practice, this means:

	* The router is much more reliable at distinguishing between simple and complex SKK queries.
	* Mis-routing complex questions to the non-reasoning path is drastically reduced compared to the base Arch-Router.

	> Note: These metrics are computed on private, synthetic + manually-authored data tailored to the SKK domain. Performance on other domains may be substantially different.

	---

	## 7. How to Use

	> ⚠️ Important: This model assumes the same overall routing prompt structure as `katanemo/Arch-Router-1.5B`. For best results, follow the upstream Arch-Router prompt format and simply adapt the `route_config` to your use case. ([Hugging Face][1])

	### 7.1 Minimal example

	```python
	import json
	from typing import Any, Dict, List
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "vidavox/SKK-Router-1.5B"
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	device_map="auto",
	torch_dtype="auto",
	trust_remote_code=True,
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Please use our provided prompt for best performance
	TASK_INSTRUCTION = """
	You are a helpful assistant designed to find the best suited route.
	You are provided with route description within <routes></routes> XML tags:
	<routes>

	{routes}

	</routes>

	<conversation>

	{conversation}

	</conversation>
	"""

	FORMAT_PROMPT = """
	Your task is to decide which route is best suit with user intent on the conversation in <conversation></conversation> XML tags. Follow the instruction:
	1. If the latest intent from user is irrelevant or user intent is full filled, response with other route {"route": "other"}.
	2. You must analyze the route descriptions and find the best match route for user latest intent.
	3. You only response the name of the route that best matches the user's request, use the exact name in the <routes></routes>.

	Based on your analysis, provide your response in the following JSON formats if you decide to match any route:
	{"route": "route_name"}
	"""

	# Define route config
	route_config = [
	{
	"name": "basic",
	"description": "Answering simple questions that ask for factual information, term meanings, or general knowledge.",
	},
	{
	"name": "complex",
	"description": "Handling specific, complex, or multi (more than one task) questions that require multi-step reasoning and interaction with databases to fetch and process data. For example, answering questions that need calculations, data analysis, or synthesis of information from multiple sources.",
	},
	]

	# Helper function to create the system prompt for our model
	def format_prompt(
	route_config: List[Dict[str, Any]], conversation: List[Dict[str, Any]]
	):
	return (
	TASK_INSTRUCTION.format(
	routes=json.dumps(route_config), conversation=json.dumps(conversation)
	)
	+ FORMAT_PROMPT
	)

	# Define conversations
	conversation = [
	{
	"role": "user",
	"content": "Apa pengertian dari Cadangan A dan berapa jumlahnya untuk Lapangan X?",
	}
	]
	route_prompt = format_prompt(route_config, conversation)
	messages = [
	{"role": "user", "content": route_prompt},
	]
	input_ids = tokenizer.apply_chat_template(
	messages, add_generation_prompt=True, return_tensors="pt"
	).to(model.device)

	# 2. Generate
	generated_ids = model.generate(
	input_ids=input_ids, # or just positional: model.generate(input_ids, …)
	max_new_tokens=32768,
	)

	# 3. Strip the prompt from each sequence
	prompt_lengths = input_ids.shape[1] # same length for every row here
	generated_only = [
	output_ids[prompt_lengths:] # slice off the prompt tokens
	for output_ids in generated_ids
	]

	# 4. Decode if you want text
	response = tokenizer.batch_decode(generated_only, skip_special_tokens=True)[0]
	print(response)
	```

	In the actual SKK agent system, this `"route"` is then used to decide whether to call the basic or reasoning LLM.

	---

	## 8. Limitations & Known Failure Modes

	### Limitations

	* Multi-turn conversations: The model may be less reliable on very long, multi-turn chats with shifting intent. It was primarily trained on shorter, focused interactions.
	* Ambiguous queries: If the question does not clearly indicate complexity (e.g., vague or underspecified prompts), the router may pick an unintuitive route.
	* Out-of-domain content: Questions unrelated to SKK / KSMI / upstream O&G may be routed unpredictably, since the training data is domain-specific.
	* Binary perspective: The router assumes a simple basic vs complex split; if you need multiple levels of reasoning or different tools, you may need to extend the label space and re-train.

	### Safety considerations

	* Not designed for medical, legal, or financial decision-making.
	* Should not be used in settings where an incorrect routing decision can cause harm or safety-critical failures.
	* Outputs are not explanations; they are discrete labels used for orchestration.

	---

	## 9. Bias & Data Caveats

	* Training data is heavily skewed toward:

	* SKK upstream petroleum / regulatory topics.
	* Text derived from or inspired by KSMI and related technical documents.
	* Language mix:

	* Bilingual Indonesian/English, but primarily focused on expert / technical wording typical for this domain.
	* As a result:

	* The model may over-assume that questions with regulatory or technical phrasing are “complex”.
	* It may not behave sensibly on informal, social-media style data or on domains very different from SKK.

	Because the underlying data is private and internal, users cannot independently audit its biases or coverage. Treat this model as highly specialized rather than general-purpose.

	---

	## 10. License & Usage

	This model is a fine-tuned derivative of katanemo/Arch-Router-1.5B, which is distributed under the Katanemo research license. ([Hugging Face][1])

	* License on this repo: `other` – `katanemo-research`.
	* By using this model, you must comply with:

	* the original Katanemo license for Arch-Router, and
	* any additional internal policies that apply to SKK data and systems.

	### Intended usage policy

	* Allowed / intended:

	* Research and experimentation on routing for question complexity.
	* Internal use as part of the SKK Internal Agent System.
	* Exploration of routing strategies in similar regulatory or technical domains, provided you have rights to the underlying data.

	* Not recommended / discouraged:

	* Exposing this router directly to end users as a chatbot.
	* Using it as a general-purpose router outside its domain without additional evaluation.
	* Using the model, or any system built with it, as the sole basis for safety-critical decisions.

	This description is not legal advice. For any production or commercial deployment, please review the Katanemo research license and your own organizational policies with qualified counsel.

	---

	## 11. Citation

	If you use this model or build upon it in academic or technical work, please consider citing the Arch-Router paper:

	```bibtex
	@article{tran2025archrouter,
	title = {Arch-Router: Aligning LLM Routing with Human Preferences},
	author = {Tran, Co and Paracha, Salman and Hafeez, Adil and Chen, Shuguang},
	journal = {arXiv preprint arXiv:2506.16655},
	year = {2025}
	}
	```

	And you may also reference this checkpoint as:

	> vidavox/SKK-Router-1.5B (v1.0 – SKK Router for internal routing), fine-tuned from katanemo/Arch-Router-1.5B on SKK-specific synthetic and manually curated routing data for basic vs complex question routing.

	[1]: https://huggingface.co/katanemo/Arch-Router-1.5B?utm_source=chatgpt.com "katanemo/Arch-Router-1.5B"
	[2]: https://huggingface.co/katanemo/Arch-Router-1.5B/commit/c3a3b356644a64c519091e56d1a19d013eb5290e?utm_source=chatgpt.com "Upload folder using huggingface_hub · katanemo/Arch- ..."