Text Generation
Transformers
Safetensors
English
Indonesian
qwen2
routing
preference
llm
qwen2.5
reasoning
Indonesian
SKK
internal-routing
question-complexity
conversational
text-generation-inference
Instructions to use vidavox/SKK-Router-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vidavox/SKK-Router-1.5B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="vidavox/SKK-Router-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("vidavox/SKK-Router-1.5B") model = AutoModelForCausalLM.from_pretrained("vidavox/SKK-Router-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use vidavox/SKK-Router-1.5B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "vidavox/SKK-Router-1.5B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vidavox/SKK-Router-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/vidavox/SKK-Router-1.5B
- SGLang
How to use vidavox/SKK-Router-1.5B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "vidavox/SKK-Router-1.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vidavox/SKK-Router-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "vidavox/SKK-Router-1.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vidavox/SKK-Router-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use vidavox/SKK-Router-1.5B with Docker Model Runner:
docker model run hf.co/vidavox/SKK-Router-1.5B
| base_model: | |
| - katanemo/Arch-Router-1.5B | |
| language: | |
| - en | |
| - id | |
| library_name: transformers | |
| license: other | |
| license_name: katanemo-research | |
| license_link: https://huggingface.co/katanemo/Arch-Router-1.5B/blob/main/LICENSE | |
| pipeline_tag: text-generation | |
| tags: | |
| - routing | |
| - preference | |
| - llm | |
| - qwen2.5 | |
| - reasoning | |
| - Indonesian | |
| - SKK | |
| - internal-routing | |
| - question-complexity | |
| paper: https://arxiv.org/abs/2506.16655 | |
| # vidavox/SKK-Router-1.5B | |
| **Version:** v1.0 – SKK Router for internal routing | |
| **Base model:** [katanemo/Arch-Router-1.5B](https://huggingface.co/katanemo/Arch-Router-1.5B) (itself built on Qwen2.5-1.5B-Instruct) :contentReference[oaicite:0]{index=0} | |
| SKK-Router-1.5B is a domain-specialized router model fine-tuned from Arch-Router-1.5B for **question complexity routing** inside an internal SKK agent system. | |
| Instead of routing across many domains and actions, this model focuses on a **single domain** (SKK upstream oil & gas and related KSMI regulations) and chooses between: | |
| - a **non-reasoning model** for **basic** questions | |
| - a **reasoning model** for **complex** questions | |
| The model outputs a minimal JSON object: | |
| ```json | |
| {"route": "basic"} | |
| ``` | |
| or | |
| ```json | |
| {"route": "complex"} | |
| ``` | |
| It is designed for **internal orchestration**, not for direct end-user text generation. | |
| --- | |
| ## 1. Intended Use | |
| ### Primary use case | |
| * **Task:** Route incoming questions to either a *basic* or *complex* LLM path based on question difficulty and reasoning requirements. | |
| * **Domain:** SKK internal agent system, with content grounded in **KSMI** and related SKK upstream O&G documents. | |
| * **Users:** Internal systems and engineers building the SKK agent stack. Not intended for general public use. | |
| ### What the routes mean | |
| * **`"basic"` route** | |
| * Short, direct, or factoid-style questions. | |
| * Queries that can be answered with light or no multi-step reasoning. | |
| * Good for low-latency, low-cost non-reasoning models. | |
| * **`"complex"` route** | |
| * Multi-step reasoning, multi-constraint, or ambiguous questions. | |
| * Questions that require combining multiple facts, interpreting regulations, or deeper analysis. | |
| * Intended for slower, more capable reasoning models. | |
| ### Out of scope | |
| * General conversational use outside SKK / KSMI context. | |
| * Safety-critical routing (e.g., medical, legal, or financial decisions). | |
| * Direct Q&A: this router only **selects** models; it does not itself produce the final answer. | |
| --- | |
| ## 2. How It Relates to Arch-Router | |
| Arch-Router-1.5B is a 1.5B-parameter preference-aligned router that maps queries to user-defined domains and actions for flexible multi-model routing. ([Hugging Face][1]) | |
| SKK-Router-1.5B: | |
| * keeps the **same routing prompt format** as the original Arch-Router model (including the JSON route output). | |
| * narrows the routing space to **question complexity** within the SKK domain. | |
| * is trained on a bilingual (Indonesian/English) mix of **synthetic and manually-written Q&A** tailored to SKK’s internal use. | |
| If you are already familiar with Arch-Router, you can plug this model in as a **drop-in replacement** for the router, as long as your route configuration reflects the `"basic"` and `"complex"` choices used during fine-tuning. | |
| --- | |
| ## 3. Model Architecture | |
| * **Backbone:** Qwen2.5-1.5B-Instruct via Arch-Router-1.5B ([Hugging Face][2]) | |
| * **Parameters:** ≈1.5B (same as base router) ([Hugging Face][1]) | |
| * **Tokenizer & chat template:** inherited from Arch-Router-1.5B. | |
| * **Fine-tune type:** PEFT/LoRA fine-tune on Arch-Router-1.5B, followed by **merging the adapter into the base weights** to form a standalone checkpoint (`vidavox/SKK-Router-1.5B`). | |
| Languages: | |
| * **Indonesian** (Bahasa Indonesia) | |
| * **English** | |
| --- | |
| ## 4. Training Data | |
| The fine-tune uses a private, domain-specific dataset: | |
| ```text | |
| DatasetDict({ | |
| train: 3096 samples | |
| val: 884 samples | |
| test: 443 samples | |
| }) | |
| ``` | |
| Each split has the following fields: | |
| * `instruction`: the main user question / request. | |
| * `input`: optional auxiliary context (may be empty). | |
| * `route`: original label in the data pipeline. | |
| * `output_route`: JSON string used as the target, e.g. `{"route": "basic"}`. | |
| ### Data sources | |
| * Synthetic conversations and prompts generated to reflect SKK’s internal workflows. | |
| * Manually authored Q&A examples capturing realistic SKK / KSMI questions. | |
| * All data is **private** and not released with this model. | |
| * Domain focus: questions grounded in **KSMI** and related SKK upstream O&G regulations. | |
| ### Label space | |
| For this fine-tune, the router is effectively binary: | |
| * `basic` – non-reasoning route | |
| * `complex` – reasoning route | |
| The original Arch-Router `"other"` route is present in the **base model** evaluation but not used as a target in the fine-tuned test set (see evaluation below). | |
| --- | |
| ## 5. Training Details | |
| * **Framework:** [TRL](https://github.com/huggingface/trl) `SFTTrainer` with `SFTConfig` (supervised fine-tuning). | |
| * **Adapter:** PEFT / LoRA attached to Arch-Router-1.5B; final model created by merging adapters into base. | |
| * **Hardware:** single **NVIDIA GeForce RTX 3090** GPU. | |
| Key training configuration (high-level): | |
| * `per_device_train_batch_size = 2` | |
| * `per_device_eval_batch_size = 4` | |
| * `gradient_accumulation_steps = 8` | |
| → effective batch size ≈ 16 (sequence-wise) | |
| * Early stopping with patience = 1 based on validation loss. | |
| * Train/val splits above; `test` used only for the final benchmark. | |
| For full configuration details, see the `Router-SFTTrainer.ipynb` notebook in this repository. | |
| --- | |
| ## 6. Evaluation | |
| The model was evaluated on a held-out **test set of 443 samples**, containing only `basic` and `complex` routes as the target labels. | |
| ### 6.1 Route distribution | |
| Comparison of how often each model predicts each route: | |
| | Route | Target test data | Fine-tuned model | Base Arch-Router | | |
| | ------- | ---------------- | ---------------- | ---------------- | | |
| | Basic | 147 | 160 | 201 | | |
| | Complex | 296 | 283 | 156 | | |
| | Other | 0 | 0 | 86 | | |
| Observations: | |
| * The **fine-tuned model** routes almost all queries to `basic` or `complex`, matching the target distribution closely. | |
| * The **base Arch-Router** tends to: | |
| * over-predict `basic`, and | |
| * send many SKK-style queries to the generic `other` route. | |
| ### 6.2 Routing accuracy | |
| Accuracy is computed as: | |
| * prediction is correct if the chosen `"route"` matches the `output_route` label for that sample. | |
| | Metric | Fine-tuned model | Base Arch-Router | | |
| | ---------------------- | ---------------- | ---------------- | | |
| | Basic route accuracy | **91.50%** | 74.83% | | |
| | Complex route accuracy | **93.10%** | 45.27% | | |
| | **Overall accuracy** | **92.55%** | 55.08% | | |
| Improvements (absolute percentage points): | |
| * **Basic route:** +16.67 pp | |
| * **Complex route:** +47.83 pp | |
| * **Overall:** +37.47 pp | |
| In practice, this means: | |
| * The router is **much more reliable** at distinguishing between simple and complex SKK queries. | |
| * Mis-routing complex questions to the non-reasoning path is drastically reduced compared to the base Arch-Router. | |
| > Note: These metrics are computed on private, synthetic + manually-authored data tailored to the SKK domain. Performance on other domains may be substantially different. | |
| --- | |
| ## 7. How to Use | |
| > ⚠️ **Important:** This model assumes the same overall routing prompt structure as `katanemo/Arch-Router-1.5B`. For best results, follow the upstream Arch-Router prompt format and simply adapt the `route_config` to your use case. ([Hugging Face][1]) | |
| ### 7.1 Minimal example | |
| ```python | |
| import json | |
| from typing import Any, Dict, List | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "vidavox/SKK-Router-1.5B" | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| device_map="auto", | |
| torch_dtype="auto", | |
| trust_remote_code=True, | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| # Please use our provided prompt for best performance | |
| TASK_INSTRUCTION = """ | |
| You are a helpful assistant designed to find the best suited route. | |
| You are provided with route description within <routes></routes> XML tags: | |
| <routes> | |
| {routes} | |
| </routes> | |
| <conversation> | |
| {conversation} | |
| </conversation> | |
| """ | |
| FORMAT_PROMPT = """ | |
| Your task is to decide which route is best suit with user intent on the conversation in <conversation></conversation> XML tags. Follow the instruction: | |
| 1. If the latest intent from user is irrelevant or user intent is full filled, response with other route {"route": "other"}. | |
| 2. You must analyze the route descriptions and find the best match route for user latest intent. | |
| 3. You only response the name of the route that best matches the user's request, use the exact name in the <routes></routes>. | |
| Based on your analysis, provide your response in the following JSON formats if you decide to match any route: | |
| {"route": "route_name"} | |
| """ | |
| # Define route config | |
| route_config = [ | |
| { | |
| "name": "basic", | |
| "description": "Answering simple questions that ask for factual information, term meanings, or general knowledge.", | |
| }, | |
| { | |
| "name": "complex", | |
| "description": "Handling specific, complex, or multi (more than one task) questions that require multi-step reasoning and interaction with databases to fetch and process data. For example, answering questions that need calculations, data analysis, or synthesis of information from multiple sources.", | |
| }, | |
| ] | |
| # Helper function to create the system prompt for our model | |
| def format_prompt( | |
| route_config: List[Dict[str, Any]], conversation: List[Dict[str, Any]] | |
| ): | |
| return ( | |
| TASK_INSTRUCTION.format( | |
| routes=json.dumps(route_config), conversation=json.dumps(conversation) | |
| ) | |
| + FORMAT_PROMPT | |
| ) | |
| # Define conversations | |
| conversation = [ | |
| { | |
| "role": "user", | |
| "content": "Apa pengertian dari Cadangan A dan berapa jumlahnya untuk Lapangan X?", | |
| } | |
| ] | |
| route_prompt = format_prompt(route_config, conversation) | |
| messages = [ | |
| {"role": "user", "content": route_prompt}, | |
| ] | |
| input_ids = tokenizer.apply_chat_template( | |
| messages, add_generation_prompt=True, return_tensors="pt" | |
| ).to(model.device) | |
| # 2. Generate | |
| generated_ids = model.generate( | |
| input_ids=input_ids, # or just positional: model.generate(input_ids, …) | |
| max_new_tokens=32768, | |
| ) | |
| # 3. Strip the prompt from each sequence | |
| prompt_lengths = input_ids.shape[1] # same length for every row here | |
| generated_only = [ | |
| output_ids[prompt_lengths:] # slice off the prompt tokens | |
| for output_ids in generated_ids | |
| ] | |
| # 4. Decode if you want text | |
| response = tokenizer.batch_decode(generated_only, skip_special_tokens=True)[0] | |
| print(response) | |
| ``` | |
| In the actual SKK agent system, this `"route"` is then used to decide whether to call the **basic** or **reasoning** LLM. | |
| --- | |
| ## 8. Limitations & Known Failure Modes | |
| ### Limitations | |
| * **Multi-turn conversations:** The model may be less reliable on very long, multi-turn chats with shifting intent. It was primarily trained on shorter, focused interactions. | |
| * **Ambiguous queries:** If the question does not clearly indicate complexity (e.g., vague or underspecified prompts), the router may pick an unintuitive route. | |
| * **Out-of-domain content:** Questions unrelated to SKK / KSMI / upstream O&G may be routed unpredictably, since the training data is domain-specific. | |
| * **Binary perspective:** The router assumes a simple **basic vs complex** split; if you need multiple levels of reasoning or different tools, you may need to extend the label space and re-train. | |
| ### Safety considerations | |
| * Not designed for **medical, legal, or financial** decision-making. | |
| * Should not be used in settings where an incorrect routing decision can cause **harm or safety-critical failures**. | |
| * Outputs are **not** explanations; they are discrete labels used for orchestration. | |
| --- | |
| ## 9. Bias & Data Caveats | |
| * Training data is heavily skewed toward: | |
| * SKK upstream petroleum / regulatory topics. | |
| * Text derived from or inspired by **KSMI** and related technical documents. | |
| * Language mix: | |
| * Bilingual Indonesian/English, but primarily focused on expert / technical wording typical for this domain. | |
| * As a result: | |
| * The model may **over-assume** that questions with regulatory or technical phrasing are “complex”. | |
| * It may not behave sensibly on informal, social-media style data or on domains very different from SKK. | |
| Because the underlying data is private and internal, users **cannot** independently audit its biases or coverage. Treat this model as **highly specialized** rather than general-purpose. | |
| --- | |
| ## 10. License & Usage | |
| This model is a fine-tuned derivative of **katanemo/Arch-Router-1.5B**, which is distributed under the **Katanemo research license**. ([Hugging Face][1]) | |
| * **License on this repo:** `other` – `katanemo-research`. | |
| * By using this model, you must comply with: | |
| * the original **Katanemo license** for Arch-Router, and | |
| * any additional internal policies that apply to SKK data and systems. | |
| ### Intended usage policy | |
| * **Allowed / intended:** | |
| * Research and experimentation on routing for question complexity. | |
| * Internal use as part of the **SKK Internal Agent System**. | |
| * Exploration of routing strategies in similar regulatory or technical domains, provided you have rights to the underlying data. | |
| * **Not recommended / discouraged:** | |
| * Exposing this router directly to end users as a chatbot. | |
| * Using it as a general-purpose router outside its domain without additional evaluation. | |
| * Using the model, or any system built with it, as the sole basis for safety-critical decisions. | |
| This description is **not legal advice**. For any production or commercial deployment, please review the **Katanemo research license** and your own organizational policies with qualified counsel. | |
| --- | |
| ## 11. Citation | |
| If you use this model or build upon it in academic or technical work, please consider citing the Arch-Router paper: | |
| ```bibtex | |
| @article{tran2025archrouter, | |
| title = {Arch-Router: Aligning LLM Routing with Human Preferences}, | |
| author = {Tran, Co and Paracha, Salman and Hafeez, Adil and Chen, Shuguang}, | |
| journal = {arXiv preprint arXiv:2506.16655}, | |
| year = {2025} | |
| } | |
| ``` | |
| And you may also reference this checkpoint as: | |
| > vidavox/SKK-Router-1.5B (v1.0 – SKK Router for internal routing), fine-tuned from katanemo/Arch-Router-1.5B on SKK-specific synthetic and manually curated routing data for basic vs complex question routing. | |
| [1]: https://huggingface.co/katanemo/Arch-Router-1.5B?utm_source=chatgpt.com "katanemo/Arch-Router-1.5B" | |
| [2]: https://huggingface.co/katanemo/Arch-Router-1.5B/commit/c3a3b356644a64c519091e56d1a19d013eb5290e?utm_source=chatgpt.com "Upload folder using huggingface_hub · katanemo/Arch- ..." |