Instructions to use ThingAI/Quark-270m-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ThingAI/Quark-270m-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ThingAI/Quark-270m-Instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("ThingAI/Quark-270m-Instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ThingAI/Quark-270m-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ThingAI/Quark-270m-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-270m-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ThingAI/Quark-270m-Instruct
- SGLang
How to use ThingAI/Quark-270m-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ThingAI/Quark-270m-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-270m-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ThingAI/Quark-270m-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-270m-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ThingAI/Quark-270m-Instruct with Docker Model Runner:
docker model run hf.co/ThingAI/Quark-270m-Instruct
Quark-270M-Instruct โ Bilingual Chat Model
Quark-270M-Instruct is the instruction-tuned version of Quark-270M Base, fine-tuned for conversational use in Italian and English. Built entirely from scratch by ThingsAI.
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"ThingAI/Quark-270m-Instruct",
trust_remote_code=True,
torch_dtype=torch.bfloat16
).cuda()
model.lm_head.weight = model.embed_tokens.weight # ensure weight tying
tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-270m-Instruct")
prompt = "<|user|>\nCiao, come stai?\n<|end|>\n<|assistant|>\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=150, do_sample=True, temperature=0.7, top_k=40)
print(tokenizer.decode(out[0], skip_special_tokens=False))
Chat Format
<|user|>
{user message}
<|end|>
<|assistant|>
{model response}
<|end|>
Multi-turn:
<|user|>
Ciao!
<|end|>
<|assistant|>
Ciao! Come posso aiutarti?
<|end|>
<|user|>
Cos'รจ l'intelligenza artificiale?
<|end|>
<|assistant|>
Model Details
| Base Model | Quark-270M Base |
| Parameters | 252M (with weight tying) |
| Architecture | Decoder-only Transformer (GQA, SwiGLU, RMSNorm, RoPE) |
| Vocabulary | 65,537 tokens |
| Context Length | 2,048 tokens |
| Precision | BF16 |
| Languages | Italian, English |
Architecture
| d_model | 768 |
| Layers | 32 |
| Query Heads | 12 |
| KV Heads | 4 |
| Head Dim | 64 |
| FFN Dim | 2,048 |
| Activation | SwiGLU |
Training
Base Pretraining
~10B tokens on a bilingual mix (Italian 50%, English 43%, Code 7%) on NVIDIA B200. See Quark-270M Base for details.
SFT (Instruction Tuning)
Fine-tuned on a diverse mix of conversational and instructional data:
| Dataset | Examples | Type |
|---|---|---|
| FreedomIntelligence/alpaca-gpt4-italian | ~52,000 | Italian instructions |
| HuggingFaceH4/no_robots | ~9,500 | English conversations |
| m-a-p/CodeFeedback-Filtered-Instruction | 5,000 | Code instructions |
| yogeshm/text_to_bash (ร80) | ~9,900 | Terminal commands |
| Custom chitchat (ร100) | ~3,000 | Identity, greetings, basic Q&A |
| Total | ~80,000 |
| Hardware | NVIDIA B200 |
| Epochs | 3 |
| Learning Rate | 2e-5 (cosine decay) |
| Batch Size | 16 ร 4 = 64 effective |
| Sequence Length | 512 |
Inference Server
Quark-270M-Instruct powers Things Chat via a self-hosted FastAPI server with SSE streaming, conversation memory, web search, and content moderation.
Limitations
- 252M is small: Limited factual knowledge, prone to hallucination
- Mathematics: Unreliable beyond basic arithmetic
- Code: Generates plausible but often non-functional code
- Context: 2,048 token window
- No system prompt: The model was not trained with
<|system|>tags
Good for
- Self-hosted bilingual chatbot
- Learning about LLM training from scratch
- Terminal command assistance
- Light conversational AI
Not suited for
- Factual Q&A requiring accuracy
- Complex reasoning or math
- Production-grade code generation
- Safety-critical applications
The Quark Family
| Model | Parameters | Type |
|---|---|---|
| Quark-50M | 51M | Base |
| Quark-135M | 135M | Base |
| Quark-270M Base | 252M | Base |
| Quark-270M-Instruct | 252M | Chat |
Links
- ๐ ThingsAI
- ๐ฌ Things Chat
- ๐ค QuarkTokenizer
- ๐ Open SLM Leaderboard
Built from scratch by ThingsAI ๐ฎ๐น
- Downloads last month
- -