llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)
SmolLM2-Rethink-135M-GGUF
SmolLM2-Rethink-135M is an experimental lightweight model trained on the Celestia3-DeepSeek-R1-0528 reasoning dataset. Based on the SmolLM2-135M-Instruct architecture, this model is specifically optimized for reasoning, structured outputs, and efficient small-scale deployment. Despite its compact size (135M parameters), it demonstrates strong capabilities in logical deduction, conversational coherence, and lightweight inference tasks.
Model Files
File Name
Size
Type
Description
SmolLM2-Rethink-135M.Q2_K.gguf
88.2 MB
Model
Q2_K quantized model (smallest)
SmolLM2-Rethink-135M.Q3_K_S.gguf
88.2 MB
Model
Q3_K_S quantized model
SmolLM2-Rethink-135M.Q3_K_M.gguf
93.5 MB
Model
Q3_K_M quantized model
SmolLM2-Rethink-135M.Q3_K_L.gguf
97.5 MB
Model
Q3_K_L quantized model
SmolLM2-Rethink-135M.Q4_K_S.gguf
102 MB
Model
Q4_K_S quantized model
SmolLM2-Rethink-135M.Q4_K_M.gguf
105 MB
Model
Q4_K_M quantized model
SmolLM2-Rethink-135M.Q5_K_S.gguf
110 MB
Model
Q5_K_S quantized model
SmolLM2-Rethink-135M.Q5_K_M.gguf
112 MB
Model
Q5_K_M quantized model
SmolLM2-Rethink-135M.Q6_K.gguf
138 MB
Model
Q6_K quantized model
SmolLM2-Rethink-135M.Q8_0.gguf
145 MB
Model
Q8_0 quantized model
SmolLM2-Rethink-135M.BF16.gguf
271 MB
Model
BF16 precision model
SmolLM2-Rethink-135M.F16.gguf
271 MB
Model
F16 precision model
SmolLM2-Rethink-135M.F32.gguf
540 MB
Model
F32 full precision model (largest)
.gitattributes
2.4 kB
Config
Git LFS configuration
config.json
29 Bytes
Config
Model configuration
README.md
31 Bytes
Documentation
Repository documentation
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant
types (lower is better):
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="prithivMLmods/SmolLM2-Rethink-135M-GGUF", filename="", )