Falcon-H1-Tiny
Collection
A series of extremely small, yet powerful language models redefining capabilities at small scale
β’
22 items
β’
Updated
β’
16
For more details about the training protocol of this model, please refer to the Tiny-H1 technical blogpost.
Currently to use this model you can either rely on Hugging Face transformers, vLLM, sglang, llama.cpp, ollama or mlx library.
Refer to the snippet below to run H1 models using π€ transformers:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "tiiuae/Falcon-H1-Tiny-R-0.6B-pre-GRPO"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Perform text generation
or
transformers serve tiiuae/Falcon-H1-Tiny-R-0.6B-pre-GRPO
llama.cpp
You can find all GGUF files compatible with llama.cpp under our official collection - an example setup could be:
brew install llama.cpp
pip install huggingface_hub
hf download tiiuae/Falcon-H1-Tiny-R-0.6B-pre-GRPO Falcon-H1-Tiny-R-0.6B-pre-GRPO-Q8_0.gguf --local-dir ./
llama-cli ./Falcon-H1-Tiny-R-0.6B-pre-GRPO-Q8_0.gguf -cnv
ollama
ollama run hf.co/tiiuae/Falcon-H1-Tiny-R-0.6B-pre-GRPO:Q8_0
mlx
mlx_lm.chat --model tiiuae/Tiny-H1-SF
For vLLM, simply start a server by executing the command below:
# pip install vllm>=0.9.0
vllm serve tiiuae/Falcon-H1-Tiny-R-0.6B-pre-GRPO --tensor-parallel-size 2 --data-parallel-size 1
python -m sglang.launch_server \
--model ttiiuae/Falcon-H1-Tiny-R-0.6B-pre-GRPO \
--tensor-parallel-size 1
For detailed evaluation of Tiny-H1 series, please refer to our technical blogpost
If the Tiny-H1 family of models were helpful to your work, feel free to give us a cite.
@misc{team2025_tiny_h1_a_series_of_extremely_small_yet_powerful_language_models_redefining_capabilities_at_small_scale,
title={Tiny-H1: A series of extremely small, yet powerful language models redefining capabilities at small scale},
author={Falcon-LLM Team},
year={2025},
}