Instructions to use Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF", dtype="auto")

llama-cpp-python

How to use Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF",
	filename="Fortytwo_Strand-Rust-Coder-14B-BF16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M

SGLang

How to use Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF with Ollama:
```
ollama run hf.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M
```

Unsloth Studio new

How to use Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF to start chatting

Docker Model Runner
How to use Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF with Docker Model Runner:
```
docker model run hf.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M
```

Lemonade

How to use Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Strand-Rust-Coder-14B-v1-GGUF-Q4_K_M

List all available models

lemonade list

Strand-Rust-Coder-14B-v1-GGUF / README.md

inikitin

Update README.md

56fca9d verified 4 months ago

preview code

raw

history blame contribute delete

9.15 kB

	---
	license: apache-2.0
	base_model:
	- Fortytwo-Network/Strand-Rust-Coder-14B-v1
	base_model_relation: quantized
	datasets:
	- Fortytwo-Network/Strandset-Rust-v1
	pipeline_tag: text-generation
	library_name: transformers
	---


	![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63aeda3a2314b93f9e706a68/I6WwY8U7I5V8lc138UmGt.jpeg)

	# Strand-Rust-Coder-14B-v1

	## Overview

	Strand-Rust-Coder-14B-v1 is the first domain-specialized Rust language model created through Fortytwo’s Swarm Inference, a decentralized AI architecture where multiple models collaboratively generate, validate, and rank outputs through peer consensus.

	The model fine-tunes Qwen2.5-Coder-14B for Rust-specific programming tasks using a 191K-example synthetic dataset built via multi-model generation and peer-reviewed validation.
	It achieves 43–48% accuracy on Rust-specific benchmarks – surpassing much larger proprietary models like GPT-5 Codex on Rust tasks – while maintaining competitive general coding performance.

	[Strand-Rust-Coder-v1: Technical Report](https://huggingface.co/blog/Fortytwo-Network/strand-rust-coder-tech-report)

	## Key Features

	- Rust-specialized fine-tuning on 15 diverse programming task categories
	- Peer-validated synthetic dataset (191,008 verified examples, 94.3% compile rate)
	- LoRA-based fine-tuning for efficient adaptation
	- Benchmarked across Rust-specific suites:
	- RustEvo^2
	- Evaluation on Hold-Out Set
	- Deployed in the Fortytwo decentralized inference network for collective AI reasoning

	---

	## Performance Summary

	\| Model \| Hold-Out Set \| RustEvo^2 \|
	\|------------\|------------------\|---------------\|
	\| Fortytwo-Rust-One-14B (Ours) \| 48.00% \| 43.00% \|
	\| openai/gpt-5-codex \| 47.00% \| 28.00% \|
	\| anthropic/claude-sonnet-4.5 \| 46.00% \| 21.00% \|
	\| anthropic/claude-3.7-sonnet \| 42.00% \| 31.00% \|
	\| qwen/qwen3-max \| 42.00% \| 40.00% \|
	\| qwen/qwen3-coder-plus \| 41.00% \| 22.00% \|
	\| x-ai/grok-4 \| 39.00% \| 37.00% \|
	\| deepseek/deepseek-v3.1-terminus \| 37.00% \| 33.00% \|
	\| Qwen3-Coder-30B-A3B-Instruct \| 36.00% \| 20.00% \|
	\| openai/gpt-4o-latest \| 34.00% \| 39.00% \|
	\| deepseek/deepseek-chat \| 34.00% \| 41.00% \|
	\| google/gemini-2.5-flash \| 33.00% \| 7.00% \|
	\| Qwen2.5-Coder-14B-Instruct (Base) \| 29.00% \| 30.00% \|
	\| Qwen2.5-Coder-32B-Instruct \| 29.00% \| 31.00% \|
	\| google/gemini-2.5-pro \| 28.00% \| 22.00% \|
	\| qwen/qwen-2.5-72b \| 28.00% \| 32.00% \|
	\| Tesslate/Tessa-Rust-T1-7B \| 23.00% \| 19.00% \|

	Benchmarks on code tasks measured using unit-test pass rate@1 in Docker-isolated Rust 1.86.0 environment.

	---

	## Task Breakdown

	\| Task \| Base \| Strand-14B \|
	\|------\|------\|-------------\|
	\| test_generation \| 0.00 \| 0.51 \|
	\| api_usage_prediction \| 0.27 \| 0.71 \|
	\| function_naming \| 0.53 \| 0.87 \|
	\| code_refactoring \| 0.04 \| 0.19–0.20 \|
	\| variable_naming \| 0.87 \| 1.00 \|
	\| code_generation \| 0.40 \| 0.49 \|

	Largest improvements appear in test generation, API usage prediction, and refactoring – areas demanding strong semantic reasoning about Rust’s ownership and lifetime rules.

	---

	## Dataset

	Fortytwo-Network/Strandset-Rust-v1 (191,008 examples, 15 categories)
	Built through Fortytwo’s Swarm Inference pipeline, where multiple SLMs generate and cross-validate examples with peer review consensus and output aggregation.

	- 94.3% compile success rate
	- 73.2% consensus acceptance
	- Coverage of 89% of Rust language features
	- Tasks include:
	- `code_generation`, `code_completion`, `bug_detection`, `refactoring`, `optimization`
	- `docstring_generation`, `code_review`, `summarization`, `test_generation`
	- `naming`, `API usage prediction`, `search`

	Dataset construction involved 2,383 crates from crates.io, automatic compilation tests, and semantic validation of ownership and lifetime correctness.

	Dataset: [Fortytwo-Network/Strandset-Rust-v1](https://huggingface.co/datasets/Fortytwo-Network/Strandset-Rust-v1)

	---

	## Training Configuration

	\| Setting \| Value \|
	\|----------\|-------\|
	\| Base model \| Qwen2.5-Coder-14B-Instruct \|
	\| Method \| LoRA (r=64, α=16) \|
	\| Learning rate \| 5e-5 \|
	\| Batch size \| 128 \|
	\| Epochs \| 3 \|
	\| Optimizer \| AdamW \|
	\| Precision \| bfloat16 \|
	\| Objective \| Completion-only loss \|
	\| Context length \| 32,768 \|
	\| Framework \| PyTorch + FSDP + Flash Attention 2 \|
	\| Hardware \| 8× H200 GPUs \|

	---

	## Model Architecture

	- Base: Qwen2.5-Coder (14 B parameters, GQA attention, extended RoPE embeddings)
	- Tokenizer: 151 k vocabulary optimized for Rust syntax
	- Context: 32 k tokens
	- Fine-tuning: Parameter-efficient LoRA adapters (≈1% of parameters updated)
	- Deployment: Compatible with local deployment and Fortytwo Capsule runtime for distributed swarm inference

	---

	## Evaluation Protocol

	- All evaluations executed in Docker-isolated Rust 1.86.0 environment
	- Code tasks: measured via unit test pass rate
	- Documentation & naming tasks: scored via LLM-based correctness (Claude Sonnet 4 judge)
	- Code completion & API tasks: syntax-weighted Levenshtein similarity
	- Comment generation: compilation success metric

	---

	## Why It Matters

	Rust is a high-safety, low-level language with complex ownership semantics that make it uniquely challenging for general-purpose LLMs.
	At the same time, there is simply not enough high-quality training data on Rust, as it remains a relatively modern and rapidly evolving language.
	This scarcity of large, reliable Rust datasets – combined with the language’s intricate borrow checker and type system – makes it an ideal benchmark for evaluating true model understanding and reasoning precision.

	Strand-Rust-Coder demonstrates how specialized models can outperform giant centralized models – achieving domain mastery with a fraction of the compute.
	Through Fortytwo’s Swarm Inference, the network was able to generate an extremely accurate synthetic dataset, enabling a state-of-the-art Rust model to be built through an efficient LoRA fine-tune rather than full retraining.

	This work validates Fortytwo’s thesis: intelligence can scale horizontally through networked specialization rather than centralized scale.

	---

	## Research & References

	- [Fortytwo: Swarm Inference with Peer-Ranked Consensus (arXiv)](https://arxiv.org/abs/2510.24801) - Fortytwo Swarm Inference – Technical Report
	- [Self-Supervised Inference of Agents in Trustless Environments (arXiv)](https://arxiv.org/abs/2409.08386) – High-level overview of Fortytwo architecture

	---

	## Intended Use

	- Rust code generation, completion, and documentation
	- Automated refactoring and test generation
	- Integration into code copilots and multi-agent frameworks
	- Research on domain-specialized model training and evaluation

	### Limitations
	- May underperform on purely algorithmic or multi-language tasks (e.g., HumanEval-style puzzles).
	- Not suitable for generating unverified production code without compilation and test validation.

	---

	## Integration with Fortytwo Network

	Strand-Rust-Coder models are integrated into Fortytwo’s decentralized Swarm Inference Network, where specialized models collaborate and rank each other’s outputs.
	This structure enables peer-reviewed inference, improving reliability while reducing hallucinations and cost.

	To run a Fortytwo node or contribute your own models and fine-tunes, visit: [fortytwo.network](https://fortytwo.network)

	---

	## GGUF Quantized Versions

	This repository provides GGUF-format quantizations of the model [Fortytwo-Network/Strand-Rust-Coder-14B-v1](https://huggingface.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1), optimized for local inference using tools such as llama.cpp, Jan, Ollama, LM Studio and other compatible runtimes.

	These quantizations significantly reduce memory requirements while preserving near-original accuracy, making deployment possible on a wide range of consumer hardware.

	\| Quantization \| File Size \| Bit Precision \| Description \|
	\|------------------\|-----------\|------------------\|----------------\|
	\| Q8_0 \| 15.7 GB \| 8-bit \| Near-full precision, for most demanding local inference \|
	\| Q6_K \| 12.1 GB \| 6-bit \| Balanced performance and efficiency \|
	\| Q5_K_M \| 10.5 GB \| 5-bit \| Lightweight deployment with strong accuracy retention \|
	\| Q4_K_M \| 8.99 GB \| 4-bit \| Ultra-fast, compact variant for consumer GPUs and laptops \|

	---

	### Usage

	You can load the GGUF models with llama.cpp or compatible backends:

	```bash
	./main -m models/Strand-Rust-Coder-14B-v1.Q5_K_M.gguf -p "Write a Rust function that reads a file line by line."
	```

	Or run interactively in Jan, LM Studio or Ollama by simply importing the model.

	---

	### License

	These quantized weights are distributed under the same Apache 2.0 License as the original model.


	Fortytwo – An open, networked intelligence shaped collectively by its participants

	Join the swarm: [fortytwo.network](https://fortytwo.network)

	X: [@fortytwo](https://x.com/fortytwo)