Granite 350M - 5M Context GGUF

This model has been modified to support 5 million token context natively.

Context Length

Native context: 5,242,880 tokens (5M)
Original: Based on glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF

Usage

from llama_cpp import Llama

llm = Llama(
    model_path="./granite-4.0-h-350m-DISTILL-gemini-3-pro-think-f16-5M-ctx.gguf",
    n_ctx=5242880,  # 5M context
    n_gpu_layers=-1,
    tensor_split=[1,1,1,1,1,1,1,1],  # 8 GPUs
)

output = llm("Your prompt here", max_tokens=500)

Files

File	Quantization	Size
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-f16-5M-ctx.gguf	See name	0.64 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q2_k-5M-ctx.gguf	See name	0.15 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q3_k_l-5M-ctx.gguf	See name	0.18 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q3_k_m-5M-ctx.gguf	See name	0.18 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q3_k_s-5M-ctx.gguf	See name	0.17 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_0-5M-ctx.gguf	See name	0.20 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_1-5M-ctx.gguf	See name	0.22 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_k_m-5M-ctx.gguf	See name	0.21 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_k_s-5M-ctx.gguf	See name	0.20 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_0-5M-ctx.gguf	See name	0.23 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_1-5M-ctx.gguf	See name	0.25 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_k_m-5M-ctx.gguf	See name	0.24 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_k_s-5M-ctx.gguf	See name	0.23 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q6_k-5M-ctx.gguf	See name	0.26 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q8_0-5M-ctx.gguf	See name	0.34 GB

Requirements

llama-cpp-python with CUDA support
Multi-GPU recommended for 5M context

Credits

Original model: glogwa68
Context extension: Automated conversion

Downloads last month: 378

GGUF

Model size

0.3B params

Architecture

granitehybrid

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for glogwa68/granite-4.0-h-350m-DISTILL-gemini-5M-CTX-GGUF

Base model

ibm-granite/granite-4.0-h-350m-base

Finetuned

ibm-granite/granite-4.0-h-350m

Finetuned

glogwa68/granite-4.0-h-350m-DISTILL-gemini-think

Quantized

glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF

Quantized

(1)

this model