Granite 350M - 5M Context GGUF

This model has been modified to support 5 million token context natively.

Context Length

  • Native context: 5,242,880 tokens (5M)
  • Original: Based on glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF

Usage

from llama_cpp import Llama

llm = Llama(
    model_path="./granite-4.0-h-350m-DISTILL-gemini-3-pro-think-f16-5M-ctx.gguf",
    n_ctx=5242880,  # 5M context
    n_gpu_layers=-1,
    tensor_split=[1,1,1,1,1,1,1,1],  # 8 GPUs
)

output = llm("Your prompt here", max_tokens=500)

Files

File Quantization Size
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-f16-5M-ctx.gguf See name 0.64 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q2_k-5M-ctx.gguf See name 0.15 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q3_k_l-5M-ctx.gguf See name 0.18 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q3_k_m-5M-ctx.gguf See name 0.18 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q3_k_s-5M-ctx.gguf See name 0.17 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_0-5M-ctx.gguf See name 0.20 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_1-5M-ctx.gguf See name 0.22 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_k_m-5M-ctx.gguf See name 0.21 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_k_s-5M-ctx.gguf See name 0.20 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_0-5M-ctx.gguf See name 0.23 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_1-5M-ctx.gguf See name 0.25 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_k_m-5M-ctx.gguf See name 0.24 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_k_s-5M-ctx.gguf See name 0.23 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q6_k-5M-ctx.gguf See name 0.26 GB
granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q8_0-5M-ctx.gguf See name 0.34 GB

Requirements

  • llama-cpp-python with CUDA support
  • Multi-GPU recommended for 5M context

Credits

  • Original model: glogwa68
  • Context extension: Automated conversion
Downloads last month
378
GGUF
Model size
0.3B params
Architecture
granitehybrid
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for glogwa68/granite-4.0-h-350m-DISTILL-gemini-5M-CTX-GGUF