Granite 350M - 5M Context GGUF
This model has been modified to support 5 million token context natively.
Context Length
- Native context: 5,242,880 tokens (5M)
- Original: Based on glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF
Usage
from llama_cpp import Llama
llm = Llama(
model_path="./granite-4.0-h-350m-DISTILL-gemini-3-pro-think-f16-5M-ctx.gguf",
n_ctx=5242880, # 5M context
n_gpu_layers=-1,
tensor_split=[1,1,1,1,1,1,1,1], # 8 GPUs
)
output = llm("Your prompt here", max_tokens=500)
Files
| File | Quantization | Size |
|---|---|---|
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-f16-5M-ctx.gguf | See name | 0.64 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q2_k-5M-ctx.gguf | See name | 0.15 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q3_k_l-5M-ctx.gguf | See name | 0.18 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q3_k_m-5M-ctx.gguf | See name | 0.18 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q3_k_s-5M-ctx.gguf | See name | 0.17 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_0-5M-ctx.gguf | See name | 0.20 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_1-5M-ctx.gguf | See name | 0.22 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_k_m-5M-ctx.gguf | See name | 0.21 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q4_k_s-5M-ctx.gguf | See name | 0.20 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_0-5M-ctx.gguf | See name | 0.23 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_1-5M-ctx.gguf | See name | 0.25 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_k_m-5M-ctx.gguf | See name | 0.24 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q5_k_s-5M-ctx.gguf | See name | 0.23 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q6_k-5M-ctx.gguf | See name | 0.26 GB |
| granite-4.0-h-350m-DISTILL-gemini-3-pro-think-q8_0-5M-ctx.gguf | See name | 0.34 GB |
Requirements
- llama-cpp-python with CUDA support
- Multi-GPU recommended for 5M context
Credits
- Original model: glogwa68
- Context extension: Automated conversion
- Downloads last month
- 378
Hardware compatibility
Log In
to view the estimation
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for glogwa68/granite-4.0-h-350m-DISTILL-gemini-5M-CTX-GGUF
Base model
ibm-granite/granite-4.0-h-350m-base
Finetuned
ibm-granite/granite-4.0-h-350m