glogwa68
/

olmo-3-DISTILL-glm-4.7-think

Text Generation

Model card Files Files and versions

olmo-3-DISTILL-glm-4.7-think / README.md

glogwa68's picture

Update README

a678ebe verified 4 months ago

|

history blame contribute delete

2.12 kB

	---
	base_model: unsloth/Olmo-3-7B-Think
	library_name: transformers
	license: apache-2.0
	language:
	- en
	- fr
	tags:
	- granite
	- fine-tuned
	- conversational
	- distillation
	- thinking
	- reasoning
	datasets:
	- TeichAI/glm-4.7-2000x
	pipeline_tag: text-generation
	---

	# olmo-3-DISTILL-glm-4.7-think

	This model is a fine-tuned version of [unsloth/Olmo-3-7B-Think](https://huggingface.co/unsloth/Olmo-3-7B-Think) trained on high-reasoning conversational data from GLM 4.7 by Z.ai.

	## Model Details

	- Base Model: unsloth/Olmo-3-7B-Think
	- Fine-tuning Dataset: TeichAI/glm-4.7-2000x
	- Context Length: 1048576 tokens
	- Special Feature: Thinking/Reasoning with `<think>` tags

	## Quantized Versions (GGUF)

	🔗 GGUF versions available here: [olmo-3-DISTILL-glm-4.7-think-GGUF](https://huggingface.co/glogwa68/olmo-3-DISTILL-glm-4.7-think-GGUF)

	\| Format \| Size \| Use Case \|
	\|--------\|------\|----------\|
	\| Q2_K \| Smallest \| Low memory, reduced quality \|
	\| Q4_K_M \| Recommended \| Best balance \|
	\| Q5_K_M \| Good \| Higher quality \|
	\| Q8_0 \| Large \| Near lossless \|
	\| F16 \| Largest \| Original precision \|

	## Usage

	### Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("glogwa68/olmo-3-DISTILL-glm-4.7-think")
	tokenizer = AutoTokenizer.from_pretrained("glogwa68/olmo-3-DISTILL-glm-4.7-think")

	messages = [{"role": "user", "content": "Hello, how are you?"}]
	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
	outputs = model.generate(inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Ollama (GGUF)

	```bash
	ollama run hf.co/glogwa68/olmo-3-DISTILL-glm-4.7-think-GGUF:Q4_K_M
	```

	### llama.cpp

	```bash
	llama-cli --hf-repo glogwa68/olmo-3-DISTILL-glm-4.7-think-GGUF --hf-file olmo-3-distill-glm-4.7-think-q4_k_m.gguf -p "Hello"
	```

	## Training Details

	- Epochs: 2
	- Learning Rate: 2e-5
	- Batch Size: 8 (with gradient accumulation)
	- Precision: FP16
	- Hardware: Multi-GPU with DeepSpeed ZeRO-3

	## License

	Apache 2.0