Fix for thinking block not showing up

by jukofyork - opened Jan 29

Jan 29

Just in case anybody else has problems with the thinking block not showing up, I posted the fix needed to the jinja template here:

https://huggingface.co/unsloth/Kimi-K2.5-GGUF/discussions/1#697b46fdf48287bb9c2e92dc

I've figured out what is causing this:

https://huggingface.co/moonshotai/Kimi-K2.5/blob/main/chat_template.jinja
{%- if add_generation_prompt -%}
  <|im_assistant|>assistant<|im_middle|>
  {%- if thinking is defined and thinking is false -%}
  <think></think>
  {%- else -%}
  <think>        # <--- This forced <think> tag doesn't set returned as part of the chat.
  {%- endif -%}
{%- endif -%}
To fix this you need to make a copy of chat_template.jinja and delete the {%- else -%} <think> part at the bottom like this:
{%- if add_generation_prompt -%}
  <|im_assistant|>assistant<|im_middle|>
  {%- if thinking is defined and thinking is false -%}
  <think></think>
  {%- endif -%}
{%- endif -%}
Then run llama.cpp using --jinja --chat-template-file chat_template.jinja options.

(Also: there doesn't seem to be any need/requirement to use --special option now)

AesSedai

Owner Jan 29

Thanks juk! I've tested this locally and it works on my setup now, much appreciated. The first shard of the quant has been updated with the new and corrected template.

CalvinZero

Jan 30

•

edited Jan 30

hi @AesSedai

Kimi-K2.5-Q4_X-00001-of-00014.gguf size is 7M, the old one is like 45G.

AesSedai

Owner Jan 30

Hi @CalvinZero , the old one should have been about 7MB as well. I split the gguf with --no-tensor-first-split so the very first shard should be small. I ran into issues with the previous GLM release that required a chat template update and that meant re-uploading the entire ~50GB first shard just to change that metadata.

Since then I've been using that split flag so the first shard can be modified easily. Since there are no tensors in that first split, it is expected to be like less than 10MB.

Hunterx

Feb 1

Appreciate the fix. doing about 5.3 tok/s tg so far on mostly CPU and mmap good quant thank you!

anikifoss

Feb 3

Thanks for the quant! And great idea making the first file small!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment