Fix for thinking block not showing up

#1
by jukofyork - opened

Just in case anybody else has problems with the thinking block not showing up, I posted the fix needed to the jinja template here:

https://huggingface.co/unsloth/Kimi-K2.5-GGUF/discussions/1#697b46fdf48287bb9c2e92dc

I've figured out what is causing this:

https://huggingface.co/moonshotai/Kimi-K2.5/blob/main/chat_template.jinja

{%- if add_generation_prompt -%}
  <|im_assistant|>assistant<|im_middle|>
  {%- if thinking is defined and thinking is false -%}
  <think></think>
  {%- else -%}
  <think>        # <--- This forced <think> tag doesn't set returned as part of the chat.
  {%- endif -%}
{%- endif -%}

To fix this you need to make a copy of chat_template.jinja and delete the {%- else -%} <think> part at the bottom like this:

{%- if add_generation_prompt -%}
  <|im_assistant|>assistant<|im_middle|>
  {%- if thinking is defined and thinking is false -%}
  <think></think>
  {%- endif -%}
{%- endif -%}

Then run llama.cpp using --jinja --chat-template-file chat_template.jinja options.

(Also: there doesn't seem to be any need/requirement to use --special option now)

Thanks juk! I've tested this locally and it works on my setup now, much appreciated. The first shard of the quant has been updated with the new and corrected template.

hi @AesSedai

Kimi-K2.5-Q4_X-00001-of-00014.gguf size is 7M, the old one is like 45G.

Hi @CalvinZero , the old one should have been about 7MB as well. I split the gguf with --no-tensor-first-split so the very first shard should be small. I ran into issues with the previous GLM release that required a chat template update and that meant re-uploading the entire ~50GB first shard just to change that metadata.

Since then I've been using that split flag so the first shard can be modified easily. Since there are no tensors in that first split, it is expected to be like less than 10MB.

Appreciate the fix. doing about 5.3 tok/s tg so far on mostly CPU and mmap good quant thank you!

Thanks for the quant! And great idea making the first file small!!

Sign up or log in to comment