fix chat template to avoid empty historical `<think>` blocks

#68

This fixes a chat template issue where historical assistant turns can emit empty <think>...</think> blocks even when reasoning_content is empty.

That matters because these empty historical <think> blocks change the serialized prompt without adding any useful information.

Why this is important:

  • it reduces unnecessary prompt drift
  • it improves prefix-cache reuse
  • it helps avoid avoidable cache misses
  • it reduces extra token processing caused by equivalent histories rendering differently

In practice, this means less wasted compute and better cache stability, especially in longer multi-turn or tool-using conversations.

The change is intentionally minimal:

  • keep the historical <think> wrapper when reasoning_content is actually present
  • do not emit an empty <think> block when there is no reasoning content

Without this guard, the template can produce prior turns like:

assistant
<think>

</think>

<tool_call>...

instead of rendering just the assistant content or tool call directly.

So this change preserves real reasoning content while avoiding empty reasoning scaffolding that can hurt caching behavior.

Edit: Made a video explaining the bug

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment