fix chat template to avoid empty historical `<think>` blocks
#68
by latent-variable - opened
This fixes a chat template issue where historical assistant turns can emit empty <think>...</think> blocks even when reasoning_content is empty.
That matters because these empty historical <think> blocks change the serialized prompt without adding any useful information.
Why this is important:
- it reduces unnecessary prompt drift
- it improves prefix-cache reuse
- it helps avoid avoidable cache misses
- it reduces extra token processing caused by equivalent histories rendering differently
In practice, this means less wasted compute and better cache stability, especially in longer multi-turn or tool-using conversations.
The change is intentionally minimal:
- keep the historical
<think>wrapper whenreasoning_contentis actually present - do not emit an empty
<think>block when there is no reasoning content
Without this guard, the template can produce prior turns like:
assistant
<think>
</think>
<tool_call>...
instead of rendering just the assistant content or tool call directly.
So this change preserves real reasoning content while avoiding empty reasoning scaffolding that can hurt caching behavior.