fix chat template to avoid empty historical `<think>` blocks

#68

by latent-variable - opened 1 day ago

←

•

This fixes a chat template issue where historical assistant turns can emit empty <think>...</think> blocks even when reasoning_content is empty.

That matters because these empty historical <think> blocks change the serialized prompt without adding any useful information.

Why this is important:

it reduces unnecessary prompt drift
it improves prefix-cache reuse
it helps avoid avoidable cache misses
it reduces extra token processing caused by equivalent histories rendering differently

In practice, this means less wasted compute and better cache stability, especially in longer multi-turn or tool-using conversations.

The change is intentionally minimal:

keep the historical <think> wrapper when reasoning_content is actually present
do not emit an empty <think> block when there is no reasoning content

Without this guard, the template can produce prior turns like:

assistant
<think>

</think>

<tool_call>...

instead of rendering just the assistant content or tool call directly.

So this change preserves real reasoning content while avoiding empty reasoning scaffolding that can hurt caching behavior.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment