endless response
#31
by
ramidahbash
- opened
I have tried this in AWQ version.
I deployed it using vllm 0.10.2 and 4 H100 GPUs and the response never ends, it looks like he in a conversation with itself so the response is the a question to himself and he answer it in a never ending loop.
Setting the temperature to 1.0 doesn't help.
try using vllm 0.12.0.
vllm 0.11.2 doesnt work for me, why 0.12.0 would help?
this model is supported with vllm>=0.10.2