it can use dflash directly with z-lab/Qwen3.6-35B-A3B-Dflash

by syvvvv - opened 15 days ago

Discussion

syvvvv

15 days ago

it can use dflash directly with z-lab/Qwen3.6-35B-A3B-Dflash

wenhuach

Intel org 15 days ago

Thanks for the information. With MTP, I don't think speculative decoding can achieve significant speedup, since the MTP module itself is relatively small. Also, if the original model authors were unable to further increase num_speculative_tokens, I doubt a third party implementation would be able to do so either. Please feel free to correct me if I'm mistaken.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment