it can use dflash directly with z-lab/Qwen3.6-35B-A3B-Dflash
#2
by syvvvv - opened
it can use dflash directly with z-lab/Qwen3.6-35B-A3B-Dflash
Thanks for the information. With MTP, I don't think speculative decoding can achieve significant speedup, since the MTP module itself is relatively small. Also, if the original model authors were unable to further increase num_speculative_tokens, I doubt a third party implementation would be able to do so either. Please feel free to correct me if I'm mistaken.