I bough lately Desktop Framework with intention of running Qwen 3.5 as model for my AI assistant on Nanobot. At first I could not run this model on this hardware for some weird bug in one of the libraries. I explained why and how I fixed it here. In this post I will just put the list of packages that I used to run Qwen finally and vLLM command switches and parameters.
Here is the list of packages that I used to finally get it working:
- vllm 0.17.1+rocm700
- amd-aiter 0.1.10.post2
- torch 2.9.1+git8907517
- triton 3.4.0
- rocm 7.2.0.70200-43~24.04
And here is the script that I am using:
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 \ VLLM_ROCM_USE_AITER=1 \ vllm serve \ cyankiwi/Qwen3.5-35B-A3B-AWQ-4bit \ --host 0.0.0.0 \ --port 8000 \ --reasoning-parser qwen3 \ --enable-auto-tool-choice \ --tool-call-parser qwen3_coder \ --dtype float16 \ --max-model-len 128k \ --gpu-memory-utilization 0.33
Happy hacking!

