Run 35B LLM Model by GTX 3050 8GB

地端模型真的越來越猛了…

用 GTX 3050 8GB 跑 35B (IQ2_M) 模型,

還能有 30 t/s 的速度

.\llama-server.exe -m models\Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf --mmproj models\mmproj-Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf -c 16384 -np 1 -t 6 --flash-attn on --image-min-tokens 1024 --no-mmap

留言

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.