Leaderboard
LLM 性能排行榜
在 DGX Spark 真机运行的模型性能测试
工况输入1024输出128并发
23 / 41
| Model | Runtime | Total tok/s▲▼ | Output tok/s▲▼ | Per User tok/s▲▼ | TTFT Avg ms▲▼ | TTFT P99 ms▲▼ | TPOT Avg ms▲▼ | TPOT P99 ms▲▼ |
|---|---|---|---|---|---|---|---|---|
| Qwen3.6-27B | nv_vllm_26_0326.03-py3 | 244.96 | 28.32 | 3.54 | 4,203.96 | 5,904.29 | 248.43 | 263.89 |
| Qwen3.6-27B-AWQ-INT4 | vllm_awq- | 446.40 | 51.60 | 6.45 | 6,066.27 | 8,453.90 | 107.74 | 136.89 |
| Qwen3.6-27B-FP8 | nv_vllm_26_0326.03-py3 | 409.67 | 47.37 | 5.92 | 2,901.23 | 4,052.95 | 145.36 | 157.99 |
| Qwen3.6-27B-FP8 双机并联 | nv_vllm_26_03_raytp226.03-py3 | 665.80 | 76.98 | 9.62 | 1,779.37 | 2,769.37 | 89.84 | 97.19 |
| Qwen3.6-35B-A3B | nv_vllm_26_0326.03-py3 | 238.96 | 27.63 | 3.46 | 16,520.76 | 32,734.38 | 33.83 | 34.23 |
| Qwen3.6-35B-A3B-FP8 | nv_vllm_26_0326.03-py3 | 1,407.61 | 162.76 | 20.34 | 1,004.83 | 1,395.39 | 41.29 | 44.64 |
| Qwen3.6-35B-A3B-NVFP4 | nv_vllm_26_0326.03-py3 | 1,504.51 | 173.96 | 21.74 | 869.22 | 1,241.96 | 39.02 | 42.74 |
| Qwen3.6-35B-A3B-NVFP4 (NV) | vllm_nightlynightly (0.22.1rc1) | 1,510.80 | 174.69 | 21.84 | 1,064.54 | 1,442.21 | 37.47 | 41.32 |
| Qwen3.5-0.8B | nv_vllm_26_0326.03-py3 | 5,599.30 | 647.43 | 80.93 | 220.56 | 313.28 | 10.61 | 11.46 |
| Qwen3.5-122B-A10B-NVFP4 | nv_vllm_26_0326.03-py3 | 430.19 | 49.75 | 6.22 | 6,398.16 | 12,091.33 | 70.42 | 78.92 |
| Qwen3.5-27B | nv_vllm_26_0326.03-py3 | 242.25 | 28.01 | 3.50 | 4,349.80 | 6,046.66 | 250.89 | 264.55 |
| Qwen3.5-27B-FP8 | nv_vllm_26_0326.03-py3 | 404.03 | 46.72 | 5.84 | 2,965.73 | 4,143.11 | 147.16 | 160.01 |
| Qwen3.5-2B | nv_vllm_26_0326.03-py3 | 2,883.74 | 333.44 | 41.68 | 357.18 | 525.02 | 21.09 | 22.53 |
| Qwen3.5-35B-A3B | nv_vllm_26_0326.03-py3 | 916.63 | 105.98 | 13.25 | 1,442.78 | 2,082.57 | 64.16 | 69.80 |
| Qwen3.5-35B-A3B-FP8 | nv_vllm_26_0326.03-py3 | 1,451.05 | 167.78 | 20.98 | 994.98 | 1,384.13 | 39.89 | 43.22 |
| Qwen3.5-4B | nv_vllm_26_0326.03-py3 | 1,319.12 | 152.53 | 19.06 | 757.23 | 1,188.72 | 46.22 | 50.07 |
| Qwen3.5-9B | nv_vllm_26_0326.03-py3 | 764.53 | 88.40 | 11.05 | 1,202.78 | 1,870.22 | 80.87 | 85.48 |
| Qwen3-32B-NVFP4 | nv_vllm_26_0326.03-py3 | 753.74 | 87.23 | 10.90 | 214.96 | 250.68 | 90.47 | 90.49 |
| Qwen3-Next-80B-A3B-Thinking-FP8 | nv_vllm_26_0326.03-py3 | 413.81 | 47.98 | 5.99 | 8,489.40 | 16,453.49 | 38.08 | 38.59 |
| DeepSeek-V4-Flash 双机并联 | vllm_ds4_sm120_ray_tp2latest | 481.76 | 56.11 | 7.01 | 7,720.81 | 8,367.23 | 92.52 | 126.43 |
| Gemma 4 E2B | vllm_gemma4_cu130gemma4-cu130 | 1,339.75 | 155.62 | 19.45 | 1,704.62 | 3,353.61 | 25.41 | 25.43 |
| Gemma 4 E2B-it | vllm_gemma4_cu130gemma4-cu130 | 1,334.37 | 154.57 | 19.32 | 1,719.58 | 3,381.83 | 25.52 | 25.55 |
| Gemma 4 E4B-it | vllm_gemma4_cu130gemma4-cu130 | 477.81 | 55.35 | 6.92 | 5,524.23 | 12,487.85 | 47.83 | 48.06 |
测试方法与数据说明
- 数据来源:DGX Spark 主机上的服务端压测链路,通过固定脚本对每个模型依次执行预热与并发压测。
- 性能指标:吞吐、TTFT、TPOT 均取自稳态汇总(预热后稳定轮次的平均值),反映端到端服务表现。
- 排序规则:默认按版本号降序分组;点击表头可切换该列降序/升序,再次点击回到默认排序。
- 稳态聚合:标准流程依次执行冷启动、预热一轮、预热两轮共三次请求。核心指标取预热一轮与预热两轮的平均值,排除冷启动偏差,确保各模型均处于相同热机状态。