Leaderboard

LLM 性能排行榜

在 DGX Spark 真机运行的模型性能测试

工况输入1024输出128并发
23 / 41
ModelRuntimeTotal tok/s▲▼Output tok/s▲▼Per User tok/s▲▼TTFT Avg ms▲▼TTFT P99 ms▲▼TPOT Avg ms▲▼TPOT P99 ms▲▼
Qwen3.6-27Bnv_vllm_26_0326.03-py3244.9628.323.544,203.965,904.29248.43263.89
Qwen3.6-27B-AWQ-INT4vllm_awq-446.4051.606.456,066.278,453.90107.74136.89
Qwen3.6-27B-FP8nv_vllm_26_0326.03-py3409.6747.375.922,901.234,052.95145.36157.99
Qwen3.6-27B-FP8 双机并联nv_vllm_26_03_raytp226.03-py3665.8076.989.621,779.372,769.3789.8497.19
Qwen3.6-35B-A3Bnv_vllm_26_0326.03-py3238.9627.633.4616,520.7632,734.3833.8334.23
Qwen3.6-35B-A3B-FP8nv_vllm_26_0326.03-py31,407.61162.7620.341,004.831,395.3941.2944.64
Qwen3.6-35B-A3B-NVFP4nv_vllm_26_0326.03-py31,504.51173.9621.74869.221,241.9639.0242.74
Qwen3.6-35B-A3B-NVFP4 (NV)vllm_nightlynightly (0.22.1rc1)1,510.80174.6921.841,064.541,442.2137.4741.32
Qwen3.5-0.8Bnv_vllm_26_0326.03-py35,599.30647.4380.93220.56313.2810.6111.46
Qwen3.5-122B-A10B-NVFP4nv_vllm_26_0326.03-py3430.1949.756.226,398.1612,091.3370.4278.92
Qwen3.5-27Bnv_vllm_26_0326.03-py3242.2528.013.504,349.806,046.66250.89264.55
Qwen3.5-27B-FP8nv_vllm_26_0326.03-py3404.0346.725.842,965.734,143.11147.16160.01
Qwen3.5-2Bnv_vllm_26_0326.03-py32,883.74333.4441.68357.18525.0221.0922.53
Qwen3.5-35B-A3Bnv_vllm_26_0326.03-py3916.63105.9813.251,442.782,082.5764.1669.80
Qwen3.5-35B-A3B-FP8nv_vllm_26_0326.03-py31,451.05167.7820.98994.981,384.1339.8943.22
Qwen3.5-4Bnv_vllm_26_0326.03-py31,319.12152.5319.06757.231,188.7246.2250.07
Qwen3.5-9Bnv_vllm_26_0326.03-py3764.5388.4011.051,202.781,870.2280.8785.48
Qwen3-32B-NVFP4nv_vllm_26_0326.03-py3753.7487.2310.90214.96250.6890.4790.49
Qwen3-Next-80B-A3B-Thinking-FP8nv_vllm_26_0326.03-py3413.8147.985.998,489.4016,453.4938.0838.59
DeepSeek-V4-Flash 双机并联vllm_ds4_sm120_ray_tp2latest481.7656.117.017,720.818,367.2392.52126.43
Gemma 4 E2Bvllm_gemma4_cu130gemma4-cu1301,339.75155.6219.451,704.623,353.6125.4125.43
Gemma 4 E2B-itvllm_gemma4_cu130gemma4-cu1301,334.37154.5719.321,719.583,381.8325.5225.55
Gemma 4 E4B-itvllm_gemma4_cu130gemma4-cu130477.8155.356.925,524.2312,487.8547.8348.06

测试方法与数据说明

  1. 数据来源:DGX Spark 主机上的服务端压测链路,通过固定脚本对每个模型依次执行预热与并发压测。
  2. 性能指标:吞吐、TTFT、TPOT 均取自稳态汇总(预热后稳定轮次的平均值),反映端到端服务表现。
  3. 排序规则:默认按版本号降序分组;点击表头可切换该列降序/升序,再次点击回到默认排序。
  4. 稳态聚合:标准流程依次执行冷启动、预热一轮、预热两轮共三次请求。核心指标取预热一轮与预热两轮的平均值,排除冷启动偏差,确保各模型均处于相同热机状态。