AI BENCHY 分类
常识问答 排名
看看哪些 AI 模型在 常识问答 上表现最好,哪些更稳定,以及差距主要出现在哪里。 排序方式: 总成本 ↑.
169/169
筛选模型
没有模型匹配当前搜索和筛选条件。
| 排名 | 模型 | 公司 | 常识问答 得分 | 分数 | 总成本 | 测试正确 | 响应时间(平均) |
|---|---|---|---|---|---|---|---|
| #106 | Qwen3.5 Plus 2026-02-15 none | Qwen | 3.0 | 5.8 | $0.016 | 0/1 | 1.11s |
| #119 | MiMo-V2.5-Pro none | Xiaomi | 3.0 | 5.5 | $0.017 | 0/1 | 1.89s |
| #126 | DeepSeek V3.2 none | DeepSeek | 3.0 | 5.3 | $0.017 | 0/1 | 17.2s |
| #84 | Gemini 3.1 Flash Lite Preview none | 3.0 | 6.4 | $0.018 | 0/1 | 814ms | |
| #86 | Hy3 preview low | Tencent | 3.0 | 6.4 | $0.018 | 0/1 | 41.7s |
| #92 | Seed-2.0-Lite none | Bytedance Seed | 3.0 | 6.2 | $0.019 | 0/1 | 1.96s |
| #125 | Qwen3.5-122B-A10B none | Qwen | 3.0 | 5.3 | $0.020 | 0/1 | 295ms |
| #168 | Step 3.5 Flash none | Stepfun | 3.0 | 2.6 | $0.020 | 0/1 | 114.1s |
| #87 | Nemotron 3 Super medium | NVIDIA | 3.0 | 6.3 | $0.021 | 0/1 | 55.3s |
| #114 | Mimo V2 Omni none | Xiaomi | 3.0 | 5.7 | $0.021 | 0/1 | 1.30s |
| #54 | Hy3 preview medium | Tencent | 3.0 | 7.3 | $0.021 | 0/1 | 39.9s |
| #60 | Qwen3.7 Plus none | Qwen | 3.0 | 7.2 | $0.023 | 0/1 | 1.21s |
| #67 | Gemini 3 Flash Preview none | 3.0 | 6.9 | $0.025 | 0/1 | 1.07s | |
| #159 | MiMo-V2-Flash none | Xiaomi | 3.0 | 4.3 | $0.025 | 0/1 | 1.82s |
| #82 | Gemini 3.1 Flash Lite Preview low | 3.0 | 6.5 | $0.026 | 0/1 | 1.35s |