AI BENCHY 分类
常识问答 排名
看看哪些 AI 模型在 常识问答 上表现最好,哪些更稳定,以及差距主要出现在哪里。 排序方式: 总成本 ↓.
169/169
筛选模型
没有模型匹配当前搜索和筛选条件。
| 排名 | 模型 | 公司 | 常识问答 得分 | 分数 | 总成本 | 测试正确 | 响应时间(平均) |
|---|---|---|---|---|---|---|---|
| #77 | Mimo V2 PRO medium | Xiaomi | 3.0 | 6.7 | $0.333 | 0/1 | 82.7s |
| #14 | GLM 5.2 medium | Z.ai | 3.0 | 8.7 | $0.324 | 0/1 | 34.2s |
| #21 | GLM 5 Turbo medium | Z.ai | 3.0 | 8.4 | $0.323 | 0/1 | 40.2s |
| #33 | Qwen3.5 Plus 2026-04-20 medium | Qwen | 3.0 | 7.8 | $0.317 | 0/1 | 92.6s |
| #55 | Claude Sonnet 4.6 none | Anthropic | 3.0 | 7.3 | $0.316 | 0/1 | 4.67s |
| #28 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 3.0 | 8.0 | $0.310 | 0/1 | 103.8s |
| #146 | MiniMax M2.5 medium | Minimax | 3.0 | 4.7 | $0.303 | 0/1 | 80.8s |
| #30 | Qwen3.6 Plus medium | Qwen | 3.0 | 7.8 | $0.294 | 0/1 | 47.5s |
| #64 | GLM 5.1 medium | Z.ai | 3.0 | 7.1 | $0.292 | 0/1 | 29.4s |
| #47 | Qwen3.6 Flash medium | Qwen | 3.0 | 7.5 | $0.288 | 0/1 | 122.9s |
| #90 | GPT-5.5 none | OpenAI | 3.0 | 6.3 | $0.231 | 0/1 | 5.01s |
| #15 | GLM 5 medium | Z.ai | 3.0 | 8.6 | $0.228 | 0/1 | 67.4s |
| #25 | Qwen3.7 Plus medium | Qwen | 3.0 | 8.2 | $0.177 | 0/1 | 91.1s |
| #18 | Seed-2.0-Lite medium | Bytedance Seed | 3.0 | 8.5 | $0.175 | 0/1 | 48.3s |
| #16 | GPT-5 Mini medium | OpenAI | 3.0 | 8.5 | $0.159 | 0/1 | 9.99s |