AI BENCHY 分类失败
领域专项:答案错误
领域专项
答案错误
看看哪些 AI 模型在 领域专项 上最容易遇到 答案错误,更快找出薄弱点。
| 排名 | 模型 | 公司 | 答案错误 次数 | 分类得分 | 测试正确 | 响应时间(平均) |
|---|---|---|---|---|---|---|
| #139 | DeepSeek V4 Flash none | DeepSeek | 2 | 5.3 | 1/3 | 19.7s |
| #140 | Qwen3 Coder Next none | Qwen | 2 | 5.3 | 1/3 | 962ms |
| #142 | Mistral Small 4 none | Mistral | 2 | 5.3 | 1/3 | 367ms |
| #146 | Laguna Xs.2 none | Poolside | 2 | 5.3 | 1/3 | 371ms |
| #149 | Nemotron 3 Nano Omni 30b A3b Reasoning medium | NVIDIA | 2 | 2.9 | 0/3 | 56.7s |
| #150 | Qwen3 Coder Next medium | Qwen | 2 | 5.3 | 1/3 | 638ms |
| #151 | Trinity Large Preview none | Arcee AI | 2 | 5.3 | 1/3 | 877ms |
| #152 | MiMo-V2-Flash none | Xiaomi | 2 | 5.3 | 1/3 | 564ms |
| #155 | Mercury 2 none | Inception | 2 | 5.3 | 1/3 | 534ms |
| #156 | Hy3 preview none | Tencent | 2 | 3.6 | 0/3 | 17.6s |
| #157 | Grok 4.1 Fast none | X AI | 2 | 5.9 | 1/3 | 1.06s |
| #158 | GLM 4.7 Flash medium | Z.ai | 2 | 3.5 | 0/3 | 174.6s |
| #2 | Gemini 3.5 Flash high | 1 | 7.6 | 2/3 | 14.1s | |
| #3 | Gemini 3.5 Flash low | 1 | 7.7 | 2/3 | 3.39s | |
| #4 | Gemini 3.1 Pro Preview medium | 1 | 7.7 | 2/3 | 32.7s |