Kushindwa kwa kategoria za AI BENCHY
Maarifa ya jumla: Jibu lisilo sahihi
Maarifa ya jumla
Jibu lisilo sahihi
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Jibu lisilo sahihi katika Maarifa ya jumla, ili uone udhaifu haraka.
Sababu za kushindwa
133/133
Chuja miundo
Hakuna miundo inayolingana na utafutaji na vichujio vya sasa.
| Nafasi | Modeli | Kampuni | Idadi ya Jibu lisilo sahihi | Alama ya kategoria | Jumla ya gharama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|---|
| #59 | Gemma 4 26B A4B medium | 1 | 3.0 | $0.045 | 0/1 | 180.9s | |
| #60 | Qwen3.7 Plus none | Qwen | 1 | 3.0 | $0.023 | 0/1 | 1.21s |
| #61 | GLM 5.2 none | Z.ai | 1 | 3.0 | $0.076 | 0/1 | 3.41s |
| #62 | MiMo-V2-Flash medium | Xiaomi | 1 | 3.0 | $0.043 | 0/1 | 1.96s |
| #64 | GLM 5.1 medium | Z.ai | 1 | 3.0 | $0.292 | 0/1 | 29.4s |
| #65 | Kimi K2.7 Code medium | Moonshot AI | 1 | 3.0 | $0.583 | 0/1 | 341.8s |
| #66 | Gemini 3.5 Flash none | 1 | 2.8 | $1.079 | 0/1 | 4.87s | |
| #67 | Gemini 3 Flash Preview none | 1 | 3.0 | $0.025 | 0/1 | 1.07s | |
| #68 | Qwen3.7 Max none | Qwen | 1 | 3.0 | $0.054 | 0/1 | 856ms |
| #70 | Qwen3.5-Flash medium | Qwen | 1 | 3.0 | $0.080 | 0/1 | 49.0s |
| #71 | Gemini 3.5 Flash minimal | 1 | 3.0 | $0.108 | 0/1 | 1.76s | |
| #72 | Ring-2.6-1T medium | Inclusionai | 1 | 3.0 | $0.033 | 0/1 | 113.9s |
| #73 | Mimo V2 Omni medium | Xiaomi | 1 | 3.0 | $0.683 | 0/1 | 234.2s |
| #74 | Hy3 preview high | Tencent | 1 | 3.0 | $0.059 | 0/1 | 47.7s |
| #75 | Qwen3.6 35B A3B medium | Qwen | 1 | 3.0 | $0.146 | 0/1 | 32.9s |