AI BENCHY अपयशे
चुकीचे उत्तर अपयशे
कोणत्या AI मॉडेल्सना चुकीचे उत्तर सर्वाधिक वेळा येतो ते पाहा, म्हणजे निवडण्यापूर्वी विश्वासार्हतेचे धोके लक्षात येतील.
| क्रमांक | मॉडेल | कंपनी | चुकीचे उत्तर संख्या | स्कोअर | बरोबर चाचण्या | प्रतिसाद वेळ (सरासरी) |
|---|---|---|---|---|---|---|
| #133 | Mercury 2 none | Inception | 14 | 4.7 | 4/19 | 610ms |
| #137 | GPT-5.4 Nano none | OpenAI | 14 | 4.5 | 3/19 | 1.36s |
| #114 | Kimi K2.5 none | Moonshot AI | 13 | 5.4 | 6/19 | 12.6s |
| #126 | Mistral Small 4 none | Mistral | 13 | 5.1 | 5/19 | 651ms |
| #129 | GPT-4o-mini none | OpenAI | 13 | 4.9 | 5/19 | 1.90s |
| #139 | MiMo-V2-Flash none | Xiaomi | 13 | 4.5 | 3/19 | 2.73s |
| #141 | Grok 4.1 Fast none | X AI | 13 | 4.4 | 3/19 | 1.67s |
| #123 | Qwen3 Coder Next none | Qwen | 12 | 5.2 | 5/19 | 9.44s |
| #124 | Nemotron 3 Super none | NVIDIA | 12 | 5.2 | 5/19 | 5.80s |
| #130 | MiMo-V2.5 none | Xiaomi | 12 | 4.9 | 4/19 | 2.02s |
| #132 | Trinity Large Preview none | Arcee AI | 12 | 4.8 | 4/19 | 3.03s |
| #134 | Qwen3.5-9B none | Qwen | 12 | 4.7 | 4/19 | 1.51s |
| #140 | Ling-2.6-1T none | Inclusionai | 12 | 4.5 | 4/19 | 8.79s |
| #144 | Granite 4.1 8B none | IBM Granite | 12 | 4.1 | 2/19 | 743ms |
| #88 | Seed-2.0-Lite none | Bytedance Seed | 11 | 6.0 | 8/19 | 2.50s |