AI BENCHY अपयशे
चुकीचे उत्तर अपयशे
कोणत्या AI मॉडेल्सना चुकीचे उत्तर सर्वाधिक वेळा येतो ते पाहा, म्हणजे निवडण्यापूर्वी विश्वासार्हतेचे धोके लक्षात येतील.
| क्रमांक | मॉडेल | कंपनी | चुकीचे उत्तर संख्या | स्कोअर | बरोबर चाचण्या | प्रतिसाद वेळ (सरासरी) |
|---|---|---|---|---|---|---|
| #12 | Gemini 3 Flash Preview low | 4 | 8.6 | 16/20 | 5.86s | |
| #15 | GPT-5.3-Codex medium | OpenAI | 4 | 8.3 | 14/20 | 16.0s |
| #17 | Grok 4.20 Beta medium | X AI | 4 | 8.2 | 13/18 | 9.81s |
| #20 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 4 | 8.1 | 14/20 | 67.9s |
| #24 | Gemini 3.5 Flash minimal | 4 | 7.9 | 14/20 | 1.58s | |
| #28 | GLM 5 Turbo medium | Z.ai | 4 | 7.9 | 13/20 | 22.7s |
| #29 | Hy3 preview medium | Tencent | 4 | 7.8 | 14/20 | 16.0s |
| #30 | Qwen3.6 35B A3B medium | Qwen | 4 | 7.8 | 14/20 | 17.3s |
| #31 | Grok 4.3 medium | X AI | 4 | 7.8 | 13/20 | 49.2s |
| #37 | Hy3 preview low | Tencent | 4 | 7.7 | 15/20 | 24.6s |
| #45 | Grok Build 0.1 medium | X AI | 4 | 7.6 | 12/20 | 26.4s |
| #47 | Gemma 4 26B A4B medium | 4 | 7.5 | 13/20 | 51.4s | |
| #51 | GLM 5.1 medium | Z.ai | 4 | 7.4 | 12/20 | 32.2s |
| #53 | MiMo-V2.5 medium | Xiaomi | 4 | 7.4 | 12/20 | 20.4s |
| #56 | Qwen3.5-Flash medium | Qwen | 4 | 7.4 | 11/20 | 65.6s |