Kushindwa kwa kategoria za AI BENCHY
Utatuzi wa mafumbo: Jibu lisilo sahihi
Utatuzi wa mafumbo
Jibu lisilo sahihi
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Jibu lisilo sahihi katika Utatuzi wa mafumbo, ili uone udhaifu haraka.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Jibu lisilo sahihi | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #156 | Hy3 preview none | Tencent | 2 | 3.1 | 0/3 | 4.56s |
| #158 | GLM 4.7 Flash medium | Z.ai | 2 | 2.9 | 0/3 | 12.9s |
| #159 | Ling-2.6-1T none | Inclusionai | 2 | 3.1 | 0/3 | 5.36s |
| #160 | LFM2-24B-A2B none | Liquid | 2 | 3.8 | 0/3 | 1.78s |
| #163 | Granite 4.1 8B none | IBM Granite | 2 | 3.2 | 0/3 | 608ms |
| #7 | Gemini 3.5 Flash medium | 1 | 7.7 | 2/3 | 2.38s | |
| #24 | GPT-5.2 Chat none | OpenAI | 1 | 7.7 | 2/3 | 4.10s |
| #28 | Gemini 2.5 Flash medium | 1 | 7.7 | 2/3 | 3.18s | |
| #36 | Qwen3.5 Plus 2026-04-20 medium | Qwen | 1 | 8.2 | 2/3 | 17.7s |
| #38 | Grok 4.3 medium | X AI | 1 | 5.9 | 1/3 | 22.5s |
| #40 | Gemini 3.1 Flash Lite Preview medium | 1 | 7.7 | 2/3 | 5.30s | |
| #43 | MiMo-V2.5-Pro medium | Xiaomi | 1 | 6.7 | 1/3 | 5.31s |
| #44 | Gemini 3.1 Flash Lite medium | 1 | 7.6 | 2/3 | 1.95s | |
| #46 | Qwen3.6 35B A3B medium | Qwen | 1 | 8.0 | 2/3 | 5.95s |
| #47 | Grok Build 0.1 medium | X AI | 1 | 7.7 | 2/3 | 18.3s |