AI BENCHY Categoriefouten
Puzzeloplossing: Verkeerd antwoord
Puzzeloplossing
Verkeerd antwoord
Zie welke AI-modellen op Puzzeloplossing het meest kans hebben op Verkeerd antwoord, zodat je zwakke punten sneller ziet.
Foutredenen
| Rang | Model | Bedrijf | Verkeerd antwoord-aantal | Categoriescore | Correcte tests | Responstijd (gem.) |
|---|---|---|---|---|---|---|
| #156 | Hy3 preview none | Tencent | 2 | 3.1 | 0/3 | 4.56s |
| #158 | GLM 4.7 Flash medium | Z.ai | 2 | 2.9 | 0/3 | 12.9s |
| #159 | Ling-2.6-1T none | Inclusionai | 2 | 3.1 | 0/3 | 5.36s |
| #160 | LFM2-24B-A2B none | Liquid | 2 | 3.8 | 0/3 | 1.78s |
| #163 | Granite 4.1 8B none | IBM Granite | 2 | 3.2 | 0/3 | 608ms |
| #7 | Gemini 3.5 Flash medium | 1 | 7.7 | 2/3 | 2.38s | |
| #24 | GPT-5.2 Chat none | OpenAI | 1 | 7.7 | 2/3 | 4.10s |
| #28 | Gemini 2.5 Flash medium | 1 | 7.7 | 2/3 | 3.18s | |
| #36 | Qwen3.5 Plus 2026-04-20 medium | Qwen | 1 | 8.2 | 2/3 | 17.7s |
| #38 | Grok 4.3 medium | X AI | 1 | 5.9 | 1/3 | 22.5s |
| #40 | Gemini 3.1 Flash Lite Preview medium | 1 | 7.7 | 2/3 | 5.30s | |
| #43 | MiMo-V2.5-Pro medium | Xiaomi | 1 | 6.7 | 1/3 | 5.31s |
| #44 | Gemini 3.1 Flash Lite medium | 1 | 7.6 | 2/3 | 1.95s | |
| #46 | Qwen3.6 35B A3B medium | Qwen | 1 | 8.0 | 2/3 | 5.95s |
| #47 | Grok Build 0.1 medium | X AI | 1 | 7.7 | 2/3 | 18.3s |