AI BENCHY Fouten
Verkeerd antwoord-fouten
Zie welke AI-modellen het vaakst tegen Verkeerd antwoord aanlopen, zodat je betrouwbaarheidsrisico's ziet voordat je kiest. Sorteren op: Responstijd (gem.) โ.
Categorieรซn
| Rang | Model | Bedrijf | Verkeerd antwoord-aantal | Score | Correcte tests | Responstijd (gem.) |
|---|---|---|---|---|---|---|
| #3 | Claude Opus 4.7 medium | Anthropic | 1 | 9.2 | 16/18 | 3.53s |
| #70 | Qwen3.5-122B-A10B none | Qwen | 11 | 5.7 | 6/18 | 3.69s |
| #17 | Gemini 3.1 Flash Lite Preview medium | 4 | 8.2 | 13/18 | 3.74s | |
| #63 | Qwen3.5-35B-A3B none | Qwen | 9 | 6.1 | 7/18 | 3.82s |
| #48 | Gemma 4 31B none | 5 | 6.9 | 10/18 | 4.02s | |
| #53 | GLM 5 none | Z.ai | 9 | 6.6 | 9/18 | 4.23s |
| #75 | GLM 5.1 none | Z.ai | 10 | 5.6 | 5/18 | 4.33s |
| #72 | Hunter Alpha none | OpenRouter | 9 | 5.7 | 6/18 | 4.58s |
| #42 | Claude Sonnet 4.6 none | Anthropic | 3 | 7.4 | 11/18 | 4.98s |
| #78 | Trinity Large Preview none | Arcee AI | 11 | 5.3 | 5/18 | 5.07s |
| #73 | Mistral Small 4 medium | Mistral | 8 | 5.7 | 5/18 | 5.64s |
| #36 | GPT-5.3 Chat none | OpenAI | 5 | 7.7 | 11/18 | 5.88s |
| #5 | Gemini 3 Flash Preview low | 3 | 8.8 | 15/18 | 6.01s | |
| #60 | Gemma 4 26B A4B none | 7 | 6.2 | 7/18 | 6.59s | |
| #28 | GPT-5.2 Chat none | OpenAI | 5 | 7.9 | 12/18 | 6.84s |