AI BENCHY Categoriefouten
Algemene kennis: Verkeerd antwoord
Algemene kennis
Verkeerd antwoord
Zie welke AI-modellen op Algemene kennis het meest kans hebben op Verkeerd antwoord, zodat je zwakke punten sneller ziet.
Foutredenen
133/133
Modellen filteren
Geen modellen komen overeen met de huidige zoekopdracht en filters.
| Rang | Model | Bedrijf | Verkeerd antwoord-aantal | Categoriescore | Totale kosten | Correcte tests | Responstijd (gem.) |
|---|---|---|---|---|---|---|---|
| #25 | Qwen3.7 Plus medium | Qwen | 1 | 3.0 | $0.177 | 0/1 | 91.1s |
| #26 | Nemotron 3 Ultra 550b A55b medium | NVIDIA | 1 | 3.0 | $0.158 | 0/1 | 38.5s |
| #27 | GPT-5.4 Mini medium | OpenAI | 1 | 3.0 | $0.526 | 0/1 | 30.1s |
| #28 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 1 | 3.0 | $0.310 | 0/1 | 103.8s |
| #29 | Qwen3.5-27B medium | Qwen | 1 | 3.0 | $0.536 | 0/1 | 85.1s |
| #30 | Qwen3.6 Plus medium | Qwen | 1 | 3.0 | $0.294 | 0/1 | 47.5s |
| #31 | Claude Sonnet 4.6 medium | Anthropic | 1 | 3.0 | $1.418 | 0/1 | 30.1s |
| #32 | Gemini 3.1 Flash Lite Preview medium | 1 | 3.0 | $0.068 | 0/1 | 2.68s | |
| #33 | Qwen3.5 Plus 2026-04-20 medium | Qwen | 1 | 3.0 | $0.317 | 0/1 | 92.6s |
| #34 | Gemini 3.1 Flash Lite medium | 1 | 3.0 | $0.071 | 0/1 | 3.08s | |
| #35 | Kimi K2.6 medium | Moonshot AI | 1 | 3.0 | $0.889 | 0/1 | 130.3s |
| #36 | Qwen3.5-122B-A10B medium | Qwen | 1 | 3.0 | $0.588 | 0/1 | 52.9s |
| #37 | Grok 4.3 medium | X AI | 1 | 3.0 | $0.614 | 0/1 | 44.5s |
| #38 | Claude Opus 4.6 medium | Anthropic | 1 | 3.0 | $2.053 | 0/1 | 63.2s |
| #41 | DeepSeek V4 Pro high | DeepSeek | 1 | 3.0 | $0.157 | 0/1 | 34.0s |