AI BENCHY Category Failures
Trivia: Wrong answer
Trivia
Wrong answer
See which AI models are most likely to hit Wrong answer on Trivia, so you can spot weak points faster. Sort by: Response Time (avg) ↓.
Failure Reasons
| Rank | Model | Company | Wrong answer Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #77 | Grok 4.1 Fast medium | X AI | 1 | 3.0 | 0/1 | 25.5s |
| #123 | MiniMax M2.7 medium | Minimax | 1 | 3.0 | 0/1 | 22.8s |
| #83 | GPT-5 Nano medium | OpenAI | 1 | 3.0 | 0/1 | 20.1s |
| #104 | DeepSeek V3.2 none | DeepSeek | 1 | 3.0 | 0/1 | 17.2s |
| #84 | DeepSeek V4 Pro none | DeepSeek | 1 | 3.0 | 0/1 | 15.6s |
| #13 | GPT-5.3-Codex medium | OpenAI | 1 | 2.8 | 0/1 | 14.4s |
| #26 | GPT-5.4 medium | OpenAI | 1 | 3.0 | 0/1 | 14.0s |
| #136 | GLM 4.7 Flash medium | Z.ai | 1 | 3.0 | 0/1 | 11.1s |
| #6 | GPT-5.5 low | OpenAI | 1 | 3.0 | 0/1 | 10.1s |
| #67 | GPT-5 Mini medium | OpenAI | 1 | 3.0 | 0/1 | 9.99s |
| #122 | Nemotron 3 Super none | NVIDIA | 1 | 3.0 | 0/1 | 8.94s |
| #41 | GPT-5.2 Chat none | OpenAI | 1 | 3.0 | 0/1 | 6.89s |
| #107 | Mistral Small 4 medium | Mistral | 1 | 3.0 | 0/1 | 5.92s |
| #72 | GPT-5.5 none | OpenAI | 1 | 3.0 | 0/1 | 5.01s |
| #53 | GPT-5.4 Nano medium | OpenAI | 1 | 3.0 | 0/1 | 4.81s |