AI BENCHY Category Failures
Trivia: Wrong answer
Trivia
Wrong answer
See which AI models are most likely to hit Wrong answer on Trivia, so you can spot weak points faster. Sort by: Response Time (avg) ↑.
Failure Reasons
133/133
Filter models
No models match the current search and filters.
| Rank | Model | Company | Wrong answer Count | Category Score | Total Cost | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|---|
| #116 | GLM 5.1 none | Z.ai | 1 | 3.0 | $0.058 | 0/1 | 2.34s |
| #123 | GLM 5 Turbo none | Z.ai | 1 | 3.0 | $0.047 | 0/1 | 2.37s |
| #108 | Owl Alpha medium | Openrouter | 1 | 3.0 | $0.000 | 0/1 | 2.38s |
| #110 | Owl Alpha none | Openrouter | 1 | 3.0 | $0.000 | 0/1 | 2.50s |
| #44 | Mercury 2 medium | Inception | 1 | 3.0 | $0.058 | 0/1 | 2.58s |
| #32 | Gemini 3.1 Flash Lite Preview medium | 1 | 3.0 | $0.068 | 0/1 | 2.68s | |
| #158 | Hy3 preview none | Tencent | 1 | 3.0 | $0.003 | 0/1 | 2.71s |
| #24 | Gemini 2.5 Flash medium | 1 | 3.0 | $0.379 | 0/1 | 2.76s | |
| #117 | DeepSeek V4 Flash none | DeepSeek | 1 | 3.0 | $0.007 | 0/1 | 3.07s |
| #34 | Gemini 3.1 Flash Lite medium | 1 | 3.0 | $0.071 | 0/1 | 3.08s | |
| #61 | GLM 5.2 none | Z.ai | 1 | 3.0 | $0.076 | 0/1 | 3.41s |
| #101 | GLM 5 none | Z.ai | 1 | 3.0 | $0.027 | 0/1 | 3.62s |
| #134 | MiMo-V2.5 none | Xiaomi | 1 | 3.0 | $0.007 | 0/1 | 3.89s |
| #118 | Kimi K2.5 none | Moonshot AI | 1 | 3.0 | $0.027 | 0/1 | 3.90s |
| #120 | Qwen3.6 27B none | Qwen | 1 | 3.0 | $0.028 | 0/1 | 4.03s |