AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category Failures

Trivia: Wrong answer

Trivia
Wrong answer

See which AI models are most likely to hit Wrong answer on Trivia, so you can spot weak points faster. Sort by: Response Time (avg) ↓.

Models Shown

15

Total Failures

117

Most Affected Model

MiMo-V2-Omni 1

Failure Reasons

Rank Model Company Wrong answer Count Category Score Tests Correct Response Time (avg)
#70 Qwen3.6 27B medium Qwen 1 3.0 0/1 81.0s
#106 MiniMax M2.5 medium Minimax 1 3.0 0/1 80.8s
#19 GLM 5 medium Z.ai 1 3.0 0/1 67.4s
#66 Grok 4.20 medium X AI 1 3.0 0/1 63.5s
#52 Claude Opus 4.6 medium Anthropic 1 3.0 0/1 63.2s
#9 Qwen3.6 Max Preview medium Qwen 1 3.0 0/1 60.6s
#56 Seed-2.0-Mini medium Bytedance Seed 1 3.0 0/1 56.8s
#85 Nemotron 3 Super medium NVIDIA 1 3.0 0/1 55.3s
#48 DeepSeek V4 Flash high DeepSeek 1 3.0 0/1 54.5s
#31 Qwen3.5-122B-A10B medium Qwen 1 3.0 0/1 52.9s
#45 Qwen3.5-Flash medium Qwen 1 3.0 0/1 49.0s
#11 Seed-2.0-Lite medium Bytedance Seed 1 3.0 0/1 48.3s
#22 HY3 Preview high Tencent 1 3.0 0/1 47.7s
#28 Qwen3.6 Plus medium Qwen 1 3.0 0/1 47.5s
#119 gpt-oss-120b none OpenAI 1 3.0 0/1 47.3s

Top Models by Wrong answer Count

Wrong answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost