Trivia x Wrong answer Ranking

AI BENCHY Category Failures

See which AI models are most likely to hit Wrong answer on Trivia, so you can spot weak points faster. Sort by: Tests Correct ↓.

Models Shown

Total Failures

133

Most Affected Model

Qwen3.7 Max 1

Failure Reasons

Wrong answer133 API error13 No answer8

Categories

Domain specific325 Anti-AI Tricks250 Coding201 Puzzle Solving154 Trivia133 Instructions following54 Combined53 General Intelligence36 Data parsing and extraction35 Tool Calling2

133/133

Rank	Model	Company	Wrong answer Count	Category Score	Total Cost	Tests Correct	Response Time (avg)
#93	Gemini 2.5 Flash none	Google	1	3.0	$0.016	0/1	1.15s
Total Tests 1 Wrong Tests 1 Total Cost $0.016 Response Time (avg) 1.15s
#94	Gemini 3.1 Flash Lite minimal	Google	1	3.0	$0.013	0/1	724ms
Total Tests 1 Wrong Tests 1 Total Cost $0.013 Response Time (avg) 724ms
#96	Gemini 3.1 Flash Lite none	Google	1	3.0	$0.013	0/1	733ms
Total Tests 1 Wrong Tests 1 Total Cost $0.013 Response Time (avg) 733ms
#97	Qwen3.5-Flash none	Qwen	1	3.0	$0.005	0/1	588ms
Total Tests 1 Wrong Tests 1 Total Cost $0.005 Response Time (avg) 588ms
#98	Gemma 4 31B none	Google	1	3.0	$0.004	0/1	1.25s
Total Tests 1 Wrong Tests 1 Total Cost $0.004 Response Time (avg) 1.25s
#99	Nemotron 3 Ultra 550b A55b none	NVIDIA	1	3.0	$0.027	0/1	1.83s
Total Tests 1 Wrong Tests 1 Total Cost $0.027 Response Time (avg) 1.83s
#100	Qwen3.6 Max Preview none	Qwen	1	3.0	$0.075	0/1	1.97s
Total Tests 1 Wrong Tests 1 Total Cost $0.075 Response Time (avg) 1.97s
#101	GLM 5 none	Z.ai	1	3.0	$0.027	0/1	3.62s
Total Tests 1 Wrong Tests 1 Total Cost $0.027 Response Time (avg) 3.62s
#102	Qwen3.6 Flash none	Qwen	1	3.0	$0.015	0/1	649ms
Total Tests 1 Wrong Tests 1 Total Cost $0.015 Response Time (avg) 649ms
#103	Qwen3.5-35B-A3B none	Qwen	1	3.0	$0.012	0/1	493ms
Total Tests 1 Wrong Tests 1 Total Cost $0.012 Response Time (avg) 493ms
#104	Qwen3.5-27B none	Qwen	1	3.0	$0.015	0/1	599ms
Total Tests 1 Wrong Tests 1 Total Cost $0.015 Response Time (avg) 599ms
#105	GLM 5V Turbo none	Z.ai	1	3.0	$0.052	0/1	2.23s
Total Tests 1 Wrong Tests 1 Total Cost $0.052 Response Time (avg) 2.23s
#106	Qwen3.5 Plus 2026-02-15 none	Qwen	1	3.0	$0.016	0/1	1.11s
Total Tests 1 Wrong Tests 1 Total Cost $0.016 Response Time (avg) 1.11s
#108	Owl Alpha medium	Openrouter	1	3.0	$0.000	0/1	2.38s
Total Tests 1 Wrong Tests 1 Total Cost $0.000 Response Time (avg) 2.38s
#109	Mimo V2 PRO none	Xiaomi	1	3.0	$0.045	0/1	1.63s
Total Tests 1 Wrong Tests 1 Total Cost $0.045 Response Time (avg) 1.63s

←

1 5 6 7 9

→

Filter models

Top Models by Wrong answer Count

Wrong answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost

Trivia: Wrong answer

Filter models

Top Models by Wrong answer Count

Wrong answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost