Trivia Model Ranking

AI BENCHY Category

See which AI models perform best on Trivia, which ones stay reliable, and where the biggest gaps appear. Sort by: Total Cost ↓.

Models Shown

Average Trivia Score

3.1

Best Model

Grok 4.20 Multi Agent Beta 0.0

Failure Reasons

With failure reason Wrong answer133 With failure reason API error13 With failure reason No answer8

169/169

Rank	Model	Company	Trivia Score	Score	Total Cost	Tests Correct	Response Time (avg)
#29	Qwen3.5-27B medium	Qwen	3.0	7.9	$0.536	0/1	85.1s
Total Tests 1 Wrong Tests 1 Total Cost $0.536 Response Time (avg) 85.1s
#27	GPT-5.4 Mini medium	OpenAI	3.0	8.0	$0.526	0/1	30.1s
Total Tests 1 Wrong Tests 1 Total Cost $0.526 Response Time (avg) 30.1s
#3	Qwen3.7 Max medium	Qwen	3.0	9.4	$0.523	0/1	33.4s
Total Tests 1 Wrong Tests 1 Total Cost $0.523 Response Time (avg) 33.4s
#49	Claude Opus 4.7 none	Anthropic	3.0	7.4	$0.505	0/1	1.46s
Total Tests 1 Wrong Tests 1 Total Cost $0.505 Response Time (avg) 1.46s
#56	GLM 5V Turbo medium	Z.ai	3.0	7.3	$0.457	0/1	41.0s
Total Tests 1 Wrong Tests 1 Total Cost $0.457 Response Time (avg) 41.0s
#81	Qwen3.6 27B medium	Qwen	3.0	6.6	$0.440	0/1	81.0s
Total Tests 1 Wrong Tests 1 Total Cost $0.440 Response Time (avg) 81.0s
#45	GPT-5.3 Chat none	OpenAI	3.0	7.5	$0.433	0/1	4.38s
Total Tests 1 Wrong Tests 1 Total Cost $0.433 Response Time (avg) 4.38s
#89	Qwen3.5-35B-A3B medium	Qwen	3.0	6.3	$0.401	0/1	177.4s
Total Tests 1 Wrong Tests 1 Total Cost $0.401 Response Time (avg) 177.4s
#19	GPT-5.2 Chat none	OpenAI	3.0	8.5	$0.393	0/1	6.89s
Total Tests 1 Wrong Tests 1 Total Cost $0.393 Response Time (avg) 6.89s
#91	Gemini 3 PRO Preview medium	Google	3.0	6.2	$0.385	0/1	0ms
Total Tests 1 Wrong Tests 1 Total Cost $0.385 Response Time (avg) 0ms
#24	Gemini 2.5 Flash medium	Google	3.0	8.2	$0.379	0/1	2.76s
Total Tests 1 Wrong Tests 1 Total Cost $0.379 Response Time (avg) 2.76s
#20	Step 3.7 Flash medium	Stepfun	3.0	8.5	$0.376	0/1	114.0s
Total Tests 1 Wrong Tests 1 Total Cost $0.376 Response Time (avg) 114.0s
#5	Gemini 3.5 Flash low	Google	10.0	9.2	$0.349	1/1	1.88s
Total Tests 1 Wrong Tests 0 Total Cost $0.349 Response Time (avg) 1.88s
#43	Kimi K2.5 medium	Moonshot AI	3.0	7.5	$0.348	0/1	83.9s
Total Tests 1 Wrong Tests 1 Total Cost $0.348 Response Time (avg) 83.9s
#39	Step 3.7 Flash low	Stepfun	3.0	7.7	$0.341	0/1	124.8s
Total Tests 1 Wrong Tests 1 Total Cost $0.341 Response Time (avg) 124.8s

Trivia Ranking

Filter models

Top Models by Trivia Score

Trivia Score vs Total Cost

Top Models by Response Time (avg)