Combined Model Ranking

See which AI models perform best on Combined, which ones stay reliable, and where the biggest gaps appear. Sort by: Metric ↑.

Models Shown

Average Combined Score

5.5

Best Model

Gemini 3 PRO Preview 1.5

Failure Reasons

With failure reason Invalid tool call96 With failure reason Wrong answer71 With failure reason No answer33 With failure reason API error26 With failure reason Timed out5 With failure reason Did not follow instructions1 With failure reason Extra formatting1

220/220

Rank	Model	Company	Combined Score	Score	Total Cost	Tests Correct	Response Time (avg)
#120	Qwen3.5-Flash medium	Qwen	6.4	6.2	$0.139	1/2	266.6s
Total Tests 2 Wrong Tests 1 Total Cost $0.139 Response Time (avg) 266.6s
#132	Qwen3.5 Plus 2026-04-20 none	Qwen	6.4	6.1	$0.122	1/2	109.7s
Total Tests 2 Wrong Tests 1 Total Cost $0.122 Response Time (avg) 109.7s
#134	GPT-5 Nano medium	OpenAI	6.4	6.1	$0.114	1/2	146.9s
Total Tests 2 Wrong Tests 1 Total Cost $0.114 Response Time (avg) 146.9s
#146	Nemotron 3 Super medium	NVIDIA	6.4	5.7	$0.055	1/2	259.9s
Total Tests 2 Wrong Tests 1 Total Cost $0.055 Response Time (avg) 259.9s
#165	KAT-Coder-Air V2.5 low	Kwaipilot	6.4	5.4	$0.041	1/2	55.9s
Total Tests 2 Wrong Tests 1 Total Cost $0.041 Response Time (avg) 55.9s
#20	Claude Fable 5 medium	Anthropic	6.5	8.6	$3.478	1/2	27.5s
Total Tests 2 Wrong Tests 1 Total Cost $3.478 Response Time (avg) 27.5s
#23	Grok 4.5 low	X AI	6.5	8.4	$0.935	1/2	12.8s
Total Tests 2 Wrong Tests 1 Total Cost $0.935 Response Time (avg) 12.8s
#37	Kimi K3 max	Moonshot AI	6.5	8.0	$3.112	1/2	223.0s
Total Tests 2 Wrong Tests 1 Total Cost $3.112 Response Time (avg) 223.0s
#63	Qwen3.7 Max none	Qwen	6.5	7.4	$0.197	1/2	37.2s
Total Tests 2 Wrong Tests 1 Total Cost $0.197 Response Time (avg) 37.2s
#74	Qwen3.5 Plus 2026-04-20 medium	Qwen	6.5	7.2	$0.317	1/2	92.4s
Total Tests 2 Wrong Tests 1 Total Cost $0.317 Response Time (avg) 92.4s
#77	Grok 4.3 medium	X AI	6.5	7.1	$0.779	1/2	55.1s
Total Tests 2 Wrong Tests 1 Total Cost $0.779 Response Time (avg) 55.1s
#87	GPT-5.6 Sol none	OpenAI	6.5	6.9	$0.524	1/2	8.37s
Total Tests 2 Wrong Tests 1 Total Cost $0.524 Response Time (avg) 8.37s
#89	Qwen3.6 Flash medium	Qwen	6.5	6.9	$0.738	1/2	299.2s
Total Tests 2 Wrong Tests 1 Total Cost $0.738 Response Time (avg) 299.2s
#91	GPT-5.5 none	OpenAI	6.5	6.9	$0.544	1/2	8.90s
Total Tests 2 Wrong Tests 1 Total Cost $0.544 Response Time (avg) 8.90s
#103	Qwen3.6 Max Preview none	Qwen	6.5	6.6	$0.231	1/2	61.6s
Total Tests 2 Wrong Tests 1 Total Cost $0.231 Response Time (avg) 61.6s

Combined Ranking

Filter models

Top Models by Combined Score

Combined Score vs Total Cost

Top Models by Response Time (avg)