Coding Model Ranking

AI BENCHY Category

See which AI models perform best on Coding, which ones stay reliable, and where the biggest gaps appear. Sort by: Response Time (avg) ↓.

Models Shown

Average Coding Score

5.7

Best Model

North Mini Code 4.5

Failure Reasons

With failure reason Wrong answer230 With failure reason API error43 With failure reason Timed out25 With failure reason No answer18 With failure reason Did not follow instructions16 With failure reason Extra formatting12

189/189

Rank	Model	Company	Coding Score	Score	Total Cost	Tests Correct	Response Time (avg)
#21	Seed-2.0-Lite medium	Bytedance Seed	8.0	8.5	$0.175	2/3	156.7s
Total Tests 3 Wrong Tests 1 Total Cost $0.175 Response Time (avg) 156.7s
#26	Grok 4.5 medium	X AI	7.6	8.3	$1.696	2/3	155.7s
Total Tests 3 Wrong Tests 1 Total Cost $1.696 Response Time (avg) 155.7s
#37	Qwen3.6 Plus medium	Qwen	6.1	7.8	$0.294	1/3	153.1s
Total Tests 3 Wrong Tests 2 Total Cost $0.294 Response Time (avg) 153.1s
#101	Nemotron 3 Super medium	NVIDIA	3.1	6.3	$0.020	0/3	147.3s
Total Tests 3 Wrong Tests 3 Total Cost $0.020 Response Time (avg) 147.3s
#79	Kimi K2.7 Code medium	Moonshot AI	7.6	7.0	$0.581	2/3	146.7s
Total Tests 3 Wrong Tests 1 Total Cost $0.581 Response Time (avg) 146.7s
#14	Qwen3.6 Max Preview medium	Qwen	8.8	8.9	$0.960	2/3	146.5s
Total Tests 3 Wrong Tests 1 Total Cost $0.960 Response Time (avg) 146.5s
#52	MiniMax M3 medium	Minimax	6.1	7.6	$0.131	1/3	144.7s
Total Tests 3 Wrong Tests 2 Total Cost $0.131 Response Time (avg) 144.7s
#164	Ring-2.6-1T none	Inclusionai	5.3	4.8	$0.026	1/3	143.8s
Total Tests 3 Wrong Tests 2 Total Cost $0.026 Response Time (avg) 143.8s
#95	Qwen3.6 27B medium	Qwen	7.7	6.6	$0.336	2/3	143.0s
Total Tests 3 Wrong Tests 1 Total Cost $0.336 Response Time (avg) 143.0s
#97	Gemini 3.1 Flash Lite high	Google	3.3	6.5	$2.044	1/1	137.6s
Total Tests 1 Wrong Tests 0 Total Cost $2.044 Response Time (avg) 137.6s
#42	Qwen3.5 Plus 2026-04-20 medium	Qwen	6.2	7.8	$0.317	1/3	125.3s
Total Tests 3 Wrong Tests 2 Total Cost $0.317 Response Time (avg) 125.3s
#45	Qwen3.5-122B-A10B medium	Qwen	6.0	7.7	$0.588	1/3	114.5s
Total Tests 3 Wrong Tests 2 Total Cost $0.588 Response Time (avg) 114.5s
#66	Grok 4.20 medium	X AI	6.3	7.3	$0.609	1/3	109.9s
Total Tests 3 Wrong Tests 2 Total Cost $0.609 Response Time (avg) 109.9s
#77	GLM 5.1 medium	Z.ai	4.6	7.1	$0.288	0/3	109.6s
Total Tests 3 Wrong Tests 3 Total Cost $0.288 Response Time (avg) 109.6s
#30	Qwen3.7 Plus medium	Qwen	6.1	8.2	$0.177	1/3	108.6s
Total Tests 3 Wrong Tests 2 Total Cost $0.177 Response Time (avg) 108.6s

Coding Ranking

Filter models

Top Models by Coding Score

Coding Score vs Total Cost

Top Models by Response Time (avg)