Coding x No answer Ranking

AI BENCHY Category Failures

See which AI models are most likely to hit No answer on Coding, so you can spot weak points faster. Sort by: Tests Correct ↑.

Models Shown

Total Failures

Most Affected Model

Gemma 4 26B A4B 2

Failure Reasons

Wrong answer230 API error43 Timed out23 No answer18 Did not follow instructions16 Extra formatting12

Categories

Coding18 Trivia10 Domain specific6 Data parsing and extraction5 Anti-AI Tricks4 Combined3 Instructions following2 Puzzle Solving2 Tool Calling2

16/16

Rank	Model	Company	No answer Count	Category Score	Total Cost	Tests Correct	Response Time (avg)
#71	Gemma 4 26B A4B medium	Google	2	2.9	$0.045	0/3	272.5s
Total Tests 3 Wrong Tests 3 Total Cost $0.045 Response Time (avg) 272.5s
#75	Step 3.7 Flash high	Stepfun	2	4.0	$1.148	0/3	206.2s
Total Tests 3 Wrong Tests 3 Total Cost $1.148 Response Time (avg) 206.2s
#76	GLM 5.1 medium	Z.ai	1	4.6	$0.288	0/3	109.6s
Total Tests 3 Wrong Tests 3 Total Cost $0.288 Response Time (avg) 109.6s
#86	Mimo V2 Omni medium	Xiaomi	1	3.3	$0.683	0/3	183.9s
Total Tests 3 Wrong Tests 3 Total Cost $0.683 Response Time (avg) 183.9s
#93	Step 3.5 Flash medium	Stepfun	1	2.4	$0.070	0/2	258.4s
Total Tests 2 Wrong Tests 2 Total Cost $0.070 Response Time (avg) 258.4s
#102	Gemma 4 31B medium	Google	1	4.3	$0.033	0/3	219.8s
Total Tests 3 Wrong Tests 3 Total Cost $0.033 Response Time (avg) 219.8s
#165	MiniMax M2.5 medium	Minimax	1	3.4	$0.303	0/3	188.6s
Total Tests 3 Wrong Tests 3 Total Cost $0.303 Response Time (avg) 188.6s
#176	GLM 4.7 Flash medium	Z.ai	1	3.2	$0.054	0/3	55.3s
Total Tests 3 Wrong Tests 3 Total Cost $0.054 Response Time (avg) 55.3s
#184	Qwen3.5-9B medium	Qwen	1	2.9	$0.036	0/3	100.9s
Total Tests 3 Wrong Tests 3 Total Cost $0.036 Response Time (avg) 100.9s
#43	Kimi K2.6 medium	Moonshot AI	1	5.7	$0.888	1/3	214.4s
Total Tests 3 Wrong Tests 2 Total Cost $0.888 Response Time (avg) 214.4s
#55	Kimi K2.5 medium	Moonshot AI	1	6.1	$0.348	1/3	217.5s
Total Tests 3 Wrong Tests 2 Total Cost $0.348 Response Time (avg) 217.5s
#103	Qwen3.5-35B-A3B medium	Qwen	1	5.9	$0.401	1/3	206.6s
Total Tests 3 Wrong Tests 2 Total Cost $0.401 Response Time (avg) 206.6s
#146	MiniMax M2.7 medium	Minimax	1	5.7	$0.100	1/3	101.9s
Total Tests 3 Wrong Tests 2 Total Cost $0.100 Response Time (avg) 101.9s
#24	GLM 5 Turbo medium	Z.ai	1	8.2	$0.323	2/3	45.9s
Total Tests 3 Wrong Tests 1 Total Cost $0.323 Response Time (avg) 45.9s
#88	Qwen3.6 35B A3B medium	Qwen	1	7.7	$0.146	2/3	50.5s
Total Tests 3 Wrong Tests 1 Total Cost $0.146 Response Time (avg) 50.5s

Filter models

Top Models by No answer Count

No answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost

Coding: No answer

Filter models

Top Models by No answer Count

No answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost