API error Failure Ranking

See which AI models run into API error most often, so you can spot reliability risks before choosing one.

Models Shown

Total Failures

161

Most Affected Model

Categories

In category Coding45 In category Combined26 In category Tool Calling17 In category Anti-AI Tricks14 In category Data parsing and extraction14 In category Trivia13 In category General Intelligence12 In category Puzzle Solving12 In category Domain specific7 In category Instructions following1

68/68

Rank	Model	Company	API error Count	Score	Total Cost	Tests Correct	Response Time (avg)
#158	KAT-Coder-Air V2.5 low	Kwaipilot	2	5.4	$0.041	7/22	10.1s
Total Tests 22 Wrong Tests 15 Total Cost $0.041 Response Time (avg) 10.1s
#161	Qwen3.6 35B A3B none	Qwen	2	5.3	$0.061	4/22	5.52s
Total Tests 22 Wrong Tests 18 Total Cost $0.061 Response Time (avg) 5.52s
#167	Mistral Small 4 medium	Mistral	2	5.1	$0.096	5/22	10.8s
Total Tests 22 Wrong Tests 17 Total Cost $0.096 Response Time (avg) 10.8s
#178	Ling-2.6-flash none	Inclusionai	2	4.9	$0.002	6/22	10.7s
Total Tests 22 Wrong Tests 16 Total Cost $0.002 Response Time (avg) 10.7s
#181	Grok 4.20 Multi Agent Beta medium	X AI	2	4.8	$5.599	8/18	9.69s
Total Tests 18 Wrong Tests 10 Total Cost $5.599 Response Time (avg) 9.69s
#183	Trinity Large Preview none	Arcee AI	2	4.8	$0.008	4/21	2.98s
Total Tests 21 Wrong Tests 17 Total Cost $0.008 Response Time (avg) 2.98s
#27	Muse Spark 1.1 high	Meta	1	8.1	$1.694	12/22	31.5s
Total Tests 22 Wrong Tests 10 Total Cost $1.694 Response Time (avg) 31.5s
#32	Inkling medium	Thinkingmachines	1	8.0	$0.391	15/22	16.2s
Total Tests 22 Wrong Tests 7 Total Cost $0.391 Response Time (avg) 16.2s
#37	Qwen3.6 Plus medium	Qwen	1	7.8	$0.405	15/22	43.1s
Total Tests 22 Wrong Tests 7 Total Cost $0.405 Response Time (avg) 43.1s
#46	DeepSeek V4 Pro high	DeepSeek	1	7.7	$0.200	10/22	79.1s
Total Tests 22 Wrong Tests 12 Total Cost $0.200 Response Time (avg) 79.1s
#51	Nemotron 3 Ultra medium	NVIDIA	1	7.5	$0.774	13/22	32.2s
Total Tests 22 Wrong Tests 9 Total Cost $0.774 Response Time (avg) 32.2s
#52	Kimi K2.7 Code medium	Moonshot AI	1	7.5	$0.751	12/22	84.2s
Total Tests 22 Wrong Tests 10 Total Cost $0.751 Response Time (avg) 84.2s
#57	Qwen3.5 Plus 2026-02-15 medium	Qwen	1	7.5	$0.437	14/22	89.2s
Total Tests 22 Wrong Tests 8 Total Cost $0.437 Response Time (avg) 89.2s
#60	LongCat 2.0 medium	Meituan	1	7.4	$0.478	12/22	136.6s
Total Tests 22 Wrong Tests 10 Total Cost $0.478 Response Time (avg) 136.6s
#62	KAT-Coder-Pro V2.5 low	Kwaipilot	1	7.4	$0.387	11/22	19.5s
Total Tests 22 Wrong Tests 11 Total Cost $0.387 Response Time (avg) 19.5s

←

1 2 3 4 5

→

API error Failures

Filter models

Top Models by API error Count

API error Count vs Score

Top Models by Response Time (avg)