Invalid tool call Failure Ranking

See which AI models run into Invalid tool call most often, so you can spot reliability risks before choosing one.

Models Shown

Total Failures

100

Most Affected Model

Ling-2.6-flash 3

Categories

In category Combined91 In category Tool Calling9

83/83

Rank	Model	Company	Invalid tool call Count	Score	Total Cost	Tests Correct	Response Time (avg)
#174	Ling-2.6-flash none	Inclusionai	3	4.9	$0.002	6/22	10.7s
Total Tests 22 Wrong Tests 16 Total Cost $0.002 Response Time (avg) 10.7s
#27	Muse Spark 1.1 high	Meta	2	8.1	$1.694	12/22	31.5s
Total Tests 22 Wrong Tests 10 Total Cost $1.694 Response Time (avg) 31.5s
#28	Inkling high	Thinkingmachines	2	8.0	$1.006	15/22	64.2s
Total Tests 22 Wrong Tests 7 Total Cost $1.006 Response Time (avg) 64.2s
#87	Gemini 3.5 Flash minimal	Google	2	6.8	$0.300	14/22	2.65s
Total Tests 22 Wrong Tests 8 Total Cost $0.300 Response Time (avg) 2.65s
#91	GLM 5V Turbo medium	Z.ai	2	6.7	$0.457	11/21	23.1s
Total Tests 21 Wrong Tests 10 Total Cost $0.457 Response Time (avg) 23.1s
#96	Qwen3.6 27B medium	Qwen	2	6.5	$0.779	10/22	106.3s
Total Tests 22 Wrong Tests 12 Total Cost $0.779 Response Time (avg) 106.3s
#119	Inkling low	Thinkingmachines	2	6.1	$0.187	10/22	5.15s
Total Tests 22 Wrong Tests 12 Total Cost $0.187 Response Time (avg) 5.15s
#120	Qwen3.6 Flash none	Qwen	2	6.1	$0.062	7/22	3.74s
Total Tests 22 Wrong Tests 15 Total Cost $0.062 Response Time (avg) 3.74s
#146	DeepSeek V4 Flash none	DeepSeek	2	5.6	$0.044	5/22	36.8s
Total Tests 22 Wrong Tests 17 Total Cost $0.044 Response Time (avg) 36.8s
#148	Qwen3.6 27B none	Qwen	2	5.5	$0.087	7/22	10.7s
Total Tests 22 Wrong Tests 15 Total Cost $0.087 Response Time (avg) 10.7s
#165	Qwen3.5-9B none	Qwen	2	5.1	$0.021	4/22	19.2s
Total Tests 22 Wrong Tests 18 Total Cost $0.021 Response Time (avg) 19.2s
#167	North Mini Code none	Cohere	2	5.1	$0.000	4/22	29.9s
Total Tests 22 Wrong Tests 18 Total Cost $0.000 Response Time (avg) 29.9s
#169	DeepSeek V3.2 none	DeepSeek	2	5.0	$0.054	6/22	18.3s
Total Tests 22 Wrong Tests 16 Total Cost $0.054 Response Time (avg) 18.3s
#172	GLM 4.7 Flash none	Z.ai	2	4.9	$0.016	6/22	9.15s
Total Tests 22 Wrong Tests 16 Total Cost $0.016 Response Time (avg) 9.15s
#190	GLM 4.7 Flash medium	Z.ai	2	4.3	$0.166	4/22	142.6s
Total Tests 22 Wrong Tests 18 Total Cost $0.166 Response Time (avg) 142.6s

1 2 3 4 5 6

→

Invalid tool call Failures

Filter models

Top Models by Invalid tool call Count

Invalid tool call Count vs Score

Top Models by Response Time (avg)