Invalid tool call Failure Ranking

See which AI models run into Invalid tool call most often, so you can spot reliability risks before choosing one. Sort by: Response Time (avg) ↓.

Models Shown

Total Failures

100

Most Affected Model

GLM 4.7 Flash 2

Categories

In category Combined91 In category Tool Calling9

83/83

Rank	Model	Company	Invalid tool call Count	Score	Total Cost	Tests Correct	Response Time (avg)
#194	GLM 4.7 Flash medium	Z.ai	2	4.3	$0.166	4/22	142.6s
Total Tests 22 Wrong Tests 18 Total Cost $0.166 Response Time (avg) 142.6s
#137	North Mini Code medium	Cohere	1	5.9	$0.000	9/22	137.1s
Total Tests 22 Wrong Tests 13 Total Cost $0.000 Response Time (avg) 137.1s
#119	Qwen3.5-35B-A3B medium	Qwen	1	6.2	$0.837	11/22	112.5s
Total Tests 22 Wrong Tests 11 Total Cost $0.837 Response Time (avg) 112.5s
#58	Qwen3.5-27B medium	Qwen	1	7.4	$1.627	13/22	111.9s
Total Tests 22 Wrong Tests 9 Total Cost $1.627 Response Time (avg) 111.9s
#68	Kimi K2.6 medium	Moonshot AI	1	7.2	$1.036	12/22	110.0s
Total Tests 22 Wrong Tests 10 Total Cost $1.036 Response Time (avg) 110.0s
#99	Qwen3.6 27B medium	Qwen	2	6.5	$0.779	10/22	106.3s
Total Tests 22 Wrong Tests 12 Total Cost $0.779 Response Time (avg) 106.3s
#95	Gemma 4 26B A4B medium	Google	1	6.6	$0.089	14/22	103.8s
Total Tests 22 Wrong Tests 8 Total Cost $0.089 Response Time (avg) 103.8s
#77	Kimi K2.5 medium	Moonshot AI	1	7.0	$0.600	10/22	99.0s
Total Tests 22 Wrong Tests 12 Total Cost $0.600 Response Time (avg) 99.0s
#57	Qwen3.5 Plus 2026-02-15 medium	Qwen	1	7.5	$0.437	14/22	89.2s
Total Tests 22 Wrong Tests 8 Total Cost $0.437 Response Time (avg) 89.2s
#114	Qwen3.5-Flash medium	Qwen	1	6.2	$0.139	12/22	84.8s
Total Tests 22 Wrong Tests 10 Total Cost $0.139 Response Time (avg) 84.8s
#110	Gemma 4 31B medium	Google	1	6.3	$0.163	14/22	75.4s
Total Tests 22 Wrong Tests 8 Total Cost $0.163 Response Time (avg) 75.4s
#108	Ring-2.6-1T medium	Inclusionai	1	6.3	$0.103	11/22	68.7s
Total Tests 22 Wrong Tests 11 Total Cost $0.103 Response Time (avg) 68.7s
#76	DeepSeek V3.2 medium	DeepSeek	1	7.0	$0.078	11/22	68.6s
Total Tests 22 Wrong Tests 11 Total Cost $0.078 Response Time (avg) 68.6s
#190	MiniMax M2.5 medium	Minimax	1	4.6	$0.340	5/22	68.3s
Total Tests 22 Wrong Tests 17 Total Cost $0.340 Response Time (avg) 68.3s
#86	Step 3.7 Flash high	Stepfun	1	6.9	$1.207	11/22	64.7s
Total Tests 22 Wrong Tests 11 Total Cost $1.207 Response Time (avg) 64.7s

1 2 3 4 5 6

→

Invalid tool call Failures

Filter models

Top Models by Invalid tool call Count

Invalid tool call Count vs Score

Top Models by Response Time (avg)