Invalid tool call Failure Ranking

See which AI models run into Invalid tool call most often, so you can spot reliability risks before choosing one.

Models Shown

Total Failures

100

Most Affected Model

Ling-2.6-flash 3

Categories

In category Combined91 In category Tool Calling9

83/83

Rank	Model	Company	Invalid tool call Count	Score	Total Cost	Tests Correct	Response Time (avg)
#201	Granite 4.1 8B none	IBM Granite	2	4.0	$0.007	2/22	1.45s
Total Tests 22 Wrong Tests 20 Total Cost $0.007 Response Time (avg) 1.45s
#2	Gemini 3.5 Flash high	Google	1	9.5	$1.976	20/22	15.1s
Total Tests 22 Wrong Tests 2 Total Cost $1.976 Response Time (avg) 15.1s
#8	Qwen3.7 Max medium	Qwen	1	9.2	$1.116	18/22	40.6s
Total Tests 22 Wrong Tests 4 Total Cost $1.116 Response Time (avg) 40.6s
#11	Gemini 3.5 Flash low	Google	1	8.9	$0.433	19/22	5.55s
Total Tests 22 Wrong Tests 3 Total Cost $0.433 Response Time (avg) 5.55s
#16	Muse Spark 1.1 medium	Meta	1	8.6	$1.357	15/22	25.0s
Total Tests 22 Wrong Tests 7 Total Cost $1.357 Response Time (avg) 25.0s
#17	Claude Fable 5 medium	Anthropic	1	8.6	$3.478	17/22	17.2s
Total Tests 22 Wrong Tests 5 Total Cost $3.478 Response Time (avg) 17.2s
#23	Claude Sonnet 5 medium	Anthropic	1	8.3	$0.922	16/22	12.5s
Total Tests 22 Wrong Tests 6 Total Cost $0.922 Response Time (avg) 12.5s
#24	Muse Spark 1.1 low	Meta	1	8.3	$0.647	13/22	11.5s
Total Tests 22 Wrong Tests 9 Total Cost $0.647 Response Time (avg) 11.5s
#29	Step 3.7 Flash medium	Stepfun	1	8.0	$0.515	14/22	26.4s
Total Tests 22 Wrong Tests 8 Total Cost $0.515 Response Time (avg) 26.4s
#32	Inkling medium	Thinkingmachines	1	8.0	$0.391	15/22	16.2s
Total Tests 22 Wrong Tests 7 Total Cost $0.391 Response Time (avg) 16.2s
#34	GPT-5.6 Terra high	OpenAI	1	8.0	$1.055	14/22	11.3s
Total Tests 22 Wrong Tests 8 Total Cost $1.055 Response Time (avg) 11.3s
#36	Qwen3.7 Plus medium	Qwen	1	7.9	$0.267	15/22	51.5s
Total Tests 22 Wrong Tests 7 Total Cost $0.267 Response Time (avg) 51.5s
#45	DeepSeek V4 Flash high	DeepSeek	1	7.7	$0.042	13/22	49.7s
Total Tests 22 Wrong Tests 9 Total Cost $0.042 Response Time (avg) 49.7s
#51	Nemotron 3 Ultra medium	NVIDIA	1	7.5	$0.774	13/22	32.2s
Total Tests 22 Wrong Tests 9 Total Cost $0.774 Response Time (avg) 32.2s
#55	GPT-5.6 Terra low	OpenAI	1	7.5	$0.519	13/22	5.31s
Total Tests 22 Wrong Tests 9 Total Cost $0.519 Response Time (avg) 5.31s

←

1 2 3 4 5 6

→

Invalid tool call Failures

Filter models

Top Models by Invalid tool call Count

Invalid tool call Count vs Score

Top Models by Response Time (avg)