Tool Calling Model Ranking

See which AI models perform best on Tool Calling, which ones stay reliable, and where the biggest gaps appear. Sort by: Response Time (avg) ↓.

Models Shown

Average Tool Calling Score

8.7

Best Model

Ring-2.6-1T 10.0

Failure Reasons

With failure reason API error17 With failure reason Invalid tool call9 With failure reason Did not follow instructions8 With failure reason Wrong answer3 With failure reason No answer2

210/210

Rank	Model	Company	Tool Calling Score	Score	Total Cost	Tests Correct	Response Time (avg)
#7	Gemini 3.1 Pro Preview medium	Google	10.0	9.2	$1.361	1/1	23.1s
Total Tests 1 Wrong Tests 0 Total Cost $1.361 Response Time (avg) 23.1s
#148	Owl Alpha none	Openrouter	10.0	5.6	$0.000	1/1	22.8s
Total Tests 1 Wrong Tests 0 Total Cost $0.000 Response Time (avg) 22.8s
#38	GLM 5.2 medium	Z.ai	10.0	7.8	$0.222	1/1	20.4s
Total Tests 1 Wrong Tests 0 Total Cost $0.222 Response Time (avg) 20.4s
#81	KAT-Coder-Pro V2.5 medium	Kwaipilot	10.0	6.9	$0.467	1/1	19.0s
Total Tests 1 Wrong Tests 0 Total Cost $0.467 Response Time (avg) 19.0s
#178	Ling-2.6-flash none	Inclusionai	3.0	4.9	$0.002	0/1	18.8s
Total Tests 1 Wrong Tests 1 Total Cost $0.002 Response Time (avg) 18.8s
#26	GPT-5 Mini medium	OpenAI	10.0	8.1	$0.237	1/1	18.6s
Total Tests 1 Wrong Tests 0 Total Cost $0.237 Response Time (avg) 18.6s
#62	KAT-Coder-Pro V2.5 low	Kwaipilot	10.0	7.4	$0.387	1/1	18.4s
Total Tests 1 Wrong Tests 0 Total Cost $0.387 Response Time (avg) 18.4s
#19	Qwen3.6 Max Preview medium	Qwen	10.0	8.4	$1.143	1/1	18.3s
Total Tests 1 Wrong Tests 0 Total Cost $1.143 Response Time (avg) 18.3s
#153	Hy3 preview low	Tencent	2.8	5.5	$0.015	0/1	17.8s
Total Tests 1 Wrong Tests 1 Total Cost $0.015 Response Time (avg) 17.8s
#73	Grok 4.3 medium	X AI	10.0	7.1	$0.779	1/1	17.7s
Total Tests 1 Wrong Tests 0 Total Cost $0.779 Response Time (avg) 17.7s
#184	Hunter Alpha medium	OpenRouter	10.0	4.7	$0.000	1/1	17.3s
Total Tests 1 Wrong Tests 0 Total Cost $0.000 Response Time (avg) 17.3s
#17	Claude Fable 5 medium	Anthropic	10.0	8.6	$3.478	1/1	17.0s
Total Tests 1 Wrong Tests 0 Total Cost $3.478 Response Time (avg) 17.0s
#99	Qwen3.6 27B medium	Qwen	10.0	6.5	$0.779	1/1	16.9s
Total Tests 1 Wrong Tests 0 Total Cost $0.779 Response Time (avg) 16.9s
#84	MiMo-V2.5-Pro medium	Xiaomi	10.0	6.9	$0.187	1/1	16.9s
Total Tests 1 Wrong Tests 0 Total Cost $0.187 Response Time (avg) 16.9s
#177	Nemotron 3 Super none	NVIDIA	4.7	4.9	$0.008	0/1	16.0s
Total Tests 1 Wrong Tests 1 Total Cost $0.008 Response Time (avg) 16.0s

Tool Calling Ranking

Filter models

Top Models by Tool Calling Score

Tool Calling Score vs Total Cost

Top Models by Response Time (avg)