Tool Calling Model Ranking

See which AI models perform best on Tool Calling, which ones stay reliable, and where the biggest gaps appear. Sort by: Response Time (avg) ↓.

Models Shown

Average Tool Calling Score

8.7

Best Model

Ring-2.6-1T 10.0

Failure Reasons

With failure reason API error17 With failure reason Invalid tool call9 With failure reason Did not follow instructions8 With failure reason Wrong answer3 With failure reason No answer2

210/210

Rank	Model	Company	Tool Calling Score	Score	Total Cost	Tests Correct	Response Time (avg)
#108	Ring-2.6-1T medium	Inclusionai	10.0	6.3	$0.103	1/1	104.4s
Total Tests 1 Wrong Tests 0 Total Cost $0.103 Response Time (avg) 104.4s
#80	Seed-2.0-Mini medium	Bytedance Seed	10.0	7.0	$0.101	1/1	88.7s
Total Tests 1 Wrong Tests 0 Total Cost $0.101 Response Time (avg) 88.7s
#135	Hy3 preview high	Tencent	10.0	5.9	$0.048	1/1	78.8s
Total Tests 1 Wrong Tests 0 Total Cost $0.048 Response Time (avg) 78.8s
#150	DeepSeek V4 Flash none	DeepSeek	10.0	5.6	$0.044	1/1	77.9s
Total Tests 1 Wrong Tests 0 Total Cost $0.044 Response Time (avg) 77.9s
#45	DeepSeek V4 Flash high	DeepSeek	10.0	7.7	$0.042	1/1	74.7s
Total Tests 1 Wrong Tests 0 Total Cost $0.042 Response Time (avg) 74.7s
#156	Gemma 4 26B A4B none	Google	10.0	5.5	$0.015	1/1	57.1s
Total Tests 1 Wrong Tests 0 Total Cost $0.015 Response Time (avg) 57.1s
#140	Nemotron 3 Super medium	NVIDIA	10.0	5.7	$0.050	1/1	39.7s
Total Tests 1 Wrong Tests 0 Total Cost $0.050 Response Time (avg) 39.7s
#76	DeepSeek V3.2 medium	DeepSeek	10.0	7.0	$0.078	1/1	34.8s
Total Tests 1 Wrong Tests 0 Total Cost $0.078 Response Time (avg) 34.8s
#199	Hy3 preview none	Tencent	10.0	4.0	$0.003	1/1	33.8s
Total Tests 1 Wrong Tests 0 Total Cost $0.003 Response Time (avg) 33.8s
#128	GPT-5 Nano medium	OpenAI	10.0	6.1	$0.114	1/1	33.3s
Total Tests 1 Wrong Tests 0 Total Cost $0.114 Response Time (avg) 33.3s
#77	Kimi K2.5 medium	Moonshot AI	10.0	7.0	$0.600	1/1	31.7s
Total Tests 1 Wrong Tests 0 Total Cost $0.600 Response Time (avg) 31.7s
#69	KAT-Coder-Pro V2.5 high	Kwaipilot	10.0	7.2	$0.482	1/1	28.0s
Total Tests 1 Wrong Tests 0 Total Cost $0.482 Response Time (avg) 28.0s
#113	MiMo-V2-Flash medium	Xiaomi	10.0	6.3	$0.043	1/1	27.8s
Total Tests 1 Wrong Tests 0 Total Cost $0.043 Response Time (avg) 27.8s
#185	Grok 4.1 Fast medium	X AI	2.8	4.7	$0.069	0/1	27.7s
Total Tests 1 Wrong Tests 1 Total Cost $0.069 Response Time (avg) 27.7s
#162	Ling-2.6-1T none	Inclusionai	3.0	5.3	$0.016	0/1	25.7s
Total Tests 1 Wrong Tests 1 Total Cost $0.016 Response Time (avg) 25.7s

Tool Calling Ranking

Filter models

Top Models by Tool Calling Score

Tool Calling Score vs Total Cost

Top Models by Response Time (avg)