Tool Calling Model Ranking

See which AI models perform best on Tool Calling, which ones stay reliable, and where the biggest gaps appear. Sort by: Response Time (avg) ↓.

Models Shown

Average Tool Calling Score

8.8

Best Model

Ring-2.6-1T 10.0

Failure Reasons

With failure reason API error17 With failure reason Invalid tool call9 With failure reason Did not follow instructions8 With failure reason Wrong answer3 With failure reason No answer2

216/216

Rank	Model	Company	Tool Calling Score	Score	Total Cost	Tests Correct	Response Time (avg)
#200	GLM 4.7 Flash medium	Z.ai	10.0	4.3	$0.166	1/1	15.9s
Total Tests 1 Wrong Tests 0 Total Cost $0.166 Response Time (avg) 15.9s
#46	GLM 5 medium	Z.ai	10.0	7.7	$0.307	1/1	15.9s
Total Tests 1 Wrong Tests 0 Total Cost $0.307 Response Time (avg) 15.9s
#50	DeepSeek V4 Pro high	DeepSeek	9.8	7.7	$0.200	1/1	15.9s
Total Tests 1 Wrong Tests 0 Total Cost $0.200 Response Time (avg) 15.9s
#101	GLM 5.2 none	Z.ai	10.0	6.6	$0.128	1/1	15.8s
Total Tests 1 Wrong Tests 0 Total Cost $0.128 Response Time (avg) 15.8s
#196	MiniMax M2.5 medium	Minimax	10.0	4.6	$0.340	1/1	15.4s
Total Tests 1 Wrong Tests 0 Total Cost $0.340 Response Time (avg) 15.4s
#106	Hy3 preview medium	Tencent	10.0	6.5	$0.018	1/1	15.0s
Total Tests 1 Wrong Tests 0 Total Cost $0.018 Response Time (avg) 15.0s
#40	Qwen3.7 Plus medium	Qwen	10.0	7.9	$0.267	1/1	15.0s
Total Tests 1 Wrong Tests 0 Total Cost $0.267 Response Time (avg) 15.0s
#74	Qwen3.5 Plus 2026-04-20 medium	Qwen	10.0	7.2	$0.317	1/1	14.7s
Total Tests 1 Wrong Tests 0 Total Cost $0.317 Response Time (avg) 14.7s
#161	Kimi K2.5 none	Moonshot AI	10.0	5.5	$0.127	1/1	14.0s
Total Tests 1 Wrong Tests 0 Total Cost $0.127 Response Time (avg) 14.0s
#140	Mimo V2 Omni medium	Xiaomi	10.0	5.9	$0.683	1/1	14.0s
Total Tests 1 Wrong Tests 0 Total Cost $0.683 Response Time (avg) 14.0s
#79	Grok 4.20 medium	X AI	3.0	7.1	$0.777	0/1	13.7s
Total Tests 1 Wrong Tests 1 Total Cost $0.777 Response Time (avg) 13.7s
#21	GPT-5.4 medium	OpenAI	10.0	8.5	$1.533	1/1	13.3s
Total Tests 1 Wrong Tests 0 Total Cost $1.533 Response Time (avg) 13.3s
#52	Grok Build 0.1 medium	X AI	10.0	7.6	$1.097	1/1	13.1s
Total Tests 1 Wrong Tests 0 Total Cost $1.097 Response Time (avg) 13.1s
#3	Gemini 3 Flash Preview medium	Google	10.0	9.6	$0.742	1/1	12.6s
Total Tests 1 Wrong Tests 0 Total Cost $0.742 Response Time (avg) 12.6s
#98	GLM 5V Turbo medium	Z.ai	7.0	6.7	$0.457	0/1	12.5s
Total Tests 1 Wrong Tests 1 Total Cost $0.457 Response Time (avg) 12.5s

Tool Calling Ranking

Filter models

Top Models by Tool Calling Score

Tool Calling Score vs Total Cost

Top Models by Response Time (avg)