AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category Failures

Combined: Invalid tool call

Combined
Invalid tool call

See which AI models are most likely to hit Invalid tool call on Combined, so you can spot weak points faster. Sort by: Response Time (avg) ↓.

Models Shown

4

Total Failures

19

Most Affected Model

DeepSeek V3.2 1
Rank Model Company Invalid tool call Count Category Score Tests Correct Response Time (avg)
#128 Qwen3.6 Flash none Qwen 1 3.0 0/1 4.22s
#32 Gemini 3.5 Flash minimal Google 1 3.0 0/1 3.56s
#122 GLM 4.7 Flash none Z.ai 1 3.0 0/1 3.22s
#163 Granite 4.1 8B none IBM Granite 1 3.0 0/1 1.88s

Top Models by Invalid tool call Count

Invalid tool call Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost