AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category Failures

Combined: Invalid tool call

Combined
Invalid tool call

See which AI models are most likely to hit Invalid tool call on Combined, so you can spot weak points faster.

Models Shown

4

Total Failures

19

Most Affected Model

Gemini 3.5 Flash 1
Rank Model Company Invalid tool call Count Category Score Tests Correct Response Time (avg)
#145 Laguna M.1 none Poolside 1 3.0 0/1 4.32s
#154 Qwen3.5-9B none Qwen 1 3.0 0/1 5.91s
#158 GLM 4.7 Flash medium Z.ai 1 2.8 0/1 65.6s
#163 Granite 4.1 8B none IBM Granite 1 3.0 0/1 1.88s

Top Models by Invalid tool call Count

Invalid tool call Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost