AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Failures

Invalid tool call Failures

See which AI models run into Invalid tool call most often, so you can spot reliability risks before choosing one. Sort by: Avg Score ↑.

Models Shown

4

Total Failures

4

Most Affected Model

GLM 4.7 Flash 1

Related Categories

Rank Model Company Invalid tool call Count Avg Score Tests Correct Response Time (avg)
#52 GLM 4.7 Flash medium Z.ai 1 3.1 4/16 36.8s
#49 GLM 4.7 Flash none Z.ai 1 3.9 4/16 2.99s
#43 MiniMax M2.5 medium Minimax 1 4.7 5/16 43.0s
#33 DeepSeek V3.2 none DeepSeek 1 5.5 7/16 12.9s

Top Models by Invalid tool call Count

Invalid tool call Count vs Avg Score

Top Models by Response Time (avg)