AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Failures

Invalid tool call Failures

See which AI models run into Invalid tool call most often, so you can spot reliability risks before choosing one. Sort by: Response Time (avg) ↓.

Models Shown

4

Total Failures

4

Most Affected Model

MiniMax M2.5 1

Related Categories

Rank Model Company Invalid tool call Count Avg Score Tests Correct Response Time (avg)
#43 MiniMax M2.5 medium Minimax 1 4.7 5/16 43.0s
#52 GLM 4.7 Flash medium Z.ai 1 3.1 4/16 36.8s
#33 DeepSeek V3.2 none DeepSeek 1 5.5 7/16 12.9s
#49 GLM 4.7 Flash none Z.ai 1 3.9 4/16 2.99s

Top Models by Invalid tool call Count

Invalid tool call Count vs Avg Score

Top Models by Response Time (avg)