AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Failures

Invalid tool call Failures

See which AI models run into Invalid tool call most often, so you can spot reliability risks before choosing one. Sort by: Failure Count ↑.

Models Shown

4

Total Failures

4

Most Affected Model

DeepSeek V3.2 1

Related Categories

Rank Model Company Invalid tool call Count Avg Score Tests Correct Response Time (avg)
#33 DeepSeek V3.2 none DeepSeek 1 5.5 7/16 12.9s
#43 MiniMax M2.5 medium Minimax 1 4.7 5/16 43.0s
#49 GLM 4.7 Flash none Z.ai 1 3.9 4/16 2.99s
#52 GLM 4.7 Flash medium Z.ai 1 3.1 4/16 36.8s

Top Models by Invalid tool call Count

Invalid tool call Count vs Avg Score

Top Models by Response Time (avg)