AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
Your ad here

AI BENCHY Failures

Invalid tool call Failures

See which AI models run into Invalid tool call most often, so you can spot reliability risks before choosing one. Sort by: Tests Correct ↓.

Models Shown

4

Total Failures

4

Most Affected Model

DeepSeek V3.2 1

Related Categories

Rank Model Company Invalid tool call Count Avg Score Tests Correct Response Time (avg)
#33 DeepSeek V3.2 none DeepSeek 1 5.5 7/16 12.9s
#43 MiniMax M2.5 medium Minimax 1 4.7 5/16 43.0s
#49 GLM 4.7 Flash none Z.ai 1 3.9 4/16 2.99s
#52 GLM 4.7 Flash medium Z.ai 1 3.1 4/16 36.8s

Top Models by Invalid tool call Count

Invalid tool call Count vs Avg Score

Top Models by Response Time (avg)