AI BENCHY
Your ad here

AI BENCHY Failures

Invalid tool call Failures

See which AI models run into Invalid tool call most often, so you can spot reliability risks before choosing one. Sort by: Score ↑.

Models Shown

12

Total Failures

13

Most Affected Model

GLM 4.7 Flash 1
Rank Model Company Invalid tool call Count Score Tests Correct Response Time (avg)
#93 GLM 4.7 Flash medium Z.ai 1 4.6 4/18 32.3s
#90 Qwen3.5-9B none Qwen 1 4.8 4/18 1.47s
#85 Elephant none Openrouter 1 5.2 5/18 1.23s
#82 Grok 4.20 none X AI 1 5.2 5/18 1.11s
#81 Elephant medium Openrouter 1 5.2 5/18 1.27s
#80 MiniMax M2.7 medium Minimax 1 5.3 4/18 31.1s
#79 Grok 4.20 Beta none X AI 1 5.3 4/18 1.19s
#74 GLM 4.7 Flash none Z.ai 1 5.6 5/18 3.35s
#75 GLM 5.1 none Z.ai 1 5.6 5/18 4.33s
#71 MiniMax M2.5 medium Minimax 1 5.7 5/18 39.6s
#64 DeepSeek V3.2 none DeepSeek 1 6.1 7/18 12.1s
#31 GLM 5V Turbo medium Z.ai 2 7.8 11/18 15.0s

Top Models by Invalid tool call Count

Invalid tool call Count vs Score

Top Models by Response Time (avg)