AI BENCHY Category Failures
Combined
Invalid tool call
Combined
Invalid tool call
See which AI models are most likely to hit Invalid tool call on Combined, so you can spot weak points faster.
Related Failure Reasons
Related Categories
| Rank | Model | Company | Invalid tool call Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #33 | DeepSeek V3.2 none | DeepSeek | 1 | 8.0 | 0/1 | 115.9s |
| #43 | MiniMax M2.5 medium | Minimax | 1 | 10.0 | 0/1 | 60.4s |
| #49 | GLM 4.7 Flash none | Z.ai | 1 | 10.0 | 0/1 | 3.22s |
| #52 | GLM 4.7 Flash medium | Z.ai | 1 | 10.0 | 0/1 | 65.6s |