AI BENCHY Category Failures
Combined
Invalid tool call
Combined
Invalid tool call
See which AI models are most likely to hit Invalid tool call on Combined, so you can spot weak points faster. Sort by: Response Time (avg) ↑.
Related Failure Reasons
Related Categories
| Rank | Model | Company | Invalid tool call Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #49 | GLM 4.7 Flash none | Z.ai | 1 | 10.0 | 0/1 | 3.22s |
| #43 | MiniMax M2.5 medium | Minimax | 1 | 10.0 | 0/1 | 60.4s |
| #52 | GLM 4.7 Flash medium | Z.ai | 1 | 10.0 | 0/1 | 65.6s |
| #33 | DeepSeek V3.2 none | DeepSeek | 1 | 8.0 | 0/1 | 115.9s |