AI BENCHY Category Failures
Tool Calling: Invalid tool call
Tool Calling
Invalid tool call
See which AI models are most likely to hit Invalid tool call on Tool Calling, so you can spot weak points faster. Sort by: Tests Correct ↓.
Failure Reasons
Categories
| Rank | Model | Company | Invalid tool call Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #31 | GLM 5V Turbo medium | Z.ai | 1 | 7.0 | 0/1 | 12.5s |
| #81 | Elephant medium | Openrouter | 1 | 3.0 | 0/1 | 2.83s |
| #85 | Elephant none | Openrouter | 1 | 3.0 | 0/1 | 2.79s |