AI BENCHY Category Failures
Instructions following: API error
Instructions following
API error
See which AI models are most likely to hit API error on Instructions following, so you can spot weak points faster. Sort by: Response Time (avg) ↑.
Failure Reasons
| Rank | Model | Company | API error Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #47 | Grok 4.20 medium | X AI | 1 | 7.3 | 1/2 | 4.42s |