AI BENCHY Category Failures
Domain specific
Extra formatting
Domain specific
Extra formatting
See which AI models are most likely to hit Extra formatting on Domain specific, so you can spot weak points faster. Sort by: Failure Count ↑.
Related Failure Reasons
Related Categories
| Rank | Model | Company | Extra formatting Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #11 | Claude Sonnet 4.6 medium | Anthropic | 1 | 10.0 | 0/3 | 0ms |
| #26 | Claude Opus 4.6 medium | Anthropic | 2 | 10.0 | 0/3 | 83.4s |