AI BENCHY Category Failures
Puzzle Solving
Extra formatting
Puzzle Solving
Extra formatting
See which AI models are most likely to hit Extra formatting on Puzzle Solving, so you can spot weak points faster. Sort by: Response Time (avg) ↑.
Related Failure Reasons
Related Categories
| Rank | Model | Company | Extra formatting Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #25 | Claude Sonnet 4.6 none | Anthropic | 1 | 7.0 | 2/3 | 2.92s |