AI BENCHY Category Failures
Puzzle Solving: Extra formatting
Puzzle Solving
Extra formatting
See which AI models are most likely to hit Extra formatting on Puzzle Solving, so you can spot weak points faster. Sort by: Response Time (avg) ↓.
Failure Reasons
| Rank | Model | Company | Extra formatting Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #23 | MiMo-V2-Pro medium | Xiaomi | 1 | 7.0 | 1/3 | 4.71s |
| #42 | Claude Sonnet 4.6 none | Anthropic | 1 | 7.7 | 2/3 | 2.92s |