AI BENCHY
Advertise here

AI BENCHY Category Failures

Puzzle Solving: Extra formatting

Puzzle Solving
Extra formatting

See which AI models are most likely to hit Extra formatting on Puzzle Solving, so you can spot weak points faster. Sort by: Response Time (avg) ↑.

Models Shown

5

Total Failures

5

Most Affected Model

Claude Sonnet 4.6 1
Rank Model Company Extra formatting Count Category Score Tests Correct Response Time (avg)
#77 Claude Sonnet 4.6 none Anthropic 1 7.7 2/3 2.53s
#68 Claude Opus 4.8 none Anthropic 1 7.7 2/3 2.74s
#51 Mimo V2 PRO medium Xiaomi 1 6.4 1/3 5.08s
#113 DeepSeek V4 Pro none DeepSeek 1 7.6 2/3 16.0s
#139 DeepSeek V4 Flash none DeepSeek 1 3.1 0/3 23.7s

Top Models by Extra formatting Count

Extra formatting Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost