AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category Failures

Puzzle Solving: Extra formatting

Puzzle Solving
Extra formatting

See which AI models are most likely to hit Extra formatting on Puzzle Solving, so you can spot weak points faster. Sort by: Failure Count ↑.

Models Shown

5

Total Failures

5

Most Affected Model

Mimo V2 PRO 1
Rank Model Company Extra formatting Count Category Score Tests Correct Response Time (avg)
#51 Mimo V2 PRO medium Xiaomi 1 6.4 1/3 5.08s
#68 Claude Opus 4.8 none Anthropic 1 7.7 2/3 2.74s
#77 Claude Sonnet 4.6 none Anthropic 1 7.7 2/3 2.53s
#113 DeepSeek V4 Pro none DeepSeek 1 7.6 2/3 16.0s
#139 DeepSeek V4 Flash none DeepSeek 1 3.1 0/3 23.7s

Top Models by Extra formatting Count

Extra formatting Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost