AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Failures

Extra formatting Failures

See which AI models run into Extra formatting most often, so you can spot reliability risks before choosing one. Sort by: Response Time (avg) ↑.

Models Shown

6

Total Failures

13

Most Affected Model

MiMo-V2-Flash 1
Rank Model Company Extra formatting Count Avg Score Tests Correct Response Time (avg)
#54 MiMo-V2-Flash none Xiaomi 1 2.9 3/16 2.97s
#25 Claude Sonnet 4.6 none Anthropic 3 6.8 10/16 5.57s
#11 Claude Sonnet 4.6 medium Anthropic 2 7.7 12/16 11.2s
#48 Qwen3 Coder Next none Qwen 1 4.0 4/16 11.7s
#33 DeepSeek V3.2 none DeepSeek 2 5.5 7/16 12.9s
#26 Claude Opus 4.6 medium Anthropic 4 6.6 10/16 22.9s

Top Models by Extra formatting Count

Extra formatting Count vs Avg Score

Top Models by Response Time (avg)