AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category Failures

Anti-AI Tricks
Extra formatting

See which AI models are most likely to hit Extra formatting on Anti-AI Tricks, so you can spot weak points faster. Sort by: Tests Correct ↓.

Models Shown

5

Total Failures

8

Most Affected Model

Claude Sonnet 4.6 1
Rank Model Company Extra formatting Count Category Score Tests Correct Response Time (avg)
#11 Claude Sonnet 4.6 medium Anthropic 1 7.0 2/3 4.95s
#25 Claude Sonnet 4.6 none Anthropic 2 4.0 1/3 4.83s
#26 Claude Opus 4.6 medium Anthropic 2 4.0 1/3 11.9s
#33 DeepSeek V3.2 none DeepSeek 2 10.0 0/3 8.79s
#48 Qwen3 Coder Next none Qwen 1 2.3 0/3 4.39s

Top Models by Extra formatting Count

Extra formatting Count vs Avg Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost