AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Failures

No answer Failures

See which AI models run into No answer most often, so you can spot reliability risks before choosing one. Sort by: Tests Correct ↓.

Models Shown

6

Total Failures

7

Most Affected Model

GLM 5 1
Rank Model Company No answer Count Avg Score Tests Correct Response Time (avg)
#14 GLM 5 medium Z.ai 1 7.4 11/16 16.2s
#27 GPT-5.2 medium OpenAI 1 6.5 10/16 15.3s
#28 Kimi K2.5 medium Moonshot AI 1 6.4 9/16 69.8s
#30 Grok 4.1 Fast medium X AI 1 6.2 9/16 26.3s
#35 Qwen3.5-35B-A3B medium Qwen 1 5.5 8/16 43.9s
#52 GLM 4.7 Flash medium Z.ai 2 3.1 4/16 36.8s

Top Models by No answer Count

No answer Count vs Avg Score

Top Models by Response Time (avg)