AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
Your ad here

AI BENCHY Failures

No answer Failures

See which AI models run into No answer most often, so you can spot reliability risks before choosing one. Sort by: Avg Score ↑.

Models Shown

6

Total Failures

7

Most Affected Model

GLM 4.7 Flash 2
Rank Model Company No answer Count Avg Score Tests Correct Response Time (avg)
#52 GLM 4.7 Flash medium Z.ai 2 3.1 4/16 36.8s
#35 Qwen3.5-35B-A3B medium Qwen 1 5.5 8/16 43.9s
#30 Grok 4.1 Fast medium X AI 1 6.2 9/16 26.3s
#28 Kimi K2.5 medium Moonshot AI 1 6.4 9/16 69.8s
#27 GPT-5.2 medium OpenAI 1 6.5 10/16 15.3s
#14 GLM 5 medium Z.ai 1 7.4 11/16 16.2s

Top Models by No answer Count

No answer Count vs Avg Score

Top Models by Response Time (avg)