AI BENCHY
Advertise here

AI BENCHY Failures

No answer Failures

See which AI models run into No answer most often, so you can spot reliability risks before choosing one.

Models Shown

15

Total Failures

43

Most Affected Model

Step 3.7 Flash 4
Rank Model Company No answer Count Score Tests Correct Response Time (avg)
#71 Step 3.7 Flash high Stepfun 4 7.0 11/21 64.5s
#78 Qwen3.6 27B medium Qwen 3 6.8 10/21 59.7s
#158 GLM 4.7 Flash medium Z.ai 3 4.4 4/21 35.1s
#37 Gemma 4 26B A4B medium Google 2 7.6 14/21 63.4s
#66 Qwen3.5-35B-A3B medium Qwen 2 7.1 11/21 72.6s
#76 Kimi K2.5 medium Moonshot AI 2 6.8 10/21 98.4s
#80 Mimo V2 Omni medium Xiaomi 2 6.7 10/21 41.2s
#107 Laguna Xs.2 medium Poolside 2 5.8 6/19 6.73s
#161 Qwen3.5-9B medium Qwen 2 4.2 3/21 82.2s
#10 Claude Opus 4.8 medium Anthropic 1 8.7 17/21 9.66s
#17 GLM 5 medium Z.ai 1 8.3 15/21 33.5s
#22 Step 3.7 Flash medium Stepfun 1 8.0 14/21 20.4s
#23 GLM 5 Turbo medium Z.ai 1 8.0 14/21 23.0s
#27 Gemma 4 31B medium Google 1 7.8 14/21 56.5s
#42 GPT-5.2 medium OpenAI 1 7.5 13/21 16.9s

Top Models by No answer Count

No answer Count vs Score

Top Models by Response Time (avg)