AI BENCHY
Your ad here

AI BENCHY Failures

Timed out Failures

See which AI models run into Timed out most often, so you can spot reliability risks before choosing one. Sort by: Tests Correct ↓.

Models Shown

15

Total Failures

61

Most Affected Model

Claude Opus 4.7 1
Rank Model Company Timed out Count Score Tests Correct Response Time (avg)
#3 Claude Opus 4.7 medium Anthropic 1 9.2 16/18 3.53s
#8 Qwen3.5 Plus 2026-02-15 medium Qwen 2 8.5 14/18 46.6s
#10 Qwen3.5-27B medium Qwen 1 8.4 13/18 53.0s
#13 GLM 5 medium Z.ai 1 8.4 13/18 23.3s
#14 Gemma 4 31B medium Google 1 8.3 13/18 24.9s
#19 Qwen3.5-122B-A10B medium Qwen 2 8.1 13/18 31.4s
#24 Gemma 4 26B A4B medium Google 2 8.0 13/18 25.0s
#26 Claude Sonnet 4.6 medium Anthropic 1 8.0 13/18 12.7s
#18 GLM 5 Turbo medium Z.ai 1 8.1 12/18 17.7s
#23 MiMo-V2-Pro medium Xiaomi 1 8.1 12/18 12.3s
#27 DeepSeek V3.2 medium DeepSeek 2 8.0 12/18 46.4s
#33 GLM 5.1 medium Z.ai 2 7.8 12/18 24.1s
#32 Qwen3.5-Flash medium Qwen 4 7.8 11/18 66.7s
#34 Kimi K2.6 medium Moonshot AI 2 7.7 11/18 45.2s
#39 Seed-2.0-Mini medium Bytedance Seed 4 7.5 11/18 69.7s

Top Models by Timed out Count

Timed out Count vs Score

Top Models by Response Time (avg)