AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
Your ad here

AI BENCHY Failures

Timed out Failures

See which AI models run into Timed out most often, so you can spot reliability risks before choosing one. Sort by: Avg Score ↓.

Models Shown

15

Total Failures

25

Most Affected Model

Qwen3.5 Plus 2026-02-15 2
Rank Model Company Timed out Count Avg Score Tests Correct Response Time (avg)
#4 Qwen3.5 Plus 2026-02-15 medium Qwen 2 8.3 13/16 34.5s
#7 Qwen3.5-27B medium Qwen 1 8.2 12/16 52.1s
#10 Qwen3.5-122B-A10B medium Qwen 1 7.7 12/16 29.7s
#11 Claude Sonnet 4.6 medium Anthropic 1 7.7 12/16 11.2s
#14 GLM 5 medium Z.ai 1 7.4 11/16 16.2s
#18 DeepSeek V3.2 medium DeepSeek 1 7.3 11/16 39.5s
#23 Seed-2.0-Mini medium Bytedance Seed 4 6.9 10/16 65.1s
#24 Qwen3.5-Flash medium Qwen 3 6.9 10/16 70.8s
#27 GPT-5.2 medium OpenAI 1 6.5 10/16 15.3s
#28 Kimi K2.5 medium Moonshot AI 1 6.4 9/16 69.8s
#30 Grok 4.1 Fast medium X AI 1 6.2 9/16 26.3s
#32 GPT-5 Mini medium OpenAI 1 6.0 8/16 25.1s
#34 GPT-5 Nano medium OpenAI 1 5.5 7/16 47.9s
#35 Qwen3.5-35B-A3B medium Qwen 4 5.5 8/16 43.9s
#43 MiniMax M2.5 medium Minimax 2 4.7 5/16 43.0s

Top Models by Timed out Count

Timed out Count vs Avg Score

Top Models by Response Time (avg)