AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
Your ad here

AI BENCHY Failures

Timed out Failures

See which AI models run into Timed out most often, so you can spot reliability risks before choosing one.

Models Shown

15

Total Failures

25

Most Affected Model

Seed-2.0-Mini 4
Rank Model Company Timed out Count Avg Score Tests Correct Response Time (avg)
#23 Seed-2.0-Mini medium Bytedance Seed 4 6.9 10/16 65.1s
#35 Qwen3.5-35B-A3B medium Qwen 4 5.5 8/16 43.9s
#24 Qwen3.5-Flash medium Qwen 3 6.9 10/16 70.8s
#4 Qwen3.5 Plus 2026-02-15 medium Qwen 2 8.3 13/16 34.5s
#43 MiniMax M2.5 medium Minimax 2 4.7 5/16 43.0s
#7 Qwen3.5-27B medium Qwen 1 8.2 12/16 52.1s
#10 Qwen3.5-122B-A10B medium Qwen 1 7.7 12/16 29.7s
#11 Claude Sonnet 4.6 medium Anthropic 1 7.7 12/16 11.2s
#14 GLM 5 medium Z.ai 1 7.4 11/16 16.2s
#18 DeepSeek V3.2 medium DeepSeek 1 7.3 11/16 39.5s
#27 GPT-5.2 medium OpenAI 1 6.5 10/16 15.3s
#28 Kimi K2.5 medium Moonshot AI 1 6.4 9/16 69.8s
#30 Grok 4.1 Fast medium X AI 1 6.2 9/16 26.3s
#32 GPT-5 Mini medium OpenAI 1 6.0 8/16 25.1s
#34 GPT-5 Nano medium OpenAI 1 5.5 7/16 47.9s

Top Models by Timed out Count

Timed out Count vs Avg Score

Top Models by Response Time (avg)