AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
Your ad here

AI BENCHY Failures

Timed out Failures

See which AI models run into Timed out most often, so you can spot reliability risks before choosing one. Sort by: Tests Correct ↑.

Models Shown

15

Total Failures

25

Most Affected Model

MiniMax M2.5 2
Rank Model Company Timed out Count Avg Score Tests Correct Response Time (avg)
#43 MiniMax M2.5 medium Minimax 2 4.7 5/16 43.0s
#34 GPT-5 Nano medium OpenAI 1 5.5 7/16 47.9s
#32 GPT-5 Mini medium OpenAI 1 6.0 8/16 25.1s
#35 Qwen3.5-35B-A3B medium Qwen 4 5.5 8/16 43.9s
#28 Kimi K2.5 medium Moonshot AI 1 6.4 9/16 69.8s
#30 Grok 4.1 Fast medium X AI 1 6.2 9/16 26.3s
#23 Seed-2.0-Mini medium Bytedance Seed 4 6.9 10/16 65.1s
#24 Qwen3.5-Flash medium Qwen 3 6.9 10/16 70.8s
#27 GPT-5.2 medium OpenAI 1 6.5 10/16 15.3s
#14 GLM 5 medium Z.ai 1 7.4 11/16 16.2s
#18 DeepSeek V3.2 medium DeepSeek 1 7.3 11/16 39.5s
#7 Qwen3.5-27B medium Qwen 1 8.2 12/16 52.1s
#10 Qwen3.5-122B-A10B medium Qwen 1 7.7 12/16 29.7s
#11 Claude Sonnet 4.6 medium Anthropic 1 7.7 12/16 11.2s
#4 Qwen3.5 Plus 2026-02-15 medium Qwen 2 8.3 13/16 34.5s

Top Models by Timed out Count

Timed out Count vs Avg Score

Top Models by Response Time (avg)