AI BENCHY
Advertise here

AI BENCHY Category Failures

Domain specific: Timed out

Domain specific
Timed out

See which AI models are most likely to hit Timed out on Domain specific, so you can spot weak points faster.

Models Shown

15

Total Failures

34

Most Affected Model

Seed-2.0-Mini 3
Rank Model Company Timed out Count Category Score Tests Correct Response Time (avg)
#73 Seed-2.0-Mini medium Bytedance Seed 3 3.0 0/3 0ms
#161 Qwen3.5-9B medium Qwen 3 3.6 0/3 137.7s
#60 Kimi K2.6 medium Moonshot AI 2 5.3 1/3 202.4s
#66 Qwen3.5-35B-A3B medium Qwen 2 4.1 0/3 88.3s
#67 MiniMax M3 medium Minimax 2 5.5 1/3 233.1s
#130 MiniMax M2.7 medium Minimax 2 3.0 0/3 19.0s
#11 Claude Opus 4.7 medium Anthropic 1 7.7 2/3 1.17s
#17 GLM 5 medium Z.ai 1 3.5 0/3 0ms
#23 GLM 5 Turbo medium Z.ai 1 2.9 0/3 71.1s
#25 Qwen3.5 Plus 2026-02-15 medium Qwen 1 5.3 1/3 17.5s
#30 Qwen3.5-27B medium Qwen 1 5.3 1/3 79.5s
#37 Gemma 4 26B A4B medium Google 1 2.9 0/3 23.6s
#42 GPT-5.2 medium OpenAI 1 5.9 1/3 77.8s
#49 Qwen3.5-Flash medium Qwen 1 5.3 1/3 146.5s
#51 Mimo V2 PRO medium Xiaomi 1 5.3 1/3 8.82s

Top Models by Timed out Count

Timed out Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost