AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category Failures

Domain specific: Timed out

Domain specific
Timed out

See which AI models are most likely to hit Timed out on Domain specific, so you can spot weak points faster.

Models Shown

11

Total Failures

34

Most Affected Model

Seed-2.0-Mini 3
Rank Model Company Timed out Count Category Score Tests Correct Response Time (avg)
#52 Claude Sonnet 4.6 medium Anthropic 1 2.9 0/3 0ms
#54 GPT-5 Mini medium OpenAI 1 3.6 0/3 44.6s
#55 GLM 5.1 medium Z.ai 1 5.3 1/3 29.8s
#72 DeepSeek V3.2 medium DeepSeek 1 2.9 0/3 24.3s
#76 Kimi K2.5 medium Moonshot AI 1 3.5 0/3 137.3s
#79 Hunter Alpha medium OpenRouter 1 3.0 0/3 10.5s
#86 Grok 4.1 Fast medium X AI 1 5.8 1/3 121.8s
#94 GPT-5 Nano medium OpenAI 1 5.2 1/3 204.0s
#103 DeepSeek V4 Pro high DeepSeek 1 2.9 0/3 205.7s
#105 Nemotron 3 Super medium NVIDIA 1 2.9 0/3 16.2s
#129 MiniMax M2.5 medium Minimax 1 2.9 0/3 237.3s

Top Models by Timed out Count

Timed out Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost