AI BENCHY Category Failures
Domain specific
Timed out
Domain specific
Timed out
See which AI models are most likely to hit Timed out on Domain specific, so you can spot weak points faster. Sort by: Response Time (avg) ↓.
Related Failure Reasons
Related Categories
| Rank | Model | Company | Timed out Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #43 | MiniMax M2.5 medium | Minimax | 1 | 10.0 | 0/3 | 237.3s |
| #34 | GPT-5 Nano medium | OpenAI | 1 | 4.0 | 1/3 | 204.0s |
| #24 | Qwen3.5-Flash medium | Qwen | 1 | 4.0 | 1/3 | 146.5s |
| #28 | Kimi K2.5 medium | Moonshot AI | 1 | 10.0 | 0/3 | 137.3s |
| #30 | Grok 4.1 Fast medium | X AI | 1 | 4.0 | 1/3 | 121.8s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 2 | 10.0 | 0/3 | 88.3s |
| #7 | Qwen3.5-27B medium | Qwen | 1 | 4.0 | 1/3 | 79.5s |
| #27 | GPT-5.2 medium | OpenAI | 1 | 4.0 | 1/3 | 77.8s |
| #32 | GPT-5 Mini medium | OpenAI | 1 | 10.0 | 0/3 | 44.6s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 1 | 4.0 | 1/3 | 39.3s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 1 | 4.0 | 1/3 | 17.5s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 1 | 10.0 | 0/3 | 0ms |
| #14 | GLM 5 medium | Z.ai | 1 | 10.0 | 0/3 | 0ms |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 3 | 10.0 | 0/3 | 0ms |