AI BENCHY Category Failures
Combined: Wrong answer
Combined
Wrong answer
See which AI models are most likely to hit Wrong answer on Combined, so you can spot weak points faster. Sort by: Response Time (avg) ↑.
Failure Reasons
| Rank | Model | Company | Wrong answer Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #132 | Mistral Small 4 medium | Mistral | 1 | 3.0 | 0/1 | 25.3s |
| #102 | Gemma 4 26B A4B none | 1 | 3.0 | 0/1 | 30.5s | |
| #156 | Hy3 preview none | Tencent | 1 | 3.0 | 0/1 | 35.8s |
| #140 | Qwen3 Coder Next none | Qwen | 1 | 3.0 | 0/1 | 45.1s |
| #131 | Qwen3.5-122B-A10B none | Qwen | 1 | 3.0 | 0/1 | 46.0s |
| #117 | Qwen3.5-35B-A3B none | Qwen | 1 | 3.0 | 0/1 | 47.4s |
| #51 | Mimo V2 PRO medium | Xiaomi | 1 | 4.7 | 0/1 | 64.7s |