Kushindwa kwa AI BENCHY
Kushindwa kwa Muda umeisha
Ona ni modeli gani za AI hukutana na Muda umeisha mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua. Panga kwa: Idadi ya kushindwa ↑.
| Nafasi | Modeli | Kampuni | Idadi ya Muda umeisha | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #150 | Qwen3 Coder Next medium | Qwen | 1 | 4.6 | 4/21 | 8.58s |
| #25 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 2 | 7.9 | 14/21 | 73.8s |
| #27 | Gemma 4 31B medium | 2 | 7.8 | 14/21 | 56.5s | |
| #29 | Qwen3.5-122B-A10B medium | Qwen | 2 | 7.8 | 14/21 | 42.5s |
| #37 | Gemma 4 26B A4B medium | 2 | 7.6 | 14/21 | 63.4s | |
| #55 | GLM 5.1 medium | Z.ai | 2 | 7.3 | 12/21 | 33.7s |
| #72 | DeepSeek V3.2 medium | DeepSeek | 2 | 7.0 | 11/21 | 68.7s |
| #76 | Kimi K2.5 medium | Moonshot AI | 2 | 6.8 | 10/21 | 98.4s |
| #79 | Hunter Alpha medium | OpenRouter | 2 | 6.7 | 8/18 | 10.3s |
| #130 | MiniMax M2.7 medium | Minimax | 2 | 5.3 | 5/21 | 38.2s |
| #158 | GLM 4.7 Flash medium | Z.ai | 2 | 4.4 | 4/21 | 35.1s |
| #49 | Qwen3.5-Flash medium | Qwen | 3 | 7.4 | 12/21 | 63.3s |
| #60 | Kimi K2.6 medium | Moonshot AI | 3 | 7.2 | 12/21 | 71.7s |
| #67 | MiniMax M3 medium | Minimax | 3 | 7.1 | 11/21 | 68.2s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 3 | 6.0 | 8/21 | 65.2s |