Kushindwa kwa AI BENCHY
Kushindwa kwa Muda umeisha
Ona ni modeli gani za AI hukutana na Muda umeisha mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua. Panga kwa: Muda wa majibu (wastani) ↑.
| Nafasi | Modeli | Kampuni | Idadi ya Muda umeisha | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #130 | MiniMax M2.7 medium | Minimax | 2 | 5.3 | 5/21 | 38.2s |
| #18 | Qwen3.7 Plus medium | Qwen | 1 | 8.2 | 15/21 | 38.9s |
| #29 | Qwen3.5-122B-A10B medium | Qwen | 2 | 7.8 | 14/21 | 42.5s |
| #94 | GPT-5 Nano medium | OpenAI | 1 | 6.3 | 9/21 | 42.5s |
| #27 | Gemma 4 31B medium | 2 | 7.8 | 14/21 | 56.5s | |
| #49 | Qwen3.5-Flash medium | Qwen | 3 | 7.4 | 12/21 | 63.3s |
| #37 | Gemma 4 26B A4B medium | 2 | 7.6 | 14/21 | 63.4s | |
| #103 | DeepSeek V4 Pro high | DeepSeek | 3 | 6.0 | 8/21 | 65.2s |
| #129 | MiniMax M2.5 medium | Minimax | 4 | 5.3 | 5/21 | 65.4s |
| #67 | MiniMax M3 medium | Minimax | 3 | 7.1 | 11/21 | 68.2s |
| #30 | Qwen3.5-27B medium | Qwen | 1 | 7.8 | 13/21 | 68.4s |
| #72 | DeepSeek V3.2 medium | DeepSeek | 2 | 7.0 | 11/21 | 68.7s |
| #60 | Kimi K2.6 medium | Moonshot AI | 3 | 7.2 | 12/21 | 71.7s |
| #62 | Step 3.5 Flash medium | Stepfun | 1 | 7.2 | 11/20 | 72.5s |
| #66 | Qwen3.5-35B-A3B medium | Qwen | 5 | 7.1 | 11/21 | 72.6s |