Kushindwa kwa AI BENCHY
Kushindwa kwa Muda umeisha
Ona ni modeli gani za AI hukutana na Muda umeisha mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua. Panga kwa: Majaribio sahihi ↑.
| Nafasi | Modeli | Kampuni | Idadi ya Muda umeisha | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #161 | Qwen3.5-9B medium | Qwen | 11 | 4.2 | 3/21 | 82.2s |
| #150 | Qwen3 Coder Next medium | Qwen | 1 | 4.6 | 4/21 | 8.58s |
| #158 | GLM 4.7 Flash medium | Z.ai | 2 | 4.4 | 4/21 | 35.1s |
| #129 | MiniMax M2.5 medium | Minimax | 4 | 5.3 | 5/21 | 65.4s |
| #130 | MiniMax M2.7 medium | Minimax | 2 | 5.3 | 5/21 | 38.2s |
| #102 | Gemma 4 26B A4B none | 1 | 6.0 | 8/21 | 5.91s | |
| #103 | DeepSeek V4 Pro high | DeepSeek | 3 | 6.0 | 8/21 | 65.2s |
| #105 | Nemotron 3 Super medium | NVIDIA | 1 | 5.8 | 8/21 | 32.0s |
| #94 | GPT-5 Nano medium | OpenAI | 1 | 6.3 | 9/21 | 42.5s |
| #79 | Hunter Alpha medium | OpenRouter | 2 | 6.7 | 8/18 | 10.3s |
| #86 | Grok 4.1 Fast medium | X AI | 1 | 6.5 | 9/19 | 23.8s |
| #76 | Kimi K2.5 medium | Moonshot AI | 2 | 6.8 | 10/21 | 98.4s |
| #66 | Qwen3.5-35B-A3B medium | Qwen | 5 | 7.1 | 11/21 | 72.6s |
| #67 | MiniMax M3 medium | Minimax | 3 | 7.1 | 11/21 | 68.2s |
| #72 | DeepSeek V3.2 medium | DeepSeek | 2 | 7.0 | 11/21 | 68.7s |