Kushindwa kwa AI BENCHY
Kushindwa kwa Muda umeisha
Ona ni modeli gani za AI hukutana na Muda umeisha mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua. Panga kwa: Muda wa majibu (wastani) ↓.
Kategoria
Katika kategoria Mahususi kwa domeni31 Katika kategoria Uandishi wa msimbo12 Katika kategoria Utatuzi wa mafumbo6 Katika kategoria Akili ya jumla4 Katika kategoria Mbinu za kupinga AI4 Katika kategoria Mchanganyiko2 Katika kategoria Uchanganuzi na uchimbaji wa data1 Katika kategoria Ufuataji wa maagizo1
| Nafasi | Modeli | Kampuni | Idadi ya Muda umeisha | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #97 | Qwen3.5-9B medium | Qwen | 11 | 4.4 | 3/18 | 73.6s |
| #46 | Kimi K2.5 medium | Moonshot AI | 2 | 7.0 | 9/18 | 72.4s |
| #39 | Seed-2.0-Mini medium | Bytedance Seed | 4 | 7.5 | 11/18 | 69.7s |
| #32 | Qwen3.5-Flash medium | Qwen | 4 | 7.8 | 11/18 | 66.7s |
| #10 | Qwen3.5-27B medium | Qwen | 1 | 8.4 | 13/18 | 53.0s |
| #8 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 2 | 8.5 | 14/18 | 46.6s |
| #27 | DeepSeek V3.2 medium | DeepSeek | 2 | 8.0 | 12/18 | 46.4s |
| #34 | Kimi K2.6 medium | Moonshot AI | 2 | 7.7 | 11/18 | 45.2s |
| #43 | Qwen3.5-35B-A3B medium | Qwen | 4 | 7.4 | 10/18 | 44.5s |
| #57 | GPT-5 Nano medium | OpenAI | 1 | 6.3 | 7/18 | 44.1s |
| #71 | MiniMax M2.5 medium | Minimax | 4 | 5.7 | 5/18 | 39.6s |
| #93 | GLM 4.7 Flash medium | Z.ai | 1 | 4.6 | 4/18 | 32.3s |
| #19 | Qwen3.5-122B-A10B medium | Qwen | 2 | 8.1 | 13/18 | 31.4s |
| #80 | MiniMax M2.7 medium | Minimax | 2 | 5.3 | 4/18 | 31.1s |
| #24 | Gemma 4 26B A4B medium | 2 | 8.0 | 13/18 | 25.0s |