Kategoria ya AI BENCHY
Orodha ya Utatuzi wa mafumbo
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Utatuzi wa mafumbo, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Majaribio sahihi ↑.
| Nafasi | Modeli | Kampuni | Alama ya Utatuzi wa mafumbo | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #163 | Granite 4.1 8B none | IBM Granite | 3.2 | 4.0 | 0/3 | 608ms |
| #22 | Step 3.7 Flash medium | Stepfun | 5.7 | 8.0 | 1/3 | 6.19s |
| #38 | Grok 4.3 medium | X AI | 5.9 | 7.6 | 1/3 | 22.5s |
| #41 | Nemotron 3 Ultra 550b A55b medium | NVIDIA | 5.5 | 7.5 | 1/3 | 3.54s |
| #43 | MiMo-V2.5-Pro medium | Xiaomi | 6.7 | 7.5 | 1/3 | 5.31s |
| #51 | Mimo V2 PRO medium | Xiaomi | 6.4 | 7.4 | 1/3 | 5.08s |
| #53 | Gemini 3.1 Flash Lite high | 5.7 | 7.3 | 1/3 | 50.8s | |
| #54 | GPT-5 Mini medium | OpenAI | 5.6 | 7.3 | 1/3 | 15.2s |
| #57 | Step 3.7 Flash low | Stepfun | 5.5 | 7.3 | 1/3 | 1.84s |
| #60 | Kimi K2.6 medium | Moonshot AI | 6.0 | 7.2 | 1/3 | 25.1s |
| #62 | Step 3.5 Flash medium | Stepfun | 5.3 | 7.2 | 1/3 | 7.22s |
| #71 | Step 3.7 Flash high | Stepfun | 5.3 | 7.0 | 1/3 | 10.2s |
| #72 | DeepSeek V3.2 medium | DeepSeek | 7.0 | 7.0 | 1/3 | 37.7s |
| #75 | Ring-2.6-1T medium | Inclusionai | 5.9 | 6.9 | 1/3 | 20.7s |
| #76 | Kimi K2.5 medium | Moonshot AI | 5.3 | 6.8 | 1/3 | 43.2s |