Kategoria ya AI BENCHY
Orodha ya Ufuataji wa maagizo
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Ufuataji wa maagizo, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Muda wa majibu (wastani) ↑.
| Nafasi | Modeli | Kampuni | Alama ya Ufuataji wa maagizo | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #93 | Qwen3.6 Plus Preview medium | Qwen | 6.5 | 6.3 | 1/2 | 3.40s |
| #63 | GPT-5.3 Chat none | OpenAI | 9.8 | 7.2 | 2/2 | 3.51s |
| #84 | Grok 4.20 Multi Agent Beta medium | X AI | 9.8 | 6.6 | 2/2 | 3.52s |
| #59 | GLM 5V Turbo medium | Z.ai | 9.9 | 7.2 | 2/2 | 3.74s |
| #6 | GPT-5.5 low | OpenAI | 9.9 | 9.0 | 2/2 | 3.74s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 9.8 | 2/2 | 4.04s | |
| #79 | Hunter Alpha medium | OpenRouter | 9.9 | 6.7 | 2/2 | 4.18s |
| #101 | Mimo V2 Omni none | Xiaomi | 6.5 | 6.0 | 1/2 | 4.26s |
| #65 | Grok 4.20 medium | X AI | 9.8 | 7.1 | 2/2 | 4.26s |
| #64 | MiMo-V2-Flash medium | Xiaomi | 10.0 | 7.2 | 2/2 | 4.28s |
| #92 | Laguna M.1 medium | Poolside | 10.0 | 6.4 | 2/2 | 4.30s |
| #86 | Grok 4.1 Fast medium | X AI | 6.5 | 6.5 | 1/2 | 4.63s |
| #62 | Step 3.5 Flash medium | Stepfun | 8.3 | 7.2 | 1/2 | 4.78s |
| #13 | Grok 4.20 Beta medium | X AI | 9.8 | 8.5 | 2/2 | 4.89s |
| #80 | Mimo V2 Omni medium | Xiaomi | 8.3 | 6.7 | 1/2 | 4.99s |