Kategoria ya AI BENCHY
Orodha ya Mwito wa zana
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Mwito wa zana, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Muda wa majibu (wastani) ↓.
| Nafasi | Modeli | Kampuni | Alama ya Mwito wa zana | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #76 | Kimi K2.5 none | Moonshot AI | 10.0 | 5.5 | 1/1 | 14.0s |
| #47 | Grok 4.20 medium | X AI | 3.0 | 7.0 | 0/1 | 13.7s |
| #16 | GPT-5.4 medium | OpenAI | 10.0 | 8.2 | 1/1 | 13.3s |
| #31 | GLM 5V Turbo medium | Z.ai | 7.0 | 7.8 | 0/1 | 12.5s |
| #25 | Grok 4.20 Beta medium | X AI | 3.0 | 8.0 | 0/1 | 12.4s |
| #6 | Seed-2.0-Lite medium | Bytedance Seed | 10.0 | 8.6 | 1/1 | 12.4s |
| #80 | MiniMax M2.7 medium | Minimax | 4.7 | 5.3 | 0/1 | 12.0s |
| #12 | Gemini 3 PRO Preview medium | 10.0 | 8.4 | 1/1 | 12.0s | |
| #30 | Step 3.5 Flash medium | Stepfun | 10.0 | 7.9 | 1/1 | 11.9s |
| #64 | DeepSeek V3.2 none | DeepSeek | 10.0 | 6.1 | 1/1 | 11.8s |
| #53 | GLM 5 none | Z.ai | 10.0 | 6.6 | 1/1 | 11.1s |
| #35 | MiMo-V2-Omni medium | Xiaomi | 10.0 | 7.7 | 1/1 | 11.1s |
| #75 | GLM 5.1 none | Z.ai | 10.0 | 5.6 | 1/1 | 10.7s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 1/1 | 10.6s | |
| #32 | Qwen3.5-Flash medium | Qwen | 10.0 | 7.8 | 1/1 | 10.3s |