Kategoria ya AI BENCHY
Orodha ya Ufuataji wa maagizo
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Ufuataji wa maagizo, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu zinazohusiana za kushindwa
| Nafasi | Modeli | Kampuni | Alama ya Ufuataji wa maagizo | Wastani wa alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #28 | Kimi K2.5 medium | Moonshot AI | 10.0 | 6.4 | 2/2 | 92.5s |
| #8 | Gemini 3.1 Flash Lite Preview high | 9.0 | 8.2 | 1/2 | 70.1s | |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 2/2 | 63.5s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 10.0 | 7.3 | 2/2 | 35.8s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 2/2 | 31.9s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 2/2 | 24.4s |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 2/2 | 19.7s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 6.9 | 2/2 | 17.5s |
| #32 | GPT-5 Mini medium | OpenAI | 7.5 | 6.0 | 1/2 | 15.7s |
| #34 | GPT-5 Nano medium | OpenAI | 9.0 | 5.5 | 1/2 | 11.9s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 2/2 | 9.88s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 2/2 | 9.56s | |
| #37 | Qwen3.5-Flash none | Qwen | 5.0 | 5.2 | 1/2 | 8.81s |
| #48 | Qwen3 Coder Next none | Qwen | 4.5 | 4.0 | 0/2 | 7.71s |
| #39 | gpt-oss-120b medium | OpenAI | 9.5 | 5.1 | 2/2 | 7.63s |
| #50 | Qwen3 Coder Next medium | Qwen | 4.5 | 3.5 | 0/2 | 7.34s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 2/2 | 7.25s |
| #5 | Gemini 3 Flash Preview low | 9.5 | 8.2 | 2/2 | 7.02s | |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 2/2 | 6.10s | |
| #15 | GPT-5.2 Chat none | OpenAI | 6.0 | 7.4 | 1/2 | 5.46s |
| #30 | Grok 4.1 Fast medium | X AI | 5.5 | 6.2 | 1/2 | 5.30s |
| #13 | Step 3.5 Flash medium | Stepfun | 9.0 | 7.4 | 1/2 | 4.98s |
| #43 | MiniMax M2.5 medium | Minimax | 8.0 | 4.7 | 1/2 | 4.64s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 10.0 | 7.2 | 2/2 | 4.28s |
| #19 | GPT-5.3 Chat none | OpenAI | 9.0 | 7.3 | 1/2 | 3.29s |
| #6 | Gemini 3 Pro Preview medium | 9.5 | 8.2 | 2/2 | 3.26s | |
| #27 | GPT-5.2 medium | OpenAI | 9.5 | 6.5 | 2/2 | 3.12s |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 2/2 | 3.11s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 2/2 | 3.04s |
| #52 | GLM 4.7 Flash medium | Z.ai | 5.0 | 3.1 | 1/2 | 2.97s |
| #46 | Kimi K2.5 none | Moonshot AI | 5.5 | 4.1 | 1/2 | 2.67s |
| #16 | Gemini 2.5 Flash medium | 9.5 | 7.4 | 2/2 | 2.62s | |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 2/2 | 2.61s |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 2/2 | 2.43s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 5.5 | 6.8 | 1/2 | 1.96s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 2/2 | 1.91s | |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.2 | 2/2 | 1.67s |
| #20 | Gemini 3 Flash Preview none | 5.5 | 7.2 | 1/2 | 1.58s | |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 2/2 | 1.52s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 2/2 | 1.49s | |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 2/2 | 1.48s |
| #47 | GPT-4o-mini none | OpenAI | 4.5 | 4.0 | 0/2 | 1.27s |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 2/2 | 1.13s | |
| #45 | Trinity Large Preview none | Arcee AI | 3.5 | 4.2 | 0/2 | 1.09s |
| #55 | LFM2-24B-A2B none | Liquid | 4.5 | 2.6 | 0/2 | 1.09s |
| #44 | GPT-5.4 none | OpenAI | 5.5 | 4.5 | 1/2 | 1.07s |
| #36 | Mercury 2 medium | Inception | 10.0 | 5.3 | 2/2 | 1.07s |
| #53 | Grok 4.1 Fast none | X AI | 10.0 | 2.9 | 0/2 | 923ms |
| #49 | GLM 4.7 Flash none | Z.ai | 5.5 | 3.9 | 1/2 | 888ms |
| #54 | MiMo-V2-Flash none | Xiaomi | 5.5 | 2.9 | 1/2 | 857ms |
| #41 | Qwen3.5-27B none | Qwen | 4.5 | 4.9 | 0/2 | 815ms |
| #42 | Qwen3.5-35B-A3B none | Qwen | 5.0 | 4.7 | 1/2 | 809ms |
| #38 | Gemini 2.5 Flash none | 9.0 | 5.2 | 1/2 | 672ms | |
| #40 | Qwen3.5-122B-A10B none | Qwen | 4.5 | 5.0 | 0/2 | 585ms |
| #51 | Mercury 2 none | Inception | 5.5 | 3.4 | 1/2 | 551ms |