Kategoria ya AI BENCHY
Orodha ya Mwito wa zana
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Mwito wa zana, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu zinazohusiana za kushindwa
| Nafasi | Modeli | Kampuni | Alama ya Mwito wa zana | Wastani wa alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 6.9 | 1/1 | 88.7s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 10.0 | 7.3 | 1/1 | 34.8s |
| #34 | GPT-5 Nano medium | OpenAI | 10.0 | 5.5 | 1/1 | 33.3s |
| #28 | Kimi K2.5 medium | Moonshot AI | 10.0 | 6.4 | 1/1 | 31.7s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 10.0 | 7.2 | 1/1 | 27.8s |
| #30 | Grok 4.1 Fast medium | X AI | 10.0 | 6.2 | 0/1 | 27.7s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 1/1 | 23.1s | |
| #32 | GPT-5 Mini medium | OpenAI | 10.0 | 6.0 | 1/1 | 18.6s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 1/1 | 15.9s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 1/1 | 15.9s |
| #43 | MiniMax M2.5 medium | Minimax | 10.0 | 4.7 | 1/1 | 15.4s |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 1/1 | 14.0s |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 1/1 | 13.3s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 1/1 | 12.0s | |
| #13 | Step 3.5 Flash medium | Stepfun | 10.0 | 7.4 | 1/1 | 11.9s |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 1/1 | 11.8s |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 1/1 | 11.1s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 1/1 | 10.6s | |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 1/1 | 10.3s |
| #27 | GPT-5.2 medium | OpenAI | 10.0 | 6.5 | 0/1 | 10.3s |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 1/1 | 9.73s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 1/1 | 9.54s | |
| #19 | GPT-5.3 Chat none | OpenAI | 10.0 | 7.3 | 1/1 | 8.36s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 1/1 | 7.73s | |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 1/1 | 7.54s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 1/1 | 7.48s |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 1/1 | 7.45s |
| #49 | GLM 4.7 Flash none | Z.ai | 10.0 | 3.9 | 0/1 | 7.05s |
| #39 | gpt-oss-120b medium | OpenAI | 9.0 | 5.1 | 1/1 | 6.91s |
| #45 | Trinity Large Preview none | Arcee AI | 10.0 | 4.2 | 1/1 | 6.67s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 1/1 | 6.37s |
| #16 | Gemini 2.5 Flash medium | 10.0 | 7.4 | 1/1 | 6.20s | |
| #53 | Grok 4.1 Fast none | X AI | 10.0 | 2.9 | 0/1 | 5.51s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 1/1 | 4.99s | |
| #15 | GPT-5.2 Chat none | OpenAI | 10.0 | 7.4 | 1/1 | 4.68s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 1/1 | 4.65s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 1/1 | 4.60s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 10.0 | 6.8 | 1/1 | 4.11s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 1/1 | 3.80s | |
| #37 | Qwen3.5-Flash none | Qwen | 10.0 | 5.2 | 1/1 | 3.67s |
| #41 | Qwen3.5-27B none | Qwen | 10.0 | 4.9 | 1/1 | 3.54s |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 1/1 | 3.39s | |
| #20 | Gemini 3 Flash Preview none | 10.0 | 7.2 | 1/1 | 3.35s | |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.2 | 1/1 | 3.33s |
| #44 | GPT-5.4 none | OpenAI | 10.0 | 4.5 | 1/1 | 2.75s |
| #50 | Qwen3 Coder Next medium | Qwen | 10.0 | 3.5 | 1/1 | 2.64s |
| #47 | GPT-4o-mini none | OpenAI | 10.0 | 4.0 | 1/1 | 2.51s |
| #48 | Qwen3 Coder Next none | Qwen | 10.0 | 4.0 | 1/1 | 2.47s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 10.0 | 4.7 | 1/1 | 2.30s |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 1/1 | 2.28s |
| #40 | Qwen3.5-122B-A10B none | Qwen | 10.0 | 5.0 | 1/1 | 2.04s |
| #38 | Gemini 2.5 Flash none | 10.0 | 5.2 | 1/1 | 1.91s | |
| #36 | Mercury 2 medium | Inception | 10.0 | 5.3 | 1/1 | 1.89s |
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 1/1 | 1.27s |
| #55 | LFM2-24B-A2B none | Liquid | 10.0 | 2.6 | 0/1 | 0ms |