Kategoria ya AI BENCHY
Orodha ya Mbinu za kupinga AI
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Mbinu za kupinga AI, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu zinazohusiana za kushindwa
| Nafasi | Modeli | Kampuni | Alama ya Mbinu za kupinga AI | Wastani wa alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 7.0 | 6.9 | 2/3 | 99.0s |
| #28 | Kimi K2.5 medium | Moonshot AI | 7.0 | 6.4 | 2/3 | 85.3s |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 3/3 | 71.4s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 3/3 | 43.9s | |
| #34 | GPT-5 Nano medium | OpenAI | 7.0 | 5.5 | 2/3 | 37.7s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 7.0 | 7.3 | 2/3 | 33.4s |
| #43 | MiniMax M2.5 medium | Minimax | 9.3 | 4.7 | 2/3 | 32.4s |
| #52 | GLM 4.7 Flash medium | Z.ai | 4.0 | 3.1 | 1/3 | 27.1s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 3/3 | 22.3s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 3/3 | 21.8s |
| #39 | gpt-oss-120b medium | OpenAI | 7.0 | 5.1 | 2/3 | 19.8s |
| #13 | Step 3.5 Flash medium | Stepfun | 10.0 | 7.4 | 3/3 | 18.5s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 9.7 | 7.2 | 3/3 | 16.8s |
| #32 | GPT-5 Mini medium | OpenAI | 7.0 | 6.0 | 2/3 | 16.5s |
| #50 | Qwen3 Coder Next medium | Qwen | 1.3 | 3.5 | 0/3 | 15.3s |
| #27 | GPT-5.2 medium | OpenAI | 7.0 | 6.5 | 2/3 | 14.3s |
| #26 | Claude Opus 4.6 medium | Anthropic | 4.0 | 6.6 | 1/3 | 11.9s |
| #46 | Kimi K2.5 none | Moonshot AI | 2.7 | 4.1 | 0/3 | 11.4s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 3/3 | 10.4s |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 3/3 | 9.69s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 3/3 | 9.52s | |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 0/3 | 8.79s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 3/3 | 6.99s |
| #16 | Gemini 2.5 Flash medium | 7.3 | 7.4 | 2/3 | 6.98s | |
| #49 | GLM 4.7 Flash none | Z.ai | 10.0 | 3.9 | 0/3 | 6.59s |
| #30 | Grok 4.1 Fast medium | X AI | 10.0 | 6.2 | 3/3 | 5.65s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 3/3 | 5.61s | |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 3/3 | 5.02s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 7.0 | 7.7 | 2/3 | 4.95s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 4.0 | 6.8 | 1/3 | 4.83s |
| #19 | GPT-5.3 Chat none | OpenAI | 7.3 | 7.3 | 2/3 | 4.72s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 3/3 | 4.69s |
| #48 | Qwen3 Coder Next none | Qwen | 2.3 | 4.0 | 0/3 | 4.39s |
| #15 | GPT-5.2 Chat none | OpenAI | 10.0 | 7.4 | 3/3 | 3.97s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 3/3 | 3.75s | |
| #45 | Trinity Large Preview none | Arcee AI | 10.0 | 4.2 | 0/3 | 3.59s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 3/3 | 3.50s | |
| #31 | GLM 5 none | Z.ai | 4.0 | 6.0 | 1/3 | 3.39s |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 4.0 | 6.2 | 1/3 | 2.74s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 9.0 | 7.5 | 2/3 | 2.53s | |
| #17 | Gemini 3.1 Flash Lite Preview low | 7.0 | 7.3 | 2/3 | 2.18s | |
| #47 | GPT-4o-mini none | OpenAI | 4.0 | 4.0 | 1/3 | 1.83s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 10.0 | 4.7 | 0/3 | 1.76s |
| #53 | Grok 4.1 Fast none | X AI | 1.3 | 2.9 | 0/3 | 1.73s |
| #37 | Qwen3.5-Flash none | Qwen | 2.3 | 5.2 | 0/3 | 1.62s |
| #20 | Gemini 3 Flash Preview none | 7.0 | 7.2 | 2/3 | 1.59s | |
| #44 | GPT-5.4 none | OpenAI | 10.0 | 4.5 | 0/3 | 1.41s |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 0/3 | 1.36s |
| #36 | Mercury 2 medium | Inception | 7.3 | 5.3 | 2/3 | 1.30s |
| #22 | Gemini 3.1 Flash Lite Preview none | 6.0 | 7.1 | 1/3 | 1.16s | |
| #40 | Qwen3.5-122B-A10B none | Qwen | 4.0 | 5.0 | 1/3 | 927ms |
| #41 | Qwen3.5-27B none | Qwen | 4.0 | 4.9 | 1/3 | 796ms |
| #38 | Gemini 2.5 Flash none | 10.0 | 5.2 | 0/3 | 668ms | |
| #55 | LFM2-24B-A2B none | Liquid | 10.0 | 2.6 | 0/3 | 471ms |
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 0/3 | 466ms |