Kategoria ya AI BENCHY
Orodha ya Utatuzi wa mafumbo
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Utatuzi wa mafumbo, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu zinazohusiana za kushindwa
| Nafasi | Modeli | Kampuni | Alama ya Utatuzi wa mafumbo | Wastani wa alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #7 | Qwen3.5-27B medium | Qwen | 8.3 | 8.2 | 2/3 | 64.6s |
| #24 | Qwen3.5-Flash medium | Qwen | 4.0 | 6.9 | 1/3 | 56.7s |
| #8 | Gemini 3.1 Flash Lite Preview high | 7.0 | 8.2 | 2/3 | 46.3s | |
| #28 | Kimi K2.5 medium | Moonshot AI | 4.0 | 6.4 | 1/3 | 45.4s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 7.0 | 7.3 | 2/3 | 36.9s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 3/3 | 34.6s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 4.0 | 5.5 | 1/3 | 31.6s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 7.0 | 6.9 | 2/3 | 25.9s |
| #48 | Qwen3 Coder Next none | Qwen | 1.3 | 4.0 | 0/3 | 22.9s |
| #34 | GPT-5 Nano medium | OpenAI | 4.0 | 5.5 | 1/3 | 19.8s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 3/3 | 17.2s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 3/3 | 15.6s |
| #32 | GPT-5 Mini medium | OpenAI | 4.3 | 6.0 | 1/3 | 14.1s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 0/3 | 12.9s |
| #39 | gpt-oss-120b medium | OpenAI | 1.7 | 5.1 | 0/3 | 11.8s |
| #43 | MiniMax M2.5 medium | Minimax | 4.0 | 4.7 | 1/3 | 11.5s |
| #9 | GPT-5.4 medium | OpenAI | 7.0 | 8.0 | 2/3 | 9.13s |
| #30 | Grok 4.1 Fast medium | X AI | 4.0 | 6.2 | 1/3 | 8.08s |
| #13 | Step 3.5 Flash medium | Stepfun | 4.0 | 7.4 | 1/3 | 7.72s |
| #33 | DeepSeek V3.2 none | DeepSeek | 7.7 | 5.5 | 2/3 | 7.37s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 3/3 | 7.15s | |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 3/3 | 6.11s | |
| #37 | Qwen3.5-Flash none | Qwen | 1.3 | 5.2 | 0/3 | 5.90s |
| #27 | GPT-5.2 medium | OpenAI | 7.0 | 6.5 | 2/3 | 5.47s |
| #3 | GPT-5.3-Codex medium | OpenAI | 9.3 | 8.4 | 2/3 | 5.12s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 3/3 | 4.80s |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 0/3 | 4.73s |
| #26 | Claude Opus 4.6 medium | Anthropic | 7.0 | 6.6 | 2/3 | 4.60s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 3/3 | 4.43s | |
| #15 | GPT-5.2 Chat none | OpenAI | 7.0 | 7.4 | 2/3 | 4.42s |
| #16 | Gemini 2.5 Flash medium | 7.0 | 7.4 | 2/3 | 3.94s | |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 3/3 | 3.91s | |
| #21 | MiMo-V2-Flash medium | Xiaomi | 7.0 | 7.2 | 2/3 | 3.77s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 7.0 | 7.5 | 2/3 | 3.58s | |
| #45 | Trinity Large Preview none | Arcee AI | 4.0 | 4.2 | 1/3 | 3.30s |
| #19 | GPT-5.3 Chat none | OpenAI | 10.0 | 7.3 | 3/3 | 2.93s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 7.0 | 6.8 | 2/3 | 2.92s |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 7.0 | 6.2 | 2/3 | 2.82s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 3/3 | 2.76s | |
| #50 | Qwen3 Coder Next medium | Qwen | 10.0 | 3.5 | 0/3 | 2.30s |
| #31 | GLM 5 none | Z.ai | 7.0 | 6.0 | 2/3 | 2.05s |
| #55 | LFM2-24B-A2B none | Liquid | 3.3 | 2.6 | 0/3 | 1.69s |
| #44 | GPT-5.4 none | OpenAI | 4.0 | 4.5 | 1/3 | 1.52s |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 0/3 | 1.38s |
| #41 | Qwen3.5-27B none | Qwen | 6.3 | 4.9 | 1/3 | 1.37s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 1.7 | 4.7 | 0/3 | 1.34s |
| #47 | GPT-4o-mini none | OpenAI | 2.3 | 4.0 | 0/3 | 1.30s |
| #53 | Grok 4.1 Fast none | X AI | 1.3 | 2.9 | 0/3 | 1.28s |
| #20 | Gemini 3 Flash Preview none | 7.0 | 7.2 | 2/3 | 1.06s | |
| #49 | GLM 4.7 Flash none | Z.ai | 3.7 | 3.9 | 0/3 | 1.00s |
| #40 | Qwen3.5-122B-A10B none | Qwen | 4.0 | 5.0 | 1/3 | 982ms |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 3/3 | 972ms | |
| #36 | Mercury 2 medium | Inception | 1.7 | 5.3 | 0/3 | 934ms |
| #38 | Gemini 2.5 Flash none | 4.7 | 5.2 | 1/3 | 576ms | |
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 0/3 | 533ms |